Weighting Ripley’s K -Function to Account for the Firm Dimension in the Analysis of Spatial Concentration

Abstract

The spatial concentration of firms has long been a central issue in economics under both the theoretical and the applied point of view mainly due to the important policy implications. A popular approach to its measurement, which does not suffer from the problem of the arbitrariness of the regional boundaries, makes use of micro data and looks at the firms as if they were dimensionless points distributed in the economic space. However, in practical circumstances the points (firms) observed in the economic space are far from being dimensionless and are conversely characterized by different dimension in terms of the number of employees, the product, the capital, and so on. In the literature, the works that originally introduce such an approach disregard the aspect of the different firm dimension and ignore the fact that a high degree of spatial concentration may result from the case of both many small points clustering in definite portions of space and only few large points clustering together (e.g., few large firms). We refer to this phenomenon as clustering of firms and clustering of economic activities. The present article aims at tackling this problem by adapting the popular K-function to account for the point dimension using the framework of marked point process theory.

Keywords

agglomeration marked point processes spatial clusters spatial econometrics

Introduction

Spatial economics theories show that economic integration may boost spatial concentration of economic activities and industrial specialization both at a regional and at an international level (Bickenbach and Bode 2008). Furthermore, due to the external increasing returns driven by the spatial concentration, the core regions (where spatial clusters of firms are more likely to occur) may reach higher levels of economic growth than the peripheral regions (see Krugman 1991; Fujita, Krugman, and Venables 1999 among others). As a consequence, the phenomenon of spatial concentration is of paramount importance to explain the determinants of growth and development on one hand and regional disparities on the other hand.

Fostered by the centrality of these issues under the theoretical and the practical point of view, a variety of empirical studies have tried to develop proper indices and statistical tests to measure the degree of spatial clustering in real industrial situations. Under this respect, a series of recent articles (Arbia, Espa, and Quah 2008; Arbia et al. 2010; Marcon and Puech 2003, 2010; Duranton and Overman 2005, 2008) have introduced the use of distance-based methods. These methods are more robust than the traditional measures of spatial concentration (such as Gini Index [Gini 1912, 1921] or Ellison-Glaeser Index [Ellison and Glaeser 1997]), which make use of regional aggregates and thus depend on the arbitrariness of the definitions of the spatial units. The distance-based methods, conversely, make use of micro-economic data, treating each firm as a point on a map and studying their spatial distribution with the methods borrowed from the so-called point pattern analysis (Diggle 2003).

In many empirical circumstances where the presence of spatial clusters of firms is tested using micro-geographical data, an important element to be taken into account is represented by the firm dimension.

Indeed, a high level of spatial concentration can be due to two very different phenomena (see Figure 1). Namely,

Figure 1.

Two extreme paradigmatic situations of spatial concentration. Case 1. Clustering of firms; Case 2. Clustering of economic activities.

Case 1: many small firms clustering in space, and

Case 2: few large firms (in the limit just one firm) clustering in space.

We can refer to the first case as the case of clustering of firms and to the second as the case of clustering of economic activities.

A proper test for the presence of spatial clusters should thus consider the impact of the firm dimension on industrial agglomeration by clearly distinguishing these two cases.

Under this respect, Marcon and Puech (2010) and Duranton and Overman (2005) have extended the use of Ripley’s K-function (Ripley 1977) considering firm size treating it as a weight attached to each of the points constituting the pattern. Both quoted articles developed relative measures of the spatial concentration, detecting the extra concentrations of firms belonging to a specific industry with respect to the distribution of firms of the whole economy. Following this procedure, a positive (or negative) spatial dependence between firms is detected when the pattern of a specific sector is more aggregated (or more dispersed) than the pattern of the whole economy. Although measures of relative spatial concentration are very useful in controlling for the idiosyncratic characteristics of the territories under study, they do not allow comparisons across different economies (see Haaland et al. 1999; Mori et al. 2005 for a more detailed discussion).

In this article, we propose a similar extension of Ripley’s K-function which leads to an absolute (rather than a relative) measure of the industrial agglomeration and which allows comparability among different empirical situations. More specifically, referring to the theory of marked point processes, we develop a stochastic mechanism that generates weighted point patterns of firms representing stylized facts of the different phenomena occurring in real cases (essentially spatial randomness or spatial concentration in the sense indicated in case 1 or case 2 above). The values assumed by the proposed measure in the various cases constitute the benchmark that allows us to formally test the departure from spatial randomness.

We will present our new approach along the following lines. In the section on Measuring the Spatial Concentration of Firms Disregarding Size: The Basic K-Function, we briefly discuss the classical Ripley’s -function which represents the starting point to develop more sophisticated measures of spatial concentration. The section, on Measuring The Spatial Concentration of Firms Considering Size: The Mark-Weighted K-Function, will be devoted to introduce the stochastic mechanism based on the marked point processes theory that allows us to develop a test for the presence of absolute spatial concentration of firms and economic activities. In this section, we introduce the new model, discuss the meaning of the model’s parameters in the context of spatial concentration of firms and economic activities, present some simulation results to better illustrate how the model works in practice. The section on A Case Study: The Localization of the High- and Medium-High Technology Manufacturing Industry in the Metropolitan Areas of Milan and Turin consists of an empirical application of this model for the study of the spatial distribution of the high- and medium-high-technology manufacturing industry in the metropolitan areas of Milan and Turin. The final section comprises a discussion of the results, some conclusions, and directions for further studies in the field.

Measuring the Spatial Concentration of Firms Disregarding Size: The Basic K-Function

It is probably fair to say that Ripley’s K-function (Ripley 1976, 1977) is currently the most popular distance-based measure to summarize the cumulative characteristics of a spatial distribution of events in the context of microgeographic data. It has indeed proved a very versatile tool to test for the presence of spatial concentration within a stationary point pattern where each event is considered as a dimensionless point. As a consequence, the K-function has been largely applied in various fields such as geography, ecology, epidemiology, and, more recently, economics (see Arbia and Espa 1996; Marcon and Puech 2003).

The K-function is defined as follows:

K (d) = λ^{- 1} E \{n u m b e r o f p o i n t s f a l l i n g a t a d i s t a n c e \leq d f r o m a n a r b i t r a r y p o i n t\},

with

E \{.\}

indicating the expectation operator and

λ

representing the mean number of events per unitary area, a parameter called intensity. Therefore,

λ K (d)

can be interpreted as the expected number of further points within a distance d of an arbitrary point of the process (Ripley 1977). In case of a homogeneous field (where the probability of hosting a point is constant across the study area), the K-function quantifies the level of spatial dependence between points at each distance d.

In order to develop a test for the presence of absolute spatial concentration, we can rely on the fact that for many stochastic processes, it is possible to compute the expectation in the right-hand side of equation (1), so that $K (d)$ can be written in a closed form (Dixon 2002). A point process generating a spatial distribution of events completely at random (i.e., points are distributed uniformly and independently on space) is the so-called homogeneous Poisson process. It can be shown that if a point pattern is a realization of a homogeneous Poisson process, then $K (d)$ equals $π d^{2}$ (see Diggle 2003). Therefore

K (d) = π d^{2}, d > 0,

represents the null hypothesis of random location of events. Significant departures from this benchmarking value represent the alternative hypothesis of spatial dependence. More precisely, for

K (d) > π d^{2}

, we have positive dependence and hence clustering (where points tend to attract each other); for

K (d) < π d^{2}

, we have negative dependence and hence inhibition (where points tend conversely to repulse each other). Therefore, to formally test whether the observed points tend to cluster in space we can verify whether, for some d,

K (d)

is significantly greater than

π d^{2}

. Critical values can be computed by Monte Carlo simulation of homogeneous Poisson processes (see Besag and Diggle 1977).

The test for the presence of absolute concentration based on Ripley’s K-function, however, can be used to detect industrial agglomeration only if firms can be considered to have the same dimension. Indeed, in a context where economic activities are different in terms of dimension with the presence of small, medium, and large firms, a point pattern is not a good representation of the location pattern of economic activities and, as a result, the K-function is no more a proper tool to summarize the spatial distribution. For instance, the simple K-function cannot recognize a situation like the one reported in Figure 1 as Case 2 as a cluster. In other words, the test do not control for the overall agglomeration of manufacturing (Duranton and Overman 2005).

In such a context, in order to define a proper test, we need to refer to the concepts and methods of the marked point process statistics, which is a branch of spatial statistics devoted to analyze sets of events scattered in space, where each event is defined not only by its spatial location but also by a mark, a supplementary set of information that might be either quantitative or qualitative (Illian et al. 2008).

Measuring the Spatial Concentration of Firms Considering Size: The Mark-Weighted K-Function

The Mark-Weighted K-Function

The mark-weighted K-function, indicated as $K_{m m} (d)$ , is an explorative tool proposed by Penttinen (2006) to summarize the cumulative characteristics of a homogeneous quantitative marked point pattern (i.e., a pattern where a quantitative mark is attached on each point). It has been proposed as a natural generalization of Ripley’s K-function. In order to introduce it, let us first rewrite the classical K-function as:

K (d) = E [\sum_{i = 1}^{N} \sum_{j \neq i} I (d_{i j} \leq d)] / λ,

where the term

d_{i j}

is the Euclidean distance between the ith and jth arbitrary points, N is the number of points the underlying point process generates, and

I (d_{i j} \leq d)

represents the indicator function such that I = 1 if

d_{i j} \leq d

and 0 otherwise. Following this notation, the mark-weighted K-function has a similar form but the marks are now taken into account:

K_{m m} (d) = E [\sum_{i = 1}^{N} \sum_{j \neq i} m_{i} m_{j} I (d_{i j} \leq d)] / λ μ^{2} .

In equation (2)

m_{i}

and

m_{j}

are the marks attached to the ith and jth points, respectively, and

μ

is the mean of the marks. Thus, the term

{λ μ}^{2} K_{m m} (d)

can be interpreted as the mean of the sum of the products formed by the mark of the ith arbitrary point and the marks of all other points in the circle d centered in it (Illian et al. 2008). Therefore, the mark-weighted K-function measures the joint cumulative distribution of marks and points at each distance d.

Turning now to the estimation aspects, following Penttinen (2006), a proper approximately edge-corrected unbiased estimator of $K_{m m} (d)$ for a marked point pattern with n observations is

{\hat{K}}_{m m} (d) = (\sum_{i = 1}^{n} \sum_{j \neq i} m_{i} m_{j} w_{i j} I (d_{i j} \leq d)) / {n \hat{λ} \hat{μ}}^{2},

where

\hat{λ} = n / |A|

is the estimated spatial intensity,

|A|

is the area of the study region, and

\hat{μ}

is the mean of the observed marks. Due to the presence of edge effects arising from the arbitrariness of the boundaries of the study region, the adjustment factor

w_{i j}

is introduced, thus avoiding potential biases in the estimates in proximity to the boundaries of the study region. More precisely, the weight function

w_{i j}

expresses the reciprocal of the proportion of the area of a circle centered on the ith point, passing through the jth point, which lies within the study region A (Boots and Getis 1988).

In an economic context, in which the marks are the values of a quantitative variable representing the firm’s size, the mark-weighted K-function might be used to develop a test for the presence of absolute spatial concentration. However, we need to derive the benchmark value of the function representing the null hypothesis of spatial randomness. For this reason, the next paragraph is devoted to derive a stochastic model to generate marked point patterns of firms which is able to represent the stylized situations of spatial randomness and concentration in the meaning of case 1 (i.e., many small firms clustering in space) and case 2 (i.e., few large firms clustering in space).

A Model for the Null Hypothesis of Spatial Randomness

The basic idea we follow is that the spatial concentration of economic activities (in the sense of Case 1 and Case 2) can be originated by some form of correlation between the spatial point intensity and the marks. This would imply, for instance, that in regions characterized by high spatial point intensity the marks tend to be systematically large if such a correlation is positive or, conversely, small if such correlation is negative.

To define a model that incorporates such a correlation structure, we refer to the design, already explored by Ho and Stoyan (2008), of an intensity-marked Cox process, where the spatial point intensity is driven by a Cox process and the marks are realizations of a process whose parameters are conditioned by the values of the spatial point intensity.

The Log Gaussian Cox Process for the Spatial Point Intensity

To start with we assume that the spatial point intensity can be modeled as a log Gaussian Cox process (a specific kind of Cox process proposed by Møller, Syversveen, and Waagepetersen 1998). According to this model, each point pattern represents a partial realization of an inhomogeneous Poisson process characterized by a spatial intensity function $λ (x)$ , with x representing the spatial coordinates of an arbitrary point (see Diggle 2003). The values of $λ (x)$ constitute a realization of a positive random field $\{Λ (x)\}$ such that $Λ (x) = e x p \{S (x)\}$ , where $\{S (x)\}$ is a Gaussian random field with mean $μ_{S}$ , variance $σ_{S}^{2}$ , and correlation function $ρ_{S} (d)$ . $\{Λ (x)\}$ is known as a log Gaussian Cox process.

The log Gaussian assumption is particularly useful because explicit expressions can be derived for the intensity and covariance structure of the point process. Indeed, according to the moment generating function of a log Gaussian distribution, the intensity $λ$ of a log Gaussian Cox process $\{Λ (x)\}$ can be written as:

λ = E [Λ (x)] = E [exp (S (x))] = exp (μ_{S} + \frac{1}{2} σ_{S}^{2}) .

Concerning the covariance structure, for any arbitrary pairs of points (say

x

and

x^{'}

Λ (x) Λ (x^{'}) = e x p \{S (x) + S (x^{'})\}

and

S (x) + S (x^{'})

is also Gaussian with mean

m = 2 μ_{S}

and variance

v = 2 σ_{S}^{2} [1 + ρ_{S} (d)]

where d is the Euclidean distance between x and

x^{^{'}}

. As a result,

E [Λ (x) Λ (x^{'})] = exp (m + ν / 2)

and hence

E [Λ (x) Λ (x^{'})] = λ^{2} exp \{σ_{S}^{2} ρ_{S} (d)\} .

The Marks Process

Our model assumes that the mark $m (x_{n})$ attached to the point $x_{n}$ generated by the log Gaussian Cox process depends on the intensity of the process itself. More formally we have:

m (x_{n}) = a Λ (x_{n}) + b exp \{R (x_{n})\},

where

Λ (x_{n})

is the value of the spatial intensity at point

x_{n}

and

exp \{R (x_{n})\}

is due to a residual process, where

R (x)

is a Gaussian random field with mean

μ_{R}

, variance

σ_{R}^{2}

and correlation function

ρ_{R} (d)

. Thus, the expected value of process

exp \{R (x)\}

, indicated with

ε

, is

ε = E [exp \{R (x)\}] = exp \{μ_{R} + \frac{1}{2} σ_{R}^{2}\}

The two constants a and b appearing in equation (3) are the model parameters. It is important to understand the role of these two parameters in the generation of the patterns of firms and the way in which they can model the relationship between the intensity with which firms are distributed in space and their dimension. More specifically, a is the parameter driving the correlation between the spatial point intensity process and the marks process. When a = 0, the marks are independent of the spatial intensity. Conversely when a > 0, it generates marks that tend to be larger (i.e., larger firms) in regions characterized by a high spatial point intensity. Finally, in those cases where a < 0, the marks tend to be smaller (and hence the firms of smaller dimension) in regions characterized by a high spatial point intensity. On the other hand, the parameter b represents the perturbation effect of the residual process on the correlation between marks and intensity. The larger the absolute value b, the more the residual process disturbs the phenomenon of correlation controlled by a.

The log Gaussian assumption makes the computation of the expected value of the marks process mathematically tractable, indeed we have:

μ = E [m (x)] = a λ e x p \{σ_{S}^{2}\} + b ε .

Formally, the expected value of the marks process would be

a λ + b ε

. However, following Ho and Stoyan (2008), the true expected value is

μ = a λ e x p \{σ_{S}^{2}\} + b ε

, which is larger than

a λ + b ε

, when a > 0, and smaller when a < 0. For a detailed explanation about the need to introduce

exp \{σ_{S}^{2}\}

to obtain the true expression for

μ

, see Ho and Stoyan (2008).

The model proposed here is particularly interesting having in mind the economic application and specifically the study of firm location. In fact in the application of the present methodological framework to the problem of assessing industrial agglomeration, the marked point patterns generated when a = 0 represent the null hypothesis of spatial randomness of firms. Similarly, a > 0 and a < 0 refer to the alternative hypothesis of spatial concentration of economic activities in the sense expressed in case 1 and case 2, respectively, in the Introduction section.

To better illustrate how the model works, in the reminder of this section we will show some realizations of a marked point process. In what follows, all the generated patterns are obtained using the same random seed so that all realizations are directly comparable and the differences between the patterns can be ascribed only to differences in the model parameters. Figure 2 shows the realization of the underlying spatial point intensity process given as $Λ (x) = e s p \{S (x)\}$ on the unit square, with mean $μ_{S} = 5$ , variance $σ_{S}^{2} = 0.25$ and correlation function $ρ_{S} (d) = exp \{- d / 0.25\}$ .¹ As we can see, in this particular realization, the spatial point intensity tends to be higher (light gray colors) toward the center of the unitary area.

Figure 2.

A realization of the underlying spatial point intensity (gray-scale image).

In order to illustrate the role of parameter a in driving the correlation between the spatial point intensity and the marks, Figure 3 displays different realizations of the marked point process with different values for a. The six simulated marked point patterns appearing in Figure 3 show the net effect of parameter a since b is always set to zero. In each pattern, the marks are rescaled to the unit interval, and each point is represented by a circle with radius proportional to its rescaled mark. Figure 3 shows quite clearly that, for positive values of a, the marks tend to be larger where the spatial point intensity is higher, that is approximately at the center of the unitary area (see patterns i, iii, and v). On the other hand, for negative values of a, the marks tend to be smaller where the spatial point intensity is higher (see pattern ii, iv, and vi). The two kinds of clustering situation—namely, case 1 and case 2—tend to be more evident when a increases in absolute value.

Figure 3.

Simulated patterns of marks according to model 3. The figure illustrates the role of parameter a when b = constant = 0.

Figure 4 shows six simulated marked point patterns with different values for b, which illustrate the role of this parameter in disturbing the correlation between the spatial point intensity and the marks. In all six cases, the residual process $exp \{R (x)\}$ is characterized by $μ_{R} = 5$ , $σ_{R}^{2} = 0.25$ , and $ρ_{R} (d) = exp \{- d / 0.25\}$ and a is set to be equal to 0.25. To understand how the parameter b disturbs the effect of the parameter a, we can compare the patterns of Figure 4 with the patterns of Figure 3i, where a = 0.25. As b increases in absolute terms, the residual process becomes relatively more important in generating the marked point patterns. In this situation, the correlation between the spatial point intensity and the marks depicted by the pattern reported in Figure 2i becomes less strong.

Figure 4.

Simulated patterns of marks according to model 3. The figure illustrates the role of parameter b when a = constant = 0.25.

The Benchmark Value of the Mark-Weighted K-Function

Because of the mathematical tractability of the model defined above, the corresponding theoretical mark-weighted K-function can be derived in a closed form. Indeed, for such a marked log-Gaussian Cox process (for d >0), the mark-weighted K-function assumes the form:

\begin{aligned} K_{m m} (d) \\ = 2 π \int_{0}^{d} u \frac{a^{2} λ^{2} exp \{2 σ_{S}^{2} + 3 σ_{S}^{2} ρ_{S} (u)\} + 2 a b λ exp \{σ_{S}^{2} + \frac{3}{2} σ_{S}^{2} ρ_{S} (u)\} ε + b^{2} ε^{2} exp \{σ_{R}^{2} ρ_{R} (u)\}}{{[a λ exp \{σ_{S}^{2}\} + b ε]}^{2}} d u . \end{aligned}

The formal derivation of equation (4) is reported in the Appendix A. Equation (4) above allows us to develop a test for the presence of absolute concentration of economic activities using the mark-weighted K-function, in which the null hypothesis of spatial randomness of firms is represented by the values of

K_{m m} (d)

when a = 0. In fact, when a = 0, then we have:

K_{m m} (d) = 2 π \int_{0}^{d} u exp \{σ_{R}^{2} ρ_{R} (u)\} d u .

To help the visualization, Figure 5 shows the mean of

{\hat{K}}_{m m} (d)

for 1,000 marked point patterns generated in the unit square from model 3 with parameters

μ_{S} = 5

σ_{S}^{2} = 0.25

ρ_{S} (d) = exp \{- d / 0.25\}

μ_{R} = 0

σ_{R}^{2} = 0.25

ρ_{R} (d) = exp \{- d / 0.25\}

, a = 0 and b = 1. Since the theoretical function (dashed line), given by equation (5), lies within the confidence envelopes (resulting from the highest and lowest values of

{\hat{K}}_{m m} (d)

calculated from the 1,000 simulations) and very close to the mean of

{\hat{K}}_{m m} (d)

(solid line), the graph confirms that equation (5) may well represent the proper benchmark to verify the presence of spatial concentration of economic activities.

Figure 5.

Mean of ${\hat{K}}_{m m} (d)$ estimated from 1,000 simulations of the marked point process following model 3 with parameters a = 0 and b = 1. The behavior of the empirical mean is represented by the solid line. The theoretical function given by (5) is reported in the graph as a dashed line.

As can be seen from equation (5), the functional form of the benchmark of spatial randomness is affected by the characteristics of the underlying random field $R (x)$ , that is the variance $σ_{R}^{2}$ and the spatial correlation function $ρ_{R} (d)$ . These two parameters define how firms tend to scatter in space up to each distance d in case of absence of any spatial interaction phenomena among economic agents. In particular, $σ_{R}^{2}$ and $ρ_{R} (d)$ regulate the shape and scale of the function in equation (5), respectively. Therefore, the correct estimate of the random field $R (x)$ is important to define properly the null hypothesis of spatial randomness.

The Statistical Significance of the Hypothesis of Spatial Randomness: A Monte Carlo Test

In order to identify the presence of absolute spatial concentration in an observed location pattern of firms, one can then evaluate the statistical significance of the deviations of the estimated mark-weighted K-function, ${\hat{K}}_{m m} (d)$ , from the values of the theoretical function, as represented by equation (5). However, the exact, or asymptotic, probability distribution of the estimator ${\hat{K}}_{m m} (d)$ is unknown, and hence it is needed to rely on Monte Carlo simulation techniques. To develop confidence bands for the values of $K_{m m}$ under the null hypothesis of spatial randomness, we should simulate n marked point patterns from the model described by equation (3) with a = 0. The calibration of the values of the model’s parameters in practical cases is however complex. This particular model specification is indeed characterized by overparameterization. Leaving to future research the purpose of defining a method to correctly estimate all the parameters, for the time being, in order to show the potentiality of the proposed methodology, we will refer to the following simplified specification:

m (x_{n}) = a Λ (x_{n}) + exp \{R (x_{n})\} .

In point of fact, adopting this simplified version of the model is like assuming that the value of the disturbance parameter b is fixed, equal to 1. With the aim of testing for the presence of spatial randomness, this assumption is however innocuous since the parameter b does not appear in the benchmark represented by equation (5).

The artificial patterns simulated by the model must have a number of points conditioned on the same number of points of the observed pattern. The values of the model parameters ( $μ_{S}$ , $σ_{S}^{2}$ , $ρ_{S} (d)$ , $μ_{R}$ , $σ_{R}^{2}$ e $ρ_{R} (d)$ ) must be adequately estimated from the data. For each simulated pattern, which is consistent with the null hypothesis since it is generated by the model with a = 0, ${\hat{K}}_{m m} (d)$ can then be computed. In this way, it is possible to obtain the approximate $n / (n + 1) Ã— 100 %$ confidence bands by taking, for each distance d, the highest and lowest values of the different functional ${\hat{K}}_{m m} (d)$ computed on the n simulations under the null hypothesis of spatial randomness. If the curve of the observed ${\hat{K}}_{m m} (d)$ lies, for some distance d, outside the plotted confidence bands, this will indicate a significant departure from the null hypothesis.

Concerning the estimate of the model parameters on the observed data, the values of $μ_{S}$ , $σ_{S}^{2}$ , and $ρ_{S} (d)$ of the spatial point intensity of the process, $Λ (x) = e x p \{S (x)\}$ , can be correctly estimated using the method of the minimum contrast based on the K-function (Diggle and Gratton 1984; Moller and Waagepetersen 2003). This method is a general estimation technique that allows to determine the parameters of the points-generating model that better fits the observed point pattern. The estimation procedure consists of finding the model parameters that lead to the minimum deviation between the theoretical K-function of the model and the observed K-function. Specifically, following Waagepetersen (2007), the values of the parameters of a log-Gaussian Cox process $\{Λ (x)\}$ that assures this minimum deviation are the values minimizing the following criterion function:

D (μ_{S}, σ_{S}^{2}, ρ_{S} (d)) = \int_{0}^{d max} {[{\{\hat{K} (u)\}}^{1 / 4} - {\{K (μ_{S}, σ_{S}^{2}, ρ_{S} (u); u)\}}^{1 / 4}]}^{2} d u,

where

K (μ_{S}, σ_{S}^{2}, ρ_{S} (d); d) = 2 π \int_{0}^{d} u exp \{σ_{S}^{2} ρ_{S} (u)\} d u

indicates the theoretical K-function of a log-Gaussian Cox process

\{Λ (x)\}

and

{\hat{K}}_{m m} (d)

is the estimator of the K-function computed on the observed data. The criterion function,

D (μ_{S}, σ_{S}^{2}, ρ_{S} (d))

, such as it is formalized here, is the integral of the squared difference of the fourth roots of the two functions (Waagepetersen 2007).

The values of the parameters $μ_{R}$ , $σ_{R}^{2}$ , and $ρ_{R} (d)$ of the Gaussian random field $\{R (x)\}$ of the residual process $exp \{R (x)\}$ can be estimated, conditionally on the choice of a parametric functional form for the spatial correlation function $ρ_{R} (d)$ , using the method of the maximum likelihood. For a comprehensive treatment of the maximum likelihood methods applied to Gaussian random fields, see Diggle and Ribeiro (2007). In this context, the random field $\{R (x)\}$ of interest can be adequately estimated referring to the following regression model:

ln \{m (x_{i})\} = R (x_{i}) + Z (x_{i}) : i = 1, . . ., n,

where the index

i = 1, . . ., n

refers to the observed locations,

m (x_{i})

is the mark associated with location

x_{i}

and the

Z (x_{i})

’s are iid

N (0, τ^{2})

random variables (Diggle and Ribeiro 2007).

A Case Study: The Localization of the High- and Medium-High-Technology Manufacturing Industry in the Metropolitan Areas of Milan and Turin

The empirical part of this article concerns the analysis of the spatial distribution of the high- and medium-high technology manufacturing firms,² operating between 1996 and 2004, located in the municipalities of Milan and Turin. The data set is a subset of the Analisi Informatizzata Delle Aziende (AIDA) archive (Bureau Van Dijk) which provides the geographic location and size (in terms of number of employees) of the productive plants of 162 and 24 limited companies operating, respectively, in Milan and Turin. These companies are all single plant and hence have a single production site. As a way of illustration, Figure 1 shows the spatial distribution of these firms in the municipalities of Milan (Figure 6a) and Turin (Figure 6b) by the means of marked point patterns where the location of each plant is identified by a circle with radius proportional to the number of employees of the firm.

Figure 6.

Spatial distribution of high and medium-high technology manufacturing firms in the municipalities of Milan and Turin in the period 1996–2004.

For both the location patterns of high- and medium-high-technology manufacturing firms, the presence of phenomena of spatial concentration has been tested, while controlling for the firms’ sizes, through the use of the mark-weighted K-function, as described in the section Measuring the Spatial Concentration of Firms Considering Size: The Mark-Weighted K-Function.

In order to evaluate the statistical significance of the deviations of the observed mark-weighted K-function from the hypothesis of random localization of firms, as identified by equation (5), we computed approximated 99.9% confidence envelopes from 999 simulations of the model of equation (6). More precisely, at each step of the simulation procedure, a marked point pattern has been generated from the model with a = 0 and the other parameters estimated from the data with the methods, described in the previous section on Minimum Contrast and Maximum Likelihood. Then, on the generated pattern, the mark-weighted K-function, ${\hat{K}}_{m m} (d)$ , has been computed. Reiterating this step 999 times and taking, for each distance d, the minimum and maximum values of the sequence of values of the 999 ${\hat{K}}_{m m} (d)$ s, it has been possible to obtain the confidence bands for the hypothesis of random location of firms and hence of absence of significant spatial concentration phenomena.

The two plots reported in Figure 7 show the behavior of function ${\hat{K}}_{m m} (d)$ at various distances d of the high- and medium-high-technology manufacturing firms located, respectively, in Milan and Turin. These graphs also depict the confidence bands for the null hypothesis of absence of spatial concentration at a significant level equal to $α = .001$ . The values of ${\hat{K}}_{m m} (d)$ are essentially interpretable in graphical terms. The values of d corresponding to peaks of ${\hat{K}}_{m m} (d)$ outside the confidence envelopes identify those distances at which there is significant spatial concentration. Looking at the graphs in Figure 7, we can then note that the high- and medium-high-technology manufacturing firms located in Milan tend to concentrate in space at distances greater than 0.5 km. On the contrary, the location pattern of the same kind of economic activities in Turin is not characterized by significant phenomena of spatial concentration.

Figure 7.

Behavior of the estimated mark-weighted K-function (continuous line) and the corresponding 99.9 percent confidence bands (dashed lines) for the high and medium-high technology manufacturing firms located in Milan (graph a) and Turin (graph b).

According to the spatial economics and regional sciences literatures, at least two general and antithetical localization phenomena regarding the innovation and technology intensive manufacturing industry can be identified: one phenomenon lead to spatial concentration in the large metropolitan areas (such as those of Milan and Turin); the other one produces location patterns where spatial interactions among economic agents are irrelevant (Arbia et al. 2010).

By the effect of the presence of knowledge spillovers and circulation of tacit knowledge (Storper and Venables 2003; Yeung, Coe, and Kelly 2007) or the existence of innovation milieux (Camagni 1991; Capello 1999), economic agents may tend to locate close to other firms with the aim of exploiting agglomeration advantages. The location pattern of Milan seems to be consistent with this stylized fact.

In contrast, according to the evolution of communication technologies, the geographic space may have a limited role in the formation of the location choices of economic agents (Sassen 1994; Castells 1996; Cairncross 2001). The decreased need of physical interaction for the transmission of knowledge and information may reduce the localization advantages due to spatial proximity. The location pattern of Turin seems to be consistent with this second stylized fact. A more in-depth study about the territorial differences between Milan and Turin in terms of agglomeration advantages would also require the analysis of other variables in addition to the information regarding the locations of firms.

Discussion and Conclusions

The spatial concentration of firms has long been a central issue in economics under both the theoretical and the applied point of view due mainly to the important policy implications. An approach to its measurement, that became recently very popular, makes use of micro data and looks at the firms as if they were dimensionless points distributed in the economic space. This approach is very attractive because it does not suffer from the problem of choosing an arbitrary partition of the economic space (such as e.g., regions, counties, or countries). However, in practical circumstances this is an excessive simplification since the points (firms) observed in the economic space are far from being dimensionless and are conversely characterized by different dimension measured in terms of the number of employees, the product, the capital, and so on. In the literature, the articles that introduced such an approach (e.g. Arbia and Espa 1996; Marcon and Puech 2003) disregard the aspect of the different firm dimension and ignore the fact that a high degree of spatial concentration may result from the case of many small points clustering in definite portions of space (as it is usually considered in the literature) but also from only few large points clustering together (e.g., few large firms). In other words they are not able to distinguish between two very different issues, namely the clustering of firms and the clustering of economic activities. The aim of this article was to introduce absolute measures of spatial concentration of firms based on an extension of Ripley’s K-function that accounts for the different firm dimension. In order to derive the null hypothesis of spatial randomness in this more complex environment, we developed a new stochastic model that generates marked point patterns of firms and is able to describe the various situations that could arise in empirical cases. In our model, the firm dimension is expressed as a function of the spatial intensity of the point process. According to the different values assumed by the model parameters, this could result either in larger points located in areas with high intensity or, conversely, smaller points located in areas characterized by high intensity. The first case is more grounded under the economic point of view where we can postulate that the same conditions that lead to a higher clustering of firms in some portions of space may also lead to the growth of the dimension of the existing firms. A good example is constituted by the action of the three Marshallian forces fostering agglomeration (Marshall, 1920). In his seminal work, Marshall emphasized that industrial agglomeration can be explained by the fact that firms try to locate near suppliers to save shipping costs, by the theory of labor market pooling and by the theory of knowledge spillovers. If some of the services are internalized in one leading big company than the same forces could produce a growth of the firms’ dimension rather than an increase in the number of firms located in the area. We would expect therefore that in most practical cases, the parameter a in equation (3) will be positive and large in absolute value. Similar arguments reinforcing this empirical expectation may be found in Krugman (1991).

On the basis of the stochastic model introduced here, we derived the corresponding mark-weighted K-function and, by making use of some simulated pattern, we presented evidence that this tool represents a proper mean to detect the presence of absolute concentration of firms keeping their dimension into account. The problem of calibrating the values of the model’s parameters in practical cases is complex and it is only partially undertaken here where we restricted ourselves to only the presentation of an inferential procedure to test if a = 0, which detects the presence of absolute concentration as a violation of the null hypothesis of spatial randomness. This procedure, however, does not allow to identify the relevant alternative hypothesis: clustering of firms (a < 0) or clustering of economic activities (a > 0). We will undertake such a problem in some future work.

Footnotes

Appendix A

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

References

Arbia

2006. Spatial Econometrics: Statistical Foundations and Applications to Regional Convergence. Berlin, Germany: Springer.

Arbia

Espa

. 1996. Statistica Economica Territoriale. Padua, Italy: Cedam.

Arbia

Espa

Giuliani

Mazzitelli

. 2010. “Detecting the Existence of Space-Time Clustering of Firms.” Regional Science and Urban Economics 40:311–23.

Arbia

Espa

Quah

. 2008. “A Class of Spatial Econometric Methods in the Empirical Analysis of Clusters of Firms in the Space.” Empirical Economics 34:81–103.

Besag

Diggle

P. J.

. 1977. “Simple Monte Carlo Tests for Spatial Pattern.” Applied Statistics 26:327–33.

Bickenbach

Bode

. 2008. “Disproportionality Measures of Concentration, Specialization, and Localization.” International Regional Science Review 31:359–88.

Boots

B. N.

Getis

. 1988. Point Pattern Analysis, Vol. 8. London, UK: Sage Scientific Geography Series, Sage.

Cairncross

2001. The Death of Distance 2.0; How the Communications Revolution will Change our Lives. Cambridge, UK: Harvard Business School Press.

Camagni

1991. “Local Milieu, Uncertainty and Innovation Networks: Towards a New Dynamics Theory of Economic Space.” In Innovation Networks: Spatial Perspective, edited by Camagni

, 121–44. London, UK: Belhaven Press.

10.

Capello

1999. “Spatial Transfer of Knowledge in High Technology Milieux: Learning Versus Collective Processes.” Regional Studies 33:353–65.

11.

Castells

1996. The Rise of the Network Society. Malden, MA: Blackwell.

12.

Diggle

P. J.

2003. Statistical Analysis of Spatial Point Patterns, 2nd ed. London, UK: Edward Arnold.

13.

Diggle

P. J.

Gratton

R. J.

. 1984. “Monte Carlo Methods of Inference for Implicit Statistical Models.” Journal of the Royal Statistical Society, Series B 46:193–212.

14.

Diggle

P. J.

Ribeiro

P. J.

2007. Model-Based Geostatistics. New York, NY: Springer.

15.

Dixon

2002. “Ripley’s K-Function.” In The Encyclopedia of Environmetrics, edited by El-Shaarawi

A. H.

Piergorsch

W. W.

, 1976–803. New York, NY: John Wiley.

16.

Duranton

Overman

H. G.

. 2005

“Testing for Localisation using Micro-Geographic Data.”

Review of Economic Studies 72:1077–106.

17.

Duranton

Overman

H. G.

2008. “Exploring the Detailed Location Patterns of UK Manufacturing Industries Using Microgeographic Data.” Journal of Regional Science 48:213–43.

18.

Ellison

Glaeser

E. L.

. 1997. “Geographic Concentration in U.S. Manufacturing Industries: A Dartboard Approach.” Journal of Political Economy 105:889–927.

19.

Fujita

Krugman

Venables

. 1999. The Spatial Economy: Cities, Regions, and International Trade. Cambridge: MIT Press.

20.

Gini

1912. “Variabilità e mutabilità.” Reprinted in Memorie di Metodologica Statistica, edited by Pizetti

Salvemini

. Rome.

21.

Gini

1921. “Measurement of Inequality of Incomes.” The Economic Journal 31:124–26.

22.

Haaland

J. I.

Kind

H. J.

Midelfart-Knarvik

K.H.

Torstensson

. 1999. “What determines the economic geography of Europe.?” Centre for Economic Policy Research, Discussion paper, 2072.

23.

L. P.

Stoyan

. 2008. “Modelling Marked Point Patterns by Intensity-Marked Cox Processes.” Statistics & Probability Letters 78:1194–99.

24.

Illian

Penttinen

Stoyan

. 2008. Statistical Analysis and Modelling of Spatial Point Pattern. Chichester, UK: John Wiley.

25.

Krugman

1991. Geography and Trade. Cambridge: MIT Press.

26.

Marcon

Puech

. 2003. “Evaluating the Geographic Concentration of Industries using Distance-Based Methods.” Journal of Economic Geography 3:409–28.

27.

Marcon

Puech

. 2010. “Measures of the geographic concentration of industries: improving distance-based methods.” Journal of Economic Geography 10(5):745–762.

28.

Marshall

1920. Principles of Economics, revised edition. London, UK: Macmillan.

29.

Møller

Syversveen

A. R.

Waagepetersen

R. P.

. 1998. “Log Gaussian Cox Processes.” Scandinavian Journal of Statistics 25:451–82.

30.

Moller

Waagepetersen

. 2003. Statistical Inference and Simulation for Spatial Point Processes. Boca Raton, FL: Chapman and Hall/CRC.

31.

Mori

Nishikimi

Smith

T. E.

. 2005. “A divergence statistic for industrial localization.” Review of Economics and Statistics 87:635–651.

32.

Penttinen

2006. “Statistics for Marked Point Patterns.” The Yearbook of the Finnish Statistical Society 2006: 70–91.

33.

Ripley

B. D.

1976. “The Second-Order Analysis of Stationary Point Processes.” Journal of Applied Probability 13:255–66.

34.

Ripley

B. D.

1977. “Modelling Spatial Patterns (with discussion).” Journal of the Royal Statistical Society, B 39:172–212.

35.

Sassen

1994. Cities in a World Economy. London, UK: Pine Forge Press.

36.

Storper

Venables

A. J.

. 2003. “Buzz: Face-to-Face Contact and the Urban Economy.” Journal of Economic Geography 4:351–70.

37.

Waagepetersen

2007. “An Estimating Function Approach to Inference for Inhomogeneous Neyman-Scott Processes.” Biometrics 63:252–58.

38.

Yeung

H. W.

Coe

Kelly

. 2007. Economic Geography. Introduction to Contemporary. London: Blackwell Publishing.