Abstract
One should expect real estate sales, and properties listed as for sale, to be concentrated on market hotspots. Using data of real estate listings from San José, Costa Rica, this expected clustering is examined using point pattern processes of detached housing, apartments, and vacant lots. Non-stationary G and J functions describe the patterns and their interactions. Potential determinants of the point pattern were selected based on previous studies and theory. Their effect on the point pattern was estimated using an inhomogeneous Poisson model, with its intensity a lognormal function of the determinants. Results show detached houses, apartments, and lots are all clustered point patterns. The cross density (joint G function) of houses with apartments and with lots exhibits clustering, suggesting the patterns are related; however, the cross density of apartments and lots is no different from a Poisson distribution (they are not related). The inhomogeneous Poisson model with Euclidean distance to the central business district (CBD), nearest municipal center, and nearest main road, as well as elevation and slope, proved better than homogeneous Poisson models in explaining the point patterns of houses, apartments, and lots.
Introduction
The analysis of urban land markets plays a central role in designing policies to balance basic conflicts among human actors in cities. Urban land values are closely linked to the configuration of land use patterns (Irwin and Geoghegan 2001). Because land is a unique commodity—its price determined by both the amount of land and its location (Smolka and Goytia 2019)—land market dynamics are complex and give rise to distortions such as land speculation (Louw 2005), residential segregation (Smolka and Goytia 2019), and environmental impacts (Mehrhoff 2005). Furthermore, public policy interventions into urban systems that aim to improve on these distortions may themselves have unintended negative consequences (Smolka and Goytia 2019); in particular, higher income groups have often used land use controls to exclude disadvantaged populations by adopting stricter development rules (Trounstine 2020); additionally, the distribution of the benefits of land use regulations may favor land owners at the expense of housing consumers (Quigley 2007). Land markets also present unique opportunities to promote equitable development, such as value capture as an instrument of urban finance (Smolka 2019).
The analysis of urban land markets has concentrated on explaining land and property prices (Smolka and Goytia 2019; Paci et al. 2017; Winson-Geideman and Evangelopoulos 2013), a variable theoretically determined by spatial differentials in accessibility (Smolka and Goytia 2019; Brueckner 1987). These analyses have been operationalized through hedonic price modeling (Cheshire and Sheppard 1995; Kain and Quigley 1970; Rosen 1974) that disaggregate the price of each attribute in a bundle, that is, a complex good. In this way, sales prices are linked to diverse attributes of real estate: intrinsic, of their location, and of their surrounding environment (Liu and Roberts 2012). While the focus on land values is understandable because of their importance for policy making and in the configuration of urban land patterns, other possibly important features have been neglected. In particular, as pointed out in Paci et al. (2017), the variation in land prices is modeled conditional on the set of locations for which data was recorded—and these are taken as fixed. Exploring the locations at which sales are offered, that is, the pattern of locations itself, is a different and understudied problem (Paci et al. point to an illuminating analogy with ecological modeling: interest in property prices is analogous to studying a feature or trait of a given species in space; the study of real estate listing locations, on the other hand, would be related to the question of where individuals of a species were observed).
Real estate sales and real estate listings can be thought of as point patterns, the realization of a stochastic point process: observation of events of interest within a bounded region (Bivand et al. 2008). The analysis of point patterns has a long tradition in ecology, explaining the spatial distribution of species, or in spatial epidemiology and the clustering of diseases (Bivand et al. 2008; Baddeley et al. 2016). Since the data used for hedonic price models consists mainly of point locations with known price (and other attributes), the paucity of point pattern analysis of real estate is perhaps surprising. The analysis of spatial autocorrelation of the price variable, and its incorporation into hedonic price models, is ubiquitous in land market analysis (Devaux and Dubé 2016). Yet with the exception of the work by Paci et al. (2017, 2020), no other study relating to real estate sales as point patterns seems to have been reported in the literature. Paci et al. (2017) developed models of inhomogeneous Poisson and log Gaussian Cox processes to predict intensity; this intensity was later incorporated into a hierarchical hedonic price model (Paci et al. 2020). Though the point pattern indeed was found to be non-random and best predicted by the log Gaussian Cox process (Paci et al. 2017), its effect was small when locations were substantially biased and none when the full dataset was used (Paci et al. 2020), suggesting problems introduced by preferential sampling could be generally minor. Preferential sampling is a source of bias that results from the lack of independence between the process being modeled, for example, land value patterns and the process that determines data locations; in this case, the density of known locations with land value data is a function of determinants similar to those of the land value pattern itself. Gelfand et al. (2012) argued the bias is more important for predictions rather than parameter estimates in geostatistical modeling.
The objective of this paper is to develop a spatial statistical analysis of real estate listings as a point pattern that improves understanding of their non-random location. Unlike the previous effort by Paci et al. (2017), the focus of this study is on the description of the patterns and their implications rather than on the methodological issue of preferential sampling. For this reason, in addition to modeling, a wider array of descriptive analysis was used. In addition, the case study of San José affords an opportunity to explore the importance of geography in explaining the point pattern (given it is a polycentric urban region with a very irregular terrain).
The paper reports on these methods and results as follows: both the methodology and the results sections, first, describe the point patterns of detached houses, apartments, and vacant lots (using kernel densities, G and J functions, as well as the cross G function to compare two patterns), and second, they report on inhomogeneous Poisson models for each pattern based on a limited set of determinants: Euclidean distances to the central business district (CBD), nearest municipality, nearest main road, elevation, and slope.
Methodology and data
Given the relations between property and land price, urban development and land cover change, and the real estate market (as a description of the complex interactions between urban agents), there is an expectation that the point pattern of real estate listings is actually clustered. This implies an inhomogeneous Poisson process, in the sense that any given point (property) may enter the market at a random moment, and therefore is randomly located. However, the intensity of this randomness systematically varies with the determinants of the behavior of urban agents (in particular, their locational preferences).
In addition to this basic hypothesis, there is also an expectation that different segments of the real estate market respond to different preferences from urban agents. Consequently, their patterns should be clustered but in distinct ways. Specifically, detached houses, apartments (mostly in buildings), and vacant lots may exhibit different distributions, albeit all clustered. Moreover, although they may all share a basic set of determinants, each of these likely impacts the considered point patterns in a different way.
The methodology described in this section, therefore, must address (1) the determination of clustering in the point pattern and (2) the relation of this clustering with the potential geographical determinants of the point pattern.
Descriptive statistics and spatial clustering in the point process
The point patterns were described using a Gaussian kernel density with a set bandwidth. The likelihood cross-validation criterion was maximized to determine this bandwidth (Baddeley et al. 2016): this criterion compares (the sum of the logarithms of) the density estimate at X i , obtained excluding the data point at i, with n times the integral of the density function; a full discussion is presented in section 5.3 of Loader (1999). This criterion was chosen because it assumes the point pattern which follows an inhomogeneous Poisson process, which is the working hypothesis—as clustering is expected to result in a varying intensity across the pattern.
To determine whether the point patterns are clustered, spatial statistical analysis—specifically, the inhomogeneous G and J functions (Baddeley et al. 2016)—were used.
The G function, called the nearest neighbor function, is the probability density of the distances between any given point in the pattern (x
i
) and the nearest point of the pattern (
The empty space (F) function is a similar measure, but of the average space left between events (points in the pattern). It is similarly defined as the G function, except that it describes the probability of finding an event (point of the pattern) within a radius r of any given point within the bounded region of the point process. For a homogeneous Poisson process, F
pois
(r) is also given by
The ratio of the probability that the nearest neighbor is at a distance greater than r, to that of the distance of the empty space probability also being greater than r, is called the J function
Inhomogeneous G and J functions were proposed by van Lieshout (see Baddeley et al. 2016). Assuming the k correlation for k ≥ 2 is invariant under translation and that, if λ(u) is the true intensity of the point process
Testing the relation between two patterns can be achieved by estimating the multitype G function—in practice, the cumulative distribution of distances from a point in subset I to a point in subset K. The inhomogeneous version of this function
The interpretation of the G and J functions is straightforward: if G(r) > G
pois
(r) and J(r) < 1, then distances in G(r) are shorter than would be expected from a random pattern; therefore, such patterns are clustered. Conversely, when G(r) < G
pois
(r) and J(r) > 1, distances are longer than would be expected from a purely random pattern and, thus, the point pattern is regular. In a similar fashion, when
The spatial analysis of point patterns was performed using package spatstat (Baddeley et al. 2016) from R (R Core Team 2022).
Modeling the determinants of the point process
The determinants of the real estate listings point process were assumed to be similar to those of property prices. Previous studies (Pérez-Molina 2022) and theoretical developments (Koomen and Stillwell 2007) have explored the role of these determinants. Five basic patterns were selected: slope, as flatter areas are more desirable than steeper locations; elevation, because relatively higher elevations correspond to the less attractive periphery of the region; Euclidean distance from the CBD and from the nearest municipal center, as main and secondary urban centralities (Euclidean distance was chosen to reduce endogeneity introduced by the local network); Euclidean distance from main roads, being close to which represents especially good accessibility.
These determinants were explored by: (1) generating histograms of each variable at the different locations of the point pattern, as well as for a sample of the entire region, and contrasting them and (2) by the estimation of an inhomogeneous Poisson point process model, with
Study area and data
The San José Metropolitan Region is an urban agglomeration of four metropolitan areas (San José, Alajuela, Cartago, and Heredia) in Costa Rica. It has a total area of 1780km2 and an urban growth boundary enclosing 25% of it. According to the latest census (2011), it had a population of 2.3 million residents, with half living in San José and the other half, distributed approximately in the same proportion between the other metropolitan areas (Centro Centroamericano de Pobación 2012).
The San José Metropolitan Region is located in a tectonic depression, at the continental divide; it is oriented in an east-west direction. It is constituted by the upper reaches of the Virilla river catchment (which drains towards the Pacific Ocean and includes San José, Alajuela, and Heredia) and of the Reventazón river (that drains towards the Caribbean Sea and where Cartago is located). Deep river canyons and relatively steep mountains act as physical barriers within the region, conditioning human settlement patterns and infrastructure (Bergoeing 2017; Pujol-Mesalles 2005).
The spatial distribution of the compiled real estate listings are shown in Figure 1. They correspond to a sample of 5859 detached houses, 2679 apartments, and 1579 vacant lots downloaded during 2020 and 2021 from an aggregator site, encuentra24. Real estate listings were classified in each advertisement; records within the San José Metropolitan region and with known location were selected. Real estate listings pattern in the San José Metropolitan Region (left) and density of real estate listings (right) expressed in the number of real estate listings per km2. The density is obtained as a kernel function, with bandwidth of 525 m.
Results and discussion
Spatial clustering and interactions of houses, apartments, and lots
Likelihood cross-validation bandwidth estimates.
The houses point pattern is roughly distributed over the entire built area of the region. The density shows intensities are larger towards the central-western part of the region (San José and likely along national rout 27, through Escazú and the higher income areas which are the main regional hotspot for real estate). Compared to it, the apartments point pattern is less extensive and more spatially concentrated near the regional CBD, which is the municipality of San José proper. Indeed, one can see the dark blue core area of higher densities is much larger in the houses than in the apartment’s density kernel. Despite their smaller number, the pattern of vacant lots is more dispersed, with a concentration towards the eastern part of San Jośe (Montes de Oca-Curridabat-La Unión); this is the second hotspot for the real estate market, also socially associated with higher middle class. Unlike the western part, however, there is more developable land and smoother terrain, which contribute to explain the greater relative availability of lots.
The densities generated for the point patterns coincide with theoretical expectations. Apartments, for example, are generally part of multi-storey buildings (or, at the very least, relatively dense attached housing units). They exhibit a greater structural density (the ratio of capital to land) than other building forms. As is predicted by urban microeconomic theory (Brueckner 1987), structural density is a function of distance to the CBD, with greater densities (and therefore also larger intensities of apartments) in central locations (in this case, the center of San José). Vacant lots, on the contrary, should be located in relatively peripheral areas (it should be noted that, while the dark blue zone of greater intensity is relatively central to the region as a whole, it is in the periphery of the San José metropolitan area; in addition to which, the points themselves are distinctly more dispersed than for houses or apartments). They had not been developed because land rents in the periphery likely did not justify, until 2020–2021, their conversion into urban land and later construction (Irwin and Geoghegan 2001). They have the lowest structural density and, thus, are also predicted to be located in the periphery of each city (Brueckner 1987).
The point pattern of houses corresponds to the norm, in the sense that detached housing occupies most of the urban area and can be found—at lower densities—over the entire region. One should note that the San José Metropolitan Region has land use regulations that define relatively low floor-to-area ratios for most municipalities (for the 20 out of 31 municipalities with regulation; the other 11 have similarly restrictive floor-to-area ratios from national building codes), as has been discussed by Pérez-Molina et al. (2023). Such stringent constraints have been found to cause sprawling development patterns (Bertaud and Brueckner 2005), which likely has contributed to the point pattern of detached housing’s extension and relatively large zones of great intensity.
The inhomogeneous G-functions (Figure 2) provide strong evidence of clustering: they are all clearly above and far outside of the 95% confidence interval of randomness. The inhomogeneous J functions (Figure 3) provide similar corroborating evidence: they depart at very small distances from the value of 1, are shown to be below it, and evidently out of the 95% confidence band. Inhomogeneous G functions for houses (left), apartments (center), and lots (right) with 95% confidence bounds in gray. Inhomogeneous J functions for houses (left), apartments (center), and lots (right) with 95% confidence bounds in gray.

Figure 4 shows Cross-density curves for houses-apartments (left), houses-lots (center), and lots-apartments (right).
Spatial modeling of houses, apartments, and lots
The contrast between the histograms for a sample of the region (first column of histograms in Figure 5) and the values corresponding to each point pattern (second to fourth columns in Figure 5) can be seen to be substantial in all cases but for distance to main roads and, perhaps, elevation. Recall the histograms of the first column represent the overall pattern of the region, whereas the histograms of columns two to four correspond to the variation, for each factor, of the point pattern. Therefore, should the statistical distribution of the region differ from that of a point pattern for any given factor, this difference represents how the factor contributes to determine the point pattern. Histograms of potential determinants of real estate listings patterns.
For distance to CBD, clearly the distribution for the region is symmetrical and relatively flat between 10 and 30 km; in contrast, for all point patterns, the distributions are right skewed with lower median values (19 km for the region, 9 km for houses and lots, and 6 km for apartments). Consistently with expected theoretical locations closer to the regional CBD, the point pattern for apartments presents a lower median than the corresponding houses and lots point patterns.
Distance to nearest municipal centers and distance to main roads in Figure 5 are similarly right skewed for all histograms, but the right tail includes a larger proportion of the data for the region when compared to any of the point patterns. In addition, the range of the variable is nearly double that of the point patterns. This association is to be expected, as undeveloped land (where real estate listings should be rare) is farther from centralities than zones where most real estate transactions (and thus listings) were located.
Elevation and slope are also different for the region relative to the point patterns, and the point patterns themselves—while showing some differences—are similar to each other. The median elevation in the sample of regional values is 1351 mamsl and its median slope, 22.1%; for the houses, apartments, and lots point patterns, the respective median values are 1135, 1113, and 1183 (nearly identical), and the median slope percentages, 6.1, 5.9, and 7.2 (again very similar, though houses and lots have greater slope percentage ranges than apartments).
Fitted trend coefficients for estimated inhomogeneous Poisson models.
All trend coefficients and likelihood ratio tests are significant at p < .01.
The trend coefficients reported in Table 2 are all highly significant, as should perhaps be expected. Comparing between the models for different point processes provides additional evidence on the relation between intensity and microeconomic theory. The trend coefficients for distance to CBD are all negative; for apartments, this coefficient is an order of magnitude larger than for houses or lots, underscoring the importance of accessibility to the most central locations (associated in turn and as was discussed, to a built form of greater structural density). On the contrary, while distance to nearest municipality is negatively related to the intensity of the houses and lots point patterns (in effect, indicating these function as secondary centralities), it is positive and an order of magnitude lower for apartments. This result suggests apartments are a phenomenon concentrated in the San Jośe metropolitan area and not in Alajuela, Cartago, and Heredia: places where half the municipal centers are located yet that show much lower intensities in the apartments point pattern. This is consistent with census from 2011: 63% of all rented homes in the metropolitan region were located in San José rather than on the other metropolitan areas (Centro Centroamericano de Pobación 2012). Other determinants (distance to nearest main road, elevation, and slope) have similar signs and trend coefficients for all point patterns. An interpretation of this result is that these variables control for the difference between the urban zones and the peripheral rural zones of the region (and that all three point patterns are similarly located within the more urbanized area).
Concluding remarks
The spatial statistical analysis of real estate listings as point patterns results in three main conclusions. (1) The point patterns of houses, apartments, and lots are all evidently clustered (as can be seen in Figures 2 and 3). However, the clustering is distinct for each segment of the real estate market (see Figure 1). (2) The point pattern of houses is correlated with the point patterns of lots and of apartments, but the latter two do not seem correlated (see Figure 4). This finding is consistent with theoretical expectations on the patterns of structural density and their relation to accessibility to urban centralities. (3) When analyzing the determinants of the point processes, the selected determinants resulted in significant relationships to the point patterns. The signs of coefficients in general coincide with theoretical expectations; in particular, as distance increases from the CBD, the point pattern intensity decreases. The apartment’s inhomogenous Poisson model showed some differences when compared to houses and lots, likely because accessibility to the regional CBD is a more important trait and accessibility to municipal centers, less important for this category of real estate listings.
In synthesis, the analysis of the point processes has provided valuable information on urban real estate markets that may inform public policy. Two findings are of special interest. First, the point pattern of lots and its relation to the determinants resemble, to a degree greater than expected, that of detached houses. This suggests much of the recorded listings are vacant lots in urbanized areas (since the pattern of detached houses is a good proxy for overall urban development) rather than concentrated on the regional periphery. While deeper analysis is required, (1) it is likely that lot transactions (as a precursor of urban development) do not imply significant environmental impacts and (2) the development of additional vacant land for urban uses in the periphery, as judged by the supply patterns analyzed, seems unwarranted. Second, the concentration of apartment listings in San José suggests the need to increase the spatial diversity of their supply (in particular, to other metropolitan areas). In addition, it may be productive to expand the point pattern analysis from sales to real estate listings of properties for rent (in order to assess if their supply is similarly limited in space to San José).
The location of available real estate for sale is distinctly and systematically determined by geographic factors, as should be expected from such a complex setting. Future work on the point pattern analysis of real estate markets in San José could aim to find spatio-temporal patterns (and in particular, given the collected data is for 2020 and 2021, any short run variations related to the COVID-19 pandemic); in addition, previous hedonic price models of the region (Pérez-Molina 2022) would benefit from a critical reappraisal and the consideration of preferential sampling bias, along the lines of Paci et al. (2020).
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Data Availability Statement
The data analyzed in this article was compiled as part of project ED-3466 Asesoría técnica y económica a la política económica, Escuela de Economía, Universidad de Costa Rica.
