Abstract
The process of urbanization is one of the most important phenomenon of our societies and it is only recently that the availability of massive amounts of geolocalized historical data allows us to address quantitatively some of its features. Here, we discuss how the number of buildings evolves with population and we show on different datasets (Chicago, 1930–2010; London, 1900–2015; New York City, 1790–2013; Paris, 1861–2011) that this ‘fundamental diagram’ evolves in a possibly universal way with three distinct phases. After an initial pre-urbanization phase, the first phase is a rapid growth of the number of buildings versus population. In a second regime, where residences are converted into another use (such as offices or stores for example), the population decreases while the number of buildings stays approximately constant. In another subsequent phase, the number of buildings and the population grow again and correspond to a re-densification of cities. We propose a stochastic model based on these simple mechanisms to reproduce the first two regimes and show that it is in excellent agreement with empirical observations. These results bring evidences for the possibility of constructing a minimal model that could serve as a tool for understanding quantitatively urbanization and the future evolution of cities.
Introduction
Understanding urbanization and the evolution of urban system is a long-standing problem tackled by geographers, historians, and economists and has been abundantly discussed in the literature but still represents a widely debated problem (see Champion, 2001). The term urbanization has been used in the literature with various definitions, and depending on the particular choice has been considered as a continuous or an intermittent process. In particular, urbanization measured by the fraction of individuals (in a country for example) living in urban areas describes a continuous process that gradually increased in many countries with a quick growth since the middle of the 19th-century until reaching values around 80% in most European countries (Antrop, 2004). Another definition has been introduced by Fielding (1982) and presented by Geyer and Kontuly (1993) as a a theory of differential urbanization where it is assumed that in general we observe the three regimes of urbanization, polarization reversal and counter-urbanization, and that are characterized by a gross migration which favors the larger, intermediate, and small-sized cities, respectively.
Another approach to study urban changes is presented in the stages of urban development proposed by Hall (1971). According to this model, the city has a life cycle going from an early growing phase to an older phase of stability or decline, and four main intermediate phases of development are identified. The first one called urbanization consists of a concentration of the population in the city core by migration of the people from outer rings. The second phase of suburbanization is characterized by a population growth of the urban agglomeration as a whole but with a population loss of the inner city and an increase in urban rings. During the third phase of (counterurbanization or disurbanization), the urban population decreases both in the core and the ring. Finally, the last phase of reurbanization displays a re-increase of the urban population. Within this framework, we observe that for most post-second war western countries, urbanization was dominating in the 1950s followed by a suburbanization in the 1960s during which the population moved from the city core to the suburbs. The standard theory of suburbanization suggests that it is driven by a combination of technological progress (leading to transport infrastructure development) and rising incomes (Anas et al., 1998; Antrop, 2004; Glaeser and Kahn, 2003). In the 1970s, we observe in many urbanized areas a regime of counter-urbanization where the population decreases. The significance of this regime and of the re-urbanization period for the 1980s and beyond, and more generally the possibility of a cyclic development are controversial topics (see for example Champion, 2001).
Urban development and the spatial distribution of residences in urban areas are obviously long-standing problems and were indeed discussed in many fields such as geography, history and economics. Few of these approaches tackled this problem from a quantitative point of view (Beckmann, 1969; Harrison and Kain, 1974; Herbert and Stevens, 1960; Makse et al., 1995; Meuriot, 1898; Mills, 1967; Taylor, 1949; Tobler, 1970; Wheaton, 1982). Among the first empirical analysis on population density, Meuriot (1898) provided a large number of density maps of European cities during the 19th-century, and Clark (1951) proposed the first quantitative analysis of empirical data. Anas (1978) presented an economic model for the dynamics of urban residential growth where different zones of a region exchange goods, capital, etc. according to some optimization rule. In this same framework, the authors of Allen and Sanglier (1981) proposed a dynamical central place model highlighting the importance of both determinism and fluctuations in the evolution of urban systems. For a review on different approaches, one can consult Benenson (1999), where studies are presented that use model population dynamics in cities and in particular the ecological approach, where ideas from mathematical ecology models are introduced for modeling urban systems. An example is given by Dendrinos and Mullally (1982) where phase portraits of differential equations bring qualitative insights about urban systems behavior. Other important theoretical approaches comprise the classical Alonso-Muth-Mills model (Anas et al., 1998) developed in urban economics, and also numerical simulations based on cellular automata (O’Sullivan and Torrens, 2001). More recently, the fractal nature of city structures (see the review Tannier and Pumain, 2005) served as a guide for the development of models (Batty and Longley, 1994; Makse et al., 1995, 1998). In particular, in Makse et al. (1998), the authors proposed a variant of percolation models for describing the evolution of the morphological structure of urban areas. This coarse-grained approach however neglects all economical ingredients and suggests that an intermediate way between these purely morphological approaches and economical models should be found.
For most of these quantitative studies however, numerical models usually require a large number of parameters that makes it difficult to test their validity and to identify the main mechanisms governing the urbanization process. On the other hand, theoretical approaches propose in general a large set of coupled equations that are difficult to handle and amenable to quantitative predictions that can be tested against data. In addition, even if a qualitative understanding is brought by these theoretical models, empirical tests are often lacking.
The recent availability of geolocalized, historical data (such as in Perret et al., 2015 for example) from world cities (Angel et al., 2012) have the potential to change our quantitative understanding of urban areas and allows us to revisit with a fresh eye long-standing problems. Many cities created open-data websites (Mason, 2013) and the city of New York (US) played an important role with the release of the PLUTO dataset (short for Property Land Use Tax lot Output), where tax lot records contain very useful information about the urbanization process. For example, in addition to the location, property value, square footage etc., this dataset gives access to the construction date for each building. This type of geolocalized data at a very small spatial scale allows to monitor precisely the urbanization process in time.
These datasets allow in particular to produce ‘age maps’ where the construction date of buildings is displayed on a map (see Figure 1 for the example of the Bronx borough in New York City). Many age building maps are now available: Chicago (Jacobsen, 2013), New York City, (Brandom, 2016), Ljubljana, Slovenia, (Plahuta, 2013), Reykjavik, Iceland, (Riggott, 2016), etc. In addition to be visually attractive (see for example, Morphocode, 2016; Palmer, 2014), these maps together with new mapping tools (such as the urban layers proposed in Morphocode, 2016) provide qualitative insights into the history of specific buildings and also into the evolution of entire neighborhoods. Palmer (2013) studied the evolution of the city of Portland (Oregon, US) from 1851 and observed that only 942 buildings are still left from the end of the 19th-century, while 75,434 buildings were built at the end of the 20th-century and are still standing, followed by a steady decline of new buildings construction since 2005. Inspired by Palmers map, Plahuta (2013) constructed a map of building ages in his home town of Ljubljana, Slovenia, and proposed a video showing the growth of this city from 1500 until now (O’Hara, 2016). Plahuta observed that the number of new buildings constructed each year displays huge spikes that signalled important events: an important spike occurred when people were able to rebuild a few years after a major earthquake hit the area in 1899, and other periods of rebuilding occurred after the two world wars. In the case of Los Angeles (USA), the ‘Built:LA project’ shows the ages of almost every building in the city and allows to reveal the city growth over time (Ureta, 2016).
Map of buildings construction date for the case of the Bronx (New York City, US). Most of the buildings were constructed during the beginning of the 20th-century, followed by the construction in some localized areas of buildings in the second half of the 20th-century (see Supplemental materials for details on
the dataset).
These different datasets allow thus to monitor at a very small spatial resolution urban processes. In particular, we aim to focus on a given district or zone, without considering for the moment their position and their role in the whole urban agglomeration they belong to. We ask quantitative questions about the evolution over time of the population and of the number of buildings, and we aim to understand if different districts of different cities can be compared. Surprisingly enough, such a dual information is difficult to find and – up to our knowledge – was not thoroughly studied at the quantitative level (except at a morphological level with fractal studies, Tannier and Pumain, 2005). Here, we use data for different cities (Chicago, 1930–2010; London, 1900–2015; New York City, 1790–2013; Paris, 1861–2011) in order to answer questions about these fundamental quantities. We want to remark that although these cities are among the most urbanized ones, they are characterized by quite different historical paths, with US cities being usually ’younger’ compared to the European ones. Chicago for example is a young city founded at the beginning of the 19th-century, and Paris instead has an history of about two thousand years.
More precisely, in this study we will show that the number of buildings versus the population follows the same unique pattern for all the cities studied here. Despite the small number of cities analysed, the strong similarities observed suggest the possibility of a universal behavior that can be tested quantitatively. In order to go further in our understanding of this unique pattern, we propose a theoretical model and empirical evidences supporting it.
Empirical results
We investigate the urban growth of four different cities: Chicago (US), London (UK), New York (US), and Paris (France). We discuss here urbanization from the point of view of two dual aspects. First, we consider the evolution of the population of urban areas and second, the evolution of the number of buildings. These aspects thus concern both an individual-related aspect (the population) and an important physical aspect of cities, the buildings.
We study here age maps and in order to go beyond a simple visual inspection of these objects, we study how the number of buildings varies with the population. In most datasets, we essentially have access to buildings that were built and survived until now. In this respect, we do not take into account the destruction, replacement or modifications of buildings. Although replacement or modifications do not alter our discussion, replacement with buildings of another land-use certainly has an impact on the evolution of the population and could potentially lead to a major impact on the evolution of cities. As we will see in our model, this can be in a way encoded in the ‘conversion’ process where a residential building is converted into a non-residential one. The important point is to describe the temporal evolution of buildings and their function, and we encode all these aspects in the simpler quantity that is the number of buildings. Further studies are however certainly needed in order to clarify the impact of these points on our results.
The urbanization process can be described by many different aspects and we will concentrate on two main indicators. Urbanization is about concentration of individuals and the first natural parameter is the population. Urbanization is also about built areas and in order to describe the physical evolution of a city, the natural parameter is the number of buildings (for a given area). Once both these parameters are known (density of population and of buildings), we already have an important piece of information. The following question is how these two parameters relate to each other, and it is then natural to plot the number of buildings versus the population when the city evolves. This ‘fundamental’ diagram contains the core information about the urbanization process and will be the focus of this study.
Choice of the areal unit
An important discussion concerns the choice of the scale at which we study the urbanization process. We have to analyse the processes of urban change at a spatial scale that is large enough in order to obtain statistical regularities, but not too large as different zones may evolve differently. Indeed geographers observed that the population density is not homogeneous and decreases in general with the distance to the center (Bertaud and Malpezzi, 2003; Guérois and Pumain, 2008). Also, during the evolution most cities tend to spread out with the density decreasing in central districts and increasing in the outer ones (Clark, 1951) and indeed in the literature, the core of the city is often analysed in relation to its suburbs. In this study, we aim to simplify the analysis and we focus on a fixed area without considering its role in the whole urban agglomeration; nevertheless we would like this area to be mostly homogeneous and not mixing zones behaving in different ways.
We choose to focus here on the evolution of administrative districts of each city. At this level, data are available and we can hope to exclude longer term processes. We will show in the following that even if this choice appears as surprising, districts in the different cities considered here display homogeneous growth. More precisely, we consider the 5 boroughs of New York, the 9 sides of Chicago, the 20 arrondissements of Paris and the 33 London districts. Also, in this way, we do not have to tackle the difficult problem of city definition and its impact on various measures (see for example, Arcaute et al., 2015) and focus on the urban changes of a given zone with fixed surface area. The datasets for these cities come from different sources (see the Supplemental material) and cover different time periods. 1930–2010 for Chicago, 1900–2015 for London, 1790–2013 for New York, and 1861–2011 for Paris. An important limitation that guided us for choosing these cities is the simultaneous availability of building age and historical data for district population.
The cities studied here display very different scales, ranging from Paris with 20 districts for 2–3 million inhabitants and an average of Homogeneity of growth in districts. Average distance between buildings at a given time (this distance is normalized by the maximum distance found for each district). Top: Chicago (central and far southwest sides). Middle: New York City (Manhattan and Staten Islands). Bottom: Paris (1st and 14th arrondissements). The dotted line represents the average value computed for a random uniform distribution and the grey zone the dispersion computed with this null model.
Population density growth
In order to provide an historical context, we first measure the evolution of the population density and then analyse the evolution of the number of buildings in a given district as a function of its population. In Figure 3, we show the average population density for the four cities studied here. This plot reveals that these different cities follow similar dynamics, at least at a coarse-grained level. After a positive growth and a population increase that accelerates around 1900, we observe a density peak. After this peak, the density decreases (even sharply in the case of NYC) or stays roughly constant. This decreasing regime is associated to the post Second World War years, defined by geographers as the suburbanization/counter-urbanization period. In the last years, New York City, Paris and London display a re-densification period. The possibility of this latter period has been proposed in some cyclic model as the stages of urban development (Hall, 1971). Nevertheless, evidences or interpretations about this phase are still an highly discussed topic. At least, this first figure highlights the existence of a seemingly ‘universal’ pattern governing the urban change process, probably driven by technological changes.
Population density versus time. The average population density versus time for the four cities studied in the paper. All these cities display a density peak in the first half of the 20th-century (see the Supplemental material for details on datasets).
However, at the smaller scale of districts, these large cities display different behaviors shown in Figure 4 where we plot the time evolution of some district densities (all results are presented in the Supplemental material). In the case of London (Figure 4, top panels), we note that the district City of London reached a density peak before 1800 while other districts (for example, Lewisham, Brent and Newham) display all the different phases of urbanization described above. For Chicago (see the Supplemental material) and Paris (Figure 4, bottom panels), the different districts are not all synchronized and display simultaneously different urbanization phases. The central districts of Paris (the 1st and the 4th for example) typically reached their density peak before 1860, while less central districts (11th to 20th) reached their density peak in the first half of the 20th-century, consistently with the idea of a centrifugal urbanization process.
Population density versus time. Local population densities for a selection of London districts (top), and a selection of Paris arrondissements (bottom). For the sake of clarity we did not plot all the districts studied and additional results can be found in the Supplemental material.
For the five boroughs of New York (see the Supplemental material), we observe that Manhattan (MN), the Bronx (BX) and Brooklyn (BK) already passed through the different phases of urbanization, and are now in a re-densification period. In contrast, Staten Island (SI) and Queens (QN) are still in the urbanization period characterized by a positive population growth rate and did not reach yet a density peak.
These preliminary results highlight the importance of spatial delimitations when studying a city. The dynamics of different districts might be the same as also suggested by qualitative models presented in the introduction, but are not necessary simultaneous mainly because of the difference between districts belonging to the core of the city and districts belonging to the ring, and further the distance from the core of the city, later the district will reach the second phase. For this reason, we will not consider in the following cities as a whole, but rather follow the evolution of various quantities for each district which display a better level of homogeneity.
We note here that a large number of empirical studies have already been performed where the densification and the disurbanization phase were observed Fielding (1982), Frey (1990), Nucci and Long (1995), Champion (1989), Beale (1975), and Berry (1976). In most of these studies, the analysis was performed focusing on the dependence between the behavior of the core and of the ring districts or on the size of the urban agglomeration.
Number of building versus population
We now turn to the main result of this paper which is the characterization of the urbanization from the point of view of both the physical aspect via the number of buildings, and the individual aspect described here by the population.
For each district, we then study the relation between the number of buildings Nb and the population P of different districts (Figure 5), and plot Nb versus P. We thus connect an element of the infrastructure – the building – to the population which allows us to get rid of exogenous effects that governs the time evolution of population for example. This plot encodes these two basic fundamental aspects of the urbanization process and we refer to this representation as ‘the fundamental diagram’. In Figure 5, we observe an apparent diversity of behaviors but, as we will see in the following, they can all be interpreted and compared in the framework of a simple quantitative model. In Figure 5(a), we show the result for the nine sides of Chicago, and we observe a clear growth phase followed by a ‘saturation’ (corresponding to the density peak) for the Far North, Northwest, Southwest, Far Southeast and Far Southwest sides (plotted in continuous line). In contrast, the other sides (Central, North, West and South), in dotted line, seem to have reached a saturation before 1930. Indeed, the dotted lines (that have to be read chronologically from the right to the left) do not display the growth regime, suggesting that it stopped before 1930, year of the earliest available data. In Figure 5(b) for London districts, we observe that all districts displayed here reached a saturation, but that the district Tower Hamlets (dotted line) reached it before 1900, year of the first available data. In Figure 5(c), we plot the five boroughs of New York City. We observe that Staten Island and Queens (dashed lines) are in a growing phase characterized by a positive value of Number of buildings versus population. We represent with continuous lines the districts that have reached their density peak, with dashed lines for districts that are still in the growing phase. We use dotted line for the districts that reached the density peak before the first year available in the dataset. (a)–(d) Results for districts in the cities studied here. (e)–(h) We show examples illustrating the ‘universal’ diagram for districts in different cities that display all the regimes described in the text. In (e), we added to the 
These various plots show that for different districts we have essentially the same trajectory in the plane Schematic representation of the fundamental curve. We represent here the typical district growth curve characterized by three main phases: after a pre-urbanization period, there is first an urbanization phase with a positive growth rate 
Theoretical model
The data studied in the previous section display a pattern that seems to encompass specific features of the different cities and we propose a theoretical model based on the following interpretation for these different regimes. The first regime corresponds to the urbanization where buildings are constructed on empty lots until the ‘saturation point’
We model the evolution of a given zone of building land area A by a two-dimensional square grid where each cell of surface
Finally, nothing happens with probability
Solving equation (5) and equation (6) leads to
Equations (4) to (6) imply that the population is an increasing function of the number of building up to a saturation value
The saturation happens only if
This relation allows us to determine the average number of people per building floor
In order to test this model, we focus on all districts that have already reached saturation (the others are still in the first growth phase) and we exclude the re-densification regime that is not described by the model. From the data we know the area A of each district and the average building footprint surface al of each district. This allows us to compute the maximum number of buildings Collapse for the rescaled variable Z and X. We plot the rescaled variables Z versus X (equation (12)) for all the 47 saturated districts of all cities. We excluded from the plot the points belonging to the re-densification regime that is not described by the model. Each city is characterized by a different symbol and each district by a different color. The continuous red line is the theoretical prediction given by equation (13). All the cities considered in this study are present and we kept the districts that have saturated and for which we can compute 
This collapse is a validation of the model: it shows that the non-trivial relation between variables (equation (13) together with definitions equation (12)) predicted by the model is in agreement with the data.
Discussion and perspectives
Theoretical urban models can be roughly divided into two categories. On one hand, there are economic models characterized by complex mathematical equations rarely amenable to quantitative predictions that can be tested against data. On the other hand, there are computer simulations (such as agent-based models or cellular automata) that are characterized by a large number of parameters, preventing to understand the hierarchy of processes governing the phenomenon. In the approach presented here, we build a simple model with the smallest number of parameters and able to describe quantitatively the evolution of various quantities such as the number of buildings and the population for a given district.
The agreement with data is tested with a data collapse which does not rely on a parameter fit. The excellent agreement observed shows that the model is able to explain empirical data. However, this agreement is not a definitive proof that the model described here is the fundamental one. Ideally one should compare with other existing models but in this case our proposal seems to be the first attempt to describe quantitatively the evolution of fundamental quantities with the help of simple fundamental mechanisms. Interestingly enough, this random model relies on a set of simple reasonable assumptions such as growth and conversion and also on non-correlated growth of buildings inside districts, an assumption that seems to be supported both by empirical measures on districts and the theoretical model.
Further quantitative studies are however needed and are of two types. First, other datasets for other cities are needed in order to test for the validity of the quantitative behavior observed here. Also, the comparison with other competing theoretical models could be very fruitful and we can only encourage the construction of such models.
Our empirical analysis confirms that there are essentially three different phases of the urbanization process: a growth phase where we observe an increase of both the number of buildings and the population; a second regime where the population decreases while the number of buildings stays roughly constant, and a last phase where both population and the number of buildings are increasing. The first two phases are well described by the simple model proposed here and which integrate the crucial ingredient of converting residential space into non-residential land-use. We observe empirically the existence of a ‘re-densification’ phase where both population and the number of buildings increase after the conversion phase. This phase seems to happen simultaneously for the different districts in a city which suggests that it is an effect due to planning decisions and not resulting from self-organization. Modelling the appearance of this regime is thus at this point a challenge for future studies. Another interesting issues is related to the estimation and a better understanding of the value
Beside showing that a minimal modeling for describing urbanization is possible despite the large variety of cities, we believe that this approach could constitute the basis for more elaborated models. These models could then be thoroughly tested against data, could describe the impact of various parameters and also help to understand some features of the possible future evolution of cities.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material is available online for this article.
