Identifying urban form typologies in Seoul using a new Gaussian mixture model-based clustering framework

Abstract

Seoul, the capital city of South Korea, has diverse urban forms developed through its complex history. Previous studies show limitations of strong subjectivity and difficulty in scalability in identifying typical Seoul urban forms with expert knowledge. Data-driven approach offers an opportunity to address those challenges, but previous studies often focused on direct applications of clustering algorithms to a given area with diverse methods and workflows, lacking a systematic framework. This study addressed these issues by developing a new form clustering framework to systematically identify form typologies at a large scale and demonstrated its application in Seoul. With a 500 m × 500 m grid as the basic spatial unit and twelve urban form attributes as learning features, 14 clusters were identified using the Gaussian mixture model. These clusters were further translated into form typologies following a semantic typology naming system, with representative form samples identified. The resulting typologies were then verified and validated through comparisons with previous studies. Their relationships with zoning classes were also examined, emphasizing their role in urban planning and design. Results suggest this new framework is an effective and promising way to identify urban form typologies in complex urban environments to better support urban planning and management.

Keywords

Urban morphology typical urban form unsupervised learning form features urban planning and design

Introduction

Understanding complex urban form in Seoul

Seoul is the largest city in South Korea, the country’s capital, and a major megacity in East Asia (Joo, 2018). Its rapid urbanization has seen urban areas developed with diverse patterns, forming a highly heterogeneous urban form evident in Seoul today (Chu and Yang, 1991; Seoul Metropolitan Government, 2017). Differences among urban form patterns reflect their respective diverse socio-economic, cultural, historical, and regulatory characteristics. Urban planners and policymakers in the city have, therefore, continuously formulated targeted and effective policies for highly diverse urban patterns (SDI, 2009).

To simplify the complexity in urban form and better support policymaking, urban form typology approach was adopted to classify these patterns into several groups (Yeo and Byun, 2010). Most studies focused on individual urban form typologies and the transformation of their street and plot patterns; these studied typologies include the downtown area (Sohn and Han, 1996; Baek and Ahn, 2012), grid residential areas (Park and Choi, 2003; Yeo and Cho, 2010), apartment complexes (Lee, 2012), and superblocks (Jung and Kang, 2012). See the Individual Urban Form Typologies in Seoul section of the Supplementary Material for details.

Aside from those studies on individual typology, Seoul’s urban forms have been classified as typologies more comprehensively by two studies. Kim (2003) identified seven typologies for eight 500 m × 500 m areas based on typicality, density, and housing type. Among the typologies, two are in city centers, and five in mid or high-density residential typologies. In the other study, Seoul Development Insititute (SDI) identified seven urban form typologies from 30 area samples according to their morphologic forms, development method, and development periods (SDI, 2009). While these studies provide helpful reference for planning and policy, both were based on researcher expertise and their study areas only covered a tiny proportion of the entire city.

Expert knowledge-based and data-driven urban form typology studies

Urban form typology is an important concept in urban morphology studies to understand spatial structures and urban development evolution (Scheer, 2016; Moudon, 1997). Scholars in other fields of urban studies, including urban sociology (Vaughan et al., 2005), urban climate (Stewart and Oke, 2012), energy (Quan et al., 2020), and transportation (Frank et al., 2008), also applied similar typology concepts to classify distinctive urban areas. In urban studies, direct examination of urban form over a large geographic area is often a challenging task because of complex urban forms. The urban form typology method reduces urban form to only several form groups, often named typologies where these urban forms are similar. The identified typologies provide a succinct representation of complex urban forms for more straightforward analysis.

Two major approaches have been mainly adopted to identify urban form typology in previous studies: expert knowledge-based and data-driven approaches. In the first approach, urban form typologies are generally determined using expert knowledge and experience for analysis of different topics, such as urban morphology (Kropf, 2018; Moudon, 1994), energy use (Rode et al., 2014), solar energy potential (Chatzipoulka et al., 2016), and microclimate (Tsoka et al., 2017). However, identifying the typologies is mostly subjective and depends on researchers’ expert knowledge. When the urban form of interest is highly complex, processing the large and diverse form information and subsequently providing synthesized solutions become difficult for researchers (Alexander, 1965). Moreover, this approach often requires a significant amount of manual work, which prevents it from being applied to a large urban area. Hence, previous works mostly focus on a few neighborhoods instead of an entire city.

In the second approach, urban form typologies are usually directly identified using urban form data. While data collecting and processing are not immune to subjectivity and bias (Bender et al., 2021), the data-driven approach nonetheless does not rely heavily on expert knowledge and experience in identifying urban form typologies. This allows it to better handle complex urban form and large-scale implementation based on formal data analysis, and is more scalable and generalizable. This advantage has made this method increasingly popular, especially with the growing availability of big urban data.

The data-driven approach was applied in identifying urban form typologies in urban contexts in Europe (Berghauser Pont and Olsson, 2017; Bobkova, 2019; Gil et al., 2012; Schirmer and Axhausen, 2016), North America (Song et al., 2013; Vialard, 2013), and Asia (Asami and Niwa, 2008). These studies have diverse definitions of the basic spatial unit (BSU), form components, and clustering methods. Their adopted BSUs usually include a buffered building footprint (Schirmer and Axhausen, 2016; Song and Knaap, 2007), a plot (Asami and Niwa, 2008; Bobkova et al., 2019), a buffered plot (Berghauser Pont et al., 2019), an urban block (Gil et al., 2012), a neighborhood (Song et al., 2013), and a grid (Jochem and Tatem, 2021; Jochem et al., 2021; Heris et al., 2020). In these studies, the urban form was studied mostly as a whole consisting of buildings, plots, and streets; with only a few cases focusing on a specific individual component.

Clustering algorithms in most studies are the k-means (Gil et al., 2012; Berghauser Pont et al., 2019; Schirmer and Axhausen, 2016) and hierarchical clustering (Song et al., 2013; Asami and Niwa, 2008). Other studies adopted other clustering algorithms, such as the k-medoids (Schirmer and Axhausen, 2016), fuzzy c-means (Berghauser Pont and Olsson, 2017), latent class analysis (Schirmer and Axhausen, 2019), Bayesian clustering (Araldi and Fusco, 2019), and the Gaussian mixture model (GMM) (Jochem et al., 2021).

Major limitations in data-driven typology studies

While data-driven studies showed great potential in finding urban form typologies in cities, they often focused on the direct application of specific clustering algorithms to a given area, with a limited discussion about integrating domain knowledge into the process. A general and systematic framework is often lacking in the field, which connects urban data, clustering algorithms, and interpretation of resulting typologies to streamline the process and better support planning and design practice. Specifically, aside from routine steps such as the data preprocessing, the necessary steps should contain the definition of BSU, the measurement of urban form, the selection of clustering algorithm, the identification of typical urban form, and the verification of resulting typologies. However, previous studies on form clustering mainly focused on the steps before clustering and had limited consideration of the post-clustering steps to interpret and verify the typologies—even in the preparation steps for clustering, there was a general lack of justification for selected methods.

The limitations in the five steps present corresponding challenges in the further development of form typology studies. The diverse BSU definitions in previous studies, ranging from a building to a neighborhood, were decided based on the purpose of the studies. However, these are too hinged on specific areas and have difficulty supporting comparative studies across cities. For urban form components, many previous studies selected components for clustering without sufficient justification. In this critical step, where urban form information is extracted from raw data, domain knowledge about the urban form structure should be carefully considered to properly measure urban form characteristics and reach meaningful clusters as urban form typologies.

The choice of clustering algorithm is also a major challenge. Previous studies applied different types of clustering algorithms which can be generally grouped into three categories: partitional clustering, hierarchical clustering, and model-based clustering algorithms (Abdullah and Hamdan, 2015). While some studies compared a few clustering algorithms, a general comparison focusing on the suitability for urban form studies is lacking, therein rendering the choice of the clustering algorithms less well founded. Among the three clustering algorithm categories, partitional clustering algorithms usually determine clusters in a single partition by minimizing the sum of the squared distances within clusters against overall clusters (Hartigan and Wong, 1979).

However, partitional clustering algorithms often have strong assumptions that often do not apply in complex urban forms. For example, the widely used k-means in this category assumes equal cluster size and attribute variance, which are often far from the case in actual urban environments. Hierarchical clustering algorithms work differently: these determine clusters in hierarchical relationships in agglomerative or divisive modes based on a proximity matrix (Jain, 2010). However, these are often criticized due to their lower accuracy in clustering high-dimensional data. The third category, model-based clustering algorithms, is characterized by modeling data with a finite mixture of probability distributions instead of proximity between data points (Raftery and Dean, 2006). Common algorithms herein include latent class analysis, Bayesian clustering, and GMM.

Among the three categories of clustering algorithms, the model-based clustering algorithms seem to be the most suitable for clustering urban forms due to their better handling of high-dimensional data with different scales in dimensions and certain underlying distributions (Vermunt and Magidson, 2002; Dahl, 2006). Particularly, GMM (Mohamed et al., 2016) is widely used in many fields due to its advantages of high interpretability and straightforward implementation based on the assumption of Gaussian distributions in data (Bouveyron and Brunet-Saumard, 2014). These characteristics are very important in urban form studies; hence, GMM has been receiving more attention in recent urban form typology studies (Jochem et al., 2021; Jochem and Tatem, 2021).

The interpretation challenges for found clusters include the naming of form typology in each cluster and the identification of the representative form for each typology. Naming resulting clusters is critical as it translates clusters used in the statistical language to typologies used in the language of design and policy. Despite this, most previous studies named clusters as form typologies based on observations and conventions without clear rules, often resulting in omissions, overlaps, and conflicts in the semantic meanings of cluster names. A naming system is hence required to better interpret clusters. Once typologies are defined and named, the identification of the representative form, a typical sample in each typology, can help greatly in understanding the corresponding typology, which still contains forms with variations. However, most previous studies did not provide such an illustration—even among a few studies having representative forms, those typical samples were selected without clear and systematic criteria (Song and Knaap, 2007; Schirmer and Axhausen, 2016; Colaninno et al., 2011; Berghauser Pont and Olsson, 2017). This remains a research gap to fill.

The typology verification and validation were generally neglected in previous studies, with only a few studies validating their clustering results through visual inspections (Berghauser Pont and Olsson, 2017; Schirmer and Axhausen, 2016, 2019) or comparisons with predefined typologies (Gil et al., 2012). Verifying and validating resulted typologies are critical in evaluating the effectiveness and avoiding the bias of the data-driven method. However, they remain a challenge because there is no “ground truth” for urban form typology coupled with the very few manually defined typologies existing in cities for comparison.

Contribution of this study

This study aims to address these gaps in the research by developing a new urban form clustering framework and applying it to identify urban form typologies in Seoul. Different from previous studies, this framework defines a systematic workflow, including the definition of BSU, selection of form attributes, implementation of the clustering algorithm, interpretation of resulted clusters, and verification of identified typologies. Following this framework, the domain knowledge and the data-driven method are integrated to identify meaningful urban form typologies. The framework was applied in Seoul to identify 14 urban form typologies, which were then compared with previous Seoul form typology studies using the expert knowledge-based approach to verify and validate the results. These results were further compared to zoning classes to highlight the potential complementary role of form typologies in urban planning and management.

Methodology

A new Gaussian mixture model-based clustering framework

A new clustering framework was developed to identify urban form typologies in a large urban area. This framework is based on GMM, a widely used clustering algorithm able to handle complex urban forms and is straightforward for interpretation. The framework has five steps: the definition of BSU, the measurement of urban form attributes, the application of the clustering algorithm, the interpretation of clusters as typologies, and the verification and validation of resulting typologies. Through these steps, the new framework streamlines the workflow to connect raw urban data, the domain knowledge in urban form, and analytical models to systematically identify urban form typologies.

Study area and data

Seoul has an area of approximately 605 km², with the Han River flowing through the city center, and four major mountains surrounding the city. Its population is approximately 9.9 million, with a population density of about 16,000 people per km².

This study collected GIS data on Seoul’s buildings, streets, plots, water features, and green areas in 2018. The building construction, street, water body, and elevation information data were provided by the National Geographic Information Institute (2019). The building age information and plot data were provided by the Ministry of Land Infrastructure and Transport (2018; 2019). And the green space information was provided by the Seoul Metropolitan Government (2019).

Grid as BSU

While there have been discussions about different BSU definitions in urban form studies (Fleischmann, 2019; Ma et al., 2021; Quan and Bansal, 2021), this study adopted the grid as BSU as it provides consistent scale comparisons for urban forms within the same city and across cities. The BSU of a grid also allows integration with other spatial data, such as raster-based remote sensing data (Jochem et al., 2021). This makes it a promising approach in urban data management with increasing applications in recent years (Heris et al., 2020; Jochem and Tatem, 2021; Jochem et al., 2021).

Here, the basic grid structure was generated as a North-South orthogonal grid starting from the bottom-left corner of the bounding box of the city boundary. The grid size was determined by referring to the urban tissue, neighborhood scope, and previous form typology studies in Seoul. The BSU of urban form typology acts as an urban tissue to reflect the local form structure, usually containing a few blocks. Therefore, the grid size is defined as bigger than 100–200 m, given the average block size of 9490 m² with a standard deviation of about seven times that average size.

The BSU was also made to correspond to the typical scope of neighborhoods and local communities. In Seoul, a neighborhood unit is often defined with a diameter of 500 m based on the walking distance, public facility management, and community levels in Seoul (Yang, 1993; Gye and Cheon, 2000). Therefore, a 500 m grid size is plausibly aligned with the neighborhood development. Additionally, the BSU size of 500 m defined by experts in previous form typology studies in the same city is an important reference. After comprehensively considering these factors, a grid size was decided as 500 m.

Because the study focuses on the built environment, grid cells with an entirely natural environment, which refer to areas with no buildings, streets, or plots, were excluded from the study (Figure S1 in the Supplementary Material). The final study area comprised 1701 grid cells (see Figure 1).

Figure 1.

Mapping of different study areas (Kim, 2003; SDI, 2009).

Urban form attributes

The choice of urban form attributes followed the structure of urban form elements (see Figure 2). The structure was developed following studies in urban morphology (Moudon, 1997, 2019; Kropf, 2018). Twelve attributes were selected to measure those elements (Table 1) based on previous studies (Schirmer and Axhausen, 2019; Berghauser Pont et al., 2019; Gil et al., 2012; Berghauser Pont and Olsson, 2017; Kropf, 2018). For simplicity and consistency, attributes values for each grid cell were estimated in ArcGIS based on the whole or parts of urban form elements in the grid cell, the latter being applied when the grid boundary cuts across the form elements.

Figure 2.

Structure of urban form elements.

Table 1.

Urban form attributes used in the analysis of Seoul.

Attribute	Element	Description	Unit
Number of buildings	Building	Total number of buildings in a cell	N
Average building height	Building	Average building height in a cell	Floors
Average building age	Building	Average building age in a cell	Years
Total street length	Street	Total street length in a cell	m
Average street width	Street	Average street width in a cell	m
Number of street intersections	Street	Total street intersections in a cell	N
Average plot size	Plot	Average plot size in a cell	m²
Number of plots	Plot	Total number of plots in a cell	N
Building coverage ratio	Building	Total building footprint area/cell size	n/a
Green area ratio	Greenery	Total green areas in a cell/cell area	n/a
Water body area ratio	Water body	Total water body areas in a cell/cell area	n/a
Elevation difference	Elevation	Difference between maximum and minimum elevations in a cell	m

Note: N: Number.

Gaussian mixture model clustering algorithm

Here, the GMM algorithm was used to cluster urban forms. GMM assumes that the sample set is a mixture of k Gaussian distributed observations (Figure S2 in the Supplementary Material). Each Gaussian distribution is called a “component,” which has its mean vector, covariance matrix, and mixing coefficient. The expectation-maximization (EM) algorithm was applied to estimate these parameters for the components from the sample set through iterations (Rasmussen, 1999). The Bayesian information criterion (BIC), a penalized log-likelihood measure, was used to measure the clustering performance (See Appendix A for details on the EM algorithm).

GMM models with varying k values from 2 to 30 were applied to the study area, and their BIC values were compared to decide k . For each k value, GMM was repeated 50 times with random starts because the EM steps may get stuck in sub-optima with specific random initializations (Kinnunen et al., 2011; Kwedlo, 2015). The best model was then chosen based on the BIC value for each k for further comparison. Each model run was capped at 1000 iterations. This study used sklearn.mixture, a function in the sklearn library in Python (Pedregosa et al., 2011), to implement the GMM algorithm.

An initial examination of attribute values in the grid cells suggested a highly skewed distribution pattern of the water body area ratio due to the sharp contrast between river-covered areas and other development areas in Seoul; the latter having 80% of the grid cells. Such a distribution seems to have a high chance of violating the Gaussian distribution assumption of GMM, thus becoming a big concern over GMM’s direct application to the data. To avoid this possible clustering issue, the study area was divided into groups of 1364 and 337 cells, with the former having a water body area ratio of less than 0.02 and the latter as the remaining. The GMM algorithm was then applied separately to the two groups.

Interpretation of clusters as urban form typologies

This study further named the clusters from the GMM analysis as urban form typologies. A representative urban form sample was then identified for each typology. These provided interpretations of the results from the data-driven approach for better understanding from the perspectives of urban design and policymaking.

A semantic typology naming system was developed in this study. In this system, mean values (namely, mean samples) of the clusters are used to translate the numerical attributes into semantic descriptions with morphological meaning using the data discretization method (Witten et al., 2017). The system contains three incremental steps: in the first step, the mean samples of clusters identified from cells and with water body ratio of more than 0.02 had “river-side” in their names. Next, four quantiles of each of important urban form features, namely, the building coverage, average building height, average building age, green area ratio, and elevation difference were generated to categorize and name the mean samples. Only the fourth quantiles of the latter three features were used for naming because they provided necessary and novel information. Last, an additional feature (average plot size) was used in naming to further distinguish clusters. Succinct typology names with at most three feature descriptors were generated following the abovementioned system (Table S1 and Table S2 in the Supplementary Material).

Representative forms were identified as those with the smallest distance to the mean samples, following that within the parameter space, the closer the samples, the more similar they are. The Mahalanobis distance (MD) based on the Gaussian assumption instead of the simple Euclidean distance as in k-means was adopted as the distance measure. Specifically, the MD is the distance between sample data x and a specific distribution, which is calculated based on the mean and covariance matrix of the distribution (Eq. B.1, Figure S3 in the Supplementary Material). Identified representative forms are supposed to describe the typologies' general characteristics.

Resulting typologies were spatially joined to the grid in ArcGIS, and their adjacencies were calculated based on Rook’s contiguity, considering only edge sharing to define cell neighbors (Figure S4 in the Supplementary Material).

Verification and validation of identified typologies

Visual inspections were used to verify the identified typologies and their representative forms. Additional form samples were examined to find whether important features were learned in the clustering and how diverse urban forms are within each typology. The choice of those form samples was based on the MD value ranking (Suzuki et al., 2008), representing the similarity level with the representative form. Ten form samples were verified in each typology, including the top and bottom five in the MD value ranking.

The typologies herein were also compared with those of Kim (2003) and SDI (2009) based on a manual approach. Typologies from both studies were overlapped in ArcGIS and then analyzed. Identified typologies were then further compared to zoning classes in Seoul following the same method to examine their similarities and differences.

Results

Clustering results

The clustering resulted in 14 clusters of urban forms in Seoul from two separate clustering analyses for grid cells with and without significant water bodies (Figure S5 in the Supplementary Material). Their area proportions and mean and variance of attribute values are shown in Table 2 and Figure S6 in the Supplementary Material, respectively. The resulting clusters show distinctive characteristics with respect to those values.

Table 2.

Numbers of cells, percentages in all cells, and mean values of urban form attributes in 14 clusters.

Cluster	Number of Cells (N)	Percentage in All Cells (%)	No. of Bldgs	Avg. Bldg. Height	Avg. Bldg. Age	No. of street Ints	Total street Length	Avg. Street Width	Avg. Plot size	No. of plots	Bldg. CR	Green Area ratio	Water Body Area ratio	Elevation difference
Cluster	Number of Cells (N)	Percentage in All Cells (%)	(N)	(Floors)	(Years)	(N)	(m)	(m)	(m²)	(N)	(n/a)	(n/a)	(n/a)	(m)
1	270	15.9	730.91	3.16	28.75	165.21	8559.07	5.47	247.91	761.05	0.38	0.01	—	21.48
2	241	14.2	536.06	3.22	27.89	113.98	6525.86	5.63	330.41	550.91	0.31	0.13	—	49.45
3	196	11.5	176.57	3.21	28.15	41.79	3048.52	6.35	633.54	173.78	0.13	0.51	—	89.17
4	188	11.1	296.19	4.42	27.39	63.49	4888.53	7.96	678.03	286.43	0.27	0.03	—	29.31
5	131	7.7	319.44	3.65	29.21	76.71	3337.33	8.25	651.38	319.08	0.20	0.16	0.12	28.05
6	122	7.2	530.59	2.90	36.46	141.01	6927.29	5.57	282.69	595.49	0.28	0.12	—	52.23
7	115	6.8	71.75	5.59	25.28	13.06	1696.20	10.24	3146.78	51.75	0.11	0.42	—	69.01
8	111	6.5	28.86	7.67	26.10	4.21	1554.14	22.29	17687.76	11.78	0.09	0.25	0.22	28.43
9	95	5.6	560.37	3.01	31.16	135.84	7471.86	5.98	278.47	574.50	0.30	0.05	0.06	28.03
10	87	5.1	100.47	6.85	27.85	19.49	2841.70	14.25	2897.46	81.21	0.20	0.04	—	10.14
11	75	4.4	40.47	9.21	20.88	5.23	1412.58	16.96	12453.15	14.09	0.10	0.42	—	52.17
12	44	2.6	37.01	6.07	30.03	2.23	1150.45	23.86	21008.79	9.65	0.12	0.32	—	34.82
13	25	1.5	255.53	2.59	29.44	54.84	3701.02	6.24	1251.33	226.70	0.20	0.29	—	71.17
14	1	0.1	10.59	2.59	27.20	0.00	51.06	49.00	527521.87	0.46	0.43	0.00	—	4.67

Note: No: Number, Avg: Average, Bldg: Building, Bldgs: Buildings, Ints: Intersections, CR: Coverage Ratio.

Urban form typologies and representative forms

The 14 clusters were given semantic names as 14 form typologies following the semantic typology naming system, with a representative form identified for each typology (see Figure 3(a)). Typology 1, named the small-plot compact mid-rise typology, is characterized by a high building density, the smallest plot size, and mid-rise buildings. Typology 2, the compact mid-rise typology, has the same patterns as Typology 1, except it has slightly larger average plot sizes. Typologies 3, 11, and 12 similarly have large green areas, but due to the differences in other patterns, they are named mountain-sited low-coverage mid-rise, green open high-rise, and old open mid-to-high-rise types, respectively.

Figure 3.

a) Representative samples of 14 urban form typologies; b) mapping of urban form typologies in Seoul.

Typology 4, or the high-coverage mid-to-high-rise typology, is typified with a high building coverage ratio and a mixture of mid-rise and high-rise buildings. Typologies 5, 8, and 9 reveal common river-side development patterns in Seoul, although Typologies 5 and 9 have mid-rise buildings, while Typology 8 has high-rise buildings. Typology 6 is characterized by old neighborhoods with low-rise buildings. Typologies 7 and 10 share similar mixed patterns with mid-rise and high-rise buildings where high-rise buildings are predominant, but they differ in other urban form attributes. Typology 13, the mountain-sited low-coverage low-rise typology, has a large elevation difference, a large green area ratio, and diverse plot sizes. Unlike other urban forms, Typology 14 has only one sample area that contains a massive wholesale market, an important heritage site in Seoul. Therefore, it stands out as one typology because of its historical and morphological uniqueness.

Spatial distribution of urban form typologies

Urban form typologies and their representative form samples are mapped in Seoul (see Figure 3(b)). Visual and analytical results suggest three major spatial distribution patterns: first, more than half of the urban form typologies have an adjacent ratio of less than 0.25, indicating that various urban forms are mixed and small-scale developments have occurred (Figure S7 in the Supplementary Material). Second, specific typologies often appear next to each other, especially among the small-plot compact mid-rise, compact mid-rise, old high-coverage low-rise, river-side open high-rise, and old river-side compact mid-rise. Third, some typologies are often located next to topography features, for example, the low-coverage mid-rise types are mostly found close to large green, mountainous and river-side areas. These three observations align with Seoul’s corresponding urban development patterns: historical development, rapid and large-scale urbanization, and interactions between urban development and natural environments.

Verification and comparison

Verification of urban form typologies

The top five form samples (with lowest MD values) and bottom five (with highest values) were selected for each typology (Table S3 in the Supplementary Material). Generally, the similarity with the representative sample is discernible for the other top four but less so for the bottom five. This suggests that characteristics of peripheral samples are more likely to exhibit a mixture pattern between their belonging and neighboring clusters. Extreme cases are found in the mountain-sited low-coverage low-rise typology, where the bottom five samples differ considerably from the representative sample. Regardless of the sample difference, there is a generally clear separation between the typologies based on their building footprint sizes, plot sizes, and street patterns, indicating that this study’s data-driven approach can identify important and distinctive typologies.

Comparisons with expert knowledge-based studies

The results from clustering were compared with the two previous form typology studies in Seoul to understand their performance differences (Table S4 and Table S5 in the Supplementary Material). The study area of this study covers 70.2% of Seoul, while previous studies by Kim (2003) and SDI (2009) cover only a small proportion of Seoul: 0.3% and 6.9%, respectively. The overlap results show that the present study is more consistent with the study made by Kim (2003) than the study of SDI (2009), suggesting the good performance of this research since the study of the former was considered more systematic (Figure S8 in the Supplementary Material). A further scrutinization indicates that the identified typologies following the new framework in this study better reflect the actual urban form pattern and surrounding urban context than the study by Kim (2003) (Figure S9 in the Supplementary Material). Moreover, the typology names given in this study are more intuitive and systematic than two studies.

Comparison with zoning classes

Urban form in Seoul is regulated by zoning, which involves rules on land-use, development density, and building and street geometry. However, modern zoning has been criticized for its oversimplicity and lacking of grand vision in shaping good urban form (Talen, 2012). A comparison between zoning classes and actual urban forms can help better understand the impact of zoning on urban form development, which remains to be further explored in the field due to the difficulty in measuring and understanding complex urban forms. This study compared zoning classes and identified form typologies, and found a large discrepancy between the two (Figure S10 in the Supplementary Material). For example, the seven residential classes in zoning have clearly defined height differences, which were not generally reflected in form typologies that seem to be residential. This mismatch between zoning classes and form typologies suggests that the typology reveals the urban form structure from a different perspective and can be used as a complementary method in urban planning and management.

Discussion

The new urban form clustering framework developed in this study is a general framework to apply to different cities. However, most steps require consideration of local characteristics. The definition of BSU is based on the local neighborhood size. The selection of form features should be adapted to urban form structures and data availability. The detailed implementation of the clustering algorithm depends on the data distribution. The validation also heavily relies on previous studies in the same area.

Moreover, a general issue exists in aggregating spatial data in urban form clustering studies, often known as the modifiable area unit problem (MAUP) (Jelinski and Wu, 1996). Two experiments were performed to understand the effect of the grid size on the clustering result and decide the best grid size. In the first experiment, a grid sensitivity analysis was conducted by varying the grid size from 200 to 800 m, where significant differences in the number of clusters were observed (Table S6, Figure S11, Figure S12 in the Supplementary Material), thus indicating a significant MAUP. The second experiment used the method of basic optimal zoning to reduce the MAUP (Jelinski and Wu, 1996). The optimal grid with the size of 500 m was identified within the range from 500 to 800 m that is considered a neighborhood scale, by minimizing intra-cell variation measured as the mean standard deviation of five important form features (Table S7 in the Supplementary Material). This result confirms the choice of the grid size herein.

Conclusion

Classifying complex urban environments into urban form typologies has been explored in two expert knowledge-based studies to support urban development and policymaking in Seoul. Paralleling other expert knowledge-based form typology studies, both studies suffered from subjective judgment and manual analysis, which prevented them from being feasibly applied on a larger scale. Another school of urban form typology studies used a data-driven approach, usually with the clustering method, to achieve better objectivity, scalability, and generalizability. However, many studies in this school focused on the direct application of clustering to urban form patterns without a systematic framework to connect domain knowledge and data analytics for implementation.

This study addresses these gaps in the research by developing a new urban form clustering framework and applying it to identify urban form typologies in Seoul. The framework defines a systematic workflow consisting of five steps: the definition of BSU, selection of form attributes, implementation of the clustering algorithm, interpretation of resulted clusters, and verification of identified typologies. For Seoul, a 500 m × 500 m grid was used as the BSU. Twelve features were selected based on the spatial structure of the urban form. They were used as learning features in GMM, a clustering algorithm widely used for complex datasets such as the urban form dataset, to identify clusters based on their similarities. Semantic names were then given to the clusters to translate into 14 form typologies following a newly developed semantic typology naming system. A representative form was identified for each typology based on the measure of MD. These 14 typologies are considered to best represent the complex urban forms in Seoul. Their spatial relationships were also analyzed to understand how Seoul’s complex urban forms and spatial structures developed over time.

Identified typologies were further verified and validated through the examinations of form samples with different MD values, comparisons with the previous expert knowledge-based form typology studies in Seoul, and comparisons with zoning classes. Results suggest a general consistency and high capacity in representing complex urban forms of the identified typologies, suggesting their important complementary role in urban planning and design in addition to zoning.

Urban form typologies were formed through complex processes in their unique socio-economic, cultural, and regulatory contexts and, therefore, can be seen as a representation of different complex system types. The introduction of urban form typology into urban development and management helps develop policies more targeted at local contexts and better reflecting urban complexity. In addition, the urban form typology also provides simple but effective representations of such complex urban systems, with the prototyped examples as references in urban planning and design practices. Compared to the traditional expert knowledge-based and other data-driven typology studies, the new urban form clustering framework developed in this study allows for a more systematic, reasonable, and interpretable urban form typology identification at a larger scale. This allows it to act as an important and effective policy support tool for urban planning and management.

There are still limitations in this study, including the BSU definition and GMM assumptions. These issues may be examined in future research.

Supplemental Material

Supplemental Material - Identifying urban form typologies in Seoul using a new Gaussian mixture model-based clustering framework

Supplemental Material for Identifying urban form typologies in Seoul using a new Gaussian mixture model-based clustering framework by Na Li, Steven Jige Quan in Environment and Planning B: Urban Analytics and City Science

Footnotes

Acknowledgements

We would like to express our gratitude to Haylin Kim for her help in proofreading.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Creative-Pioneering Researchers Program through Seoul National University (SNU) the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) (No. 2018R1C1B5043758; No. 2022R1C1C1004953), the Seoul National University AI Institute through the Data Science Research Project 2018, and the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Education) (No. 5120200113713).

ORCID iDs

Na Li

Steven Jige Quan

Supplemental Material

Supplemental material for this article is available online.

Author Biographies

Na Li is a Ph.D. candidate in the Graduate School of Environmental Studies at Seoul National University, South Korea, and a research student at the University of Tokyo, Japan. Na Li holds a bachelor's degree in Landscape Architecture from Beijing Forestry University, China, and a master's degree in Landscape Architecture from Seoul National University, South Korea. Her research focuses on data-driven urban planning and design, urban form typology, building energy efficiency, thermal environment, and urban vitality.

Steven Jige Quan is an Associate Professor and the Director of the City Energy Lab in the Graduate School of Environmental Studies at Seoul National University, South Korea. He is also an Affiliated Faculty in the Artificial Intelligence Institute at Seoul National University. His research interests include smart and sustainable urban systems, urban form and energy, AI-aided design, and sustainable zoning system. Steven worked as a Department Chief Planner at Beijing Tsinghua Urban Planning & Design Institute, China. Steven holds a B.Arch and an M.Eng in Urban Planning from Tsinghua University, China, and a Ph.D. in City and Regional Planning from Georgia Institute of Technology.

References

Abdullah

Hamdan

(2015) Hierarchical clustering algorithms in data mining. World Academy of Science, Engineering and Technology International Journal of Computer, Electrical, Automation, Control and Information Engineering 9(10): 2201–2206.

Alexander

(1965) A City is Not a Tree. In: Larice

Macdonald

(eds). The Urban Design Reader. 2 ed. New York: Routledge, 152–166.

Araldi

Fusco

(2019) From the street to the metropolitan region: Pedestrian perspective in urban fabric analysis. Environment and Planning B: Urban Analytics and City Science 46(7): 1243–1263.

Asami

Niwa

(2008) Typical lots for detached houses in residential blocks and lot shape analysis. Regional Science and Urban Economics 38(5): 424–437.

Baek

Ahn

(2012) Seoul hanokjugeojiui 1970nyeon ihu byeonie yeonghyangeul michineun jugeoji gujowa hanok yuhyeongui yoin [Factors of structure of residence and Hanok type involved in the transformation of Hanok residence in Seoul since 1970]. Journal of the Architectural Institute of Korea 28(4): 189–198.

Bender

Gebru

McMillan-Major

et al. (2021) On the dangers of stochastic parrots: can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, 610–623.

Berghauser Pont

Olsson

(2017) Typology based on three density variables central to Spacematrix using cluster analysis. ISUF 2017: City and Territory in the Globalization Age. Valencia, Spain: Editorial Universitat Politècnica de València, 1337–1348.

Berghauser Pont

Stavroulaki

Bobkova

, et al. (2019) The spatial distribution and frequency of street, plot and building types across five European cities. Environment and Planning B: Urban Analytics and City Science 46(7): 1226–1242.

Bobkova

(2019) Towards a Theory of Natural Occupation. Doctoral Dissertation. Chalmers University of Technology.

10.

Bobkova

Marcus

Berghauser Pont

, et al. (2019) Structure of Plot systems and economic activity in cities: linking plot types to retail and food services in London, Amsterdam and Stockholm. Urban Science 3(3): 66.

11.

Bouveyron

Brunet-Saumard

(2014) Model-based clustering of high-dimensional data: A review. Computational Statistics & Data Analysis 71: 52–78.

12.

Chatzipoulka

Compagnon

Nikolopoulou

(2016) Urban geometry and solar availability on façades and ground of real urban forms: using London as a case study. Solar Energy 138: 53–66.

13.

Chu

Yang

(1991) Seoul shi doshimbu doshi hyeongtae byeonhwa gwajeong ae gwanhan yeongu (I) [A study on urban morphological changing process in a part of CBD in Seoul (I)]. Journal of Korea Planning Association 26(4): 5–20.

14.

Colaninno

Roca

Pfeffer

(2011) An automatic classification of urban texture: form and compactness of morphological homogeneous structures in Barcelona. 51st Congress of the European Regional Science Association. Barcelona, Spain: European Regional Science Association, Louvain-la-Neuve.

15.

Dahl

(2006) Model-based clustering for expression data via a Dirichlet process mixture model. Bayesian Inference for Gene Expression and Proteomics 4: 201–218.

16.

Fleischmann

(2019) MOMEPY: Urban morphology measuring toolkit. Journal of Open Source Software 4(43): 1807.

17.

Frank

Bradley

Kavage

, et al. (2008) Urban form, travel time, and cost relationships with tour complexity and mode choice. Transportation Research Part A: Policy and Practice 35(1): 37–54.

18.

Gil

Beirão

Montenegro

, et al. (2012) On the discovery of urban typologies: data mining the many dimensions of urban form. Urban Morphology 16(1): 27–40.

19.

Gye

Cheon

(2000) Keomyuniti Jungsimui Jugeohwangyeongjeongbi Yeongu [Community-Based Improvement of Residential Environment in Urban Area]. Sejong, South Korea: Korea Research Institute for Human Settlements.

20.

Hartigan

Wong

(1979) Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(1): 100–108.

21.

Heris

Foks

Bagstad

, et al. (2020) A rasterized building footprint dataset for the United States. Scientific Data 7(1): 1–10.

22.

Jain

(2010) Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31(8): 651–666.

23.

Jelinski

(1996) The modifiable areal unit problem and implications for landscape ecology. Landscape Ecology 11(3): 129–140.

24.

Jochem

Leasure

Pannell

, et al. (2021) Classifying settlement types from multi-scale spatial patterns of building footprints. Environment and Planning B: Urban Analytics and City Science 48(5): 1161–1179.

25.

Jochem

Tatem

(2021) Tools for mapping multi-scale settlement patterns of building footprints: An introduction to the R package foot. PLoS One 16(2): e0247535.

26.

Joo

(2018) Megacity Seoul: Urbanization and the Development of Modern South Korea. London, UK: Routledge.

27.

Jung

Kang

(2012) An analysis of the plot subdivision patterns of urban blocks in Gangnam, Seoul. Journal of the Architectural Institute of Korea 28(5): 189–196.

28.

Kim

Seoul Development Institute International Seminal on Urban Form (2003) International Urban Form Study: Development Pattern and Density of Selected World Cities. Seoul, South Korea: Seoul Development Institute.

29.

Kinnunen

Sidoroff

Tuononen

, et al. (2011) Comparison of clustering methods: A case study of text-independent speaker modeling. Pattern Recognition Letters 32(13): 1604–1617.

30.

Kropf

(2018) The Handbook of Urban Morphology. West Sussex, UK: John Wiley & Sons.

31.

Kwedlo

(2015) A new random approach for initialization of the multiple restart EM algorithm for Gaussian model-based clustering. Pattern Analysis and Applications 18(4): 757–770.

32.

Lee

(2012) 1970-80nyeondae hanguk gocheung apateuui giwongwa seonggyeoge gwanhan yeongu - reu kkoreubyuje, jideullung, hilbeljaimeoui gocheungdosi irongwaui bigyo [Origins and characteristics of Korean apartments in 1970-80’s - Comparison with ideas of Le Corbusier, Siedlung and Hilberseimer]. Journal of the Korean Housing Association 23(1): 67–77.

33.

Chen

(2021) An elastic urban morpho-blocks (EUM) modeling method for urban building morphological analysis and feature clustering. Building and Environment 192: 107646.

34.

Ministry of Land

Infrastructure and Transport (2019) Building Age Information. Available at: http://openapi.nsdi.go.kr/nsdi/eios/ServiceDetail.do?svcSe=F&svcId=F025 (accessed 22 February 2021).

35.

Ministry of Land Infrastructure and Transport (2018) Land characteristics map. National Spatial Data Infrastructure Portal. Ministry of Land Infrastructure and Transport.

36.

Mohamed

Côme

Oukhellou

, et al. (2016) Clustering smart card data for urban mobility analysis. IEEE Transactions on Intelligent Transportation Systems 18(3): 712–728.

37.

Moudon

(1997) Urban morphology as an emerging interdisciplinary field. Urban Morphology 1: 3–10.

38.

Moudon

(2019) Introducing supergrids, superblocks, areas, networks, and levels to urban morphological analyses. ICONARP International Journal of Architecture and Planning 7: 1–14.

39.

Moudon

(1994) Getting to know the built landscape: typomorphology. Ordering Space: Types in Architecture and Design. New York, United States: Van Nostrand Reinhold, 289–311.

40.

National Geographic Information Institute (2019) Continuous Numerical Map. Available at: http://map.ngii.go.kr/ms/map/NlipMap.do?tabGb=total (accessed 22 February 2021).

41.

Park

Choi

(2003) Geonchukjedo byeonhwae ttareun ilbanjugeoji dosijojige gwanhan yeongu [A study of the urban tissue of a residential area according to the change of building code]. Journal of the Architectural Institute of Korea 19(10): 69–76.

42.

Pedregosa

Varoquaux

Gramfort

, et al. (2011) Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12: 2825–2830.

43.

Quan

Bansal

(2021) A systematic review of GIS-based local climate zone mapping studies. Building and Environment 196: 107791.

44.

Quan

Economou

Grasl

, et al. (2020) An exploration of the relationship between density and building energy performance. URBAN DESIGN International 25(1): 92–112.

45.

Raftery

Dean

(2006) Variable selection for model-based clustering. Journal of the American Statistical Association 101(473): 168–178.

46.

Rasmussen

(1999) The infinite Gaussian mixture model. NIPS 12: 554–560.

47.

Rode

Keim

Robazza

, et al. (2014) Cities and energy: urban morphology and residential heat-energy demand. Environment Planning B: Planning Design 41(1): 138–162.

48.

Scheer

(2016) The epistemology of urban morphology. Urban Morphology 20(1): 5–17.

49.

Schirmer

Axhausen

(2016) A multiscale classication of urban morphology. Journal of Transport and Land Use 9(1): 101–130.

50.

Schirmer

Axhausen

(2019) A Multiscale Clustering of the Urban Morphology for Use in Quantitative Models. In: D'Acci

(ed), The Mathematics of Urban Morphology. Cham: Springer, pp. 355–382.

51.

SDI (Seoul Development Institute) (2009) Urban Form Study of Seoul. Seoul, South Korea: Seoul Development Institute.

52.

Seoul Metropolitan Government (2017) Seoul Dosigyehoek Yeonhyeok [History of Seoul Urban Planning]. Seoul, South Korea: Seoul Metropolitan Government.

53.

Seoul Metropolitan Government (2019) Green Space. Available at: https://data.seoul.go.kr/dataList/datasetList.do (accessed 20 Feburary 2022).

54.

Sohn

Han

(1996) Yuhyeongjeok hyeongtaehagui yeongubangbeobe gwanhan yeongu [A study of the research methods of typomorphology]. Journal of the Architectural Institute of Korea 12(6): 73–83.

55.

Song

Gordon-Larsen

Popkin

(2013) A national-level analysis of neighborhood form metrics. Landscape and Urban Planning 116: 73–85.

56.

Song

Knaap

(2007) Quantitative classification of neighbourhoods: the neighbourhoods of new single-family homes in the Portland Metropolitan Area. Journal of Urban Design 12(1): 1–24.

57.

Stewart

Oke

(2012) Local climate zones for urban temperature studies. Bulletin of the American Meteorological Society 93(12): 1879–1900.

58.

Suzuki

Sota

Brown

, et al. (2008) Using Mahalanobis distance to compare genomic signatures between bacterial plasmids and chromosomes. Nucleic Acids Research 36(22): e147–e147.

59.

Talen

(2012) City Rules: How Regulations Affect Urban Form. Washington, DC: Island Fress.

60.

Tsoka

Tsikaloudaki

Theodosiou

(2017) Urban space’s morphology and microclimatic analysis: A study for a typical urban district in the Mediterranean city of Thessaloniki, Greece. Energy and Buildings 156: 96–108.

61.

Vaughan

Clark

DLC

Sahbaz

, et al. (2005) Space and exclusion: does urban morphology play a part in social deprivation? Area 37(4): 402–412.

62.

Vermunt

Magidson

(2002) Latent class cluster analysis. Applied Latent Class Analysis 11(89–106): 60–106.

63.

Vialard

(2013) A Typology of Block-faces. Doctoral Dissertation. Georgia Institute of Technology.

64.

Witten

Frank

Hall

, et al. (2017) Data Mining: Practical Machine Learning Tools and Techniques. Cambridge, United States: Todd Green.

65.

Yang

(1993) Dosi Jugeodanjigyehoek [Urban Residential Complex Plan]. Seoul, South Korea: Kimundang.

66.

Yeo

Byun

(2010) Seoul Neibeohudeu Gongganpaeteon Yeongu [Seoul Neighborhood Spatial Pattern Study]. Seoul, South Korea: Seoul Development Institute.

67.

Yeo

Cho

(2010) Gyeokjahyeong jugeobeullogui mullijeok byeonhwateukseong yeongu [A study on the physical transformation of grid residential blocks]. Seoul Studies 11(4): 91–100.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

3.59 MB