Abstract
This study examines the relationship between various measures of environmental product variety and retail rents in central urban shopping areas. Using a Geographic Information System (GIS)-based detailed survey database, this research identified 34 layers of environmental product variety in the most representative single-centred shopping areas of the six largest cities in Taiwan. This research extracted layers of product variety and other measures of product variety, such as the number of layers of product variety above each point of interest, the density, the Core/Periphery factor scores, the Shannon entropy index, the Simpson diversity index and the Herfindahl–Hirschman index of each street line buffer area. The proposed method was used to generate three-dimensional maps of the rent gradient and the extracted core and periphery layers of product variety. Thus, a tool was developed for examining the variety features from various angles. The results showed that, in general, the higher the product variety, the higher the rents. Nevertheless, the scores for the core and periphery of the environmental product variety were the dominant determinants; street line buffer areas can only have lower rents if they lacked the correct (i.e. the core layers) environmental product variety, even if they have higher measurements of other variety features.
Keywords
Introduction
Preferences for product variety have long constituted a central concern in urban and spatial research (Dixit and Stiglitz, 1977; Lancaster, 1990; Van Nes, 2005). Product variety is generally treated as a significantly favourable and productive factor in urban and industrial agglomeration (Fujita and Thisse, 2002; Ogawa, 1998; Schiff, 2015), but the methods for precisely measuring variety are insufficient, whether in academia or practice, for two possible reasons. First, regarding the intangibility of spatial features, such as product variety 1 and rental value, researchers and practitioners can only imagine these ambiguous and intangible variety features of a city or spatial study region. However, the spatial distribution of variety sometimes lacks a visual context, leading to contradictions between theory and practice. These intangible locational or environmental externalities, both Marshallian-specialised and Jacobian-diversification, become crucial in determining city sizes and inducing regional innovativeness (Abdel-Rahman and Fujita, 1990; Carlino and Kerr, 2015; van der Panne, 2004). However, what if the spatial distribution of product variety could be observed? The environmental product variety could then be analysed. The patterns of retail agglomeration would then become visible and measurable. Such visibility of intangible rental distributions and environmental features could induce more extensive research.
The second reason why current methods for measuring variety are unsatisfactory derives from the conceptual ambiguity of product variety. Variety, diversity, difference and even complexity could be, to some extent, synonymous (Batty, 2005; Gans and Hill, 1997; Jost, 2006; Straathof, 2007). When variety is discussed from a spatial perspective (e.g. a city or a shopping area), the characteristics of variety become a relative concept of agglomeration and the clustering of activities. Under this circumstance, the concepts of variety include not only higher ‘abundance’, ‘difference’ or ‘number’ but also the degrees of ‘richness’, ‘concentration’ or ‘evenness’. Moreover, the characteristics of this product variety depend on whether or not the spatial pattern of agglomeration can generate higher increasing returns, and such quality can entail different concepts. For example, a higher number of retail stores of different categories indicate higher variety (in scope); a higher number of retail stores in the same category are also treated as a higher variety (in scale). A heterogeneous agglomeration and its incentive are based on the economies of scope (Fujita, 1989). Moreover, a homogeneous agglomeration pursues increasing returns from economies of scale (Fujita and Thisse, 2002). Their benefits increase the frequency of consumer searches or reduce the shopping costs of the same trip in multipurpose shopping (Fischer and Harrington, 1996).
Due to the above-discussed complexity in measuring environmental product variety, this paper proposes a multi-stage method to tackle this significant subject. The retail agglomeration in central urban places, therefore, is the most observable environment for product variety, in terms of data availability in number, type and location of retail stores. Using detailed survey data from the central shopping areas of the six largest cities in Taiwan, this paper develops a six-stage process to extract different retail-category layers, each with exact clustering locations. This spatial data mining process provides the specific environmental retail distribution of each defined measuring unit. The specific distributions of retail-category layers of measuring units later generate different diversity and variety indices. Retail rental data of measuring units, as the performance variable, are then examined to understand if a higher product variety translates into an higher rental level. Because of the exact locational extraction process of product variety, three-dimensional visualised maps were generated in the final stage, providing an observer-friendly interface for later research.
Literature review: agglomeration and diversity
For a city or town centre pursuing vitality and viability, a retail development and business environmental management strategy plays a crucial role in generating the power of attraction (Cirman and Pahor, 2009; Dolega and Lord, 2020; Ravenscroft, 2000). Shopping atmosphere and retail amenities have undoubtedly become the engines of customer drawing power for a city centre (Teller et al., 2016; Teller and Elms, 2012; Weiler et al., 2003). The clustering of various retail stores and service providers attracts consumers with multiple demands. Product variety is, thus, closely related to productivity (Behrens and Robert-Nicoud, 2015; Ciccone and Hall, 1996; ; Koster et al., 2014). Retail-led urban planning and development have become essential instruments for city centre regeneration (Balsas, 2018; Guimarães, 2017). Consequently, such a cluster of various retail stores or shops also shows the personality of a city and forms its attractiveness (Scoppa and Peponis, 2015). Moreover, the role of diversity and heterogeneous agglomeration has been considered as the source of sorting of heterogeneous location choices of workers and firms as well as the selection opportunities (Behrens and Robert-Nicoud, 2015). All these externalities generated from diversity and spatial clustering stimulate the ‘natural advantage’ for innovation (Carlino and Kerr, 2015).
Because such a retail structure is crucial in generating city centre agglomeration economies, previous relevant research has focused on two aspects. The first is the retail agglomeration patterns based on classical location theories, such as the hierarchical spatial distribution of retail activities from central place theory (Clark and Rushton, 1970) and the spatial competition process under Hotelling’s principle of minimum differentiation (Meagher, 2012). These classic locational theories can be applied to explain the general spatial patterns of city activities. The second aspect of relevant research has involved focusing on the sources and optimal conditions of agglomeration patterns, including economies of scale and economies of scope caused by Marshallian (specialisation) and Jacobian (diversification) externalities (Abdel-Rahman and Fujita, 1990; Caragliu et al., 2016; Carlino and Kerr, 2015; van der Panne, 2004); optimal variety conditions under monopolistic competition (Chang, 2012; Dixit and Stiglitz, 1977; Peng and Tabuchi, 2007); and the spatial distribution of increasing and constant returns of industries as suggested by Krugman’s (1991) core and periphery relationship. 3 These externalities, product variety and spatial distribution concepts are considered and designed into later empirical studies, including the influential buffering range of retail stores, spatial product variety feature extraction and generating different product variety indices.
Moreover, the pattern of spatial distribution of product variety is also closely related to the characteristics of agglomeration economies. Hence, the spatial structure of retail concentration is also a significant concern to city retail amenities (Burger et al., 2014; Konishi, 2005; Murata, 2003; Potter, 1980). According to the principle of minimum differentiation suggested by Meagher (2012), retailers selling similar products tend to cluster together. However, Fujita and Thisse (2002) indicate that in addition to agglomeration (or centripetal) forces, the spatial configuration should also be influenced by dispersion (or centrifugal) forces. This paper suggests that for a shopping area, the spatial agglomeration pattern should also follow the core and periphery relationship similar to the concept indicated by Krugman (1991) in the regional study. Moreover, the core retail categories that could generate increasing returns should occupy core areas, and other supportive retailers, who generate constant returns, should be at periphery locations.
Regarding the discussed intangibility and ambiguity, it is extremely difficult to identify and measure variety or diversity because of the ambiguity of the essence of variety. In particular, in a complex environment, the dimensions of variety may generate even greater concerns when a simplified concept is preferred. For example, the number of layers of retail categories, the number of differentiated agglomeration areas and the density of shops within observed street areas are easier to apply directly compared with the Shannon entropy index, Simpson diversity index or Herfindahl–Hirschman index (Gans and Hill, 1997; Jost, 2006; Straathof, 2007). The concepts of variety, diversity, differentiation and complexity are sometimes difficult to delineate. Nevertheless, while a model extracts specific effects and those that are influential to certain performance variables, precise measurement is the top priority.
Although retailers of the same type tend to cluster together, preferences for variety still dominate the choice of location. Therefore, a prime location with higher rent should still have higher environmental product variety and, of course, retailers within these dominant groups should have sufficient rental payment ability. Consequently, the main research hypothesis of this study is for a central urban shopping area: the higher the environmental product variety, the higher the productivity and, thus, the higher the retail rent. Furthermore, this study also suggests that not all characteristics of variety exert positive effects on productivity and rents. This concept is simple to understand if product variety is treated as a part of resource allocation; thus, the distribution of resources should follow the core and periphery relationship. The core layers generating increasing returns are generally the main purpose of shoppers; the periphery layers, by contrast, generate constant returns and perform supportive functions.
Data and methods
To connect the above theoretical concept and the empirical study, a six-stage data mining process was developed. The value of this six-stage method is to extract environmental product variety under the specific location of each retail store. Using the minimum influential buffering area of each retail store ensures that the measuring is environmental, not only the existence of each retail store at a specific location. The spatial clustering of the retail stores of the same type gives us the measure of Marshallian externalities. At the same time, a defined 34 retail-category system provides diversity measures based on Jacobian externalities. Using the spatial joint function provided by ArcGIS, the detailed distribution and the combination of product variety could be extracted. This data set allows us to generate various product variety indices, including Shannon entropy, the Simpson diversity index, HHI and the core and periphery scores. These generated indices and variables are then examined further regarding whether higher the product variety has a higher rent level. Finally, these results were presented in 3D visualisations to provide policymakers with alternative viewpoints of the spatial distribution of the environmental product variety.
The data
Data were gathered from open government data sources or other standard geographical open data sources such as OpenStreetMap or Geofabrik. 4 However, the data quality was unsatisfactory. Therefore, establishing a GIS-based spatial database from a detailed field survey became the primary task. The fundamental layer of a retail shop data set follows the concept of a point of interest (POI) used in OpenStreetMap (OpenStreetMap Wiki, 2016). This database can identify the specific location of each retail shop, the name of the business, the retail activity or activities of the shop and the floor level(s) the store. Other than the fundamental POI layer, this research used at least three of the original layers of data sets including (1) the point data of retail rents, (2) the line data of street networks from the open data source Geofabrick and (3) the basemap from the open data source OpenStreetMap. The point data layers of retail rental data were collected from several open data sources, such as the websites of leading real estate agents. Thus, with these four basic data layers in GIS, this study generated all the extensive results, in 2D and 3D, such as the distribution of agglomeration areas of each layer of retail category, indexed product variety and the rent gradient of each selected shopping area.
To understand the product variety features of the highest retail hierarchy, 5 six shopping areas were surveyed. The selected shopping areas of this research were as follows, from north to south: (1) the Ximending shopping area of the Taipei metropolitan area, (2) the Taoyuan railway station shopping area of the Taoyuan metropolitan area, (3) the Hsinchu railway station shopping area of the Hsinchu metropolitan area, (4) the Chungyo shopping area of the Taichung metropolitan area, (5) the Tainan railway station shopping area of the Tainan metropolitan area and (6) the Shinkuchan shopping area of the Kaohsiung metropolitan area. The field research took place from late 2014 to early 2016. The image or video data taken by surveyors were then coded into ArcGIS.
The basic details of the six surveyed shopping areas, such as the population, size of the surveyed area, number of POIs and overall density of POIs, are shown in Table 1. Using the same survey method, the density of POIs was higher in the north rather than the central or southern cities. Furthermore, the highest density shopping area of this survey was in Taipei, the largest metropolitan area.
Basic details of the selected shopping areas.
The rental value of each high-street retail store is always difficult to determine in data collection. Asking prices and after-transaction prices are both collected, but after-transaction prices were mainly used in this study. The data sources were based on open transaction websites such as the actual price registration webpage of the Ministry of the Interior and the transaction records provided by private real estate agencies such as Sinyi Realtor, Yungching Realtor and 591 Realtor. The rental data were also collected and coded in ArcGIS as point data. Time and regional differences among records were considered necessary tunings for inflation and CPI adjustments, which were conducted to prevent general bias. As described, these rental points were captured using the line of street data from the open data source, Geofabrik. The street lines with rental values then became the source of performance variables and the base for plotting the rental gradient of each shopping area.
Research design: six stages of product variety data mining process
After the field survey and coding process of retail shop POIs and rental value points, the next step entailed defining and extracting the layers of product variety. As mentioned, this extraction process is the most crucial task for verifying the characteristics of product variety. This study carried out the product variety data mining process in six stages (Figure 1).

Product variety data mining process of urban shopping areas.
Stage 1. Basic data preparation of the selected areas
In this research, four basic data sets were used: (a) the POIs of each retail shop location, (b) rental data points, (c) street lines from Geofabrik and (d) the basemap from OpenStreetMap. All the subsequent extension data sets and variables were generated from these four basic data sets, such as the retail agglomeration areas and the rental gradient. The required tunings of the raw data, e.g. the rent adjustments and retail activity identification, were also completed at this stage.
Stage 2. Defining and identifying product variety
To extract the layers of retail categories, the classification and definition system of product variety was essential for deriving research results. The design of the elements for indices depends on the research objectives.
This research is the first to extract the layers of retail categories for a 3D visual demonstration of spatial distributions. Therefore, a standard retail-category system was applied according to the most general differences in retail features (Table 2), involving 34 retail categories. A retail-category system is generally used in department stores or shopping centres as the way to classify products or services into different groups or departments. Once all POIs were categorised into 34 retail categories (combined from at least 120 subcategories), the layers of retail categories could be developed.
The 34 layers of retail categories and the defined influential buffering range.
Stage 3. Generating the spatial distribution of the agglomeration areas for each layer of retail category
This stage involved generating all 34 layers of retail categories separately. Each layer presented the actual spatial distribution of agglomeration areas of the single product category. The process for determining the agglomeration areas is described as follows: (a) all POIs were classified into 34 standard retail categories, (b) the layer of each retail category was selected and exported and (c) each POI was buffered according to the effect range shown in Table 2. For example, in standard category shops such as apparel shops, footwear shops, or hair salons, 15 m is the environmental influential effect range. 6 Hence, if one or more shops of the same type is within 15 m, then the affected buffer areas overlay. Under this condition, dissolving all the overlapping buffer areas enables generating the agglomeration areas.
Stage 4. Generating street line buffer areas: capturing rents and other variety features
These street line buffer areas (SBAs) become the basic measurement units of performance data (i.e. rents). Hence, other than the retail rental values, the number of shops, shop density of each street and the calculation of product variety indices were based on these SBAs. Data mining processes such as multiple regression, factorial analysis, clustering analysis and the 3D rental gradient also were based on SBAs of each shopping area.
Stage 5. Connecting all data sets: generating product variety indices
The major task at Stage 5 was to establish a connection between data sets and generate more variety indices from it. The process involved (a) using ArcGIS to establish the link among POIs, rents and spatial features; (b) conducting a factorial analysis to extract the core and periphery variety layers and scores; and (c) generating variety indices (e.g. Shannon, Simpson diversity and Herfindahl indices).
Figure 2 shows the combined two-dimensional (2D) visualisation of the related data sets. These visualised results presented sufficient information; hence, illustrating the details was difficult. However, the data sets (Figure 2) were clear and enabled calculating the product variety indices.

The connecting process to capture the characteristics of each POI and SBA.
Factorial analysis was used for extracting the core and periphery layers of retail categories. This basic data mining process should reveal the dominant product variety textures that generate higher rents. The factor scores for each SBA were tested in the regression model to examine their effects on rents. For other variety variables, other than the captured explicit variety variables (e.g. number of stores, number of retail-category layers and size of agglomeration area), the combined data sets provided sufficient information for calculating variety indices (e.g. Shannon entropy index, Simpson diversity index and Herfindahl–Hirschman index). These extended indices provided more information on richness, evenness and concentration. The definitions of the extended variables used in this research are as follows: Shannon entropy index (expected value of choice and diversity
7
):
Simpson diversity index (richness and evenness):
Herfindahl–Hirschman index (the concentration of the market):
The selection of these three indices is based on the advantage of the measuring meanings. The Shannon entropy quantifies the degree of surprise, i.e. the chance that a shopper could encounter the abundance of product variety in the measuring unit. Simpson diversity index (1 – D) for evenness and HHI for concentration are two complementarity indices, which measures if the retailers of the same category are concentrated or evenly dispersed within the measuring unit.
Stage 6. Generating data mining results
After the extraction stages, the final stage involved mining for detailed results by including (a) the factorial analysis to extract the core and periphery retail categories, (b) a regression model for examining the main hypothesis and other product variety features, (c) the TwoStep cluster analysis for SBAs and (d) 3D illustrations.
Factorial analysis, or the principal component analysis method used for dimension reduction, is widely used in data mining processes. In this analysis, the core and periphery layers of retail categories could be extracted. The research hypothesis here is that the core layers could generate higher environmental product variety externalities to the POIs but not the periphery layers. Therefore, to understand the significant contributions of each layer, this analysis involved using the distribution of the frequencies of the 34 layers in all SBAs 8 to generate potential factors.
For the regression tests, at least three general models were examined in regression
Models 1 and 2 were examined based on the environmental externalities of each POI. Model 3, in contrast, was examined according to the variety of the SBAs. Because SBAs are the basic units of shopping areas, Model 3 was used to test seven product variety indices of SBAs regarding retail rents. The descriptive statistics of the applied product variety variables are listed in Table 3.
Descriptive statistics of product variety measurements.
Nevertheless, a preliminary examination of these product variety indices revealed high correlations. Consequently, as described in the last section, Model 3 was disaggregated into Model 3-1 and Model 3-2 to prevent the problem of multicollinearity. The first subtest entailed testing the core and periphery factor scores generated at Stage 5. The research hypothesis suggested that the agglomeration of core layers could generate higher rents; however, this implies that the clustering of periphery layers tends to occur in locations with lower rents. A second subtest was performed to examine the influence of the independent product variety variables of Model 3 on retail rents. However, because of highly correlated issues, multiple regression cannot be used to test the separate effects. Thus, principal component regression (PCR) was used to examine the model.
Moreover, a robust regression process was used to deal with potential problems of heteroscedasticity. White’s adjustment and weighted least squares method were applied to provide consistent standard error and covariance. The final mining process at this stage entailed using the basic clustering algorithm to explore the product variety characteristics among various clustering groups of SBAs. The TwoStep cluster component of SPSS is a scalable clustering algorithm that can handle both continuous and categorical variables, revealing the specific variety characteristics of each identified cluster.
Results
The empirical results illustrated in this section mainly pertain to the core research hypotheses at Stages 5 and 6, involving four parts: (a) the factorial analysis results of core and periphery retail categories, (b) the regression models for main hypotheses, (c) the descriptive and cluster analyses for POIs and SBAs in the six cities and (d) the 3D visualisation of the spatial allocation of product variety. The demonstrations and related implications are addressed subsequently.
Factorial analysis results
The factorial analysis results in Table 4 show the representative core and periphery retail categories. The Kaiser–Meyer–Olkin (KMO) measure of sampling and Bartlett’s test of sphericity results was favourable (KMO measure was 0.887, and the significance of Bartlett’s test of sphericity was lower than 0.001) for factorial analysis. Both the eigenvalues and scree plot show that the core factor had the highest eigenvalue and the steepest slope of the scree plot. Therefore, the representative variables of the core factor were determined to be the core layers of retail categories, and the remaining variables of the selected potential factors were the periphery layers of retail categories.
Extracted core and periphery factor of product variety.
The initial eigenvalue of Factor 1 is 10.146; % of the variance is 29.842; the initial eigenvalue of Factor 2 is 3.079; % of the variance is 9.055; Kaiser–Meyer–Olkin measure of sampling: 0.887. Bartlett’s test of sphericity: approx. χ2 15,545.256, sig. < 0.000.
From the strength of loadings (higher than 0.8), the most representative layers of the core factors were L2 (apparel and accessories), L28 (department stores) and L3 (beauty, hair and cosmetics); the loadings of L10 (footwear) and L1 (restaurants and foods) were also higher than 0.75. The representative layers of the periphery factor (loadings higher than 0.6) are L6 (hospital, clinic and dental), L20 (drug stores and medical care), L27 (DIY and hardware) and L13 (specialty services). In subsequent regression models, the factor scores for both core and periphery factors were examined to determine the effect of each SBA on rents. A positive effect was expected for the core factor.
Regression results
(a) Tests of
As mentioned in the previous section, three basic regression models were examined. The first was the main hypothesis of this research: the higher the variety, the higher the productivity and, thus, the higher the rent. Table 5 shows that all tests on the data sets, for overall and each of the six cities, the variable ‘
Regression tests of the significance for higher layers the higher the retail rents.
Dependent variable: ln rent.
***<1%.
(b) Tests of
Whereas Model 1 indicated that a higher number of layers above a POI generate higher rent, Model 2 enabled a detailed examination of the effect of different retail-category layers on each POI. Table 6 shows the multiple regression results of retail-category layer above the POI regarding its rental value. There was no collinearity problem (all VIF < 2) in this test. The result indicates a general concept of the influence of different types of retail-category layers, in that they are significantly positive, significantly negative or not sufficiently significant to rent. Therefore, not all layers of retail categories are favourable sources of positive environmental externalities because some of them even lower rents. To some degree, these results match the factorial analysis results in Table 4. The representative core layers (i.e. L2, L28, L3, L10 and L1) had a significantly positive relationship with the retail rents of POIs, whereas the representative periphery layers (i.e. L6, L20, L27 and L13) had a significantly negative relationship. Therefore, in subsequent regression models, the core scores were expected to have positive effects on the rents of SBAs, whereas the periphery scores were expected to have negative effects.
Regression results for each layer of product variety.
Dependent variable: ln rent.
***<1%.
(c) Tests of
The results in Table 7 for Model 3-1 support the hypothesis that the ‘
Principal component regression of product variety measures.
For Model 3-2, Table 7a shows the potential factor extraction process (two factors extracted with the eigenvalue > 1) and the order and loadings for each product variety index. The KMO measure of sample adequacy was 0.574, and Bartlett’s test of sphericity yielded sig. < 0.001. Table 7b shows that the six product variables had strong positive loadings (higher than 0.65) in Factor 1.
The ranking orders of the representative variables, to some degree, showed the importance of these variables; ‘
Descriptive and TwoStep cluster results
This part of the empirical study entailed examining the data sets by using descriptive statistics and the TwoStep cluster data mining process to unveil detailed features of the product variety of shopping areas. The results of Models 1–3 indicated that higher product variety was favourable in retail agglomeration for both individual POIs and SBAs; however, the information was insufficient for realising the detailed characteristics. The dominant layers were L1 (restaurants, cafés and foods), L2 (apparel and accessories) and L3 (beauty, hair and cosmetics). The L1 number totalled more than 300 shops (and as many as 700 shops in Taipei’s Ximending shopping area). In total, 5717 of 10,648 POIs (53.7%) were from L1, L2 and L3. Almost all of the remaining 31 layers totalled fewer than 100 shops.
Another crucial variable that had to be examined in detail was the number of layers of environmental retail categories for each POI. ANOVA showed that the difference was significant. However, the detailed information showed that, for each POI, the overall mode comprised three layers; for each shopping area, the mode comprised two to four layers. Only the shopping area in Taipei had few POIs (five points) with 10 layers of retail categories; the average rent for these five points was $NT13,335 per month, which is high but not the highest rental value area. Therefore, examining the details of SBAs was necessary for obtaining additional information on the characteristics of product variety indices.
The next data mining process used in this study required using TwoStep cluster analysis. After several trial and error processes and consideration of the main research objectives, the first TwoStep clustering was applied using the average rent in a natural logarithm as the input variable for determining the clusters, because the rental value was the main performance variable. Hence, the detailed product variety indices under different levels of rents were one of the major concerns of this study. The cluster quality chart (Figure 3(a)) indicates that the overall model quality was ‘good’. The cluster size view (Figure 3(b)) shows the frequency of each cluster. Viewing a particular slice in the pie chart indicates the number of SBAs (844 in total) assigned to the cluster. A total of 1.8% of the SBAs were assigned to the first cluster, 4.7% to the second, 10.5% to the third, 60.3% to the fourth and 22.6% to the fifth. The cluster number was obtained using SPSS, and the order from left to right was ranked according to the highest to the lowest average rent of the cluster. Clearly, Cluster 1, despite having only 15 SBAs, had the highest rental value ‘

The SPSS TwoStep cluster analysis results of overall SBAs. Note: The variable Simpsont and HHIt here are simply to avoid the same 0 value among clusters; hence, the transformed variables are: Simpsont = Simpson × 106; HHIt = HHI × 106.
Figure 3(c) presents a detailed distribution of each feature of product variety, in addition to the abundance of the texture of variety and difficulty for researchers and planners. Only 15 of the 844 SBAs indicated that the higher the product variety indices are, the higher the rent. Cluster 3 with 89 SBAs (10.5%) had the third-highest rents but the second-lowest contributions in ‘
3D illustrations
The final results of the analysis of product variety in this study were the 3D visualisation of the rental gradient and the layers of retail categories. Due to new technology in spatial analysis, the integrated GIS-based spatial data sets established in this research efficiently generated the rental gradients. They provided a clear view of the spatial distribution of each layer of the retail category; the 3D demonstration presents a specific location and allocation of the layers of retail categories. As shown in Figure 4, previously unidentified invisible rents and product variety could be easily observed at a specific location. The same method was used for the illustration in Figure 4, to demonstrate the spatial distribution of different shopping areas; only the POIs and street lines were added to the bottom layers.

The spatial distribution of layers of product variety and the rent gradients of six shopping areas. Note: The values of Core and Periphery are the means of SBAs in each shopping area. The P values are all <1%.
Figure 4 presents the results from regression and the TwoStep cluster analysis results. The 3D maps provided tools that were more specific and precise for observing environmental product variety. Exploring the rental gradient of each shopping area from different angles on the map enables the reader to identify accurately individual layers or product variety hotspots.
Discussion
This study aimed to propose an effective method to extract the environmental externalities from a multi-stage spatial data mining process. With the six-stage data mining process, this study provided more thorough and detailed measurements of environmental product variety, as compared to previous studies such as Brown (1987), Clark and Rushton (1970) and Konishi (2005). As the final results show, this multi-stage process could extract the exact locational distributions of different retail-category layers, which provide the fundamental information to generate further product variety indices. From this paper, the data mining process of the environmental product variety provides some fruitful discussions. First, the empirical results illustrated that, for a shopping area in a central city, product variety is a favourable factor of consumer preferences and is also an environmental variable that increases the productivity of retail shops. However, the concept of product variety entails numerous ambiguous and invisible features for researchers and urban planners. Hence, precisely measuring the characteristics of product variety is crucial for accurately evaluating the characteristics of the environmental features and understanding their spatial distributions. This study suggests using a series of methodologies for precisely measuring the features of product variety and establishing a six-stage data mining process.
Through data mining, a series of empirical results were used to establish a profile of product variety for central urban shopping areas. The factorial analysis of the 34 layers of product variety extracted the core and periphery layers from the original data set of the distribution of the 34 retail categories of each SBA. The regression results of the individual POIs show that, although the variable of the number of layers above each POI had a significantly positive relationship with the rental value, none of the existing layers had a higher rent.
The tests of the scores for the core and periphery layers also confirmed that the core layers had a significantly positive relationship with rents, whereas the periphery layers had a significantly negative relationship. These results showed that, although higher product variety could indicate a higher rent, only the agglomeration of the ‘correct layers’, namely the core layers, had a positive influence on retail rents. Other product variety indices provided an even clearer profile for understanding the texture of the product variety. PCR showed that the scores for the favourable product variety measurement variables had a significantly positive relationship with rents. Nevertheless, although regression models generally support the conclusion that a greater variety corresponds with a higher rent, a detailed product variety profile requires using the results from descriptive statistics and TwoStep cluster analysis to demonstrate. The findings suggest that the environmental product variety, in general, exerts a positive effect on shopping areas. Characteristics of product variety, such as the number of layers above each POI, the number of shops, density, Shannon entropy index (variety in choice), Simpson diversity index (richness and evenness) and HHI (concentration), all had a significantly positive relationship with retail rents. However, the spatial distribution of the core and periphery layers of product variety is the dominant characteristic of retail clustering. The strongest hotspots undoubtedly have favourable features with higher core factor scores. Still, areas with a high periphery and low core scores, even with higher Shannon, Simpson or HHI factors, can only have low rents.
Finally, the GIS-based spatial database of product variety enables generating 3D maps of the rental gradient and the spatial distribution of the product variety of shopping areas. The 3D maps provide an opportunity to observe the features and distributions from different angles and various purposive views. For example, the 3D maps of the spatial distribution of the core and periphery layers demonstrated that only relatively few SBAs could generate higher rents, and the concentration areas of the core layer activities were closely connected to higher rental hotspots. With this powerful 3D visualising tool, researchers and planners can explore in-depth the product variety environment.
For environment and planning policymakers, we believe that this multi-stage spatial data mining process could also be implemented in other environmental diversity identification exercises, such as in understanding the abundance of tourism resources, the sufficiency of medical services and the gaps in public facility provision. The effectiveness of this mining process could be more powerful, once the identification of the influential range of POIs could be more precise and accurately determined through, for example, the Internet of Things.
Weight type: inverse standard deviation (no scaling). White heteroskedasticity-consistent standard errors and covariance. Dependent variable: ln avrent.
***<1%
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Ministry of Science and Technology, Taiwan, ROC (MOST: 104-2119-M-305-001).
