Abstract
This study explores the effect of the spatial configuration of street networks on movement patterns of users of a cycling monitoring app, employing crowdsourced information from OpenStreetMap and Strava Metro. Choice and Integration measures from Space Syntax were used to analyse the street network’s configuration for different radiuses. Multiple linear regression models were fitted to explore the influence of these measures on cycling activity at the street segment level after controlling other variables such as land use, household density, socio-economic status, and cycling infrastructure. The variation of such influence for different time periods (weekday vs. weekend) and trip purposes (commuting vs. sports) was also analysed. The results show a positive significant association between normalised angular choice (NACH) and cycling activity. Although the final regression model explained 5.5% of the log-likelihood of the intercept model, it represents an important improvement compared with the base (control-only) model (3.8%). The incidence rate ratio of NACH’s Z scores was 1.63, implying that for an increase of one standard deviation of NACH, there is an expected increment of about 63% in the total cyclist counts while keeping all other variables the same. These results are of interest for researchers, practitioners, and urban planners, since the inclusion of Space Syntax measures derived from available public data can improve movement behaviour modelling and cycling infrastructure planning and design.
Introduction
Promoting active modes of transportation, and particularly cycling, is one of the main strategies towards sustainable mobility. Urban cycling implementation efforts have mainly focused on building and improving infrastructure, including segregated lanes, bikeways, and parking facilities. Urban planners usually face the question of where to build such infrastructure, since new cycle lanes reclaim space traditionally dedicated to motorised vehicles. However, little attention has been given to the network’s configuration effect on the planning and design process (Law et al., 2014). The link between cycling and the built environment has been explored before, finding that connectivity and land use mixture (Saelens et al., 2003; Winters et al., 2013) and the existence of differentiated bicycle infrastructure (Chang and Chang, 2009; Fagnant and Kockelman, 2016; Winters et al., 2013) are important factors. However, the network’s intrinsic relational, structural, and geometric properties and their influence on cycling movement have been less studied. Exploring this relationship at a detailed geographical level has one very practical purpose: it can help urban designers and policy-makers determine where to build cycling infrastructure from an evidence-based approach. To serve this purpose, the importance of using data from an extended period to draw a general pattern of cycling movement has been highlighted (Heinen et al., 2010).
Cycling data with high spatial coverage and detailed granularity is costly; therefore, studies were usually based on constrained data sets: observations limited to certain urban area sectors and certain time periods, phone surveys prone to memory and perception bias, census data without routing information, or experiments with limited numbers of volunteers or for short periods. Moreover, most research on this subject has been conducted in developed countries, where local governments have the financial and logistic capabilities to conduct and maintain traffic monitoring programs, including pedestrian and cyclists’ counts. In Latin America, transport research on cycling has been conducted mostly from a health perspective (Hoehner et al., 2008; Sadarangani et al., 2018) or as a component in general commuting trends (Guerra et al., 2018). Some studies have explored the relationship between cycling and the built environment in more detail, but they were limited to areas with specific initiatives that enabled data collection at a fine scale (Zieff et al., 2018). Consequently, studies about the relationship between the urban structure and bicycle movement at a detailed scale in Latin America are lacking.
The use of crowdsourced Big Data represents a promising solution for the inconvenience of data-gathering limitations. Volunteer-generated information can provide data at enough resolution to make it relevant for research and informed decision-making. This is especially true for Latin American cities where resources for extensive data collection and infrastructure implementation are scarce.
In this study, we explore the possibilities of using large volunteered and crowdsourced geo-information data sets (OpenStreetMap and Strava Metro) to study the relationship between urban morphology and cycling movement behaviour in Cuenca, Ecuador. Specifically, we analyse the influence of the network structure on cycling activity on each street segment recorded with the Strava fitness app, using Space Syntax. The study’s contribution is twofold: on the one hand, we introduce a methodological approach to explore the relationship between the structure of the urban network and cycling activity that can be cost-effective, scalable, and systematic, especially for cities where dedicated data on cycling are scarce or even non-existent. On the other hand, we provide evidence of the influence of Space Syntax measures on the spatial behaviour of a cycling group using a fitness monitoring app.
The following section briefly introduces the theoretical background relevant to our study: Big Data, Space Syntax, urban morphology, and cycling behaviour. The third section explains the methodology, the data sets used, and statistical analyses. Then the main results are presented and discussed. Finally, conclusions are given.
Background and related work
Big Data and urban studies
Big Data have been a popular topic related to urban studies in the last decade, creating enthusiasm, hype, and a fair amount of cynicism (Rae and Singleton, 2015). It is mainly related to the concept of smart cities, where sensor-generated geo-located data enrich the understanding of the functioning of cities (Batty, 2013). The core characteristics of Big Data (big in volume, high in velocity, exhaustive in scope, fine-grained in resolution, flexible, extensible and scalable) (Chen et al., 2016; Kitchin, 2014) allow the dynamism in urban processes to be unveiled, which forces planners and researchers to create a new understanding of urban relations. Kitchin (2014) draws attention to recent urban studies made by physicists employing Big Data analytics to model social and spatial processes while ignoring previous social science development and resulting in reductionist and functionalist analyses that overlook the effects of culture, politics, policy, governance, and capital. Rae and Singleton (2015) highlighted the importance of addressing basic questions such as, “How have Big Data helped cities and regions?”, which have been overlooked or unsatisfactorily answered. Big Data are usually treated as if a whole domain were captured with full resolution, while in reality it is only a sample, shaped by technology, and therefore subject to sampling bias (Kitchin, 2014).
Beyond the precautions that the use of Big Data requires, several studies have employed Big Data for applied urban research in many fields, mobility and transportation amongst them. Examples include data analysis from smart travel cards for Bus Rapid Transit travel planning (Tao et al., 2014), the use of sensor data in trash containers to organise routes for trash collection vehicles (Ma et al., 2018), and the use of mobile phone location to examine job–housing balance and employment self-containment (Zhou et al., 2018).
With both defendants and detractors, it is important to realise the potential of Big Data, and to direct efforts of applying it to gain insight about concrete issues regarding urban relations. This is especially important for cities in developing regions, where official programs to produce data sets for urban management are limited.
The built environment and cycling
The relation between cycling and the urban built environment has been researched over the last years, providing important theoretical, empirical, and methodological contributions. Parking et al. (2008) developed a model based on census data to explore the socio-economic, physical, and transport infrastructure determinants of preference for cycling to work in the United Kingdom (UK). Krenn et al. (2015) developed a bikeability index for Graz (Austria) based on 278 GPS recorded trips from 113 participants. The index was composed by some variables: bikeways, green areas, main roads, and topography. The authors found that regular cyclists live in more bicycle-friendly neighbourhoods than non-cyclists, although it is unknown if cyclists choose bike-friendly neighbourhoods or the infrastructure improvement increased cycling. Lovelace et al. (2017) designed the propensity to cycle tool (PCT), an online platform based on origin-destination census data for the UK to map current and what-if scenarios of cycling levels at different scales with the aim of illustrating the potential effects of public policies. Yeboah et al. (2015) examined the proportional use of cycling facilities by utility cyclists in Tyneside (UK) using a proposed corridor analysis based on surveys and GPS data collected from 79 volunteers. They found that 57% of the cyclists used bikeways, while 34.1% cycled outside bikeways and 8.9% near them.
Street network structure and the spatial behaviour of cycling
Street network structure is one of the main features of urban space that influence movement behaviour. This relationship is at the core of Space Syntax, a theory that uses analytic, quantitative, and descriptive tools to extract spatial properties and their association to social performance with graph theory fundamentals (Hillier and Hanson, 1984; Turner, 2007). The pattern of relations between spatial elements, like streets, is called configuration and a set of configurations forms a structure (Al-Sayed et al., 2014). Hillier (2007) states that cognitively, these patterns of relations highly affect our behaviour, and this effect can be quantified and studied in different areas: natural movement and movement economies (Hillier, 2007), crime (Chiaradia et al., 2009), and poverty (Bolton et al., 2017), among others.
The movement of cyclists in the urban environment is not oblivious to the effect of network structure properties, and therefore, can be read and interpreted with Space Syntax measures. Raford et al. (2007) found that the aggregated journeys of cyclists, independent of origin and destination, seem to form a powerful spatial logic, where Mean Angular Depth is the most important factor for explaining the volume variation of cyclist at a system scale, probably because streets with lower angular mean depth are probabilistically “shallower” to more origin–destination pairs. Law et al. (2014), in a comparison study using data between 2003 and 2012, showed that Normalised Angular Choice at radius N (global) had statistically greater explanatory power on cyclist movement than the London Cycle Superhighway presence, meaning that route accessibility was more important to explain aggregate cyclist movement than cycling infrastructure. Rybarczyk and Wu (2014), using Space Syntax Visual Graph Analysis, found that utilitarian cycling was positively linked to the perceived ease of reaching the closest traversable space with visual Mean Depth. The authors also concluded that Mean Visual Entropy had a negative effect on bicycle mode choice, meaning that increased disorder near the trip’s origin reduced the probability of cycling. All these studies on the relationship between cyclist movement and urban form show the importance of the built environment intrinsic properties for shaping and modifying route choice.
Network structure properties can be affected by other attributes such as restrictions or impedance values derived from physical or normative characteristics. Yeboah and Alvanides (2015) explored the influence of network restrictions (i.e. one way) on route choice behaviour of cyclists and found that actual routes were significantly longer than shortest path routes computed by an algorithm in Newcastle, UK. The authors also found that low-hierarchy roads and footways are preferred in utilitarian trips.
The approaches reported in the literature show some characteristics: (a) most relations between the built environment and cycling were examined at neighbourhood or area level; (b) several were based on self-reported cycling activity and perceptions through surveys; (c) objective cycling activities were analysed for short time periods or at specific places; and (d) most research based on fine-scale data gathered during a long time span was conducted in developed countries. Our investigation aims to contribute to the study of spatial cycling behaviour in Latin America by providing empirical results about the influence of network structures on the cycling activity of fitness app users at the street segment scale. Additionally, it proposes a replicable methodology based on crowdsourced Big Data.
Methodology
Our methodological approach consisted of fitting multiple regression control models with variables related to the physical and socioeconomic environment as predictors and cycling activity counts in each street segment as the response variable. Street network influence was assessed by adding variables representing Space Syntax measures to the control model and comparing their model fitting.
Study area
Cuenca is an Andean city in southern Ecuador, with an urban population of 400,000 covering 73 km2. The street network’s length is 1483.4 km. With a mean altitude of 2550 m and an average temperature of 15°C, the weather is relatively stable year-round. It has four universities, a good coverage of basic services, and a relatively high quality of life. Its historical development, several watercourses, and the Andean topography have shaped urban morphology (Figure 1).

Study area: Cuenca (Ecuador).
Data sets
Two main data sets were used: OpenStreetMap and Strava Metro. Both data sets present an enormous opportunity to have up-to-date data at fine geographical and temporal levels to perform detailed analyses on the relationship between cycling patterns and street network structure. Moreover, these data sets have large coverage including thousands of cities around the world, ensuring this approach’s replicability and scalability.
OpenStreetMap (OSM, OpenStreetMap, 2016) is a free, online, volunteered geographic information (VGI) service and database comprised by street-level features with global coverage. OSM has been used for cycling research related to route choice (Yeboah and Alvanides, 2015), to assess the health impacts of cycling (Mueller et al., 2018), and to study cycling behaviour (Xie et al., 2013), and it has been key for other research areas (Arsanjani et al., 2019). The OSM road network for the study area was downloaded and pre-processed to obtain a centreline map of street segments (pre-processing is detailed as supplemental material 03).
Strava Metro (Strava LCC, 2017) is a commercial service offering different products with fine grained, aggregated, and anonymised data, collected with mobile phones or GPS devices by Strava App users. This study is based on the roll-up product for cycling activities added to each street segment in the study area from 1 September 2014, to 30 September 2015. The data set includes counts for different time periods (weekday vs. weekends) and trip purposes (sport vs. commuting activities). Strava Metro’s data sets are increasingly popular for studying cycling behaviour, mainly to analyse the spatial distribution of cycling activity (Heesch and Langdon, 2016; Jestico et al., 2016; Sun and Mobasheri, 2017). The demographics of the analysed data set were 75% male, 12% female, and 13% gender not reported. The total age range was between 16 and 94 years old, and the most common ranges were 25–34 (27%) and 35–44 (21%). According to the National Cyclist Profile Survey (LlactaLAB, 2018), the cycling community’s gender distribution in Ecuador is 78% male and 21% female (1% did not report gender), and the most representative age ranges are 25–34 (32%), <25 (29%) and 35–44(23%). As for the economic profile, the same survey reports that in Cuenca, 51% of surveyed cyclists have a household income of 500–1000 US dollars, while 35% are private employees and 30% students; 55% have finished Secondary school as the highest level of education (see supplemental material 01). Attending privacy, Strava does not provide economic profiles of their users, and therefore comparisons are not possible. Although the demographics of both data sources are similar in terms of gender and age, the results obtained in this paper apply only to Strava users.
Other data sets include: National census data from 2010 (INEC, 2011) used for computing household density and socio-economic status; Digital Terrain Model at 3m resolution provided by the SIGTierras program (MAGAP, 2012); and a land-use data set provided by the local municipality.
Variables
In this study, the spatial behaviour of cyclists is represented as the number of cycling activities of users per street segment from the Strava Metro data set. Three dependent variables were modelled separately to explore differential influences of the street network structure regarding the temporal dimension and trip purpose: (1) Total cyclists: the overall number of Strava cyclists on each street segment in the study area for a year. (2) Cyclists weekdays: number of cyclists during weekdays, (3) Commuting activities weekdays: number of commuting trips on weekdays.
Based on the literature, a set of explanatory variables related to different aspects of the urban environment were selected as control variables. Three were related to socio-economic conditions (Heinen et al., 2010; Parking et al., 2008): household density represented as the number of households divided by the total block surface; living conditions index (ICV) reported by Orellana and Osorio (2014); and land use mixture (Saelens et al., 2003; Winters et al., 2013) computed as the Shannon index with the following equation
The spatial dimension of the explanatory variables was represented with Space Syntax Integration and Choice (Turner, 2007). Previous research indicates that when these measures are normalised by angular variation and segment length, they are better at capturing the influence of the network’s structure than non-normalised measures, because the syntax normalisation process considers the angular change between lines and lengths, which has been shown to correlate more accurately with movement than other methods, like shorter paths between nodes (Law et al., 2014; Turner, 2007). Therefore, NACH (normalised choice) and NAIN (normalised integration) were computed. NACH is calculated by counting the number of times each street segment falls on the path of the least angular deviation between all pairs of segments within a selected distance radius (Hillier and Iida, 2005), and is computed by the equation
The normalised angular integration NAIN measures how close each segment is to all others in terms of the sum of angular changes that are made on each route (Hillier and Iida, 2005), according to the equation
NAIN and NACH depend on the analysis radius r, and its influence on human movement might depend on the transport mode. Consequently, they were computed for different radiuses to explore their influence on cycling counts and to select the adequate distance for the final model. The selected values for r represented steps from local to global: 400 m, 800 m, 1200 m, 1500 m, 1800 m, 2000 m, 2500 m and Rn (global radius).
Descriptive statistics of the explanatory and dependent variables are presented in supplemental material 02.
Data preparation and the mapping process
This study uses street segments as the unit of analysis; therefore, pre-processing the original data was necessary for computing the required variables at street-segment level. Brief explanations of the process are included in this section and the details are presented in supplemental material 03.
Network simplification
Computation of Space Syntax measures requires a simplified representation of the street network (i.e. centre lines) to appropriately capture the overall structure and to enhance the accuracy of the analyses (Kolovou et al., 2017). OSM and Strava data, however, are drawn in high detail, with complex roundabouts and separate lines for multi-lane roads. Consequently, network simplification was required, which included steps for line generalisation, simplification, connectivity cleaning, unlinking, topology cleaning and validation. The result was a simple street network of fully connected centrelines with complete topology, including unlinks.
Control variables
Socio-economic variables were available as attributes of a regular mesh of hexagonal cells with an incircle diameter of 300 m (Orellana et al., 2017). Hexagonal meshes are often preferred over rectangular grids in spatial analysis because distances between centroids of queen neighbours are equal, and they have the lowest perimeter to area ratio of all regular tessellation figures, reducing the edge effect (Birch et al., 2007; Burdziej, 2019). Street segments received the values of the cell where its centroid was located. This allowed to establish a univocal segment-to-area relation and guarantee reproducibility. Road Hierarchy and Existence of Cyclepaths were available on OSM data. Number of Intersections was obtained by counting intersections in each cell of the socio-economic mesh. Slope was computed using the altitude of each street segment ends obtained from the DEM.
Network structure variables
Space syntax measures NAICH and NAIN were computed on the cleaned street network for the corresponding radiuses using the Angular Segment Analysis algorithm included in the Space Syntax Toolkit in QGIS.
Cycling activity variables
Original Strava data corresponding to the original OSM network were transferred to the cleaned network using spatial join with tolerance values of 50 m for avenues and 7 m for streets, based on the maximum distance from computed centrelines to the corresponding original lines.
Statistical analysis
The statistical analysis comprised three phases. First, individual associations between each one of the explanatory variables and cycling activity variables were computed with generalised linear models. The purpose of this phase was threefold: to explore the non-controlled influence of each variable and decide its inclusion, to select the adequate radius for the spatial variables for the following phases, and to decide between Poisson and Negative Binomial distributions for the regression models.
In the second phase, multiple regressions were applied using the distribution and control variables selected in the previous step, for obtaining parsimonious base models representing the influence of the socio-economic, infrastructure, physical and topographic factors. The variable choice for each model was based on three criteria: (a) each variable’s individual influence; (b) stepwise selection based on Akaike information criterion (AIC) and Bayesian information criterion (BIC), which are measures of a model’s relative quality (Akaike, 1974; Kass and Rafferty, 1995) and Vuong’s test for nested models under the null hypothesis that two models are undistinguishable (Vuong, 1989); and (c) results reported in previous studies based on the literature reviewed and cited in the “Variables” subsection. One model for each dependent variable was produced. To estimate the explicative potential of components over cycling activity, each model was assessed using the likelihood ratio (LR) statistic test and compared with the intercept-only model using McFadden’s pseudo R2.
Finally, full models were fitted for each dependent variable by including the selected spatial variable and comparing it with the base models. Each variable’s influence was reported as incidence rate ratios (IRRs) to facilitate the interpretation. An IRR is the exponentiated coefficient of a negative binomial model and represents the expected amount of change on the dependent variable for one unit of the predictor’s variation, keeping all other variables the same (Zwilling, 2013). Complementarily, original variables and model residuals were tested for spatial correlation using global Moran’s I (see supplemental material 02). All statistical analyses were conducted in R (R Core Team, 2016) (see supplemental material 07).
Results and discussion
A total of 19,103 segments were included in the final cleaned network. Visual exploration revealed some correspondence between NACH values and cycling activity (see Figure 1 and Space Syntax distribution maps in supplemental material 04).
Individual associations
Negative binomial models performed better than Poisson models; consequently, they were used for the remaining modelling steps. Table 1 shows the results for individual NB models.
Results of individual associations.
Significance codes: (.): p < 0.1; (*): p < 0.05; (**): p < 0.01; (***): p<0.001
All individual variables except ICV displayed highly significant association with the three cycling activity variables. ICV had a small but significant negative association with Total cyclists. This was unexpected, since the literature has reported the importance of socio-economic conditions for cycling activity with similar data sets (Musakwa and Selala, 2016) and the Strava Metro data set might be biased towards high economic conditions. However, the association was positive for Commuting activities weekdays. Our interpretation is that since socio-economic spatial segregation in Cuenca is relatively low (Orellana and Osorio, 2014), it has less influence on cycling activity. Moreover, socio-economic conditions might influence the decision of cycling but not necessarily the route choice. Total cyclists includes sport and recreation cycling, which might occur on city outskirts, where living conditions are relatively low. This variable was kept for the next steps to assess its influence when other variables are controlled. Household density showed significant negative association for all three variables. This result contrasts with previous studies and predominant theory, which associates high density with more active mobility. However, some studies in Latin American cities have found negative or no association between residential density and the number of pedestrians at street level (e.g. Hermida et al., 2019; Rodríguez, Brisson and Estupiñán, 2009). Land use mixture showed higher significance for commuting activity weekdays. This is consistent with the theory of natural movement and cities as movement economies: cyclists seem to choose routes in areas with land use diversity. Road hierarchy, existence of cycle paths and number of intersections all had strong association with the three types of cycling activity. Existence of cycle paths had a higher IRR for commuting activities weekdays. Slope had strong negative association with the three types of activity, which is consistent with previous studies and with the general behaviour of cyclists who avoid steep slopes. Finally, an assessment of NACH and NAIN variables at different radiuses indicated that its association with cycling activity increased with the value of the radius, with NACH at global radius (NACH_Rn) having the strongest effect (see supplemental material 05), consistent with previous findings (Turner, 2007). There is some indication that the influence of the network is stronger for commuting trips that are closer to natural movement theory. Therefore, NACH_Rn was selected to represent the network configuration for further analysis.
Multiple regression models
Partial models with incremental complexity were fitted for each variable to decide a suitable set of control socio-economic factors using the LR and ANOVA tests for model comparison. Again, evidence showed that ICV had slight if any contribution to cycling activity variation, since including it did not improve the model fitting (see Model Comparison in supplemental material 06). Therefore, ICV was excluded for the next steps.
Then, base models were tested. The LR for the base model of Total cyclists was −74,167, and explained 3.8% of the log-likelihood of the intercept-only model (pseudo R2 = 0.038). This value was the same for the base model for Cyclists weekdays. For Commuting activity weekdays, the base model had a Pseudo R2 of 0.029.
The full models, fitted by including NACH_Rn to each base model, were compared with the corresponding base models using McFadden’s Pseudo R2 and Z values from Vuong’s test. Table 2 summarises these comparisons.
Regression model fitting.
The full model for Total cyclists explained 5.5% of the log-likelihood of the intercept only-model, which implies a relative improvement of 44.7% compared to the base model. For Cyclists weekdays, the full model explained 5.4% of the intercept model, which represented a relative variation of 42%. Finally, for commuting activities weekdays, the full model explained 4.2% of the intercept-only model, which means a 44.8% improvement from the base model. Vuong’s tests indicated that, in all cases, full models were closer to true models while accounting for complexity. This implies that including the network structure for cycling activity modelling improves the model’s performance by about 44%, and the model is a good compromise between simplicity and power.
As for spatial autocorrelation, Moran’s I values were low for the three models but statistically significant: total cyclists I = 0.072, weekday cyclists I = 0.077, commuting activities weekdays I = 0.072, p < 0001 (500 m inverse distance squared neighbourhood). This implies that there is a remaining spatial effect that was not captured by the model. Although the global autocorrelation of the residuals was rather low, further analysis will be necessary to assess the convenience of using spatial regression techniques to improve models.
Table 3 summarises the influence of each variable in the full models. Household density remained negatively associated to cycling activity when controlled for other variables. For Total cyclists, the IRR was 0.716 (CI: 0.69–0.74), which means that an increase in one standard deviation of household density implies a reduction of about 28% in cyclists, keeping all the other variables the same. Land use mixture was positively associated to cycling activity, and the higher IRR was for Commuting activities weekdays (1.326, CI = 1.275–1.380), implying that each SD increment on the Shannon index will attract around 1/3 more commuting cycling trips. Reinforcing the explanation presented in the previous section, cycling activity in our study seems to be larger in highly commercial-diverse areas where residential density is relatively low.
Final regression models.
Significance codes: (.): p < 0.1; (*): p < 0.05; (**): p < 0.01; (***): p<0.001
Regarding the infrastructure, Number of Intersections had a positive significant association with all cycling activity and was higher for Commuting activities weekdays (IRR = 1.227, CI = 1.172–1.285), implying that cycling activity is higher in well-connected areas. Road Hierarchy was also positively associated with all cycling activity and its IRR for total cyclists 3.3 (CI = 3.1–3.6). This means that primary roads had, on average, three times more cyclists than residential streets. A similar effect was found for Existence of cycle paths, although the confidence intervals were wider (IRR = 3.1, CI = 2.72–3.55).
One unexpected result was that the significance of slope was the lowest of all explanatory variables for Total cyclists (p = 0.007) and for Cyclists weekdays (p = 0.01), when controlled for all other variables, with IRRs of 0.431 (CI = 0.26–0.72) and 0.438 (CI = 0.26–0.76), respectively. The significance of Slope for Commuting activities weekdays was also the lowest but remained highly significant (p = 0003) and the IRR was 0.23 (CI = 0.125–0.44). Also, it is worth noting that confidence intervals were the widest of all explanatory variables. One possible explanation is that while commuting cyclists generally avoid steep streets, these are more tolerable for sport cyclists.
Finally, regarding the network structure, NACH_Rn has an IRR of 1.63 (CI = 1.59–1.67) in the total cyclists model, implying that streets with one SD above the mean in the value of NACH_Rn, had, on average, 63% more cyclists, keeping all other variables equal. This effect was stronger for commuting activities weekdays (IRR = 1.68, CI = 1.62–1.74), a finding that corroborates the assumption that commuting movement is closer to natural movement from Space Syntax theory.
Overall, there were no important differences between the models for the three kinds of cycling activity. All coefficients had the same direction and similar values for the three models and the confidence intervals overlapped. The only two exceptions were Land Use Mixture, whose effect was significantly higher for commuting activities weekdays compared with the other two variables of cycling activity, and Household Density, with a significantly lower effect also for commuting activities weekdays.
These findings evidence three key characteristics of streets for cycling activity: hierarchy, dedicated cycling space and directness. These characteristics are even more important for commuting activity and should guide the design processes of cycling infrastructure.
Conclusions
In this study, we explored the relationship between urban street network structure and the movement behaviour of cyclists using a fitness app at street segment level. With two big crowdsourced data sets with ample coverage and detailed scale, we applied Space Syntax to study the road network morphology and used generalised linear regression modelling to better understand associations, with a replicable methodological approach.
From the spatial perspective, results of the study show that network structure has high influence on cycling activity for our data set. The inclusion of Normalised Angular Choice at global radius in the regression models improved its explanatory power by 42% for total cyclists and 45% for commuting cycling activities on weekdays. Road hierarchy and segregated cycle paths also had a strong positive influence on cycling activity. This is relevant for cycling behaviour modelling, since all three variables can be easily extracted or computed for any street network from OSM and are highly informative. There is also a practical implication for decision-making, since they can be used as guidelines for planning and implementing cycling infrastructure.
There are some other important outcomes related to variables commonly analysed in the literature. First, socio-economic conditions seem to be irrelevant for cycling activity at street segment level for our data set, in contrast with findings from previous studies (Heinen et al., 2010; Musakwa and Selala, 2016; Parking et al., 2008). These studies, however, analysed the socio-economic variables at area level and related them to the trip’s origin; therefore, they are more representative of the decision of cycling. Our study analyses the data at street segment level, and is more representative of route-choice decision making. Also, our data set is prone to socio-economic bias since it only represents the behaviour of Strava users. Second, household density had a negative influence on street level cycling activity, in contrast with the cited previous studies in developed countries. However, other studies in Latin America have also found negative or null association of residential density with pedestrian counts at the street level (e.g. Hermida et al., 2019; Rodríguez, Brisson and Estupiñán, 2009); hence, they might indicate geographic or cultural specificities in active mobility in Latin American cities that must be further studied. Finally, another important finding was related to the slope. Although it remained statistically significant and had a strong effect when controlled for other variables, its significance level was the lowest of all analysed variables and its confidence intervals were relatively wide, mainly for total cyclists and cyclists on weekdays, whereas for commuting activities, the significance remained high. Since in the first two cases, the dependent variables reflected behaviour of both sporting and commuting activities, the effect of the slope varies, whereas in the third case, this effect is more consistent.
The results derived from this study are the first of its kind in Latin America and will help to understand cycling behaviour in Andean cities. Our approach is replicable in any area where the data are available. Commercial and free platforms of geographic information such as Strava Metro and OSM have large coverage worldwide and represent a promising solution for studying cycling behaviour and its relationship with urban morphology. A limitation of our study is related to the potential bias of the Strava data set, because it is constrained to users with enough acquisitive status to own smart devices that can support the platform. This means that data might be biased by the socio-economic status of its users; therefore, it might not represent the whole population’s spectrum. Further analysis can help to improve the model to correct for socio-economic conditions (Roy et al., 2019). Given the fact that Cuenca is a city with relatively low socio-spatial segregation, we expect that the bias would be stronger in other cities.
Further research should explore the applicability of these findings to the behaviour of other cyclists by applying the methodology to other data sources (e.g. on-site counts). Moreover, it will be important to refine the regression model, as well as making comparisons between this city and others in the region to test the universality of the results. Specifically, the relevance of socio-economic variables and slope should be further explored in other cities of the region. Finally, since human movement is expected to exhibit spatial association, other approaches such as the use of spatial regression techniques will help us to understand the underlying spatial processes affecting cycling behaviour.
Supplemental Material
Supplemental Material1 - Supplemental material for Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data
Supplemental material, Supplemental Material1 for Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data by Daniel Orellana and Maria L Guerrero in Environment and Planning B: Urban Analytics and City Science
Supplemental Material
Supplemental Material2 - Supplemental material for Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data
Supplemental material, Supplemental Material2 for Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data by Daniel Orellana and Maria L Guerrero in Environment and Planning B: Urban Analytics and City Science
Supplemental Material
Supplemental Material3 - Supplemental material for Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data
Supplemental material, Supplemental Material3 for Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data by Daniel Orellana and Maria L Guerrero in Environment and Planning B: Urban Analytics and City Science
Supplemental Material
Supplemental Material4 - Supplemental material for Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data
Supplemental material, Supplemental Material4 for Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data by Daniel Orellana and Maria L Guerrero in Environment and Planning B: Urban Analytics and City Science
Supplemental Material
Supplemental Material5 - Supplemental material for Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data
Supplemental material, Supplemental Material5 for Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data by Daniel Orellana and Maria L Guerrero in Environment and Planning B: Urban Analytics and City Science
Supplemental Material
Supplemental Material6 - Supplemental material for Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data
Supplemental material, Supplemental Material6 for Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data by Daniel Orellana and Maria L Guerrero in Environment and Planning B: Urban Analytics and City Science
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Dirección de Investigación de la Universidad de Cuenca DIUC.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
