Abstract
The accelerated growth of cities and urban populations over recent decades and the complexity and diversity of urban areas demands proficient spatial affordance assessment especially for the vulnerable sections of the society. Lately machine learning and computer vision models have become highly competent in analyzing urban images for assessing the built environment. This study harnesses the potential of computer vision techniques to assess the age-friendliness of urban areas. The developed machine learning model utilizes Google’s Street View images and is trained using lived experience-based image ratings provided by elderly participants. Newly assigned urban images are accordingly rated for their level of age-friendliness by the model with an accuracy of 85%. This paper elaborates upon the associated literature review, explains the data collection approach and the developed machine learning model. The success of the implementation is also demonstrated, confirming the validity of the proposed methodology.
Introduction
Rapid urbanization and incremental increase in the global aging population pose new challenges for urban designers and planners. It is becoming increasingly necessary to promote demographic inclusivity and encourage active aging to ensure a healthier elderly population (Buffel and Phillipson, 2018). Consequently, an agenda known as aging-in-place concentrates on the necessities to support older residents to live in a community where they can maintain an independent, active, and socially connected life (Woolrych et al., 2019). However, rapid urbanization, often underpinned by ill-informed planning decisions, increasingly results in developing physical environments that are counterproductive for sustaining a healthy lifestyle for an aging population. Policymakers and urban planners worldwide, in order to help remedy the situation, are heavily invested in designing age-friendly towns and neighborhoods (Buffel and Phillipson, 2018). Such developments aim to enhance security, safety, active mobility, and connectedness via public transportation systems for the elderly (George, 2018; Gorman et al., 2019).
Considering the overwhelming size of urban regions, assessing age-friendliness attributes at a granular level of every street becomes a daunting task (Beard and Montawi, 2015). Since sending human auditors to collect the required information is time-consuming and costly, many researchers have adopted street view images as the information source for their research work (Liu et al., 2017; Ye et al., 2019). Technologies such as Google Street View (GSV) that offer a more suitable alternative to in-person field audits (for studying physical characteristics of urban environments) are increasingly being used for conducting research. Street view imagery reduces the required resources for street audits since the data is collected virtually (Rzotkiewicz et al., 2018). Vision-based approaches can generalize more sufficiently among automatic analysis methods and are adequate for building comprehensive models compared to other approaches (Li and Sheng, 2021). This research is dedicated to vision-based characteristics showing the elderly’s needs and does not consider other types of analysis.
Computer vision can be used to extract urban attributes from street view images, and Machine Learning (ML) methods can extract geospatial and urban information from images. These techniques enable intelligent models to evaluate each scene independently after the learning process is completed. This is an advantage over more traditional approaches, such as content-based image retrieval (CBIR) (Szeliski, 2010). Most of these novel works utilize deep neural networks for classifying urban images. Convolutional Neural Networks (CNNs) (Bhandare et al., 2016) have demonstrated more reliable performance to analyze visual imagery. These networks can achieve human-like accuracy when exposed to extensive datasets (Goodfellow et al., 2016). Some prominent examples of open image datasets such as ADE20 K (Zhou et al., 2017) include image segmentation data from indoor and outdoor urban areas. Some model structures are also available for further use—for example, GoogleNet (Szegedy et al., 2015), VGG16 (Simonyan and Zisserman, 2015), and ResNet (He et al., 2016). Most of these models are trained and tested on the ImageNet dataset and inherit their feature extraction capabilities from the same source. Other datasets, such as the Urban Environments dataset (Albert et al., 2017), are utilized to fine-tune pre-trained structures to build urban analysis models. Several open deep learning frameworks and libraries are also available along with these datasets and model structures. Extensive research has been conducted on urban images to evaluate different urban attributes. However, no comprehensive methods exist to assess features vital for deciphering a cities’ age-friendliness. This study proposes a method to investigate the age-friendliness of the urban environment using computer vision methods. The data collection procedure using GSV images and a database of evaluated results acquired through a participatory process involving aged adults for rating urban streetscapes are presented. A transfer learning pre-trained model is used to assess each image’s characteristics and measure the corresponding area’s age-friendliness.
The article is structured as follows: Sections “studies on age-friendliness” and “AI-driven urban analysis” outline recent studies on age-friendliness and the role of Artificial Intelligence in urban studies. Section “methodology” elaborates on the methodology elaborating the processes involved in the research. Section “evaluation of the Machine learning model” evaluates the results of the applied experiments, and Sections “critical reflection” and “conclusion” provides a critical reflection outlining the limitations and opportunities of the proposed method. Section 7 serves as the conclusion for the study.
Studies on age-friendliness
Beard and Montawi (2015) defined an age-friendly city as a supportive environment that enables citizens to grow older more actively within their society, neighborhoods, and civic communities. Age-friendly cities promote older adults’ being actively involved, socially integrated, and supported by infrastructure developed to serve their needs. The World Health Organization has made countless efforts to support cities in becoming more age-friendly through its Global Age-Friendly Cities Guide (World Health Organization, 2007). Outdoor spaces, buildings, transportation, and housing are some of the key attributes that make a city ideal for aging populations. As stated in WHO’s Checklist of Essential Features of Age-friendly Cities (World Health Organization, 2007), urban planners must strive towards designing clean and pleasant public areas, green spaces, safe and sufficient outdoor seats, well-maintained pavements that are wide enough for wheelchairs, adequate pedestrian crossings, separated cycle paths from pavements, good street lighting, and well-maintained and accessible public toilets outdoors and indoors (World Health Organization, 2007) in their planning initiatives. It is important to note that age-friendliness is multi-scalar. Apart from social factors such as loneliness, community engagement, racial segregation, poverty, crime victimization, etc., environmental determinants such as noise levels, pollution levels, density, civic amenities, etc., all contribute to developing age-friendly urban environments. Physical attributes of the built environment, such as the ones outlined by WHO, are also inherently multi-scalar. Aspects ranging from civic infrastructures such as public parks, uncluttered streets, cycle paths, wide pavements, and adequate slopes to enable mobility to public amenities such as adequate street lighting, public toilets, public seating at regular intervals, shaded streets, etc., all impact the experience of aging. These physical granularities can be analyzed individually using computational tools and techniques. However, capturing lived experience of the elderly demographic using images of the urban environment further allows for a comprehensive reflection of these diverse physical attributes. The lived experience of the elderly is essential since it embodies their everyday experiences, choices, navigation abilities, and conscious choices about utilizing city infrastructure that shapes their perception of the built environment. Lived experience is defined as “personal knowledge about the world gained through direct, first-hand involvement in everyday events rather than through representations constructed by other people” (Chandler and Munday, 2011). In the case of an elderly demographic, lived experience and the knowledge gained from their experiences pertaining to their interactions with the physical urban environment, the difficulties they encountered during navigating the urban environment, and deterrents versus enablers that aided them in living a healthy and independent life are of critical importance. This human sciences-oriented knowledge is banked upon to develop meaningful interpretations from the urban images that they are exposed to during this study.
Several research investigations have been conducted to assess the degree of age-friendliness associated with urban features. Most of these assessments are conducted using statistical analysis on user feedback derived from questionnaires. For instance, Knapp (2009) explored aging-in-place amongst females aged 65 or older to define a suitable aging environment for older adults and understood that being socially active before stepping into older age helps adapt and discover alternatives to nursing homes. Plouffe and Kalache (2010) compared age-friendly features in urban areas between developed and developing countries and uncovered why age-friendly characteristics were more diverse in developed countries. In another study, Kadoya (2013) reveals two policy implications for a more active elderly community: Living with someone promotes an elderly person to interact with society, and an elderly person’s ability to be mobile encourages their social participation. Green (Green, 2013) investigated how city governments could practice helping older adults gain independence and aid them in actively participating in the social and economic operations of the city. Kendig et al. (2014) also presented case studies to evaluate the impact of consultative, political, policy, and research processes to achieve age-friendly cities (AFC) in Australia. Beard and Montawi (2015) collected the most innovative approaches adopted by the members of the WHO’s Global Network of age-friendly cities and communities in various parts of the globe.
Although these studies are informative, they are confined to their respective study areas and cannot be extrapolated to other neighborhoods, cities, or nations. Upscaling such studies necessitates extensive surveys and audits to determine their applicability (Ibrahim et al., 2019; Law et al. 2018). On the other hand investigations by the World Health Organization have demonstrated that the demands of the elderly demographic remain similar across most urban areas worldwide (World Health Organization, 2007). The assessment of age-friendliness using computational advancements such as machine learning models can be used effectively to encourage scalability and allow for rapid investigations of urban areas across nations. The paper accordingly harnesses machine learning processes to develop a single model that can learn about the physical characteristics of urban environments desired by the elderly and accordingly perform assessments of an urban area. The developed model can be further generalized when it is trained by datasets gathered from various urban settings and a higher number of participants. The proposed model instead of only offering a binary result (that other studies outlined in this section do): either the evaluated areas are age-friendly, or they are not—is able to provide a range in between, thus allowing for more thorough assessments to be performed.
AI-driven urban analysis
Street view images can help researchers access urban data readily while using computer vision techniques enables us to automate the assessment of urban features from such images (Dubey et al., 2016). Researchers have also investigated the correlation between artificial neural networks' interpretations of such images and the human perception of these images in terms of uniqueness, safety, liveliness, and health (Acosta and Camargo, 2019; Law et al., 2018; Ordonez and Berg, 2014). Compared to traditional machine learning methods, deep learning structures require less feature engineering and often perform more accurately when larger datasets are available (Goodfellow et al., 2016; Liu et al., 2017). Some best-known urban planning research works that utilize deep learning visual models are introduced in this section.
Suel et al. (2019) applied a deep neural network trained on a collection of images of London to measure income distribution and evaluate education, unemployment rate, housing, health, and crime in urban regions. Dubey et al. (2016) trained a CNN to learn from a ranking loss and joint classification to anticipate human interpretations of pairwise picture comparisons. Many studies have also taken advantage of the CNN structure due to its ability to perceive image data. For example, Ibrahim et al. (2019) offered a realistic-dynamic urban modeling framework using deep CNNs to detect slums, informal areas, and pedestrians from street-level images worldwide. Ye et al. (2019) applied a pre-trained model named SegNet (Badrinarayanan et al., 2017) to extract pixels representing design elements responsible for a streets' visual quality. The Street-Frontage-Net (Law et al., 2018) demonstrated the effects of active frontage on walkability. Their findings suggest that active frontage might be more significant in places where pedestrians tend to walk more. De Nadai et al. (2016) explored the relationship between the residents' safety perception and activity levels by consolidating estimations of perceived safety using a CNN to extract mobile phone data for estimating the liveliness of an urban area. Similarly, Naik et al. (2014) developed StreetScore, an algorithm for predicting a cityscape’s perceived safety, assessing GSV images in 21 cities in the United States with support vector regression. Deep learning structures, especially convolutional neural networks, can outperform other state-of-the-art methods in computer vision problems where the model predicts citizens' perception of various concepts (Zhao et al., 2018). Pre-trained image analyzers can help urban studies perform more swiftly and accurately among these models. Object detection methods are typically used to collect information on the details present in street view images. They can return numerous object classes in the output or the same image with bounding boxes of the street elements such as a person, trees, cars, etc. (Cheng et al., 2014; Khan and Al-habsi, 2020). One of the most reliable object detection models is the VGG16 (Simonyan and Zisserman, 2015), a convolutional neural network (CNN) trained with the ImageNet dataset. ImageNet (Krizhevsky et al., 2012) includes over 15 million labeled images in approximately 22 thousand categories. This structure is employed as the base model to implement computer vision analysis on urban images.
Investigating the age-friendliness of the built environment requires the analysis of multiple urban attributes. A computational model should thus be capable of characterizing diverse aspects per image. A model with multiple training layers, such as the one used in this study, can not only be trained to identify physical features constituting an image rated by an elderly cohort (with respect to its age-friendliness), but is also able to subsequently rate any new urban image emulating the same human-like rating tendency is deemed suitable for such analysis. The advantages of using this model are varied: The model is highly rated for its performance in image analysis and can be fine-tuned to solve other problems, such as motion analysis and autonomous driving (Ren et al., 2017); This model has also been used in many urban studies, presenting promising results and showing its ability to extract valuable information from urban images (Srivastava et al., 2019). The literature review on AI identified VGG16 as one of the best-performing pre-trained models (according to the ImageNet challenge 1 ) and is therefore used in the study.
Existing literature pertaining to urban computational models typically lack comprehensive models explicitly built for systematically evaluating urban attributes pertinent to the elderly (Kwon and Cho 2020). The literature review also uncovered computational models that involve generic assessment of urban features but an absence of models that analyze urban environments based explicitly on the lived experience of the elderly. For instance, walkability is covered in various articles; however, evaluating a neighborhood’s walkability for younger individuals and children differs significantly from that for the elderly. Factors such as having a sufficient slope of the walkaways and sidewalk width are significantly more essential for the elderly than for other residents (Plouffe and Kalache 2010; McGarry and Morris, 2011).
Actual street view images, when subjugated to voting (to capture the age-friendly nature of the street) by the elderly, result in an unbiased rating since they are driven by emotional and experience-based accounts provided specifically by this demographic. Physical attributes that are either missing or are deemed problematic, including ambient associations that result from a combination of these physical attributes of the urban (such as fear or falling, safety, discomfort, loneliness etc.) are thus intuitively captured by the elderly as they rate street view images. Such digital processes also overcome the limitations of traditional physical survey driven approaches by eliminating the need to be physically present in the studied area. Although automated GSV image analysis cannot wholly replace in-situ data collection, they can still assist in identifying problem-prone locations and thus aid in precise geolocation driven physical inspections more consciously in areas needing maximal attention thus saving time and resources for field surveyors.
Methodology
Study workflow
The research workflow involves the following six processes:
Census data-driven site selection
This process involved analyzing census data provided by the Australian Bureau of Statistics to extract potential geolocations with the concentration of elderly demographic within Sydney, Australia. Findings of the analysis suggested that few Australians aged 75 and over prefer to live in Sydney’s Central Business District (CBD) and are more likely to live in the capital city’s inner suburbs (James et al., 2019). It was thus deemed necessary to explore the physical characteristics of the built environment of both central zones of an Australian city and its suburbs to decipher underlying reasons for this location choice. The city district of Sydney, preferred by relatively lesser elderly demographic as opposed to many north shore postcodes that are home to the most considerable percentage of elderly (Census QuickStats, 2016), was thus chosen as regions for study. Within these regions, two suburbs in Central Sydney: Haymarket (4.7% aged 60 years and above), Ultimo (6% aged 60 years and above), and one Northshore suburb: Crows Nest (12.6% aged 60 years and above), were chosen as case study locations to establish if the age-friendliness of these suburbs impact the difference in the percentage of elderly occupants.
Lived experience-driven urban image rating by the elderly
Dealing with a vulnerable demographic such as the elderly necessitates accommodating their lived experience within the research. The assessment data capturing their perspective of their immediate surroundings' age-friendliness reveals bottom-up opinion and serves as a credible data source for training machine learning models. A decision to develop an online application for surveys with the elderly cohort was actively taken. This decision was partly owing to the COVID-19 pandemic and the sensitivities associated with in-person interaction with this vulnerable community. However, successful research initiatives such as StreetScore (Naik et al., 2014) reinforce the premise of this research. StreetScore’s underlying principles of using machine learning algorithms to predict street safety rates using street images laid the foundations for developing online platforms for conducting the lived experience survey.
Primary datasets for developing an image repository used in the survey included Google Street View and OpenStreetMap (OSM). The images are extracted using Google’s API (Application Programming Interface) and visualized on a map. The extraction process involved combining Open Street Map API and Google API to deploy a virtual grid in the selected neighborhoods to plot geographical points at 50m from each other. The resultant geolocations are stored in a CSV file. The extract images from the Google API provide a user-specific key (created by Google Maps), the study area’s latitude and longitude, and the user angle with the vertical axis or field of view (fov). A size of 800 × 600 pixels for the images is specified. The image angles per geographical point were specified as 0, 90, 180, and 270° to cover urban features around each point. The viewing angle is specified at 90° parallel to the horizontal axis to maintain a horizontal viewing angle. The total image collection comprises of 1961 images incorporating streets, sidewalks, outdoor seating areas, pedestrian crossings, and public spaces encompassing a significant portion of the visible physical aspects relevant to an elderly’s well-being as discussed in the literature.
A web application was developed to expose several random images from this image repository to the participants (n = 460). The application enabled collecting assessments from participants in a simplified ranking system. The participants are prompted to rate each image from 1 to 5 based on how age-friendly they appear. This rating scale is chosen based on similar state-of-the-art research works (Liu et al., 2017). On this scale, “1” denotes the least age-friendly, while “5” denotes the highest. Each image is assigned with several ratings by different participants. Therefore, each pair of images and corresponding ratings from individual participants is considered a data sample, forming the dataset.
As elaborated in section 2, most modern works consider rating problems as a classification task. For instance, participants could label the images in our dataset “age-friendly” and “non–age-friendly” to generate a classification dataset. However, research has shown that collecting assessments in the form of ratings is more rational for the participants and provides more opportunities to acknowledge lived experience-based viewpoints (Kwon and Cho, 2020). Each image in the dataset is rated by at least four participants, and analyzing these ratings result in more accurate interpretations. Therefore, rather than categorizing a scene into two categories, participants can rate an image as a range between high and low levels of age-friendliness. Such ratings enable the final model to generalize participants’ views and develop an analysis method to interpret visual attributes regardless of individual opinions. The primary aim of this research is to analyze “visual features” experienced by the elderly. Studies have shown that assessing visual features distinguishes several urban attributes and enables investigators to analyze the physical characteristics of urban areas since these attributes are always analyzable by researchers (Law et al., 2018; Liu et al., 2017; Zhao et al., 2018).
All images are reformatted to a fixed size of 224 × 224 pixels RGB and are fed into a deep learning model. Considering this fixed image size as the input data, we aimed to generate natural numbers between 1 and 5 in the output. Such a problem falls into the category of regression predictions (Bishop and Nasrabadi, 2006). In this case, the independent variables are the input images, and the dependent variables are the ratings generated by the model. This generated value is the model’s rating on how age-friendly each input image appears based on its past ranking history.
Development of the machine learning model
A VGG16 model is fine-tuned to perform an image regression task on image ratings, instead of its primary goal of object detection. This base model is retrained using the collected images, and corresponding ratings for the images are considered as the output. The model learns the relationship between the input-output pairs during this training process. The model’s implementation in this study is borrowed from the object detection library presented by TensorFlow (Abadi et al., 2016) and Keras (Ketkar and Santana, 2017). The library provides a VGG16 model with a structure consisting of 22 layers combining CNNs (Simonyan and Zisserman, 2015). This structure implies retraining a pre-trained model with attached layers using a new dataset—termed “transfer learning” in the literature. This method takes advantage of the visual feature extraction layers of the base model to develop a new model for the current task. Our study replaced the models final layer with another dense layer consisting of one neuron to generate a value for the input image’s score.
Establishing the evaluation criteria for the machine learning model
Three metrics are used to evaluate the proposed method’s efficiency and robustness: accuracy, Root Mean Squared Error (RMSE), and coefficient of determination (r-squared). Accuracy is used to measure the predicted and actual values, while RMSE calculates their disparity. In contrast, the coefficient of determination defines the correspondence between these two values. Accuracy, RMSE, and the coefficient of determination return a percentage, a real number larger than 0, and a real number in the range of −1 to 1, respectively. More significant amounts of accuracy and r-squared demonstrate more precise predictions, while lower values of RMSE confirm more reliable performance in rating predictions. The formulas deployed for calculating RMSE and
The regression accuracy (equation (3)) shows how close the predictions are to the actual ratings. It is calculated by dividing the smaller value between the actual and the predicted rates by a more considerable value
Two-stage implementation and validation of the model
Two experiments for implementation and testing of the model were developed:
Experiment 01: 1961 urban images from the chosen neighborhoods rated by the elderly participants were organized into batches of 10 before being fed to the model in this experiment. The dataset was randomly divided into a training set and a test set, with a 90/10 ratio. The training samples were split into two sets: training and validation, containing 75% and 15% of the whole dataset. The training set was first fed to the model to complete the training phase, and the validation set helped the model prevent overfitting. This stage was followed by an evaluation phase in which the trained model was deployed to make predictions on the test set. These predictions were compared against the actual values (rated by the elderly), and the models' performance was evaluated using the criteria introduced in 4.1.4.
Experiment 02: Spatial mapping to geo-locate age-friendly zones within the chosen urban precincts was conducted after training and testing the computational model (Experiment 01). Experiment 02 involved geospatial mapping and comparative analysis of predicted versus rated age-friendly hotspots in the chosen neighborhoods. The underlying process involved linking the collected image data with the geospatial database (GSV images and OSM-based street network data). Besides geo-locating the images on a map (advancing from Experiment 01, where only image rating comparisons could be visualized), color-coding was assigned to each geo-point dispersed throughout the studied street segments. Class 1 (Red) is assigned to the lowest level of age-friendliness of the geo-located urban image, while Class 5 (Green) denotes the highest level of perceived age-friendliness. A set of maps comparing the ratings assigned to each image based on the elderly’s lived experience cohort and the predictions made by the model were accordingly produced.
Evaluation of the machine learning model
The study results were evaluated in two stages to address Experiment 01 and Experiment 02 outlined in 4.1.5.
Evaluation of Experiment 01: model’s rating results for the test set were examined against the actual ratings provided by the participants to evaluate the model’s capability by employing a random sample of the street view images containing 10% of the dataset. For a quantitative assessment of the algorithm, the lived experience ratings were compared with the outcomes obtained from the model for calculating the accuracy, RMSE, and coefficient of determination. The evaluated prediction accuracy was close to 80%, and the coefficient of determination was approximately +0.4, demonstrating precise predictions and a positive correlation between the actual and predicted values.
The RMSE results also reveal that the root error between the actual and predicted rates is a fraction more than one, a desirable rate at this experiment stage. Sample cases of these predictions are illustrated in Figure S1 in the Supplementary Material. These examples are selected from the average values attained from individual ratings of the participating cohort. Therefore, they represent the general opinion of the participants and not a singular person’s opinion. The rating classes ranged from 1 to 5, where the larger numbers demonstrated a higher satisfaction level expressed by participants. The predictions prove the model’s ability to identify age-friendly features in urban images. The developed model is also able to generalize the training observations and apply them to new datasets, while performing robustly.
Evaluation of Experiment 02: After testing the model for image identification and rating, Experiment 02 focused on integrating geospatial locations of the identified images to develop comparative maps (comparing the lived experience-based rating and the predicted model rating of street segments). 10% of the entire image dataset was thus assigned as the model’s testing set (these images were never used in the training step and were thus new to the model). Image processing and formatting criteria developed for Experiment 01 were adhered to.
The result of the experiment (Figure 1) showcased the high accuracy of the model in ranking assigned urban images (via Experiment 01) and mapping the resultant rating as color codes assigned to the geospatial points (via Experiment 02). In this figure, each map on the left shows the rating of the participants on images that are not observed by the model during the training phase. The maps on the right illustrate the ratings generated by the trained model for the same (previously not observed) images that the participants rated. This enabled an unbiased comparison between the participants ranking and the model-generated ranking. Accordingly, maps 1a, 1c, and 1e showcase the actual ratings for the studied neighborhoods, while 1b, 1d, and 1f demonstrate the model-generated ratings for the same areas. The assigned color-coding (Red for least age-friendly and Green for most age-friendly) allowed for the development of 2d geospatial visualizations specifically meant for visual comparison. A high rate of association between the lived experience ratings provided by the elderly participants and the model-generated values was found. The model-generated interpretations were deemed accurate and dependable, thus enthusing confidence around the model’s ability to mimic the opinions provided by an elderly person. The experiment proved the validity of deep learning structures for analyzing urban environments from the perspective of age-friendliness. Visual comparison between the lived experience rating and the rating assigned by the model for the Crows, Haymarket, and Ultimo neighborhoods. (a). Lived experience rating of Crows Nest, (b) Model's rating rating of Crows Nest, (c) Lived experience rating of Haymarket, (d) Model's rating rating of Haymarket, (e) Lived experience rating of Ultimo 1f. Model's rating rating of Ultimo.
After comparing actual participant ratings with the model-generated values and corresponding street view images, the Crow’s Nest, Ultimo, and Hay Market suburbs of Sydney were revealed as having inadequate sidewalks, unsafe back streets, insufficient time to cross intersections during green lights, and restricted access to green spaces, resting areas, and public bathrooms. The total rating distribution of these locations is also examined to assess the overall quality of these suburbs, revealing that low-rated images are more common than images with ratings of 3 or above. Such analysis driven findings and associated ease of interpretation can further enable urban planners and designers to identify problematic design attributes and aid in mitigating unfavorable scenarios that can hamper participation of the elderly demographic.
The generated results should be seen as an informed initial attempt to label and categorize built environments from the perspective of an elderly demographic. Often attempts to post-evaluate the performance and suitability of our built environment from the perspective of the most vulnerable population segments are left unattended. The developed cartographic representation can be further enhanced by embedding 360° Google Street View (as pop ups) that can be connected per point, and ultimately be presented in the form of a digital dashboard to urban planners and designers. This will aid in developing a spatial understanding of the ranked points and help identify either missing physical attributes, ambience, and environmental factors that are detrimental towards engaging the elderly demographic in a healthy and comforting manner. Advanced machine learning processes can also be deployed to auto-generate deficiencies in physical attributes in the form of a list per point. Such extracted feature lists in combination with the embedded street view dashboards can further aid urban planners, architects, and designers to be contextually grounded while developing sensitive mitigation decisions. Designing the built environment while considering the needs, abilities, and expectations of the most vulnerable sections of our society can certainly result in the development of inclusive, and responsive urban environments. This research and its intended results promote an avenue towards achieving this goal through the proposed methodology.
Overall, the model’s error iteratively decreased throughout the training phase, resulting in a root mean squared error of the model-generated values of 0.6. The model’s accuracy is also greater than 85%. The coefficient of determination is positive, indicating a positive correlation of +70% between the lived experience data and the model’s rating outputs. The model’s performance for measuring and interpreting age-friendliness using GSV images is thus deemed adequate based on the three performance criteria listed above.
Critical reflection
The research study presented in this paper attempts to add to the critical work conducted within computer vision and associated urban analytics research by adding an evaluative layer/field of age-friendliness. This field is otherwise saturated with semantics-focused research. Evaluating the urban environment from an age-friendliness perspective using machine learning processes and harnessing street view images extracted from Google has thus been the primary focus of the research. The research, though conclusive in proving the validity of the developed model, presents both limitations and opportunities.
The developed machine learning model’s primary task is to learn from the training dataset. It primarily involves lived experience-based visual rating rather than a granular rating of individual physical features within the built environment. Further development of the model could involve automated feature extraction and object identification within a given image to extrapolate the reasons behind the rating provided by the elderly. This feature could promote precise feedback outlining the presence or absence of physical features to develop contextually sensitive mitigation solutions. An iterative process of counter-checking the findings and mitigation solutions proposed by the model with the elderly participants can further help evaluate and strengthen the model’s potential.
Similarly, the generated geolocation plots currently serve the purpose of visual analysis in an analog manner. Though extremely useful to visualize non–age-friendly hot spots in a 2d map format, further development to host inbuilt digital analysis capacity to reveal underlying reasons for the ranking and associated mitigation suggestions would be a welcome addition. Furthermore, the online platform also has its positives and negatives. Considering the COVID pandemic and the vulnerability of the elderly demographic, online participation proved to be a highly valuable tool for garnering feedback. However, technical literacy, speed of internet services, and the availability of assistance to successfully conduct the survey are issues that need to be considered. For the case of this research, a participatory cohort of 460 elderly people was sufficient. However, for larger-scale research initiatives, attracting, advocating, and communicating the intent and successful filling in of surveys needs to adopt a hybrid approach wherein face-to-face and digital means could be deployed in tandem. Empathizing with the elderly cohort could increase involvement and interest in successive research initiatives. Besides this, higher participation rates translate into larger training datasets, thus proving beneficial for improving the performance of the developed model.
Additionally, one of the drawbacks of the existing method is the difficulty in determining the physical characteristics of a rated image that trigger a participant’s rating and therefore pinpointing the features that need improvement in the urban area. To address this drawback, the image dataset could include multiple labels corresponding with positive and negative age-friendliness characteristics. This will aid in constructing a multi-label classification model for identifying features within an image that underlie the ratings assigned by the elderly. Besides these limitations and opportunities, the researchers also acknowledge the complexity surrounding social and environmental determinants of health and well-being. For instance, factors such as loneliness, physical impairment, neurodegenerative disorders, accessibility, presence of green-blue infrastructure, economic limitations, etc., holistically contribute to the overall state of age-friendliness.
Conclusion
This paper elaborates upon a computer vision technique developed to predict and visualize urban age-friendliness. Many urban studies have proven the success of deep learning approaches in computer vision tasks to evaluate urban scenes for attributes such as safety, greenery, and walkability. However, no comprehensive deep neural network model has yet been developed to amalgamate these concepts to address the needs of the elderly. This study acknowledges the importance of visual appearance and lived experience of urban environments as critical factors shaping the engagement of the elderly within the urban environment. The research accordingly utilizes street view images and an online participatory platform to capture lived experience of the elderly with an easy to comprehend rating system and uses these rated images to train a modified VGG16 machine learning model.
The model, once trained, is utilized for testing, and predicting the age-friendliness of newly assigned urban images with a high degree of success. The developed model is fully scalable and can analyze age-friendliness of multiple urban areas. Such participatory and evaluative tools can enable urban designers and policymakers alike to analyze and compare urban areas for their age-friendliness without mobilizing physical survey teams and resources while developing context-specific policies benefiting the elderly population. Aside from a visual machine learning model to investigate physical attributes for assessing age-friendliness features, the proposed methodology has additional contributions to the literature; the accumulated data and the data collection technique for regression-based analysis of the urban environments can be repeated for similar investigations in this research area. Other datasets, including pictures and participant ratings, can be utilized in the future to perform similar analysis in other cities and countries. The presented methodology thus not only allows swift and precise assessment of age-friendliness of the built environment but also provides a comprehensive workflow to employ GSV images in several other locations and evaluate any physical features for vulnerable residents or the general population alike.
In conclusion, a novel computational tool and online participatory platform for gathering lived experience data and measuring perceived age-friendliness of the urban environment is presented to promote age-friendly inclusive urban environments.
Supplemental Material
Supplemental material - Analyzing the age-friendliness of the urban environment using computer vision methods
Supplemental material for Analyzing the age-friendliness of the urban environment using computer vision methods by Fereshteh Moradi, Nimish Biloria and Mukesh Prasad in Environment and Planning B: Urban Analytics and City Science
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Note
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
