Abstract
This work addresses recent research in the area of pedestrian navigation aids that aims at finding alternatives to the widely used map-based turn-by-turn navigation systems in the context of Smart City environments. Four different approaches of pedestrian navigation systems were compared to each other in a user experiment that was conducted in a virtual environment: (1) map-based, (2) landmark-based, (3) augmented reality, and (4) public display navigation. The results of the experiment with 45 participants conducted in a virtual environment suggest that the augmented reality navigation performs best concerning efficiency and effectiveness and the landmark-based navigation performs worst in the context of Smart Cities.
Introduction
By the year 2050, approximately 80% of the European population will be living in urban areas according to Caragliu et al. (2011). This requires an intelligent infrastructure to handle the demands of the growing cities. The vision for this type of infrastructure, especially from the point of view of the Information and Communication Technology (ICT), is followed by the concept of “Smart Cities”. According to Harrison et al. (2010), a Smart City has to be (1) “Instrumented”, referring to the real-time data tracking from various sources, (2) “Interconnected”, referring to the integration of the captured data, and (3) “Intelligent”, referring to the analysis of these interconnected data. Their definition of a Smart City is “connecting the physical infrastructure, the IT infrastructure, the social infrastructure, and the business infrastructure to leverage the collective intelligence of the city” (Harrison et al., 2010: 2). According to Batty et al. (2012), coupling, coordination and integration of technology, and environment are required for solving key problems of the growing cities.
Mobile devices and the surrounding environment can have clear synergies in their operation and can be coupled to improve the quality of pedestrian navigation, too. These synergies can provide better real-time location tracking and more efficient and accurate communication of relevant information which can significantly improve not only the performance but also the user experience (UX) of pedestrian navigation. Existing concepts for pedestrian navigation have to be reconsidered due to the changing situation and novel concepts can emerge.
Technological developments toward Smart Cities and the increasing availability of environmental data (e.g. building and street junction information (Fogliaroni et al., 2018)) require to reconsider the feasibility and quality in terms of performance and UX of existing pedestrian navigation approaches. In this work, we present the results of a user study with 45 participants that evaluates four navigation assistance systems in a virtual Smart City environment based on augmented reality, landmarks, public displays as well as on the typical digital map approach. The contribution of this work is manifold, highlighting novel insights of pedestrian navigation assistance and identifying effective and efficient navigation assistance systems for application in Smart Cities.
The rest of this paper is structured as follows: the first section provides an overview of related work in the field of pedestrian navigation and Smart Cities. This is followed by a presentation of the four navigation assistance approaches that were implemented for the user experiments. The next section introduces the relevant research questions and the subsequent two sections talk about the implementation of the navigation systems in the virtual environment as well as the experimental setup and procedure. Next, the results of the user experiment are presented followed by a conclusion and a short outlook on future research directions.
Pedestrian navigation
According to Raubal and Winter (2002), pedestrians prefer navigation instructions that are presented as salient objects such as landmarks. Due to different levels of orientation skills, no generalizations can be made about a preferred presentation mode which makes the case for user-adaptive navigation interfaces as described in Ohm et al. (2015).
Nowadays, pedestrian navigation devices are almost only mobile based. Navigation instructions are typically displayed as turn-by-turn instructions on a digital map presented on small mobile screens. Mobile devices are coming along with certain disadvantages for pedestrians such as visual attention switches between the display and the environment which can lead to high cognitive load and distraction (e.g. in busy traffic situations) as shown in Giannopoulos et al. (2015). In recent years, researchers have tried to overcome several disadvantages of pedestrian navigation systems by introducing novel approaches, ranging from auditory (Holland et al., 2002), vibro-tactile (Gkonos et al., 2017; Schirmer et al., 2015), and augmented reality approaches (Takeuchi and Perlin, 2012) to gaze-based approaches (Giannopoulos et al., 2015).
While pedestrians navigate, they are affected strongly by their surrounding environment. It is important to distinguish between familiar and unfamiliar environments (Gokla et al., 2019) as well as to consider the complexity of an environment according to Raubal and Winter (2002).
Smart Cities
A city network is a complex system of different components that fulfill functions (e.g. streets connect places) and interact with each other. The main aim of Smart Cities is to support the information exchange within a city by connecting the various elements with the aid of wireless networks. By using digital technologies or ICT, the quality and performance of urban services can be enhanced. A city can though only be truly smart if the applications using those networks are able to synthesize and aggregate data in a way to actually improve efficiency, equity, sustainability, and quality of life in cities (Batty et al., 2012).
The most promising concept to support wayfinding in an increasingly digital urban environment is the so-called Internet of Things in a public context. With objects in the public space being connected to WIFI and Radio-Frequency Identification (RFID) tags that can exchange information between a user with a mobile phone and any object in the network that they are attached to, wayfinding supposedly becomes more and more interactive. At the same time, questions of security, privacy, and reliability of such systems need to be addressed as those networks are especially vulnerable to external manipulation, e.g. man-in-the-middle attacks (Atzori et al., 2010).
Navigation and especially pedestrian navigation has so far been overlooked as a chance for innovative applications in Smart Cities. Mobile location analytics are used as proxies for human mobile activities to investigate mobility and daily activity patterns (e.g. M-Atlas) as seen in Batty et al. (2012). Those analytics are, for example, used to make policy decisions on traffic networks. Public displays will be present in more and more public spaces in the future and therefore it can be assumed that they will also be utilized for the provision of personalized navigation information presented to pedestrians (Rukzio et al., 2009).
Pedestrian navigation assistance
Pedestrian navigation aids can differ in the way they present navigation instructions which is crucial for decision making. Those instructions can trigger one or multiple human senses such as visual (e.g. maps), auditory (e.g. sound as in Holland et al. (2002)), or even tactile senses (e.g. vibrations as in Giannopoulos et al. (2015)). The navigation aids used in the user experiment are following four different approaches and are presented below.
Map-based navigation
Mobile map-based navigation systems based on turn-by-turn instructions such as Google Maps are widely used by millions of users. These systems come along with two major issues that are well researched: visual attention switches and a decrease of spatial knowledge acquisition as described in Parush et al. (2007). For this user experiment, a map-based system based on the Google Maps design was used as a baseline and applied using the Wizard-of-Oz methodology (Kelley, 1983). This methodology is often employed to avoid implementation costs, but also for the evaluation of systems that do not exist in the required form. During a Wizard-of-Oz experiment, the participants have the impression that they are interacting with an actual computer system. In reality, the experimenter acts as a “Wizard”, intercepting the established communication between the participant and the tested system, providing the feedback to the user.
Landmark-based navigation
According to Richter and Winter (2014), landmarks are considered to be a key construct for humans to make sense of the environment they live in. Studies show that landmarks are by far the most frequently used category of information when humans communicate about navigation (May et al., 2003). This makes them particularly interesting for implementation in pedestrian navigation systems.
Augmented reality navigation
Driven by the gaming industry, augmented reality has already been implemented in diverse navigation systems for driving and walking (e.g. compass navigation with 3D Compass Plus 1 ) and has also shown to be suitable for Smart City applications, e.g. accessing bus routes and tourist landmarks in a city in Serbia (Pokric et al., 2014). Recent systems simply add instructions over an image of the real world that is displayed on a mobile device. However, the future of augmented reality applications is heading toward data glasses (Fogliaroni et al., 2019) and contact lenses as in Takeuchi and Perlin (2012) that are projecting information to an imaginative screen in the user’s field of view. Even though the technology behind augmented reality has improved rapidly, recent studies have shown that they still suffer from usability, hardware, and orientation issues leading to higher uncertainty for navigation processes (Brunyé et al., 2016; Rehrl et al., 2014). This limitation is one of the reasons that this study has been conducted in a virtual environment. A virtual setup allows to simulate the technology as it is to be expected to work in the future. However, this also implies that the findings of this study in terms of augmented reality cannot simply be transferred to a real world scenario.
Public display-based navigation
The public display-based system is an approach that has been compared with other navigation systems in the context of pedestrian navigation for outdoor (Rukzio et al., 2009) as well as indoor environments (Taher and Cheverst, 2011). Developments in the field of Smart Cities and sensor networks imply that public displays will occur in more and more public spaces to display, for example, sensor or real-time traffic data, and can eventually also be used to show navigation instructions.
A futuristic scenario where a pedestrian uses a public display-based navigation system in a Smart City can be imagined as followed: the user starts a mobile app to indicate her desired destination, puts it back into her pocket and starts walking. At every intersection in a certain range of proximity, the user’s mobile phone gets recognized by a public display system for example via RFID, Bluetooth, or WIFI and connects to it. The mobile app which has routing capabilities then transfers the navigation instruction to reach the destination to the public display which in response displays this instruction to the user accordingly.
Research questions
The focus of this work lies on the evaluation of pedestrian navigation assistance for Smart Cities, trying to identify the challenges for design and implementation, and provide directions in order to optimize navigation regarding efficiency, effectiveness, UX, and spatial knowledge acquisition. The main research questions are the following: RQ1 How do these different navigation approaches influence the process of wayfinding for pedestrians in an unfamiliar urban environment considering efficiency and effectiveness? RQ2 How do these different navigation approaches vary regarding UX? RQ3 How do these different navigation approaches enable the process of spatial knowledge acquisition while navigating through an unfamiliar urban environment? RQ4 How can traditional navigation assistance be adapted to the opportunities and challenges of Smart Cities?
Implementation
Prototypes of the four introduced navigation approaches were implemented in order to perform an evaluation in a virtual Smart City environment.
Hardware
The used hardware consisted of a Logitech 3D Precision Pro Joystick to enable movement, an HP XB31 digital projector for the virtual environment, and a gaming computer for rendering, executing the experiment, and logging all user data. A 19″ monitor was further utilized during the map-based condition to display the navigation instructions.
Software
The main focus for the implementation of the virtual Smart City environment was a high degree of realism (see Figure 1). This was achieved by adding detailed street furniture as well as dynamic elements to make the environment more “alive”. The urban environment was designed using the ESRI CityEngine 2 with the aid of the Complete Street Rule which features realistic street furniture and random placement. The generated city was then imported into the game engine Unity3D 3 where a realistic skybox was added as well as a dynamic traffic system including pedestrians and cars moving along predefined trajectories.

The virtual environment with the implementation of the four conditions: map-based, landmark-based, AR, and public displays (from left to right and top to bottom).
Navigation systems
In the following, the four conditions for the user experiment are described and can also be viewed in Figure 1.
In the user experiment, a second screen was used for the map-based condition to display an overview map with the corresponding user location and the path that the user was supposed to move along as well as an arrow that previews the next turn. The overview map and the user location on the map were not automatically connected to the location in the virtual environment as the second screen only showed static images that were changed by the experimenter corresponding to the location of the user in the virtual environment.
The landmark-based condition was implemented by using local landmarks (e.g. shops, restaurants) as defined by Raubal and Winter (2002) that were used in combination with audio instructions. The audio instructions are linked to the user’s distance to an intersection and are played when the user reaches a certain distance that was previously evaluated through a pilot study and an additional online study (see section Pilot Study). The instructions were recorded in American English and include information about the name of the landmark, the type (e.g. restaurant, museum), and facade color.
The augmented reality condition was implemented in the virtual environment by displaying directional 3D arrows as navigation instructions. The static 3D arrows were positioned in the middle of intersections floating at a height of about 3 meters which is above the head of the user’s player.
The public displays were placed accordingly at decision points in the virtual environment and animated similarly to a LED panel (see Figure 2). The navigation instruction moves in the direction of movement and the animation of the panel starts when the user reaches a certain distance to the public display as in the landmark-based approach. The choice for this type of public display was based on the results of a pilot study (see section Pilot Study).

Design of the utilized public display. The direction of the animation indicates the correct direction (e.g. the dots change color from left to right in order to indicate a right turn).
Experiments
A user experiment in a virtual environment setup has been conducted with 45 participants to compare the four different navigation approaches with each other.
Setup
The participants were standing in front of a height-adjustable table where the 3D Joystick was placed in the middle that was then used to navigate through the virtual environment. The virtual environment was projected to the opposite wall so that participants faced the screen from a distance of 3 meters. During the map-based condition, the 19″ screen was positioned in front of them (∼15° angle).
Design
Prior to the actual user experiment with 45 participants, a pilot study with nine participants has been conducted in order to test the virtual environment design and the experiment setup. The actual user experiment was designed as a Between-Subjects study. Every participant was assigned to one of the implemented navigation systems and had to move along the same path in the virtual environment. Participants were distributed equally to the approaches regarding gender. The navigation path consisted of 13 decision points with varying levels of complexity (see Figure 3, yellow path). We tried to balance the complexity of the environment in our experiment by varying the options at decision points. Furthermore, we chose properties from the environment for our instructions that can be determined by the users.

The complete route with 13 decision points in yellow as well as the training route in light blue.
Procedure
First, the participants had to fill out the pre-experimental questionnaires including questions about demographics (i.e. age and gender), questions for the assessment of their spatial abilities (see Section 6.0.1) as well as indicate their prior experience with the utilized technology using a seven-point Likert scale (i.e. experience concerning virtual environments, digital maps, 3D Joysticks, and first-person-view).
Afterward, during a training session, the participants were asked to move along a test path that included three decision points (see Figure 3, light blue path) to make themselves familiar with the 3D Joystick as well as with the navigation system. To conduct the experiment, the participants were standing in front of a projected screen behind a table on which the 3D Joystick was placed.
After clarifying possible questions, the participants started with the actual task on the navigation path in a first-person-view. They were asked to navigate as fast as possible and to behave like a pedestrian as they would in a real environment (staying on sidewalks, using crossings, etc.). Before the navigation task started, participants received a starting signal that synchronized with the start of the recording of the data collection, e.g. the completion time. In case participants took a wrong turn, they were advised by the experimenter after 5 seconds to turn around and get back on the right track, simulating a correction.
After successfully completing the navigation task, the participants had to fill out the post-experimental questionnaires as a last step including a standardized questionnaire about the UX (User Experience Questionnaire (UEQ) by Laugwitz et al. (2008)), a Scene Recognition task, and a standardized questionnaire about cognitive workload (NASA-TLX by Hart and Staveland (1988)).
In the Scene Recognition task that was used as a proxy for local spatial knowledge acquisition participants were asked to decide on given images whether they recognize a scene from the virtual environment. A set of 11 pictures was presented to the participants and they had to indicate for each picture if they remembered having seen or not seen the shown scenery while navigating through the virtual environment. There were five true positives and six true negatives included in the set of pictures. The uneven number of pictures was selected due to the tendency that participants might guess that there is a balanced number of positives and negatives when they count an even number of total pictures.
Tracked data
While the participants were moving along the path, they were automatically tracked by the virtual environment that recorded their location (x, y), head movements (rotations in the horizontal dimension caused joystick movements), and locomotion interruptions. Additionally, relevant temporal data were also collected (time between decision points).
Participants
A total of 45 persons participated in the user experiment with ages ranging from 20 to 59 years and an average age of 26.7 years. Twenty-three males and 22 females were distributed equally to the conditions. The participants came from various professional (e.g. computer science, marketing, and finance) and cultural (e.g. Swiss, Japanese, German, Greek, Canadian, Mexican, and Polish) backgrounds.
Pilot study
A pilot study was conducted with nine participants between 19 and 27 years old. This study showed that the Scene Recognition task with initially 20 different pictures was too difficult and therefore resulted in random selection of seen and unseen images by the participants.
Additionally, the participants in the pilot study helped to decide on a preferred design for the public display approach due to the fact that there were two designs at the beginning: a panel design and a display design.
The display design follows the idea of a rotating compass as seen in Rukzio et al. (2009) where the desired direction is flashing (see Figure 4). The shape of the display reminds of a traffic signal to fit better into an urban context. The three different types of displays are corresponding to the shape of the intersection (1: T intersection, 2: cross intersection, and 3: star intersection or square). The design of the panel version is inspired by LED panels that are indicating directions with a flow of flashing rows of lights.

Alternative display designs with flashing lights.
The decision comes to the LED-panel design because it has been liked better by the participants of the pilot study and also shows higher UX values. Participants also criticized that there was a different display design for each intersection type which caused confusion and ambiguities.
Results
Since the dependent variables were not normally distributed, non-parametric statistical tests were used for the analysis. A Kruskal–Wallis test was applied to analyze all approaches combined and the Mann–Whitney–Wilcoxon test was later applied for the pairwise comparisons. Furthermore, due to these multiple comparisons, in order to avoid a type I error, the Bonferroni correction as described in Armstrong (2014) was applied, resulting to an
Summary of the most relevant significant differences between the evaluated navigation approaches.
Note: The table has to be read from left to right. For instance, augmented reality is better than the landmarks approach concerning the elements in the cell. The details can be found in the Results section.
Spatial abilities
The participants were asked to fill in the “Santa Barbara Sense of Direction Scale” questionnaire developed by Hegarty et al. (2002), which provides a self-estimation of spatial abilities. The participants had to rate their abilities using a seven-point Likert scale (values close to 7 indicate high spatial abilities). A Kruskal–Wallis test did not reveal any significant differences between the participants of the four evaluated conditions (p = .16,
Experience with virtual environments
The participants were also asked to state their experience with virtual environments and with the typical first-person-view (values close to 7 indicate high expertise). A Kruskal–Wallis test did not reveal any significant differences between the four evaluated conditions (p = .07,
Navigation performance
For user performance, several measurements were collected including completion time, number of errors, and number of locomotion interruptions and head movements (joystick rotations in the horizontal dimension).
The measured completion time from start to end of the navigation task revealed significant differences between the four conditions. Due to a certain number of wrong turns (errors) that occurred during the navigation task, the captured total time had to be normalized by first excluding the additional time for wrong turns (without errors) and afterward adding a penalty for each error (penalty). The penalty time of 10 seconds per error was calculated from the average time that a participant needed to get back on the right track after taking a wrong turn. This also corresponds with the experiment setup where participants taking a wrong turn were informed after 5 seconds about their wrong decision. The post-processing resulted in three different representations of completion time for the four approaches that all showed significant differences (measured: p < .001,

Comparison of different representations of completion time.
A pairwise comparison using a Mann–Whitney–Wilcoxon test revealed significant differences concerning the total completion time between the landmark-based and augmented reality condition (p = .0003, W = 10) as well as between the augmented reality and public display conditions (p = .008, W = 24). Similarly, a pairwise comparison revealed also significant differences concerning the corrected time (time without errors) between the landmark-based and augmented reality conditions (p = .001, W = 15). A pairwise comparison revealed also significant differences concerning the penalty time (time penalty of 10 seconds/error) between the landmark- and map-based conditions (p = .008, W = 21), the landmark-based and augmented reality conditions (p = .0005, W = 11) as well as between the public display and augmented reality condition (p = .008, W = 24).
The number of wrong turns (errors) that can be seen as an indicator for effectiveness has also been shown to be significantly different between the approaches (p < .010,
Another indicator for effectiveness that might also be linked to UX is the degree of confusion that represents to which extent participants have shown signs of temporary disorientation. To access the degree of confusion, locomotion interruptions as well as head movements (rotations in the horizontal dimension) were captured and evaluated. Head movements that occurred while participants where moving were extracted from the result as they would signify a turning event. Additionally, measurements for both indicators that did not last longer than 278 milliseconds were not considered as signs for disorientation and were also excluded from the evaluation. The threshold of 278 milliseconds was set due to the distribution of the measurements and because it corresponds to a measured average human reaction time
4
for human–computer interaction systems. A Kruskal–Wallis test did not reveal any significant differences for the number of interruptions (p = .012,

Degree of confusion comparison.
Spatial knowledge acquisition
The Scene Recognition task was used as a proxy for spatial knowledge acquisition. The evaluation of the assignments (seen/not seen) to the corresponding pictures was done with an accuracy measure as well as with an F1-score. The accuracy measure shows how often the participants have chosen the right pictures (true positives) whereas the F1-score as defined in Zhang and Zhang (2009) is a combination of precision and recall and includes quality as well as quantity. To calculate both measures, true positive, true negative, false negative, and false positive assignments have been evaluated from the user experiment. Both measurements did not show statistically significant differences between the approaches (accuracy: p = .46,

Comparison of the accuracy and F1-score for the Scene Recognition task.
User experience
The evaluation of the UEQ shows that in tendency the map-based approach performed worse than the three other approaches whereas the augmented reality and public displays approaches have been liked the most. However, only the Novelty aspect revealed statistically significant differences between the approaches (p = .006,

User experience comparison between the four approaches and against the benchmark.
Cognitive workload
A Kruskal–Wallis test did not reveal any significant differences concerning the total cognitive workload (p = .8,
Discussion
The results of the user experiment demonstrate that the participants did behave very differently for the four evaluated approaches. The differences can mainly be explained by the fact that different navigation approaches require different levels of attention and cognitive load, and from this we can conclude and confirm the following known general requirements for a navigation system that aims to make wayfinding in unfamiliar environments more efficient and effective:
Navigation instructions must be simple and generalized for all types of intersections according to Richter and Duckham (2008). Navigation instructions should be visible in advance according to Winter (2003). Visual attention switches between instructions and environment should be avoided according to Giannopoulos et al. (2015).
The following section presents more detailed insights in how to answer the research questions:
All four navigation approaches showed differences for completion time, number of errors, and the degree of confusion. The differences in the completion time can be explained mainly by the number of interruptions that were necessary to interpret the given instruction. Concerning the number of errors and the degree of confusion, especially during the landmark-based condition, it can be probably explained by the choice of landmarks that were used for the instructions. Elements in the environment that might be considered as landmark for one do not necessarily have to be perceived and recognized as such for others. This suggests that a landmark-based approach can be successful only if appropriate landmarks are used in the instructions, landmarks that are easily distinguishable and visible in advance. This might be a bottleneck for such type of navigation.
According to the results of the UEQ, the UX did not differ significantly between the four approaches, except for the Novelty aspect. As expected, the map-based approach was not perceived as novel as the other navigation approaches.
The Scene Recognition task did not reveal any significant differences, suggesting that the participants of the four evaluated approaches were equally good in acquiring local spatial knowledge during navigation. Another explanation for this result could be that recognizing images out of a generically built environment has been shown to be fairly difficult for most participants. It is also to be noted that the scene recognition is not sufficient to evaluate spatial knowledge as the process of acquiring this knowledge is far more complicated (e.g. the participants would have had to also draw the route that they took to have a better understanding of this process).
Map-based navigation has been widely used and can be considered as a traditional approach with which a broad range of users and also the participants in the presented experiments are familiar with. From the statistical analysis, it could be shown that the public displays approach has performed at least as well as the map-based approach by making use of information available in Smart Cities such as the users’ location relative to infrastructure elements.
The user experiment suggests that there can be a large potential in the implementation of such systems in a Smart City framework and that it would add great value for the user, especially when criteria such as hands-free interaction are required. Traditional navigation systems need to be improved by incorporating available information of a Smart City not only for the design and implementation of such systems, but also by making navigation aids adaptable and responsive to real-time data.
Limitations
The experimental setup of this study lacks a certain degree of realism or natural experience for the participants not only due to limitations of the graphical representation but also because the visual field of the user was limited from 360 degrees (immersive virtual environment) to less than 90 degrees by using a screen projection and because the navigated with a joystick. As an improvement of the experimental setting, the virtual environment could be projected in a VR cave on multiple displays or be shown to the participants through a VR headset to increase realism. Additionally, the Scene Recognition task can only be seen as a proxy for spatial knowledge acquisition to a certain degree as it does not take into account any further aspects of spatial knowledge, such as e.g. route knowledge. Two important aspects concerning assistance systems, i.e. privacy issues and degradation in spatial knowledge acquisition (Parush et al., 2007), were not in the focus of this work. Although we tried to optimize every condition concerning the type of instructions, it can be very difficult to choose appropriate landmarks that can serve as anchors in the environment and be recognized correctly and quickly by every participant. Our results suggest that a measure for landmarks allowing to evaluate them according to different criteria (e.g. advanced visibility (Winter, 2003)) would be very beneficial for such systems, allowing to select appropriate landmarks that can be incorporated into the instructions.
Conclusion and outlook
A first evaluation of pedestrian navigation systems in a virtual Smart City environment has been accomplished with a user experiment comparing four different navigation approaches. The augmented reality navigation system was the overall winner compared to the other four approaches. The landmark-based navigation system has used the most available information from the Smart City environment which led to a higher degree of confusion that might be explained with an overflow of information. The landmark-based approach performed worst in this evaluation. It could be shown that the public display navigation system performed at least as good as the traditional baseline approach with a digital map.
Future work includes further investigations of designs for public displays to be used in urban environments. Especially integrating real-time data such as traffic monitoring should be taken into consideration for future studies.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
