Abstract
Cognitive maps are assumed to be fundamentally spatial and grounded only in perceptual processes, as supported by the discovery of functionally dedicated cell types in the human brain, which tile the environment in a maplike fashion. Challenging this view, we demonstrate that spatial representations—such as large-scale geographical maps—can be as well retrieved with high confidence from natural language through cognitively plausible artificial-intelligence models on the basis of nonspatial associative-learning mechanisms. More critically, we show that linguistic information accounts for the specific distortions observed in tasks when college-age adults have to judge the geographical positions of cities, even when these positions are estimated on real maps. These findings indicate that language experience can encode and reproduce cognitive maps without the need for a dedicated spatial-representation system, thus suggesting that the formation of these maps is the result of a strict interplay between spatial- and nonspatial-learning principles.
Keywords
In recent years, the scientific study of human memory has been revolutionized by the discovery of a wide network of spatially modulated neurons, such as place and grid cells, each having distinct roles in the representation of space (Bicanski & Burgess, 2020; Hafting et al., 2005; Moser et al., 2008, 2014). Grid and place cells would, indeed, support real and imagined navigation (Bush et al., 2015; Horner et al., 2016) as well as the exploration of visual scenes (Nau et al., 2018). The presence of grid cells in the human hippocampal–entorhinal region (Doeller et al., 2010), in particular, provides a neural substrate for the existence of cognitive maps, a representational format that encodes environmental elements in terms of their position in space and enables the flexible use of spatial knowledge (O’Keefe & Nadel, 1978; Tolman, 1948). Evidence for these internal representations of the spatial structure of the world has been reported for navigable spaces that vary in scale, spanning from small- to large-scale environments (Peer et al., 2020). Interestingly, these domain-general core coding principles from navigation have been proposed to be involved in the organization of nonspatial information (Behrens et al., 2018; Bellmund et al., 2018). According to this view, not only spatial but also nonspatial conceptual knowledge would be organized through low-dimensional geometries that rely on the very same computations and brain areas involved in spatial representations, taking the form of cognitive maps (Bottini & Doeller, 2020).
Although the idea of a spatial universal metric in the formation of cognitive maps across multiple domains has been substantiated by various empirical findings (Aronov et al., 2017; Constantinescu et al., 2016; Viganò & Piazza, 2020), challenges to this proposal are not missing. First, there is evidence that navigators heavily rely on associative information rather than on spatial metrics when they navigate (Ekstrom et al., 2020). Second, the hypothesis of a systematic spatial organization of knowledge that spans all domains of behavior is apparently at odds with data demonstrating that language (i.e., a nonspatial domain) can encode the structural organization of mental representations (Rinaldi & Marelli, 2020; Utsumi, 2020).
Evidence from models that induce semantic representations from statistical regularities (namely, word co-occurrences) in natural language, indeed, suggests that some properties of geographical maps can be retrieved from linguistic experience (Rinaldi & Marelli, 2020). Typically, these distributional-semantic models represent words as points in a high-dimensional vector space: Because similar words will occur in similar contexts, they will end up being associated with vectors that are close to each other (Günther et al., 2019). Interestingly, exploiting these language-based models, some studies could successfully reproduce the spatial structure of maps, despite these models being based only on linguistic data (Avery et al., 2021; Louwerse & Benesh, 2012; Louwerse & Zwaan, 2009). In language, indeed, speakers encode the perceptual world they live in (Louwerse, 2011); thus, perceptual information can be bootstrapped from patterns of language use. It follows that the importance of linguistic and sensorimotor experience is far from being mutually exclusive—rather, overwhelming evidence for a strict interdependency between the linguistic and perceptual systems has accumulated (Louwerse & Connell, 2011; for a complete discussion, see Louwerse, 2018). Thus, these codes may reciprocally contribute to mental representations, with no clear priority assigned to either source: The extent to which linguistic and perceptual information would play a role can vary as a function of different factors, including the experimental task employed, the type of stimuli used, and the domain assessed (Louwerse et al., 2015). However, and critically, these previous studies were able to demonstrate only that relative spatial estimates (i.e., the structure of maps and not their directional properties) can be retrieved from natural language. That is, linguistic data were not informative about absolute locations of cities defined by coordinate axes, a main property characterizing Euclidean spaces. Thus, the linguistic information extracted from natural language could not be used to objectively distinguish between latitude and longitude planes or between locations in which north and south or east and west were simply reversed (Louwerse & Zwaan, 2009).
Statement of Relevance
How do people come to learn and represent geographical information? Intuitively speaking, geographical knowledge is primarily learned through the visual modality, by the inspection of cartographic maps. Visual experience is therefore conceived as the foundational cognitive building block for the formation of mental maps of the environment. In this study, we tested the counterintuitive hypothesis that geographical maps can be learned from natural language alone, without the direct involvement of any sensorimotor experience. We first demonstrated that psychologically plausible computational models can derive geographical information from written texts only, reproducing the spatial layout of real-world maps. Next, in two behavioral experiments, we further found that these language-based maps reliably resemble human representation of geographical information. This suggests that language encodes the spatial structure of the world, calling into question the claim that sensorimotor experience is the key ingredient in the formation of mental maps.
Here, we provide direct evidence that one of the most prototypical Euclidean environmental representations—namely, geographical maps—does not strictly require any spatial computation or any dedicated spatial memory system, suggesting in turn that the formation of cognitive maps can be traced back also to domain-general associative-learning principles deputed to the detection of environmental regularities. This suggests that, when available, language-based information can effectively complement and supplement spatial information derived from perceptual experience. We take advantage from distributional-semantic models and, specifically, from word embeddings, a machine-learning technique based on a neural network predicting word co-occurrences (Günther et al., 2019; Mikolov et al., 2013). These models are typically trained on large collections of text that document natural-language use and learn to represent the meaning of a target word on the basis of the lexical contexts in which it appears (i.e., the words with which it co-occurs in the text), incrementally updating a set of weights by minimizing the difference between model predictions and observed data at each learning event (i.e., every word occurrence). The estimated set of weights will eventually capture the semantics of a specific word in distributed terms. These distributed representations, or vectors, can be quantitatively compared by measuring their proximity in a multidimensional space. Similarity between words is thus measured as proximity in this semantic space (this geometrical interpretation, however, is purely a methodological convenience, as the computational architecture of word embeddings is not grounded on any spatial principle). We thus computed the language-based coordinates of cities by extracting the linguistic distance between each of them and the four cardinal points (north, south, east, and west). We then explored (a) whether language-based maps retain similarity to real spatial maps as well as (b) whether the distortions in language-based maps reflect human biases when processing spatial information.
Experiment 1
We first show that word embeddings can reproduce the spatial organization of geographical maps, which encode elements in a continuous Euclidean space defined by reference axes (i.e., latitude and longitude). In particular, in Experiment 1, we focused on 25 European capitals, extracting for each of them both the real geographical coordinates and the language-based coordinates (Figs. 1a and 1b). The latter were retrieved from word embeddings applied to eight different languages (English, German, Italian, Spanish, Portuguese, French, Dutch, Norwegian) through fastText, a recently developed distributional-semantic model for which word vectors in several languages were made available.

Geographical maps retrieved from natural language in Experiment 1. The top row shows (a) geographic and (b) linguistic maps of the European cities included in the experiment. The bottom row shows the relationship between (c) geographic and linguistic latitude and (d) geographic and linguistic longitude for the cities included, separately for each of the eight languages tested.
Method
The complete methodology is reported in the Supplemental Material available online.
Stimuli
Twenty-five European capitals were selected as stimuli, and for each capital, latitude and longitude were retrieved from GeoHack (https://www.mediawiki.org/wiki/GeoHack).
Distributional-semantic model
The distributional-semantic model used here was fastText (Bojanowski et al., 2017; Grave et al., 2018; Schütze, 1993). From the semantic spaces of the languages considered (i.e., English, German, Italian, Spanish, Portuguese, French, Dutch, Norwegian), we extracted vector representations for the 25 European capitals, together with the vector representations for words describing the four cardinal points, namely, North, South, East, and West.
Computation of language-based coordinates
For each city and separately for each language, we computed a linguistic latitude and a linguistic longitude (for lowercase cardinal points, see the Supplemental Material). Linguistic latitude of a city k was obtained with the following formula:
Thus, we subtracted from the cosine (cos) of the angle formed by the vectors representing each city (
Similarly, linguistic longitude was obtained with the following formula:
Also in this case, positive values will thus indicate an eastern position.
Data analysis
All the analyses were performed using RStudio (RStudio Team, 2015). Using the lme4 R package (Bates et al., 2015), we estimated two linear mixed models with linguistic latitude or linguistic longitude as the dependent variable, geographic latitude or geographic longitude as the continuous predictor, and languages and cities as random intercepts. Marginal pseudo R2s are reported.
Results
We explored whether these language-based coordinates resembled the geographical ones by estimating two linear mixed models with linguistic latitude or linguistic longitude as the dependent variable, geographic latitude or geographic longitude as the continuous predictor, and languages and cities as random intercepts. The effect of geographic latitude on linguistic latitude was significant, t(23) = 5.92, p < .001, b = 0.002, pseudo R2 = .24, indicating that higher geographic-latitude values (i.e., geographically northern locations) correspond to higher linguistic-latitude values (i.e., linguistically northern locations). The effect of geographic longitude on linguistic longitude was also significant, t(23) = 4.20, p < .001, b = 0.001, pseudo R2 = .10, indicating that higher geographic-longitude values (i.e., geographically eastern locations) correspond to higher linguistic-longitude values (i.e., linguistically eastern locations). For geographical and linguistic maps, see Figures 1a and 1b, respectively; for the slopes for each language across latitude and longitude, see Figures 1c and 1d, respectively.
These findings indicate that real geographic positions can be retrieved with high confidence solely from language experience, without the need for any spatial computation (as the architecture of word embeddings is based on nonspatial associative-learning principles applied to linguistic data only). Notwithstanding the overall convergence between linguistic estimates and geographical positions, language-based maps were distorted compared with the real maps because a good portion of variance remained unexplained (for latitude: pseudo R2 = .24; for longitude: pseudo R2 = .10).
Experiment 2
On the basis of these results, in Experiment 2, we explored whether the distortions emerging from language are in line with biases in the mental representation of maps. Forty-three participants were presented with the names of two Italian cities on a computer screen and asked to indicate the northern one as fast and accurately as possible (Fig. 2a). For each city pair, we computed two different predictors, namely, a geographic predictor and a linguistic predictor. The geographic predictor (Δgeographic latitude) was computed as the absolute value of the difference between latitudes of the two cities comprising the pair. Similarly, the linguistic predictor (Δlinguistic latitude) was computed as the absolute value of the difference between the linguistic latitudes of the two cities comprising the pair. Additionally, as a further control, we included other language-based predictors (for details and results, see the Supplemental Material).

Cognitive maps predicted by nonspatial information in Experiment 2. The timeline (a) shows an example trial, in which participants saw the names of two cities and had to indicate whether the one on the left (L) or right (R) was farther north. Akaike information criterions (AICs) for the models estimated on accuracy (b) are shown separately for Δgeographic latitude and Δlinguistic latitude. The effect of the best-fitting model (i.e., the geographic model) on accuracy is shown in (c). AICs of the models estimated on reaction time (RT; d) are shown separately for Δgeographic latitude and Δlinguistic latitude. The effect of the best-fitting model (i.e., the linguistic model) on log-transformed RT is shown in (e).
Method
The complete methodology is reported in the Supplemental Material.
Participants
Sample size was determined a priori on the basis of the suggestion by Brysbaert and Stevens (2018) that an experiment should have at least 1,600 observations per cell of the design (i.e., per condition tested) to achieve proper power—that is, at least 40 stimuli for 40 participants.
Forty-three Italian students (11 male; age: M = 23.81 years, SD = 3.12) participated in the experiment (they were recruited via institutional email advertisements). All participants were native Italian speakers, had normal or corrected-to-normal vision, and were naive to the purpose of the study. Informed consent was obtained from all participants before the experiment. The protocol was approved by the psychological ethical committee of the University of Pavia, and participants were treated in accordance with the Declaration of Helsinki.
Stimuli and procedure
We selected 11 Italian cities distributed in the country along the north–south axis (Milan, Venice, Bologna, Florence, Pisa, Perugia, Rome, Naples, Lecce, Cagliari, and Palermo), which were then paired one to another (110 pairs, so that in half of the pairs a given city will appear on the left and in the other half on the right side of the screen). Participants were told that they would be shown the names of two Italian cities and that they were to indicate the northern one. Participants were instructed to respond as fast and accurately as possible by pressing the A key to choose the city on the left or the L key to choose the city on the right (Fig. 2a). Participants were tested online using PsychoPy (Peirce, 2007, 2009; Peirce et al., 2019).
Distributional-semantic model
As in Experiment 1, the distributional-semantic model used was fastText. Word vectors corresponding to the included Italian cities and to the cardinal points were retrieved from the Italian pretrained semantic space.
Computation of geographical and linguistic predictors
For each city pair, we computed four different predictors, namely, a geographic predictor and three linguistic predictors. The geographic predictor (Δgeographic latitude) was computed as the absolute value of the difference between latitudes of the two cities comprising the pair (retrieved from https://github.com/MatteoHenryChinaski/Comuni-Italiani-2018-Sql-Json-excel). Similarly, a linguistic predictor (Δlinguistic latitude) was computed as the absolute value of the difference between the linguistic latitude of the two cities comprising the pair. Additionally, as a further control, we included other language-based predictors (for details and results, see the Supplemental Material).
Data analysis
Our dependent variables were log-transformed participants’ correct response times (RTs), which were analyzed using linear mixed models, and participants’ accuracy, which was analyzed using generalized linear mixed models fitted on a binomial family distribution (i.e., correct answers were computed as 1s and wrong answers as 0s). Marginal pseudo R2s are reported.
The same predictors were included in the respective analyses. Specifically, because of multicollinearity issues, we estimated a separate model for each of the two predictors (Δgeographic latitude and Δlinguistic latitude). These models contained participants and items as random intercepts. We then selected the best model on the basis of the lowest Akaike information criterion (AIC; Akaike, 1973).
Results
Trials in which overall RTs were faster than 300 ms or in which participants did not provide an answer were excluded from the analyses (2% of the trials). For accuracy data, we found that the effect of Δgeographic latitude was significant (z = 7.55, p < .001, b = 0.69, pseudo R2 = .29), indicating that the higher the Δgeographic latitude (the higher the difference between the two geographic latitudes; i.e., the geographically farther the two cities on a north–south axis), the higher participants’ accuracy (Fig. 2c). The effect of Δlinguistic latitude (z = 6.89, p < .001, b = 32.96, pseudo R2 = .23) was also significant, indicating that the higher the Δlinguistic distance, the higher participants’ accuracy. Crucially, the model including Δgeographic latitude outperformed the other by a ΔAIC of at least 11 (Fig. 2b), indicating that geographical information better explained participants’ performance compared with linguistic experience. These findings indicate that participants successfully relied on their spatial knowledge to correctly solve the task.
For RT data, the effect of Δlinguistic latitude, t(53) = −5.62, p < .001, b = −2.55, pseudo R2 = .07, was significant, indicating that the higher the Δlinguistic latitude (the higher the difference between the two linguistic latitudes; i.e., the linguistically farther the two cities on a north–south axis), the faster the participants’ responses (Fig. 2e). The effect of Δgeographic latitude, t(53) = −4.39, p < .001, b = −0.04, pseudo R2 = .05, was also significant, indicating that the higher the Δcoordinate, the faster the participants’ responses. Importantly, here we found the reverse pattern in terms of AIC values: The model including Δlinguistic latitude outperformed the other by a ΔAIC of at least 17 (Fig. 2d). These results suggest that linguistic experience better explains the chronometric performance in a computerized task, indicating in turn that the formation of cognitive maps does not rely solely on a spatial memory system.
Experiment 3
Experiment 2 indicated that spatial representations can be directly retrieved from nonspatial associative information. However, an overreliance on language experience may have been accentuated by the linguistic task employed. On these grounds, in Experiment 3 we used a spatial task, presenting participants (N = 41) with a map of Italy depicting a red dot: They had to indicate which city, among two possible alternatives, was located in correspondence with the dot (Fig. 3a). Data analysis was identical to that for Experiment 2; the only difference was that in this case, each model additively included geographic or linguistic predictors describing both Δcoordinates (i.e., latitude and longitude).

Effect of language experience on human performance in a spatial task in Experiment 3. The timeline (a) shows an example trial, in which participants had to indicate whether the city name on the left (L) or right (R) was indicated by the red dot on the map. Akaike information criterions (AICs) for the models estimated on accuracy (b) are shown separately for Δgeographic coordinates and Δlinguistic coordinates. The effects of the best-fitting model (i.e., the geographic model) on accuracy are shown in (c). AICs of the models estimated on reaction time (RT; d) are shown separately for Δgeographic coordinates and Δlinguistic coordinates. The effect of the best-fitting models (i.e., the linguistic model) on log-transformed RT are shown in (e).
Method
The complete methodology is reported in the Supplemental Material.
Participants
Forty-one Italian students (11 male; age: M = 21.8 years, SD = 1.69) participated in Experiment 3. Inclusion criteria were identical to those for Experiment 2.
Stimuli and procedure
We selected 9 Italian cities from the 11 included in Experiment 2, for a total of 72 possible pairs (in half of the pairs a given city will appear on the left and in the other half on the right side of the screen), which were shown twice. In each trial, the location of either city was presented as a red dot on a blank Italian map. Participants were told that they would be shown a map of Italy with a red dot and the names of two Italian cities and that they were to indicate which one was located approximately at the dot location. Participants were instructed to respond as fast and accurately as possible by pressing the A key to choose the city on the left or the L key to choose the city on the right (Fig. 3a). Participants were tested online using PsychoPy (Peirce, 2007, 2009; Peirce et al., 2019).
Distributional-semantic model
As in Experiments 1 and 2, the distributional-semantic model used was fastText.
Computation of geographical and linguistic predictors
As in Experiment 2, for each city pair, we computed geographic and linguistic predictors. Geographic predictors were computed as the absolute values of the differences between the latitudes (Δgeographic latitude) and the longitudes (Δgeographic longitude) of the two cities. Similarly, Δlinguistic latitude and Δlinguistic longitude were computed as the absolute values of the difference between the linguistic latitudes and the linguistic longitudes of the two cities, respectively. Additionally, as in Experiment 2, as a further control, we included other language-based predictors (for details and results, see the Supplemental Material).
Data analysis
Data analysis was identical to that for Experiment 2; the only difference was that in this case, each model additively included geographic or linguistic predictors for both Δcoordinates.
Results
Trials in which overall RTs were faster than 300 ms or in which participants did not provide an answer were excluded from the analysis (1% of the trials). Results fully replicated those of Experiment 2. For accuracy, the effect of Δgeographic latitude was significant (z = 3.79, p < .001, b = 0.17), indicating that the higher the Δgeographic latitude (the higher the difference between the two geographic latitudes; i.e., the geographically farther the two cities on a north–south axis), the higher participants’ accuracy, whereas the effect of Δgeographic longitude was not significant (z = 0.81, p = .42, b = 0.03, pseudo R2 = .03; Fig. 3c). Regarding the other model, the effect of Δlinguistic latitude was significant (z = 2.75, p = .01, b = 7.18), indicating that the higher the Δlinguistic latitude (the higher the difference between the two linguistic latitudes; i.e., the linguistically farther the two cities on a north–south axis), the higher participants’ accuracy, whereas the effect of Δlinguistic longitude was not significant (z = −0.11, p = .91, b = −0.68, pseudo R2 = .02). Crucially, the geographic model outperformed the other by a ΔAIC of at least 6 (Fig. 3b).
For RT data, the effect of Δlinguistic latitude was significant, t(68) = −4.02, p < .001, b = −0.90, indicating that the higher the Δlinguistic latitude (the higher the difference between the two linguistic latitudes; i.e., the linguistically farther the two cities on a north–south axis), the faster the participants’ responses, whereas the effect of Δlinguistic longitude was not significant, t(68) = 0.16, p = .87, b = 0.09, pseudo R2 = .01 (Fig. 3e). Regarding the other model, the effects of both Δgeographic latitude, t(68) = −4.93, p < .001, b = −0.02, and Δgeographic longitude, t(68) = −4.16, p < .001, b = −0.01, were significant (pseudo R2 = .02), indicating that the higher the Δgeographic coordinates (the higher the difference between the two coordinates; i.e., the geographically farther the two cities on both north–south and west–east axes), the faster the participants’ responses. Importantly, the linguistic model here outperformed the other by a ΔAIC of at least 2.2 (a ΔAIC > 2 can be considered statistically significant; Hilbe, 2011; Fig. 3d).
Although here the task tapped spatial processing to a greater extent because participants were presented with a real map, we found that RTs were again better predicted by the language-based estimates. This indicates that language captures the cognitive-based distortions that ground humans’ representations of geographical maps.
Discussion
In contrast to leading views maintaining that cognitive maps build almost uniquely on a common set of spatial mechanisms (Bellmund et al., 2018; Bottini & Doeller, 2020), we demonstrated that humans can as well construct these representational formats by drawing on nonexclusively spatial-learning capacities applied to linguistic data. In fact, we first showed that models built on nonspatial principles and merely based on linguistic data can successfully reproduce the spatial layout (i.e., directional properties) of real maps. Perhaps more importantly, we also showed that the chronometric performance of adult humans when exploring real maps is better accounted for by language-based coordinates rather than by the real geographical coordinates. This pattern was replicated across two behavioral experiments, one of which directly tapped contingent spatial processing (i.e., participants were shown a real map and had to indicate which, among two possible alternatives, was the city located in correspondence with a dot). This testifies the occurrence of systematic biases in the chronometric profiles of human judgments, with the systematicity of these biases being compatible with distortions from natural language. Hence, humans can in principle create spatial representations without strictly relying on real spatial computations. Interestingly, we further found across both behavioral experiments that accuracy was better explained by real geographical information, suggesting in turn that participants were also relying on spatial knowledge to solve the task. Such a dissociation between chronometric and accuracy data suggests that spatial and nonspatial knowledge likely interact in the formation of cognitive maps in humans. The primacy of linguistic over spatial information is likely dependent on the experiential traces available: In the case of geographical maps, linguistic traces would be abundant and easily accessible; yet for other contexts (e.g., environments that are not represented in language), spatial information derived from sensorimotor traces would be essential. More generally, these findings are in line with theoretical views maintaining that linguistic and perceptual experiences mutually participate in constructing mental representations (Louwerse, 2018).
Notably, previous studies already reported that when people construct mental models of local-scale locations (e.g., towns) by reading text (e.g., route descriptions), these mental maps are functionally equivalent to those constructed by traveling in a real environment (Ferguson & Hegarty, 1994; Taylor & Tversky, 1992). However, these previous studies mainly employed explicit spatial cues, such as the use of spatial prepositions (e.g., Taylor & Tversky, 1996), and more critically could not rule out whether participants were redeploying the spatial system to construct mental maps; that is, these studies could not exclude if the construction of mental models was based on the recruitment of spatial computations. On the contrary, here we demonstrated that representations of the environment can be derived without a dedicated spatial memory system.
The models we used are indeed mathematically related to cognitively plausible learning models. That is, word embeddings are consistent with relatively simple, psychologically grounded associative-learning mechanisms (Günther et al., 2019). Thus, these models based on language use well account for the formation of the conceptual system (i.e., as they are psychologically plausible learning models). We thus trace back the formation of cognitive maps not only to the mechanisms involved in the spatial memory system but also to domain-general error-driven learning, which is based on the detection of probabilistic relationships between regularities in the environment and the cues that allow those events to be predicted (Atick & Redlich, 1992; Attneave, 1954; Wei & Stocker, 2017). This means that human structural organization of cognitive maps can be conceived as the result of a strict, continuous interplay between spatial- and nonspatial-learning mechanisms.
An alternative possibility is that, although language is a learning environment available only for humans, the core mechanisms enabling the formation of spatial representations from linguistic experience may be the same at play for cognitive maps in nonverbal animals. Inconsistently with the view of an underlying purely spatial structure, neuronal populations in the hippocampal–entorhinal system indeed have been shown to encode both spatial and nonspatial dimensional representations (Aronov et al., 2017). Recent proposals thus account for the formation of cognitive maps as arising primarily not from inherently spatial principles but rather via reinforcement learning (Momennejad, 2020). The reinforcement-learning framework can in fact be successfully used to learn the structures of spatial locations but also of associated abstract representations such as nonspatial memory items (Collins, 2017). Learning predictive representations of structures hence may be the basic mechanism subserving cognitive maps. Analogously, the prediction-based computational model that we used is mathematically equivalent to the Rescorla-Wagner learning model (Rescorla & Wagner, 1972), which is largely influential in comparative psychology (Miller et al., 1995). Taken together, these considerations suggest that domain general, simple learning principles can explain the formation of maps in both humans and nonverbal animals regardless of the specific information processed (i.e., spatial or nonspatial).
The present research also has some limitations. First, although in Experiment 3 we used a spatial task, participants’ performance was assessed by means of a two-alternative forced choice task. Future work is needed to confirm whether the observed findings hold true in spatial-localization tasks (e.g., those requiring participants to locate the position of a target city on a map), which tap into spatial operations more overtly. Second, it remains to be seen whether other spatial representations—for which the impact of direct sensorimotor experience has been thoroughly documented—can be analogously derived from natural language. For instance, humans conceptualize number and time primarily in terms of space, and the direction of these mental representations is thought to be rooted in our sensorimotor interactions with the surrounding world (Rinaldi et al., 2018). Whether nuanced distributional patterns of words in natural language may eventually capture the canonical spatial orientation of number and time is thus an interesting question that remains to be addressed.
Our study bears relevant implications for the current debate on the neural substrate of cognitive maps, in that it suggests that the neural network of these maps may be more widely distributed than previously thought. These findings also suggest that the involvement of the hippocampal-entorhinal region in the formation of cognitive maps (Bellmund et al., 2018; Bottini & Doeller, 2020) may be dependent on basic learning principles subserving nonspatial associative memory rather than on spatial computations only. Future works should thus clarify the role of these brain areas, probing the involvement of nonspatial mechanisms also for small-scale cognitive maps, which can be experienced through direct firsthand sensorimotor processes.
Supplemental Material
sj-docx-1-pss-10.1177_09567976221094863 – Supplemental material for Spatial Representations Without Spatial Computations
Supplemental material, sj-docx-1-pss-10.1177_09567976221094863 for Spatial Representations Without Spatial Computations by Daniele Gatti, Marco Marelli, Tomaso Vecchi and Luca Rinaldi in Psychological Science
Footnotes
Transparency
Action Editor: Sachiko Kinoshita
Editor: Patricia J. Bauer
Author Contributions
D. Gatti, M. Marelli, T. Vecchi and L. Rinaldi conceptualized the study. D. Gatti, M. Marelli and L. Rinaldi designed the methodology. D. Gatti programmed the experiments, collected and analyzed the data, with M. Marelli and L. Rinaldi providing critical supervision. L. Rinaldi wrote the first draft of the manuscript, which was revised and edited by D. Gatti, M. Marelli and T. Vecchi. All the authors approved the final version of the manuscript for submission.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
