Abstract
Bibliometrics have become commonplace and widely used by authors and journals to monitor, to evaluate and to identify their readership in an ever-increasingly publishing scientific world. This contribution introduces a multi-method corpus analysis tool, specifically conceived for scientific corpuses with spatialised content. We propose a dedicated interactive application that integrates three strategies for building semantic networks, using keywords (self-declared themes), citations (areas of research using the papers) and full-texts (themes derived from the words used in writing). The networks can be studied with respect to their temporal evolution as well as to their spatial expressions, by considering the countries studied in the papers under inquiry. The tool is applied as a proof-of-concept on the papers published in the online open access geography journal Cybergeo since its creation in 1996. Finally, we compare the three methods and conclude that their complementarity can help go beyond simple statistics to better understand the epistemological evolution of a scientific community and the readership target of the journal. Our tool can be applied by any journal on its own corpus, fostering thus open science and reflexivity.
Introduction
Faced with the increasing number of articles, journals and channels of publication used by researchers in a digital world with increasing open access, journals need tools to identify their readership and authors need this information to better reach their target audience, using the relevant keywords, vocabulary and citations. This paper suggests a set of complementary digital tools to tackle these needs by journal themselves, empowering them through a reflexive analysis of their content and fostering open science through more transparency. To perform their functions and provide useful insights, the tools need to meet three requirements: (1) to go beyond the usual citation metrics and to give semantic and network analytics directly from the scientific contents of the papers; (2) to situate sets of papers according to the semantic fields of their topic and their geography; and (3) to identify significant variations in research topics that may be linked to the geographical origin of their authors or to the country they analyse. This last point is especially interesting for scientific journals of geography.
Since the seminal work of Thomas Kuhn in the early 1960s, the development of science studies has been based on three disciplinary pillars: history of science, philosophy of science and sociology of science. In the 1980s, political sciences contributed by focusing on the links between knowledge production and knowledge utilisation. This ‘political turn’ began with the creation of the journal Knowledge in 1979. Since the late 1990s, science studies have been affected by a ‘spatial turn’ and eventually, a geography of science emerged (Livingstone, 1995, 2003; Withers, 2009). More recently, the conjunction of complexity-based approaches, networks science and big data has introduced a ‘quant turn’, with systematic analyses of citation networks and automated mining of large textual corpora (see e.g. recent synthesis such as Hicks et al. (2015) on research metrics or Börner et al. (2015) on maps of science). The emergence of a new highly interdisciplinary field is coined as a ‘science of science’ by Fortunato et al. (2018). The study of science by itself is indeed a crucial aspect for the production of scientific knowledge, also known in social sciences as reflexivity. This paper contributes to this effort by showing a proof-of-concept of reflexive corpus analysis methods and tools for electronic journals which are prone to such developments (Pumain, 1996). We particularly insist on the two aspects of the heterogeneity of measures and of the spatial dimension, which we now contextualise.
First, measures and approaches in bibliometrics are highly multi-dimensional. Cronin and Sugimoto (2014) attempt to provide an overview of the complex nature of the measure of scientific publications and the intrinsic multidimensional nature of knowledge production. They provide both recent technical contributions and a critical approach, insisting on the ‘Janus-faced nature of metrics’. This confirms that reducing knowledge production to a few dimensions is not only wrong but dangerous for science. Studies in bibliometrics which have the complementarity of different approaches as their main focus are rather rare. Wen et al. (2017) construct maps of hydrological science by combining different types of networks such as citation networks and keyword networks and show the complementarity of these entries. Part of the difficulty arises from the disciplinary context. Omodei et al. (2017) show that taking into account citation and discipline data into a multilayer network is useful to understand patterns of interdisciplinarity.
Second, the geographical dimension of science has also been studied by several studies. Frenken et al. (2009) propose a specific research program for the emerging field of spatial scientometrics, which includes not only specific questions such as the spatial distribution of citations and activities, but also specific methodological issues linked to noise in spatial data or more classical geographical issues such as the modifiable aerial unit problem. The work presented in the following will thus focus on corpuses with a spatial dimension.
The 20-year anniversary of the first digital-only journal in geography (Kosmopoulos, 2002; Pumain, 2001), namely Cybergeo (http://journals.openedition.org/cybergeo/), was the occasion to analyse a consistent corpus of over 700 articles published in seven languages, with respect to its semantics as well as to the geography of its authorship and readership. We performed a quantitative epistemology analysis of the scientific papers published since 1996 to measure their similarities according to three types of textual indicators: their keywords (the way authors advertise their research), their citation network (the way the paper is used by other fields and disciplines) and their full-text (the vocabulary used to write the paper and present the research).
These analyses are complementary and show the evolution of a journal toward emergent themes of research. It also highlights the need for Cybergeo to keep extending its authorship base beyond the French-speaking community, in order to match its ambition to be a European Journal of Geography. Our contribution mainly consists in a methodological and technical product developed to interactively handle a large-scale heterogeneous scientific corpus, with a particular attention to spatialised corpuses. We show how the coupling of complementary views can create a second-order type of knowledge on the scientific context of the corpus studied: the spatial embedding of the three classification methods unveils unexpected patterns. Furthermore, the dedicated online tool that we designed is available as an open source software which can be used not only by journals for a collective scientific reflexivity, but also by institutions and individual scientists for a bottom-up empowerment of Open Science.
The remaining of the paper is organised as follows. We first describe the different methods used to analyse semantic networks and how these are coupled through interactive spatial data exploration. We then describe results at the first-order (each method) and at the second-order (achieved through coupling) before discussing broader implications for quantitative epistemology and reflexivity in Open Science.
A multi-method spatialised corpus analysis tool
One main aspect of our contribution is the combination of different methodologies, each having not only its potentialities and pitfalls, but also specific questions and objects of study. We present in this section the different methods and how they are coupled together to produce second-order knowledge.
Internal semantic network
The first exploration method is based on the set of keywords declared by the authors themselves when publishing in Cybergeo. We consider articles and keywords as a bipartite network. This network can be decomposed into two simple networks: a network of articles (vertices) linked by common keywords (edges) and a network of keywords (vertices) linked by common articles (edges) (Roth and Cointet, 2010). We consider the second one as a semantic network. We construct semantic communities (see Supplemental material for methodological details) with the Louvain algorithm (Blondel et al., 2008). This community detection method is chosen among others because it is based on modularity measures such as the modal weight defined above. The Louvain method performs a modularity maximisation, such as other algorithms. In this case, the semantic network is small and simple and any modularity-based algorithm would give a similar output.
External semantic network
The second methodological development focuses on the combination of citation network exploration and semantic network analysis. The full method we apply here is described in detail by Raimbault (2019). Citation networks have been widely used in science studies, for example, as a predictive tool for the success of a paper (Newman, 2014), or to unveil emerging research fronts (Shibata et al., 2008). Indeed, the bibliography of a paper contains a certain scientific positioning as well as a line of inheritance to which it aims to contribute and which fields it is based on. Reverse citations (i.e. contributions citing a given paper, up to a given level) on the other hand show how the knowledge presented in a paper was understood, interpreted and used, and in particular by which field (on this point, the interesting example of Jacobs (1961), heavily cited today by quantitative studies of the city by physicists, shows how unexpected the audience can be over time).
We define the citation neighbourhood of our corpus as all the articles citing articles published in Cybergeo, all the articles citing the ones cited by Cybergeo and all the articles citing these ones. We therefore have a network of depth 2, with a control group to compare to Cybergeo articles. The citation data are collected using automatic data collection (Raimbault, 2019). Once citation neighbourhoods have been constructed, keywords are automatically extracted from the abstracts of corresponding publications using natural language processing techniques, and a semantic network is constructed (see Supplemental material for details). This network and its communities enable to associate a list of keywords and corresponding disciplines to each paper. These are complementary to the declared keywords and the full-text themes presented in the next subsection, as they reveal how authors position their article in the semantic landscape associated to the citation neighbourhood, or what their ‘cultural background’ is.
Topics allocation using full-text documents
The third and last exploration method details the allocation of topics in full-text documents, and is thus complementary to the previous ones that used declared keywords and relevant keywords within abstracts of the citation neighbourhood. Topics allocation is used widely for many purposes. Citron and Way (2018) have identified semantically related scientific communities. Karami et al. (2018) have characterised diabetes, diet, exercise and obesity in comments on Twitter. Niebles et al. (2008) have found human action categories in video sequences translated in spatiotemporal words. All these authors have mobilised a topic classification model such as the Latent Dirichlet Allocation model (LDA) and its derivatives. We apply here the LDA method to extract topics from full-text documents (see Supplemental material for a thorough description of the method).
Geographical aggregation of semantic profiles
Given a semantic characterisation of articles (using keywords, citations or full-texts), it is then possible to determine two semantic profiles of countries: one using countries as authoring ‘origins’ and one using countries as subject ‘destination’. This semantic profile of a country X is made of the mean share of themes Y present in articles authoring from or studying country X. At one extreme, if only one article A1 came from a country X1, the semantic profile of X1 would be exactly that of A1. At the other extreme, if all articles came from X2, the semantic profile of X2 would be the overall distribution of themes across the corpus.
All in all, given the three semantic characterisations of articles (using keywords, citations and full-texts) and the two geographical allocations of articles (authoring or studied), each country has a maximum of six distinct semantic profiles. We use these semantic profiles to cluster countries. The clustering method applied is an ascending hierarchical clustering algorithm using the Ward criterion of distance maximisation. When analysing authoring clusters, we consider groups of countries from which a certain geography is made and written. This option is interesting in a reflexive aspect but practically more hazardous because of the high concentration of emissions (and the consequently low number of emitting countries) and because of the uncertainty of national provenance as captured by the institutional affiliation of authors at the time of publication. Therefore, in the application section, we base our clustering on studied countries only. When analysing clusters of studied countries, we consider how certain groups of territories are studied, what words authors use to talk about them and in which research areas the papers about them are used.
Open data + interactivity = reproducibility and transparency
Last but not least, our methodological contribution is also closely linked to issues of reflexivity, transparency and reproducibility in the process of knowledge production. It is now a well-sustained idea that all these aspects are closely linked and that their strong coupling participate in a virtuous circle enhancing and accelerating knowledge production, as seen in the various approaches of Open Science (Fecher and Friesike, 2014). For example, open peer review is progressively emerging as an alternative way to the rigid and slow classical canons of scientific communication (Ross-Hellauer, 2017). In the domain of computational science, tools are numerous to ensure reproducibility and transparency but require a strict discipline of use and are not easily accessible (Wilson et al., 2017). Open Science suggests transparency of not only the knowledge production process itself, but also the knowledge communication patterns: on this point, we claim that the interactive exploration of quantitative epistemological patterns is necessary. We therefore built an interactive application to enable the exploration of heterogeneous scientific corpora.
The web application is available online at https://analytics.huma-num.fr/geographie-cites/cybergeonetworks/. Source code and data, both for analyses and the web application, are available on the open git repository of the project at https://github.com/Geographie-cites/cybergeo20.
Application to the Cybergeo corpus
Data
The data used to test our method and tools consist in the corpus of 20 years of article publication from a free online geography journal. Cybergeo was founded in 1996 as a digital-only European Journal of Geography. Between April 1996 and May 2016, 737 scientific articles have been published by 1351 authors from 51 countries. These articles have generated 2710 citations altogether over the last 20 years, which correspond to half the number of all the other articles cited by Cybergeo articles (5545).
In order to produce analyses at the country level, articles have been geo-tagged in two ways. First, the country of affiliation as it was declared by the author(s) at the moment of publishing has been coded following the two-letter identifiers of the International Organisation for Standardisation. This information is available as authors have to choose a single or major affiliation and fill a form on the publishing platform. Second, the articles were read one by one to extract the major geographical subjects. Articles were tagged with a country if this country or a sub-region of it constituted the focus of the study. In the case of European countries, different sets of countries were associated with the publication, depending on the perimeter of the subject (for instance: EU15, EU25, Schengen area, EuroMed, etc.).
Summary statistics of authorship by country are given in the Supplemental material. By linking institutions of authors to their geographical subject (Figure 1), we find different patterns:

Geographical origins and destinations of papers – 1996–2015. Reciprocal links are represented in blue.
European and North American countries tend to study each other in a symmetric way through Cybergeo articles;
Latin American countries are mainly studied by authors affiliated in Europe and North America;
African and Asian countries are studied mainly by Europeans and marginally by Americans and themselves;
Russia and Australia are studied by Western authors and study their own hinterland.
Finally, we find privileged links between France and (formerly) French-speaking countries (including Belgium, Canada, Vietnam, Madagascar, Senegal, etc.), whereby a common language and at times a shared history through colonialism have produced favoured national subjects of study to be published in the European (yet predominantly French-speaking) Cybergeo journal.
Internal semantic network (keywords)
Communities and semantic fields
The community detection algorithm reaches a modularity optimum with 10 clusters, which we summarise by expert knowledge (as we will do for each clustering result in the following) as: mobility and transportation; imagery and GIS 1 ; climate and environment; history and epistemology; sustainability, risk and planning; Economic geography; Territory and population; urban dynamics; statistics and modelling; and emotional geography. Some clusters concentrate a large number of keywords and articles, such as ‘imagery and GIS’ or ‘statistics and modelling’. This result was expected because of the original aim and scope of the journal (quantitative geography). Beside the main clusters and a set of medium-sized clusters, two small and totally unexpected clusters according to journal editors emerged: ‘emotional geography’ and ‘climate and environment’. The CybergeoNetworks application proposes a set of visualisation parameters to draw the communities (see online Supplemental material) such as setting the size of vertices and edges according to different variables (degree, number of articles and modal weight).
As explained above, modal weight metrics can be used to draw semantic fields. The CybergeoNetworks application presents the full list of keywords. The user chooses one keyword from that list, the word is placed at the centre of the plot and all its neighbours are arranged at a distance inversely proportional to the preferential attachment (modal weight). We illustrate this feature in Figure 2: to ease reading, a circle is drawn at a distance of 1. Some proximities are expected (‘urban’ is closely linked to ‘city’), some are expected knowing the original scope of the journal in the field of theoretical and quantitative geography (‘model’ or ‘spatial statistics’ are linked to ‘city’). Some proximities are totally unexpected: for ‘city’, the preferential attachment of keywords like ‘movie’, ‘web’ and ‘virtual’.

Semantic fields (a) ‘City’ and (b) ‘Network’.
Spatial communities
Using the keywords distributions to draw the semantic profile of the 129 countries studied in a Cybergeo article, we obtain a clustering in four groups representing 16.3% of the initial inertia. 2 Its geographical distribution is shown in Figure 3 with the average profile of each group.

(Left) Geographical communities of declared interest. (Right) Corresponding semantic profile of groups.
Countries are differentiated first by whether or not the articles studying them also declare keywords related to transport and mobility, history and epistemology, urban systems and/or emotional geography. Indeed, the first group of 79 countries (in blue, Figure 3) is defined by these themes. The corresponding countries are the most developed and richest territories of the world, including emergent countries such as the BRICS countries (Brazil, Russia, India, China and South Africa). The keywords used to advertise the articles follow the latest trends of geography, with mentions of emotions and mobility for instance.
The countries of the other groups over-represent the keywords related to:
Methods (in orange) such as statistics and modelling. The countries associated with these keywords are all located in Central and Southern Africa, with the exception of Laos. These countries are studied by a small number of articles focusing on methodological approaches. For example, the only article studying Rwanda (Querriau et al., 2004) relates to an optimal location problem, whereas Vallée (2009) uses ‘multilevel modelling’ as a keyword for the only article about Lao. Sustainability and risks (in yellow). This is the case of articles about Indonesia, for example, which all relate to hazards and vulnerability: to tsunamis (Ozer and De Longueville, 2005), to volcanoes (Bélizal et al., 2011) and to water scarcity (Putra and Baier, 2009). Finally, 19 countries are associated with keywords related to imagery and GIS (teal colour). They are located primarily in Saharan Africa. In many cases, this happens because the articles present a methodology which uses aerial and satellite images to substitute missing socioeconomic data (Ackermann et al., 2003; Devaux et al., 2007).
Thus, drawing communities of declared interest, we find an interesting dichotomy between rich countries on the one hand, which are studied extensively in the literature and for which authors use trendy keywords to singularise themselves from past and concurrent work; and developing countries on the other hand, which are associated with more technical keywords reflecting a narrower spectrum of domains and specific data challenges.
External semantic network (citations)
The application enables to explore the citation neighbourhood of chosen articles, in terms of semantic contents (the visualisation of full networks are technically not feasible as the full corpus contains around 200,000 articles). Wordclouds on the CybergeoNetworks application give the content of the article and the content of the articles in the neighbourhood, with each word being associated to the semantic communities. The user can therefore situate a work within a semantic context, and we expect that unanticipated connexions can be made with these tools, as authors may not be aware of similar works in other disciplines.
Communities structures
As explained before, the raw semantic network is optimised for modularity and size, making a compromise between these two opposite objectives, when edge and node filtering parameter vary. This provides 12 communities, which can correspond to existing disciplines, to methodological issues or to very precise thematic subjects. The communities are, in order of importance in terms of proportion of total keywords: Political Science/Communication; Biogeography; Social and Economic Geography; Climate; Physical Geography; Commerce; Spatial Analysis; Microbiology; Neuroscience; GIS; Agriculture; and Health. This method has the property of grouping together keywords based on co-occurrence, thus revealing the actual structure of abstracts’ contents: it is both an advantage when revealing links as for the large field of Social and Economic Geography, but it can also blur information by grouping more detailed communities. Very precise and small communities such as Health Geography appear as they are strongly isolated from the rest of the communities. This structure is particular, and shows a dimension of knowledge that classical citation analysis would not reveal.
Spatial communities
Using the citation network communities to draw the semantic profile of the 130 countries studied in a Cybergeo article, we obtain a clustering in four groups representing 16.4% of the initial inertia. 3 Its geographical distribution is shown in Figure 4 with the average profile of each group.

(Left) Geographical communities of bibliographical use. (Right) Corresponding semantic profile of groups.
The largest group of countries largely overlaps with the largest cluster of keywords communities (cf. previous section). Indeed, rich and emergent countries (BRICS included) are studied in articles used in similar ways in the citation network. There are further divides among this group. A first subgroup (in blue) of countries is studied by Cybergeo articles cited preferentially in the fields of commerce, socio-economic and political analysis. These correspond to articles mostly in Economics and Social Sciences. The nearest subgroup of countries (in orange) comprises 32 ‘Southern’ countries such as Australia, Pakistan, Chile, Madagascar, Iran, Lao, the Philippines or Iceland. It corresponds to countries treated by articles cited preferentially in methodological fields (spatial analysis and GIS). Indeed, the only article about Iran presents a collaborative decision support system (Jelokhani-Niaraki and Malczewski, 2012), while the only article about Australia reviews online cartographic products (Escobar et al., 2000). This kind of articles then tends to stay in the citation clique of geomatics. The third refers to 14 countries in South-East Asia (Indonesia, Thailand and Myanmar), Eastern Africa (Somalia, Kenya and Tanzania) and North Africa (Libya, Tunisia and Egypt). The articles studying them are cited preferentially in the fields of physical geography and health studies. The vision of these countries through the articles citing works published in Cybergeo is thus dominantly one of morphological wonders and health vulnerability. Finally, a group of 12 sub-Saharan countries (Côte d’Ivoire, Zimbabwe and the Republic of Congo) are associated with papers cited in the climatology citation community.
Thus, drawing communities of bibliographical use, we find an interesting dichotomy between rich countries on the one hand, which are associated with papers cited in broad communities, including topical and methodological fields and poor and developing countries on the other hand, which are associated with papers cited mainly in relation to their natural geography, health and climatic risks in the literature. This could suggest a need for the journal to call for more articles about such countries’ populations and economies.
Topics allocation (full-texts)
Evolution of the topics addressed in the corpus
The LDA model is applied on a reduced corpus of French articles, this language being the leading one for Cybergeo. We chose not to translate articles and keep a larger corpus with French articles, in order to avoid potential additional bias due to the translation process in our results. After destructuring the texts and filtering nouns, articles and verbs, our corpus counts no less than four million words, which leads to a dictionary of 137,224 unique words. The optimal number of topics was chosen by estimating the LDA parameters for different numbers of topics and choosing a compromise between perplexity and entropy of the resulting classification (see details of this optimisation and the detailed description of topics in Supplemental material), what results in 20 topics. We give in Supplemental material a part of the matrix describing, for each topic index, the first 20 translated words (except for the index 7 where words were already in English) in decreasing order of probability to belong to the topic. It is then interesting to observe how many documents addressed a given topic each year, i.e. the topics evolution in the Cybergeo corpus (Figure 5). We can distinguish several evolution profiles: decreasing, punctual, regular and increasing topics. Articles about cartography tend to decrease. Articles about remote sensing were mainly produced in 2000, just like articles about water management in 2004 and 2011. Articles about agglomeration are regularly produced. Geographical epistemology is also often debated across articles. Topics such as district and mobility tend to increase.

Number of documents addressing a topic per year, between 1996 and 2015. For visualisation purposes, only selected topics commented in text are shown.
Spatial full-text communities
Using the full-texts to draw the semantic profile of the 129 countries studied in a Cybergeo article, we obtain a clustering in four groups representing 13.4% of the initial inertia. Its geographical distribution is shown in Figure 6 with the average profile of each group.

(Left) Geographical communities of writing practice. (Right) Corresponding semantic profile of groups.
In this clustering analysis, we do not find the dichotomy of countries based on their wealth and economic development levels. The link between semantic and geographical proximity is also less obvious at the world level, in spite of one region being strikingly revealed: the institutional boundaries of Europe. The group of countries included in the EU27 plus the USA, Brazil and Chile (in yellow) appears strongly similar in terms of vocabulary used to talk about them. In particular, themes related to borders (frontière) and villages describe these countries well (for example, Le Néchet, 2011; Lusso, 2009; Santamaria, 2009).
A second group includes countries studied by papers written in English, as Cybergeo is a multilingual journal. A group of 59 countries, including Canada, Russia, Namibia, Malaysia and Ecuador, are studied in Cybergeo using preferentially words such as ‘commune’, ‘déplacement’ (mobility or displacement) and ‘aménagement’ (planning), suggesting an effort from French-speaking authors to present and explain the geographical and urbanism context of other countries around the world. Finally, a group includes countries from all continents and corresponds to papers written preferentially with words such as ‘eau’ (water) and ‘entreprise’ (enterprise), that is a very heterogeneous set of papers which could easily be classified as ‘other’.
The communities of vocabulary and writing practice thus appear less straightforward and less linked to geographical proximity. The main result lays in the fact that there is a specific set of words used to write about the European Union, a sort of EU27 Novlang made of words like ‘Eurovision’, ‘subsidiarity’ and ‘Spatial Development Perspectives’.
Discussion
Why three classifications? Evaluating the complementarity of approaches
This section backs up the previous qualitative comparison of approaches through their spatialisation by quantitative measures of their complementarity. In spite of having seen that the communities obtained from the three different methods are semantically and geographically distinct, we do not know precisely how they complement each other. The overlapping analysis is complicated by the fact that articles belong simultaneously to several clusters for each classification.
Therefore, we compare the methods two by two by computing the share of articles classified simultaneously in each possible pair of clusters from the two methods. Methodological details are detailed in Supplemental material, together with the diagram synthesising overlap between communities of the different classifications. We obtain for instance a clear preferential positive and negative relation between some citations communities and keywords communities (Figure 7). On the one hand, 35% of the Cybergeo articles in the GIS citation cluster are characterised by keywords identified as ‘Imagery and GIS’. On the other hand, there is no article in the ‘crime’ citation cluster which have keywords of the ‘Climate and environment’ community. These relationships make sense, because the way a paper is advertised by its keywords is one of the first elements indicating the potential reader that the paper is relevant or not. Interestingly, the ‘complex systems’ citation community is characterised by a variety of keywords communities (27% of the articles cited by this community are tagged in the ‘statistics and modelling’ cluster, 17% in ‘Imagery and GIS’ cluster, 13% in ‘history and epistemology’ and 11% in ‘urban dynamics’). This suggests that the field of complex systems, being unified by methods rather than objects of inquiry, are more open to diverse topics than other citation communities. It could also mean that within Cybergeo, authors of articles relevant to the complex systems community advertise their paper with keywords from the discipline of geography rather than methods only, in order to attract topical readers as well. Looking at the relations between keywords communities and themes communities, we find that some topics require specific words to write about them. For example, ‘Imagery and GIS’-tagged articles use more words from the ‘EN’ theme category, which corresponds to English words (rather than French). Urban studies are distinguished between its quantitative side (advertised by keywords around ‘urban dynamics’ and using words such as ‘agglomeration’) and its qualitative side (advertised by keywords around ‘sustainability, risk and planning’ and using words such as ‘femme’: woman). Interestingly, the words like ‘risk’ (risque) are used themselves more in articles tagged around ‘Climate and environment’ than around ‘sustainability, risk and planning’.

Overlapping between the three methods of semantic classification.
Finally, the flows between themes communities and citations communities appear roughly proportional to the size of clusters at origin and destination, suggesting that citations are rather independent of the vocabulary used in the articles. This is reflected in the quantitative analysis of correlations done in Supplemental material, this pair having the smallest mean absolute correlation. In short, the words that count in a citation strategy are much more the keywords than the actual content of the paper. These complementary analyses show thus the complementarity of classifications in the exploration of semantic diversity of publication in a 20-year-old journal.
Conclusion: Fostering open science and reflexivity
The open tools and software we provide participate to a larger effort of reflexivity tools in the context of Open Science. It is aimed at being complementary to existing platforms, like the Community Explorer for the community of Complex Systems developed by ISCPIF 4 that provides an interactive visualisation of social research networks combined to semantic networks based on self-declared keywords provided by researchers. Another example closer to what we developed is Gargantext 5 that provides corpus exploration functionalities. Linkage 6 is a similar tool with different methods, using latent topic allocation for networks with textual annotations (Bouveyron et al., 2016). We differentiate from these by exploring simultaneously multiple dimensions of semantic classification and more importantly by adding the geographical aspect. Furthermore, in comparison to various tools that private publishers are beginning to introduce, the open and collaborative nature of our work is crucial. For example, Bohannon (2014) suggests that one must stay careful when using search results from a popular academic search engine, as the mechanisms of the ranking algorithm and thus the multiple biases are unknown. The comparison is similar with text-mining paying services provided by private companies, as we suggest that a subtle synergy between knowledge content and knowledge production processes (to which closed tools are an obstacle) can be more beneficial to both.
We have studied a scientific corpus of a journal in Geography, combining multiple points of view through their embedding in the geographical space. This work is therefore in itself reflexive, illustrating the kind of new approach to science it aims at promoting. We believe that the open tools we develop in this context will contribute to the empowerment of authors within Open Science.
Supplemental Material
Supplemental material for Empowering open science with reflexive and spatialised indicators
Supplemental Material for Empowering open science with reflexive and spatialised indicators by Juste Raimbault Pierre-Olivier Chasset Clémentine Cottineau Christine Kosmopoulos
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
