Abstract
The aim of this paper is to study the topographical features of a transportation infrastructure through graph theory. First, we construct a planar, connected, and simple graph for each considered infrastructure; then, we compute some normalized indices associated to the graph, namely largest eigenvalue, gap, a Betti number, and codimension. The set of indices proposed in this paper is new for this application. These indices are computed from either the adjacency matrix or the edge ideal of the graph, and so they depend on the overall topology of the graph itself; furthermore, since the normalized indices are scale-free, they allow us a more effective comparison between different transportation infrastructures. Two scenarios are considered in order to understand advantages and limits of the proposed approach: the first scenario concerns a set of underground networks of certain large cities in the world, whereas the second one concerns a set of bus transit networks of several medium-sized cities in Italy. Indices calculated for both scenarios show two types of results. First, they show that the proposed indices are able to estimate the different topologies of the considered networks: networks with the same number of vertices and of edges but not with the same graph have different indices. Second, they show that the values of the indices in the two scenarios not only belong to the same curve separately but fit well also into the same curve: the transportation networks, no matter whether underground or bus transit, seem to be controlled by similar mechanisms.
Introduction
Nowadays, it is standard practice to represent transportation networks as graphs. In fact, such representation of networks allows researchers to highlight the wanted features of the network among the many ones. When graphs represent transportation networks, nodes and links of the network usually become vertices and edges of the associated graph. The degree of detail in this representation is somewhat subjective because it depends on the scope of investigation.
In addition, as in literature many different indices have been associated to a graph, researchers can get a better and more comprehensive understanding of the features of the transportation network from the analysis of values of some indices associated to the graph. When the chosen indices are normalized, even the comparison of different networks becomes easier. Hence, the representation of transportation infrastructures with graphs, and the following computation of some associated indices, should aim at understanding the different behaviors of networks. Besides, the comparison of a new designed network with working ones could help planners to get a network that fully satisfies transportation needs. In this sense, graph indices can integrate planning factors such as demand, demography, geography, demand assignment, and costs.
In the present paper, which is mainly applicative and not theoretic or algorithmic, we want to investigate properties of underground and bus (and sometimes tramway) transit transportation networks that depend on the way a line crosses all the other ones. Then, we transform networks into undirected, planar, simple graphs by coupling a node to either a terminal stop or a transfer one (for us, a stop is either an underground station or a bus stop). With transfer stop, we mean a stop connected to at least three other stops. This choice affects the analysis related to land and service accessibility but not to the functional properties of the network. In the “The underground and bus transit graphs” section, we motivate the reason why we consider only undirected, planar, and simple graphs, and some consequences of our choice. Then, for each graph, we compute four indices, namely the largest eigenvalue of the graph, the gap, difference between the largest and the second largest eigenvalues, the codimension (codim, for brief), and the Betti number
The analysis of the four computed indices has two main purposes. The first objective is to understand to what extent the considered indices are suitable to analyze transportation networks. To understand this, we construct a new dataset consisting of 33 graphs associated to bus transit networks of medium-sized Italian cities and we compute the values the indices take. Then, we compare those values with the ones in the underground network dataset by Derrible and Kennedy, computed in Mussone and Notari (2017). For the sake of completeness, we report the values in both datasets. We stress that, while the underground dataset includes cities from five major world areas, the new dataset involves cities all lying in a geographically small area compared to the worldwide one. Moreover, the Italian cities date back to at least the Middle-Ages, and so their transportation networks, though redesigned in the last decades, had to be integrated with a settled urban network. Hence, there is no reason to believe a priori that the indices are equally suitable for both datasets.
Once we have clear evidence that the chosen scale-free indices are suitable to analyze transportation networks, our second objective is to merge the two datasets to get a more robust one, suitable for further investigation on transportation networks. To obtain this, we plot the indices on the same plane and we check that they complete each other, that is to say, the same curve fits both the datasets. We have to underline that those indices capture certain relationships between vertices and edges of a graph but probably not all the existing ones; to clarify that point, one has to push forward the present research.
At last, among the results of our research, we emphasize that no matter whether underground or bus transit, these networks present an increasing structure depending on similar principles, since we can think of the different sizes of the analyzed networks as the evolution over time of a city, according to Alexander (1965).
The paper has five other sections: The next section presents a brief literature review; the “The indices” section lists the indices proposed for the analysis; the “The underground and bus transit graphs” section describes the datasets used for the application; the “Results” section shows and discusses the results of the proposed indices and a comparison between underground transportation networks and bus transit ones. Finally, in the “Conclusions” section, we summarize the conclusions.
Literature review and paper aims
The study of transportation infrastructures via the computation of indices from the associated graphs began with the seminal papers by Garrison and Marble (1962, 1964). Those authors introduced three indices depending only on the number of vertices and edges of the graphs associated to networks: circuit α, degree of connectivity γ, and complexity β. Since then, such an approach has become very common, and many other indices have been proposed and discussed by applying them to various datasets.
Among many others, we quote the papers by Derrible and Kennedy (2010a, 2010b, 2012), also because we re-analyzed their dataset, consisting of graphs associated to underground networks, in Mussone and Notari (2017). In their papers, Derrible and Kennedy consider 33 urban rapid rail transit networks of varying sizes, whether at grade, underground, or elevated, currently working in five major world areas (North and Latin America, Europe, Africa, and Asia). They transform the networks into graphs by coupling a node to each transfer station, where transfer station is understood to be either a station connected to at least three other stations, or a terminal one. We adopt the same procedure to construct a graph from a transportation network. The indices considered by Derrible and Kennedy are the number of vertices; the number of edges; the network diameter, i.e. the total number of edges linking the two furthest vertices using the shortest path; and the complexity β and the degree of connectivity γ, both by Garrison and Marble (1962). They also consider indicators depending on physical characteristics of the network, such as edge lengths. A further development of this work can be found in Wanga et al. (2017).
For completeness, we stress that, in Mussone and Notari (2017), more indices were considered: in addition to the four ones considered in this paper, we took into account the number of vertices, edges, the indices
Gattuso and Miriello (2005) face the problem of a topological and geographical evaluation of metro networks through a set of indicators, at graph and geographical levels. At graph level they use, other than the ones by Garrison and Marble (1962):
the local degree, i.e. the number of concurrent links on the generic node; the number of destinations directly connected by i nodes; the number of nodes; generally it does not correspond to the number of network stations (two or three different line stations, placed at the same point, are represented by a unique network node); the number of links (a link covered by several lines is counted only once); the number of cycles.
A list of parameters at geographical level is also proposed. Actually, this study is notable for the development of two indicators at geographical level: node’s range of influence (to differentiate the attractiveness of a node) and network covering (circular surfaces defined by the node’s range of influence that can be placed one upon another).
Gastner and Newman (2006) find some clear patterns that imply a connection between network structure and geography by analyzing empirical data and overcoming past limitations on computer resources. Blumenfeld-Lieberthal (2009) analyzes the topology of transportation networks within different systems of cities using their characteristics as indicator of economic activity between them.
Xie and Levinson (2007) propose to calculate three parameters, namely heterogeneity, connection pattern, and continuity, to measure structural characteristics of road networks. Heterogeneity is derived from the probability pk that a vertex is connected to other k vertices and it is proportional to
Ducruet and Lugo (2013) propose a discussion about structure and dynamics in transportation networks whose static dimension is analyzed with regard to topology, geometry, morphology, and spatial structure from a global and local point of view, according to the required application (e.g. looking for interconnectivity or vulnerability, respectively). Kurant and Thiran (2006) extract the topology of mass transportation networks from their timetables. Many other papers focus on dynamic features (mainly the growth) of transportation networks (see, e.g. Xie and Levinson, 2009a, 2009b), so many and so specialized that this topic could deserve a dissertation apart.
In Estrada (2012) and Estrada et al. (2012), the authors study how much information is contained in the communicability distance matrix
Finally, we quote Alexander (1965), as the conclusions in that paper have been recently recovered by Derrible (2016). In Alexander (1965), the author argues that when a city grows organically, its parts are highly interconnected. As we consider urban transportation networks, we expect the associated graphs to be so complex that they merit a detailed analysis.
The indices
In this section, we introduce the indices we use to analyze graphs. To allow for an easy comparison between indices computed for graphs with a different number of vertices or edges, we normalize indices with respect to some maximizing functions, empirically computed in the class of undirected, planar, connected, and simple graphs. As in Section 2 of Mussone and Notari (2017), we divide the indices into two classes, according to the techniques needed to compute them: linear algebraic and algebraic indices. The linear algebraic indices are computed from the adjacency matrix of graphs via linear algebra, whereas the algebraic indices are computed from the edge ideal of the graph via the algebra of polynomial rings. All indices, then, depend on the topology of the graph itself.
Let
Now, we describe the algebraic indices. An edge is a couple of vertices, the endpoints of the edge. We say that the endpoints of the edge meet the edge itself. The codimension of a graph is the least number of vertices needed to meet all the edges. Of course, if we take all the vertices, they meet all the edges. However, in general, a proper subset of vertices is enough to meet all the edges, and so the codimension is smaller than
The whole set of indices is the following (“Linear algebraic indexes” section).
Linear algebraic indexes
λv
Gap g
It is the difference between the largest eigenvalue and the second largest one of
Normalized largest eigenvalue
It is the normalized largest eigenvalue of
Let
if
The adjacency matrix of
Normalized gap of G
It is the normalized gap of
Let
if
if
if
if
The adjacency matrix of
Algebraic indices
Codim(G)
It is the codimension of the graph
b21(G)
It is an algebraic Betti number of
Normalized codimension of
It is the normalized codimension of
If the number of vertices is odd, i.e.
Normalized
It is the normalization of
As in the previous cases,
Our study focuses on the values of the above described four normalized indices.
In Table 1, we report the values of the indices for the graphs in Figure 1, and the maximum values of the indices for

Three graphs with
Values of indices for the graphs in Figure 1.
The underground and bus transit graphs
Before describing in detail the two datasets taken into consideration, we debate on the statistical significance of the datasets. While it is evident how many networks belong to each dataset, it is hard to estimate the total number of transportation networks (i.e. the universe) that we have to consider. Hence, we do not know what percentage of the universe corresponds to our datasets. Now, we list some reasons to explain why it is hard to estimate the universe. A first fact is that many cities, but not all, have a transportation network in the sense we are considering. Second, there are networks that serve areas containing several cities, usually a large one with a few small satellites. Finally, there are networks connecting different and very crowded areas. All of them are transportation networks but they work on different scales. We are interested in local transportation networks, and so we ought to count the networks of the first type but also some networks of the second type, if not too big. Again, it is optional to include or exclude some of them. Moreover, it is reasonable that some of them are essentially the same network even if on a different scale or in different world areas, and so they have to be counted only once. In conclusion, it is not possible to estimate how big the universe is. A complete different approach could be the following one. As networks are represented by graphs, we can consider all the possible graphs. However, real networks are represented by a relatively small number of graphs if compared to the set of all graphs that can be constructed with the same number of vertices and edges. This remains true even if we change the number of details we consider when passing from networks to graphs. Then, if one considers all possible graphs, information on real networks is limited. A refined strategy could consist in selecting the graphs that do represent transportation networks. To implement such a strategy, however, we need to characterize such graphs. Unfortunately, up to now, in literature, no one has proposed such a characterization. Finally but not least, real networks dynamically change very rapidly, and so a graph represents a real network only for the time the considered network does not change. In conclusion, we are not able to evaluate the statistical significance of our datasets. However, we are sure we can consider the actual transportation networks as expressions of similar processes, reducing dramatically our universe, and so the proposed datasets are suitable at least for inferring information about the universe.
A further remark concerns the restriction on graphs we associate to networks. In fact, we only consider undirected, planar, connected, and simple graphs. While every transportation network is connected, the planarity requirement could not be achieved for all networks, even if it generally does for the ones we took into account. In general, we assume there is a node every time two or more lines cross. The choice of considering undirected and simple graphs limits the network properties we can deal with. For example, we cannot consider physical network characteristics, such as the performance of a link (length, travel time, fare, or other costs), as well as interaction with demand. In fact, to capture such properties, one needs weighted, directed, and non-planar graphs. The definition of the proposed indices for such extended graphs deserves much further theoretical consideration and we are dealing with them for a following study. Finally, we remark that, as a byproduct of the procedure we use to construct a graph from a network, degree 2 vertices are very rare. The choice of not considering degree 2 vertices is justified by the fact that they do not modify the topology of the graph, but, of course, it is an information that can be taken into consideration when studying weighted graphs.
Now, we describe the two datasets in detail.
The first set consists of 41 graphs representing underground networks, from the largest cities worldwide (Table 2). The second set contains 33 graphs representing bus transit local transport networks for 38% of Italian cities with a population in the range of 90,000–200,000 inhabitants (Table 3). Though both samples can be considered statistically significant with respect to the represented universe, they were limited only by the availability of transportation network maps.
Number of vertices, edges, and some results for the graphs of the first scenario – Underground graphs of some worldwide cities (norm. stands for normalized).
Number of vertices, edges, and some results for the graphs of the second scenario – Bus transit local transport graphs of some Italian cities (norm. stands for normalized).
In the first scenario, there are 41 graphs (Table 2). Thirty-three of these graphs were built up by Derrible and Kennedy (2010a) to represent functionally the underground networks of certain large cities worldwide. Variants of eight graphs from the above ones are also taken into consideration (indicated with the label “modified”) and added to the list. In the paper by Roth et al. (2012), 14 networks are used, and in that above-mentioned by Derrible and Kennedy (2010a) they are 33.
For the second scenario, we built up 33 graphs to represent functionally the bus transit local transportation networks for some medium-sized Italian cities (Table 3). For some of those cities, the available data were the list of the main stops of each bus line. In those cases, we superimposed the list to the city map and we considered as nodes the junctions with at least three entering/exiting connections, or terminal stops, as in the first scenario.
As indicated in the “Introduction” section, this new dataset consists of transit networks of cities all lying in a geographically small area if compared to the other dataset. Moreover, the considered cities date back to at least the Middle-Ages, and so their transportation networks, although designed in the last decades, had to be integrated with a settled urban network.
The graphs representing underground networks have mostly a number of vertices less than 26, and the others are scattered in the range 30–131. Moreover, graphs with the number
Considering both datasets together, however, the number of edges is in the range
The considered cities in the first scenario are much more populated than those in the second scenario. Nevertheless, the associated networks have a comparable number of nodes and links, due to the cost of building, managing, and maintaining a complex underground network with respect to a bus transit one.
We computed the indices proposed in the “The indices” section for both the datasets. The computation of the linear algebraic indices was performed by using MatLab©, while the algebraic indices were computed by using Singular (Decker et al., 2015). In Tables 2 and 3, for the sake of brevity, we report only the normalized indices. In fact, the normalized indices are more suitable for comparing graphs that have different numbers of vertices and edges.
Results
Tables 2 and 3 show that the values of the indices, when considered all together, vary significantly between graphs of the same set. The only exception is Lyon and Montreal, but their graphs are the same, once the vertices are re-labeled. Hence, the considered indices are suitable for detecting the different topologies of the analyzed graphs. The indices show more or less a monotonic trend with respect to the number of vertices. In fact, the largest eigenvalue, the gap, and the degree 1 algebraic Betti number are clearly decreasing, while codim is increasing in the first scenario and constant in the second one.
Now, we begin to analyze the data of Tables 2 and 3. We observed in the “The underground and bus transit graphs” section that the number of vertices of the graphs in the first scenario is not homogeneous. In order to understand how much this affects results, we cluster the graphs having the number of vertices in [19, 19 + p − 1], in [19 + p, 19 + 2p − 1], in [19 + 2p, 19 + 3p−1], and so on until it reaches 79, the largest number of vertices for graphs in the second dataset, with p = 2, 5, 10. We perform an analogous clustering also on the first dataset. For each subset, we compute the averages of the indices. By comparing those numbers, it is easy to see that clustering acts only as a low-pass filter and trends present in data do not change. As expected, we get a smooth interpolating curve when p grows.
To confirm that the proposed indices are suitable for detecting different topologies, we perform a further analysis. We compare three graphs of the second scenario (the cities of Bergamo, Parma, and Reggio Emilia), with 50 vertices each, but with different numbers of edges: 62, 65, and 65, respectively. The comparison confirms that the proposed indicators are capable of pointing out differences between the topologies of these graphs as shown in Table 4, where values are re-normalized according to their maximum column values for easy comparison. Although the three considered cities have graphs with the same number of vertices and two of them also have the same number of edges, the indices are different. The Parma graph emerges as the most complex one. By analyzing the distribution of node degrees (Figure 2(a)), we can observe that the Parma graph has the highest number of degree 6 vertices (i.e. with node degree equal to 6) which indicates a more complex local structure. By analyzing also the minimum spanning trees of these graphs, we observe some differences, particularly Parma spanning tree differs from the other two for its number of levels which is lower (9 versus 11). That denotes the presence of higher degree vertices. Also its diameter is lower. However, we note that the node degree or distance, considered alone, is not enough to render the complexity of a graph, because it is easy to construct trees (i.e. the simplest graphs) with many vertices of very large degree.

Distribution of vertex degree for the three graphs with 50 vertices of the second scenario (a) and for the three graphs with 43 vertices from both scenarios (b); the number in brackets below city name represents the number of edges.
Normalized indices for the three graphs with 50 vertices of the second scenario, and for the three graphs with 43 vertices from both scenarios,
We perform another analysis of the same type on three other graphs, one from the first scenario and two from the second one, namely the cities of Moscow, Bolzano, and Lecce, with 43 vertices each, and with a different number of edges: 66, 69, and 52, respectively. As shown in Table 4, also in this case the indices show some differences. The Moscow graph turns out to be the most complex. The node degree distribution of Moscow has the highest number of degree 5 vertices, while the number of degree 6 vertices is the same for Moscow and Bolzano (see Figure 2(b)). It holds also for this triplet of graphs that the minimum spanning tree of Moscow has the lowest number of levels and also its diameter is the lowest.
It must be pointed out that these two last analyses have to be considered only as indicative because of the limited number of cases they include, and the many spanning trees a graph has. Despite this statistical weakness, however, sensitiveness to all the indices cannot be denied. In conclusion, we have clear evidence that the four scale-free proposed indices are suitable for analyzing networks.
To compare the values of the indices for the two scenarios, we plotted them together, getting a diagram for each normalized index versus the number of vertices (Figure 3). The four pictures highlight the different distribution of vertices for graphs of the two datasets. In fact, as remarked in the “The underground and bus transit graphs” section, the second scenario (the bus transit one) has the most number of graphs concentrated in the central range of the considered number of vertices, whereas the first scenario (the underground one) is more scattered over the whole range. Nevertheless, the two sets of data actually complete each other, as Figure 3 shows, by arranging data for each index around a hypothetic fitting curve. Repeating the above procedure versus the number of edges, we get very similar results. Hence, we can merge the two datasets, no matter what type of network they represent, and we get a more robust database. This might be useful for investigating whether or not a graph represents a transportation network: if the values of the indices of the graph are comparable to the ones of the dataset then the graph can be associated to a transportation network.

Relationships between the number of vertices and normalized codim (a), b21 (b), gap (c), and λv (d) for graphs of underground and transit networks.
In order to investigate the best regression curve for each of the four normalized indices, we compute the fitting curve first on separated datasets and then on the joint dataset. The null hypothesis of both computations is that data in datasets belong to different population. We used curves belonging to different families: exponential, rational, and polynomial curves. The best performance (based on R2 value) is obtained by using a double exponential curve
This result suggests two considerations. First, the way a transit network changes when a new vertex or a new edge is added is quite well defined. In fact, it is enough to use the fitting curve to predict the values of the indices for a new configuration. What is still difficult to assess is how the new vertex is connected even if (or even though) the range of possibilities is likely to be very wide. Second, the two sets of transit networks behave similarly with respect to the number of vertices and edges of the associated graphs. Probably, this is due to the fact that the proposed analysis on topology is essentially scale-free and so it points out only some aspects of the transportation networks.
Conclusions
The present paper can be included in the vast literature aimed at studying transportation networks via the computation of indices of the associated graphs. We propose a set of scale-free indices to measure the differences in topology of graphs associated to transportation networks. We compute the values of the indices for graphs belonging to two different technological scenarios: the graphs of the first set represent underground networks, while the graphs of the second set represent bus transit networks. The underground set refers to underground systems of some relevant cities in the world, whereas the bus transit set contains the transit lines of some medium-sized Italian cities (mainly served by buses but also by tramways).
Particular care is employed to normalize indices by using the maximum value we can get from undirected, planar, connected, and simple graphs with exactly the same number of vertices and edges.
The analysis of the values of the indices for both scenarios shows that they vary even if the graphs have the same number of vertices and/or edges. Hence, we can use them to evaluate the differences, whereby edges connect vertices.
We also observe a relationship between node degrees and the values of indices in the sense that a graph with the highest node degree, among graphs with the same number of vertices and edges, has also the highest values of indices and vice versa. Moreover, the normalized values of the indices are well fitted by double exponential curves for both datasets and their union. The R2 fitting index is greater than 0.80 for all indices except for codim which has a higher dispersion of values and a R2 not greater than 0.37. This deserves further investigation but it can be considered a prospective merit since data dispersion is mainly due to differences reported by the index for different topologies.
Finally, we can conclude that the two datasets can be merged together in order to build a more robust/significant one, suitable for deciding whether a generic graph can be considered a possible representation of a transportation network.
From the topological point of view, our results prove that the two types of networks are built, more or less, by using the same underlying criteria. In fact, the list of an increasing number of vertices and edges could be thought of as a time modification of a transportation network, no matter what type it is. In particular, this result shows that the structure of transportation graphs tends to be restricted to a few basic topologies and not to the most complex ones involving an equal number of vertices and edges. We could consider the sets of considered graphs, from the smallest to the largest one, like a possible way a network evolves in time, though in our application we have not taken into account temporal dynamics.
This result seems very close to the one published by Roth et al. (2012). However, the limitation to a few “dominant mechanisms governing the evolution of these structures” emerges quite clearly from the fact that index values lie around the same curve.
Our last comment concerns a possible definition of a transportation graph, that is to say, a graph that could represent a transportation network from an abstract point of view. From the cases under investigation, such a graph is undirected, connected, planar, and simple, with a number of edges in the range
The extension of this approach to direct and/or weighted graphs needs further development and research. Direct or weighted graphs will allow us to take into account other crucial features associated to transportation graphs, such as e.g. length, travel time, costs, number of stops, speed, frequency, potential and satisfied demand, real estate values of buildings close to lines, and so on. On the other hand, the adjacency matrix of a direct graph is no longer symmetric, and so its eigenvalues may be not real. The algebraic indices, too, need to be redefined if they are to be used in this new setting.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
