Stated Preference Analysis With Latent Variables for Higher Speed Rail

Abstract

Recently, the necessity to redefine the mode selection model (modal split model) has emerged owing to the diversification of transportation systems. In addition to socio-economic factors, preference factors and user characteristics significantly affect the selection of transportation mode. Therefore, user preferences are reflected in the modal split model. The latent class model is highly descriptive, and its suitability can be improved. With the advent of high-speed rail (Great Train eXpress [GTX]), Seoul has established a competitive and complementary relationship for the demand of passengers between general and great express rail. Therefore, a modal split model reflecting the characteristics of the GTX, which is redefined by applying a latent variable to the stated preference analysis, is necessary. Latent class analysis uses the Bayesian information criterion and log-likelihood to distinguish the travel property (Cluster1), station property (Cluster2), and transfer property (Cluster 3). After comparing the time values for each cluster, the GTX preference is analyzed based on the inner and outer circle situations. A model that accurately reflects the characteristics and preferences of passengers is proposed in this paper. In the future, strategies can be established for the inner and outer circle situations, and higher operational efficiency can be achieved by combining the GTX and subway.

Keywords

planning and analysis choice models mode choice public transportation model multimodal rail transit systems urban

Calculations for the existing modal split models are performed by considering travel time, travel cost, and socio-economic factors as the main variables. However, the necessity to redefine the appropriate modal split model has emerged recently owing to the advent and diversification of a new transportation mode. In this study, an evaluation technique used by the current government is applied to the modal split model, which is primarily used to predict transportation demand. In fact, various studies are being conducted to mimic reality by incorporating various variables in the current modal split model, which reflects only general socio-economic indicators ( 1 , 2 ). Typically, not only individual socio-economic attributes, but also individual preferences significantly affect the selection of transportation ( 3 , 4 ). Therefore, models using variables that reflect user preferences as well as socio-economic factors are to be developed. The latent class model is descriptive and can improve the suitability of the modal split model ( 5 ). The Great Train eXpress (GTX) is currently under construction in Seoul as a new high-speed rail. The GTX connects major hubs in Seoul and Gyeonggi-do in a 30-min period. It is similar to the cross rail in London, the regional express rail in France, and the Reseau Express Metropolitan (REM) in Canada. The cross rail of London increases London’s rail capacity by 10% and transports approximately 1500 passengers per train; furthermore, it can be accessed by 1.5 million people in central London within 45 min ( 6 , 7 ). Meanwhile, Canada’s REM, which was once used as a car or taxi, aims to provide access to the Montreal International Airport ( 8 ). Therefore, the GTX can enhance people’s life quality by improving the transportation systems in Seoul, thereby solving chronic traffic problems and reducing travel time. Owing to the advent of the GTX, which is a new transportation system, it is envisioned that passengers are more likely to prefer using the subway and GTX than the other public transportation. Therefore, competition and complementary relationships between the subway and GTX will be formed. For regional railways, a modal split model was established to distinguish between subway and high-speed rail. However, Seoul’s railway modal split model cannot distinguish different transfer effects, and an application was established using the same variable between the above-mentioned transportation systems. However, spatial correlations might not be reflected accurately ( 9 – 11 ).

Therefore, using the modal split model of the subway for the GTX can result in errors. Therefore, a mode sharing model that reflects the characteristics of a high-speed rail is to be developed. In this study, the modal split model for the GTX was redefined using latent class analysis (LCA), which can reflect individual characteristics and preferences. The share rates of the outer and inner circles were obtained based on a stated preference (SP) survey about changes in the use of the metropolitan railway because of GTX operation. The Bayesian information criterion (BIC), a determinant for model selection, and the log-likelihood (LL) value, which indicates the fit of the model, were classified, and the latent variables were derived. The modal split model was applied in a few case studies via LCA. In addition, the application of the GTX enabled the comparison of preference differences between public transportation and in-vehicle time values through distance.

Literature Review

SP Survey and Modal Split Model Implementation via LCA

In previous studies, models have been developed for various fields by applying preference factors to LCA. In various cases, LCA is applied to the modal split model. Galdames et al. ( 3 ) conducted a survey of commuters to acquire socio-economic data and preference factor data to investigate the role of preference factors in the modal split model. In this study, by performing LCA based on path analysis, both preference factors and socio-economic data were considered and used for the discrete choice model. People’s travel behavior and mode choice can be explained by considering preference properties as a latent variable. Therefore, the consideration of latent variables demonstrates the practical importance of travel costs and service levels in a person’s decision-making process ( 3 ). Madanat et al. ( 12 ) identified variables by applying LCA to determine factors affecting the thinking about road detours when a driver encounters traffic congestion. Therefore, the attitude toward path switching and the reliability of information provided by the radio traffic report or changeable message sign are important explanatory variables for path switching ( 12 ). Wen and Lai ( 13 ) reported that the latent class model performed better than the existing MultiNomial Logisitc (MNL) model. In addition, they explained that if individual socio-economic and travel characteristics are included as variables, then the suitability of the latent class model can be further improved ( 13 ). In another study by Wen et al. ( 14 ), railway access mode choice was investigated using survey data obtained from Taiwan. Consequently, LCA was proposed, and the results showed that the proposed model provided better explanations than other models with regard to both alternative patterns and preference heterogeneity. They identified user preferences with regard to individual aspects using a modal split model based on socio-economic and travel characteristics. In addition, the suitability of LCA was investigated, and highly explanatory results were obtained ( 14 ). Tawfik and Rakha ( 15 ) reported that selection behavior differed by individual; therefore, a segmented model should be developed. Accordingly, a route-choice model for drivers was presented via LCA. The class for the model was classified based on the demographics, personality, and selection characteristics of 20 drivers. In addition, it was reported the route-choice model using LCA performed better than the hierarchical behavior model ( 15 ). Ben-Akiva et al. ( 5 ) simulated preference factors as a latent variable affecting the decision-making process. Comparing the choice model with and without the latent variable (preference factors), it was discovered that integrating the latent variable improved the suitability of the choice model significantly ( 5 ). Prato et al. ( 16 ) reported that modeling alternative processes for each individual when selecting a path was complex. Therefore, a methodology that appropriately applies factor analysis to path selection is proposed in this paper. Individual behavior data were acquired from a survey conducted on individual commuters; subsequently, the data were analyzed ( 16 ). Walker ( 17 ) explicitly incorporated an atypical concept as a preference indicator to develop a more realistic behavior model. Behavioral researchers emphasized the importance of factors that affect behaviors, such as situation, knowledge, and attitude ( 17 ). In addition, the current conceptual and methodological frameworks for integrating preference factors that affect decision making in the selective model into explanatory variables are developed by modeling the above-mentioned factors as latent variables. Afghari et al. ( 18 ) proposed a latent class model that is consistent with “multiple-risk process” theory based on the geometric characteristics of a road, the spatial characteristics of the surrounding environment, and driver behavior factors. A methodological approach for estimating the Bayesian latent class model was proposed considering engineering and unobserved spatial factors. This implies that the latent class model can consider spatial correlation ( 18 ). The current modal split model is presented by the integration of in-vehicle and out-vehicle times within the overall travel time. Within the framework, it is effective to categorize the transfer effect or spatial correlation into different factors and analyze them simultaneously. Therefore, LCA is more descriptive than other models because it considers various factors and preferences. In addition, the latent variable is derived primarily from satisfaction surveys to analyze unobservable preferences and individual characteristics.

LCA is used not only in the modal split model, but also in the transit and freight sectors. Cerwick et al. ( 19 ) developed a model using mixed logit and LCA to investigate the relationship between the severity of large truck crashes and their contributing factors; subsequently, they investigated the differences between the models. After comparing the models, the latent class model was discovered to be more suitable. In transportation and freight, a survey was conducted to investigate the LCA and people’s preferences ( 19 ). Román et al. ( 20 ) analyzed preferences when performing mode choices for freight transport. Kim et al. ( 21 ) conducted a study using preference surveys from 190 New Zealand freight forwarders and companies to understand the decision-making process when making mode-choice decisions. Based on LCA, mode-choice decisions differ depending on the transport distance and shipment size. Therefore, LCA is used in preference and SP surveys. In addition, mode-choice decisions can be achieved appropriately using preference surveys and LCA.

LCA Using Structural Equation Modeling

A study was conducted to identify the relationship between the latent variable and the variable available in the LCA through structural equation modeling (SEM). Hurtubia et al. ( 22 ) proposed a method where preference metrics are introduced to the general discrete choice model, which considers only quantitative variables, such as travel time and cost. Recognizing the importance of users’ lifestyles, attitudes, and perceptions, a model was developed using a structural equation ( 22 ). Kim et al. ( 23 ) reported that structural equations best describe the underlying relationship between variables, and they analyzed the severity of collisions based on factors contributing to collision accidents. The standardized coefficient from the LCA is useful for assessing the relative importance of latent factors to the severity of collisions ( 23 ). Wen et al. ( 24 ) developed a structural equation to determine passenger loyalty to intercity bus services and explained the causal relationship between latent factors. Outwater et al. ( 25 ) predicted an expanded ferry service by adding variables to consider the importance of passengers’ attitudes and various markets in mode-choice modeling. Therefore, SEM was performed, which rendered it easy to identify the causal relationship between traffic behavior and the socio-economic profile of passengers ( 25 ). Wang and Qin ( 26 ) analyzed the relationship between factors that contribute to the severity of collisions and the severity of a single vehicle crash via SEM. This is because the complex relationships between variables can be investigated by simultaneously processing both endogenous and exogenous latent variables ( 26 ). Spada et al. ( 27 ) hypothesized a theoretical route for analyzing the relationship among COVID-19 (Coronavirus), population density, and climate. In addition, it was tested via SEM, a statistical technique, for correlation analysis. Consequently, climate factors and population density were discovered to be correlated with COVID-19 ( 27 ). Al-Mahameed et al. ( 28 ) acquired more than 60 expansionary variables pertaining to pedestrian and bicycle collisions in highways to establish relationships between pedestrian and bicycle collisions and expansionary variables. SEM that incorporated exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) was performed. Consequently, important latent variables (e.g., road and degree of activity) that affected the frequency of collisions were revealed ( 28 ).

Methods

Introduction of the SP Survey

The SP technique can be defined as a series of techniques that establish virtual scenarios via statistical experimental planning and provide individuals with individual preferences ( 29 ). In other words, it is a technique that investigates an individual’s preference and intention that are psychologically inherent in a hypothetical situation. The SP survey provides individual respondents with an alternative based on hypothetical situations. A hypothetical alternative is expressed as an explanatory variable representing the alternative, that is, a service or characteristic. Alternative configurations are generally combined with experimental planning; therefore, respondents order their choices in the order of importance or grade them on a scale for each alternative. Alternatively, they can select a preferred alternative. This SP survey technique allows data independence to be maintained. Using the experimental planning method, the correlation can be excluded, and questions without interaction between two variables can be designed and presented to the respondents. In addition, data about new situations that do not exist can be obtained. In fact, the demand for new transportation that does not exist in the current market can be estimated by presenting fees and travel times for virtual new transportation based on experimental plans. Therefore, when SP data are used, the unit of the attribute variable is determined in advance and presented to the respondent. This affords the advantage of no measurement error from the attribute variables when inputting data or constructing a model.

Theory of Relevance

The modal split model, based on the existing SP survey, was calculated using the out-vehicle, in-vehicle, access, and wait times. However, a modal split model using invisible user characteristics has been developed recently. In this study, the modal split model was redefined using LCA, which applies latent variables to the SP survey.

A latent variable means an intrinsic characteristic that cannot be measured directly, such as a respondent’s preference for an examination question, that is, a variable with potential characteristics that cannot be observed. LCA is modeled based on potential characteristics. This can reveal people’s traffic behavior by reflecting their preference for mode choice. It can cluster variables, including various exogenous variables, such as demographics. Furthermore, it improves the explanatory power of the model by reducing the error caused by non-missing factors and is more suitable than other models ( 3 , 17 ). This enables a reasonable interpretation by analyzing the behavioral preferences of users. Therefore, efforts to consider preference properties as latent variables have been continued, and studies that use latent variables in the mode choice process are currently in progress. Owing to the advantages and characteristics of LCA, a model was developed by adding a latent variable to the existing modal split model. Previously, travel time, travel cost, and socio-economic factors were used in LCA; however, the model was redefined in this study by considering individual characteristics, as shown in Figure 1.

Figure 1.

Schematic illustration of the modal split model.

A SP survey is intended for accumulating personal preferences or opinions about specific transportation systems and behaviors in hypothetical situations to serve as a basis for conducting a preference survey for new transportation choices in the future. A SP survey exhibits the following characteristics.

Provides respondents with an option based on hypothetical situations.

Each alternative attribute is presented by the analyst based on the respondents’ current situation.

Each alternative configuration is combined based on the experimental design.

A hypothetical alternative is expressed as a service or characteristic of the explanatory variable.

Respondents provide their preferences about three components: choice, ranking, and figures.

Based on the existing case, the SP survey requires 75–100 samples in each group ( 30 ). When r responses are obtained from an individual, the required number of samples (n) is divided by the number of questions and determined by the actual number of samples. The minimum number of samples is shown in Equation 1 if multiple responses are obtained from n respondents r times ( 31 ):

n \geq \frac{q}{rp a^{2}} Φ^{- 1} (\frac{1 + a}{2})

(1)

where n is the number of samples; $p$ is the share rate of cars; $q$ = $1 - p$ ; a is the relative error; $Φ^{- 1}$ is an inverse function of the cumulative normal distribution function.

Table 1 shows the minimum number of samples based on the proportion (p) of the population at the 95% confidence level, relative errors of 5% and 10%, and number of multiple responses (r).

Table 1.

Minimum Number of Samples Based on Proportion of Population (p), Relative Error (a), and Number of Questions (r)

Category	Number of samples (relative error $a = 0.05$ )		Number of samples (relative error $a = 0.10$ )
$p$	$r = 1$	$r = 9$	$r = 1$	$r = 9$
0.10	13,828	1536	3457	384
0.15	8707	967	2177	242
0.20	6146	683	1536	171
0.25	4609	512	1152	128
0.30	3585	398	896	100
0.35	2853	317	713	79
0.40	2305	256	576	64
0.45	1878	209	469	52
0.50	1536	171	384	43

The design process is illustrated in Figure 2. The purpose of investigating the transportation preference is to create a modal split model that reflects the characteristics of the new transportation system; therefore, the survey targets and number of samples were determined. The preferred expressions for choice, ranking, and rating were subsequently provided. The selection method is the most realistic SP expression method and is widely used in transportation; therefore, this method was used to conduct the survey. The choices for routes and transportation were assumed when applying SP techniques in the general transportation sector. The demands for routes and transportation were estimated by assuming routes and transportation with different travel times and costs, and the willingness to pay because of reduced travel time was estimated. Subsequently, the attribute variables and levels were determined. The variable that describes the selection alternative is known as the attribute variable. After determining the selection situation and the number of selection alternatives, the attribute variable that describes the selection alternative was set. In this study, the attribute variable was analyzed by organizing the travel time (outside, in-vehicle time, total travel time) and travel cost (car fare and operation cost). When the attribute variable and level were determined, the level of each variable was determined by the level change, considering the width of the change around the reference level. The experimental design is a technique that economically identifies the optimal conditions of a product by selecting several factors that affect the characteristics of transportation and conducting experiments to identify their relationships. In this case, a survey was conducted using 16 orthogonal arrangements and questionnaires with 14 two-level attribute values ( 31 ). In this study, the number of clusters was set to add a latent variable, and SEM was performed to establish relationships between variables. Statistical significance was confirmed using the BIC and LL.

Figure 2.

Design process with stated preference and latent class.

The latent class model is represented by the following equation. In this study, SP investigation was conducted, followed by LCA using Equation 2:

f (y_{i} | z_{i}^{COU}) = \sum_{x = 1}^{K} P (x | z_{i}^{COU}) Π_{t = 1}^{H} f (y_{ih} | x, z_{i}^{COU})

(2)

where x is the latent class; $f (y_{ih} | x, z_{i}^{COU})$ is a probability density function in which the effects of latent variables and covariates are present as latent classes under conditional conditions; $P (x | z_{i}^{COU})$ is the probability that each covariate belongs to a latent class in a specified conditional state; $f (y_{i} | z_{i}^{COU})$ is the probability density function of the observations based on the covariates ( 32 ). Furthermore, when applying the latent class model to a discrete selection model, it was assumed that the selection behavior was attributable to the selector’s latent behavior preference. These differences were associated with the lifestyle, attitudes, and transportation preferences. In addition, various responses associated with personal behavior were analyzed and defined as latent variables. In this case, if an individual represents transportation as a utility function for each class, it can be expressed as shown in Equation 3:

U_{in}^{s} = V^{s} (X_{in}, Z_{n}, β^{s}) + \in_{in}^{s}

(3)

where $V^{s}$ is the deterministic part of the utility function; $X_{in}$ is the vector of attribute of alternative i; $Z_{n}$ is the vector of the characteristics of an individual $n$ ; $β^{s}$ is the vector of the parameters; and $\in_{in}^{s}$ is the random component accounting for unobserved attributes and characteristics.

For example, the probability that an individual (n) belongs to a latent class (s) and selects transportation (i) is expressed as shown in Equation 4:

P_{n} (i | s) = \frac{e^{V^{s} (X_{\in}, Z_{n}, β^{s})}}{\sum_{j \in C_{s}} e^{V^{s} (X_{jn}, Z_{n}, β^{s})}}

(4)

Because the class to which an individual belongs is unknown, reverse inference is performed based on individual characteristics, as shown in Equation 5:

F_{ns} = f (Z_{n}, γ^{s}) + ζ_{ns}

(5)

where $F_{ns}$ is a latent continuous variable associated with the probability of belonging to class s, which can be understood as the “utility” belonging to one class; $Z_{n}$ is the vector of the characteristics of individual $n$ ; $γ^{s}$ is the vector of the parameters to be estimated; $ζ_{ns}$ is the probability of an individual $n$ belonging to a particular class $s$ .

Finally, the probability of an individual selecting an alternative (i) is expressed as shown in Equation 6 ( 33 ):

P_{n} (i) = \sum_{s \in S} P_{n} (i | S) P_{n} (S)

(6)

Therefore, the latent variable calculates the modal split and is useful for deriving a single group or type from multivariate category data. The components that represent latent class models as structural equation models include latent variables, measured variables, and measurement errors. Latent variables are difficult to observe directly, and in this study, they were described as propensities for traffic behavior or preference factors. Measured variables, such as responses to each item in the questionnaire, can be observed directly. Furthermore, they are used to indirectly measure the latent variables. “Error” refers to measurement error, and arrows indicate causality. In the case of K, the association between the two variables is shown, where a larger size indicates a higher association, as shown in Figure 3.

Figure 3.

Relationship between latent and measurement variables.

This can be expressed via SEM, CFA, and EFA. For CFA, the appropriateness of maintaining a factor coefficient that is not theoretically applicable to zero is assessed statistically. This does not account for all observed variables. The covariance between the latent variables and error covariance between the measured variables can be explained. However, in this study, a new latent variable was created; therefore, EFA was applied based on the characteristics of the latent variable describing all the observed variables. The EFA process is shown in Figure 4, where “factor” is a latent variable factor, x1–x9 are measured variables, and e1–e9 are error variance (unique variance).

Figure 4.

Exploratory factor analysis model.

Each latent variable was set to one for standardization. All measurement variables were assumed to exert an effect. It can be regarded as a realistic model in which the latent variable that describes the measurement variable already exists ( 34 ).

In addition, the three-step estimation method was used in the LCA employed in this study ( 35 ).

The Latent Class (LC) model was developed for a set of response variables. The latent class number was determined by comprehensively considering the feasibility of information compatibility, statistical verification ( $x^{2})$ , quality of classification, and interpretation ( 36 ).

The subjects were assigned to the latent class based on their posterior class membership probabilities. This involves estimating the most likely latent variable and determining the most likely group.

The association between assigned class membership and external variables was investigated using simple cross-tabulations or multiple logistic regression analysis.

Finally, the BIC and LL, which are typically used to evaluate the adequacy of the latent class model, were applied. This can be expressed as shown in Equation 7 ( 37 ):

BIC = - 2 \ln L + q [\ln (n)]

(7)

where $\ln L$ is the LL; $q$ is the number of parameters in the model; $n$ is the sample size.

Case Study

In the case study, a preference survey about transportation choices affected by the operation of the GTX was conducted to override the modal split model of the GTX. The GTX is constructed at a considerable depth, unlike the existing subway. It is a fast-traveling, out-of-vehicle, and time-consuming transportation. Users’ preferences or preference factors can affect the choice of the GTX. However, in general, the modal split model is built based on the travel time and transportation cost. Therefore, the modal split model of the GTX was estimated via LCA, which can reflect people’s preference factors and personalities. Therefore, the survey was designed by selecting the choice among SP expression methods. When applying the SP technique, it was primarily assumed that route selection and transportation selection situations were selected. The demand for routes and transportation were estimated by assuming routes or transportation with different travel times and travel costs, and the willingness to pay because of reduced travel time was estimated ( 27 ). An online survey was conducted on 1000 metropolitan railway users within and outside the GTX sphere of influence. Assuming a confidence level of 95%, a relative error of 5%, a population ratio of 0.35, nine responses, and a reserve rate of 10%, a total of 660 minimum samples and 330 samples per survey point were calculated. Accordingly, a survey was conducted using 1000 samples. Based on the residential location and GTX usage behavior of the users, the representative routes were established by classifying them into an inner circle (Gyeonggi–Gyeonggi, Seoul–Seoul) and an outer circle (Gyeonggi–Seoul), as shown in Figure 5.

Figure 5.

Selected survey points.

The SP survey combined attribute variables and levels via the design of experiments (DOE) to create a hypothetical situation. Therefore, the attribute variable of the survey that describes the effects of future environmental changes, such as the GTX, on metropolitan railways was composed of the travel time (out-of-vehicle time, in-vehicle time, and total travel time) and travel cost (operating cost). The results based on the inner and outer circles combined are presented in Table 2.

Table 2.

Results Based on Combination of Inner and Outer Circles

Transportation	In-vehicle time (min)	Out-vehicle time (min)	Total travel time (min)	Travel cost (won)
Inner circle
Auto	28	0	28	5040
Taxi	28	5	33	33,640
Bus	59	10	69	2800
Subway	61	15	76	1850
GTX	10	20	30	3800
Outer circle
Auto	66	0	66	10,315
Taxi	66	5	71	55,600
Bus	50	10	60	3100
Subway	74	10	84	2050
GTX	29	20	49	4400

Note: GTX = Great Train eXpress.

When the attribute variable and level were determined, the change in the level was assumed to occur near the reference level. As the level change of the attribute variable required the question presented to the respondent to include a realistic preference, realistic level values for the SP survey design had to be set. The DOE is a technique that economically identifies optimal conditions for targets by selecting various factors that affect their characteristics and conducting experiments to identify their relationships ( 27 ). Therefore, a survey was conducted using 16 questions with 14 values of two-level attributes (Table 3).

Table 3.

Orthogonal Array for Stated Preference Survey Questionnaires

Number	Auto		Bus			Taxi			Subway			GTX
	In-vehicle time	Travel cost	In-vehicletime	Out-vehicle time	Travel cost	In-vehicle time	Out-vehicle time	Travel cost	In-vehicle time	Out-vehicle time	Travel cost	In-vehicle time	Out-vehicle time	Travel cost
	1	2	3	4	5	6	7	8	9	10	11	12	13	14
1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	1	0	1	1	1	0	1	1	1	0
3	0	0	0	1	0	1	0	1	1	1	0	0	1	1
4	0	0	0	1	1	1	1	0	0	1	1	1	0	1
5	0	1	1	0	0	0	1	1	0	1	1	0	1	1
6	0	1	1	0	1	0	0	0	1	1	0	1	0	1
7	0	1	1	1	0	1	1	0	1	0	1	0	0	0
8	0	1	1	1	1	1	0	1	0	0	0	1	1	0
9	1	0	1	0	0	1	0	1	1	0	1	1	0	1
10	1	0	1	0	1	1	1	0	0	0	0	0	1	1
11	1	0	1	1	0	0	0	0	0	1	1	1	1	0
12	1	0	1	1	1	0	1	1	1	1	0	0	0	0
13	1	1	0	0	0	1	1	0	1	1	0	1	1	0
14	1	1	0	0	1	1	0	1	0	1	1	0	0	0
15	1	1	0	1	0	0	1	1	0	0	0	1	0	1
16	1	1	0	1	1	0	0	0	1	0	1	0	1	1

Note: GTX = Great Train eXpress.

In the SP survey, variables such as age and residence were considered. The age and mode characteristic variable values were applied to the model as consecutive values (Table 4). Subsequently, the BIC and LL were used to derive the number of latent variables. In general, to determine the number of latent classes, the information criteria were considered quantitative. Similarly, the significance and interpretation of the model parameters were considered qualitative. The BIC was used to determine the number of classes in the LCA, where three of the four classes were discovered to be the most appropriate. The GTX is characterized by being deeper than existing transportation. Although it can guarantee a short in-vehicle time, it may require a long out-of-vehicle time. In this survey, the questions listed in Table 4 were posed, and the latent class (i.e., latent variable) was obtained to distinguish them based on the user’s propensity. Accordingly, the latent class was classified into three categories, GTX travel property, GTX station property, and GTX transfer property, which reflect the meanings of preferring reduced total travel time, transportation access, and a minimum transfer, respectively, as shown in Table 5.

Table 4.

Personal and Transportation Characteristic Variables

Classify variable	Variable value
Personal characteristic variable
Gender	Male (1)/Female (−1)
Age	Age
Job	Have a job (1)/Have no job (−1)
Student	Yes (1)/No (−1)
Residential area	Seoul (1)/etc. (−1)
Transportation characteristic variable
Total travel cost	Won
Travel time	Minutes
Transfer time	Minutes
GTX, subway, taxi dummy	NA
Number of samples	n
In-vehicle time	Minutes
Transfer time (out-vehicle time)	Minutes

Note: GTX = Great Train eXpress; NA = not applicable.

Table 5.

Definition of Latent Variables and Related Questionnaires

No.	GTX latent question	Average preference	Latent class
1	It is more important to get there faster than any other transportation.	4.24	GTX travel property
2	If GTX travel time is short, it doesn’t matter if it takes a long time to get to the platform.	2.94
3	It is important to go quickly even if the price is higher.	3.51
4	If you can save money, it doesn’t matter if it takes a little longer.	3.01
5	It doesn’t matter if you’re deep underground as long as you have transportation to the platform.	3.84	GTX station property
6	Even if GTX travel time is short, it will not be used if it takes long to get to the platform.	3.18
7	If there is a system that connects directly to the underground platform, I will use GTX.	4.20
8	Even if it takes more time, I prefer transportation that arrives directly at the destination.	3.77	GTX transfer property
9	It doesn’t matter if you transfer with GTX several times to get to your destination quickly (ex: home–subway (transfer)–GTX ride–GTX (transfer)–subway–destination).	2.57
10	If there is no transfer discount with GTX, I will not use it.	3.83

Note: GTX = Great Train eXpress.

Results

The results shown in Table 6 were obtained by integrating the SP model and the latent variable with the values shown in the SP model for 1000 people. For each inner and outer circle alternative, the coefficients of the modal split model, t-value, and LL were estimated. In both the inner and outer circle situations, the users preferred auto more than the other systems. With regard to gender, a higher preference was obtained for males than females. With regard to age, users aged 30–40 years preferred the GTX. The analysis considering SP + latent revealed the preference by including the latent variable in the modal split model. The results of two analyses were compared by adding latent variables to the SP and SP model analyses. The LCA categorized the cluster into three groups to determine people’s preferences for the future introduction of the GTX. It was discovered that transfer minimization with an estimated value of approximately −0.01784 was important to the inner circle, whereas the station position with an estimated value of approximately −0.0262 was more important to the outer circle. This implies that the smaller the number of transfers, the better the accessibility to the station, and the higher the use of the GTX. Therefore, factors such as GTX transit time, accessibility, and transfer minimization will affect the usage of the GTX. In addition, the statistical significance of the two models was verified using LL, the BIC, and chi squared. The higher the LL, the lower the BIC, and the more statistically significant the model. Statistical estimates indicated LL values of approximately −15,859 and −17,234 for the SP models and −15,712 and −17,205 for the SP and latent variable models. The BIC values were approximately 1.986 and 2.158 for the SP model alone, and approximately 1.970 and 2.156 when the SP model contained a latent variable.

Table 6.

Results of Model Estimation

Category	SP model
	Inner circle		Outer circle
	Estimated value	t-Value	Estimated value	t-Value
Gender
Subway	−0.016	−1.53**	−0.015	−3.14*
GTX	0.01	1.45**	0.025	6.35**
Taxi	0.002	4.22**	0.001	3.78*
Bus	−0.007	−1.06*	−0.023	−10.54**
Age
Subway	−0.007	−2.44*	−0.006	−1.17*
GTX	−0.005	−3.48*	0.002	−0.14**
Taxi	0	−0.36**	0	−0.22*
Bus	−0.008	−1.09*	−0.004	−1.54*
Job
Subway	0.09	5.80**	0.151	7.02**
GTX	0.692	8.95**	0.407	11.34**
Taxi	0	0.77*	0.001	0.93*
Bus	0.015	4.95**	0.339	8.11**
Student
Subway	−0.074	−9.63**	−0.133	−5.15**
GTX	−0.61	−12.89**	−0.381	−10.50**
Taxi	0	−0.10**	−0.001	−0.44**
Bus	−0.007	−1.95**	−0.277	−6.72**
Residential area
Subway	0.018	0.72**	0.017	1.87**
GTX	−0.03	−2.20**	−0.057	−2.92**
Taxi	0	−0.30**	−0.001	−0.33**
Bus	−0.001	−1.68**	−0.011	−1.83**
Total travel cost	−0.000428	−29.26**	−0.000365	−27.942**
Travel time	−0.022201	−15.256**	−0.041097	−38.843**
Transfer time	−0.017992	−4.756**	−0.012796	−3.337**
GTX dummy	1.259548	15.857**	0.338272	2.935**
Subway dummy	−0.256440	−2.878**	−0.182898	−1.664**
Bus dummy	−1.056969	−14.155**	−0.258328	−2.556*
Taxi dummy	5.966824	16.423**	10.625253	21.429**
In-vehicle time value	3113		6754
Transfer time value	2523		2103
LL (log-likelihood)	−15,859.85		−17,234.91
BIC (Bayesian information criterion)	1.98672		2.15860
$x^{2}$ (Chi-squared)	1209.84883		2883.68203
Category	SP model + latent variable
	Inner circle		Outer circle
	Estimated value	t-Value	Estimated value	t-Value
Total travel cost	−0.000433	0.0000**	−0.000366	0.0000**
Travel time	−0.022336	0.0015**	−0.041171	0.0011**
Transfer time	−0.018226	0.0038**	−0.012825	0.0038**
GTX dummy	−0.983459	0.2097**	−0.455665	0.2109**
Subway dummy	−2.504525	0.2140**	−0.976300	0.2079**
Bus dummy	−1.060760	0.0749**	−0.261671	0.1011**
Taxi dummy	6.070560	0.36492**	10.645355	0.49620**
In-vehicle time value	3098		6755
Transfer time value	2528		2104
Latent variable
GTX travel property	0.66687	16.54	0.2723	0.035913
GTX station property	0.00801	0.22	−0.0262	0.033011
GTX transfer property	−0.01784	−0.49	−0.0142	0.032391
LL (log-likelihood)	−15,712.61		−17,205.47
BIC (Bayesian information criterion)	1.97013		2.15673
$x^{2}$ (Chi-squared)	1504.32629		2942.55354
Number of samples	1000

Note: GTX = Great Train eXpress; SP = stated preference.

P-value ≤ 0.15; **P-value ≤ 0.05.

Tables 7 and 8 show preferences for subway (with GTX), bus, auto, and taxi, for the inner circle and outer circle. When comparing the two situations, the GTX’s preference was higher in the outer circle than in the inner circle. This shows that although the travel cost and travel time differed in the two situations, the benefits of the GTX were more emphasized when traveling to different boundaries. Next, preference according to the basic state, SP model, and SP + latent variable was analyzed. As a result of analyzing the basic state, in the inner circle situation, the preferences were auto, bus, subway (with GTX), taxi. However, in the outer circle situation, the preferences were auto, subway (with GTX), bus, taxi. In addition, the same results were observed in the SP model. When comparing the basic state and SP model (SP + latent model), the mode change to subway (with GTX) increases because the suitability of the model improved. In the SP + latent (travel property), the auto preference decreased compared to the basic state, and subway (with GTX) preference increased. In addition, when comparing the GTX travel property and GTX station property, the transfer preference was lower. As a result, the preference has increased in the SP model and the SP + latent model including the GTX.

Table 7.

Preference for Each Model

Model	Category	Inner circle (A) (%)	Outer circle (B) (%)	Difference (B – A) (%)
Basic	Auto	54.43	53.31	−1.12
	Bus	19.99	19.30	−0.69
	Taxi	9.74	7.49	−2.25
	Subway (with GTX)	15.84	19.90	4.06
SP model	Auto	53.84	52.38	−1.46
	Bus	19.46	18.54	−0.92
	Taxi	9.41	7.05	−2.36
	Subway (with GTX)	17.29	22.03	4.74
SP + latent (GTX travel property)	Auto	52.17	50.63	−1.54
	Bus	18.96	18.02	−0.94
	Taxi	9.04	6.63	−2.41
	Subway (with GTX)	19.83	24.72	4.89
SP + latent (GTX station property)	Auto	53.36	51.88	−1.48
	Bus	19.11	18.19	−0.92
	Taxi	9.26	6.89	−2.37
	Subway (with GTX)	18.27	23.04	4.77
SP + latent (GTX transfer property)	Auto	53.58	52.11	−1.47
	Bus	19.26	18.33	−0.93
	Taxi	9.33	6.96	−2.37
	Subway (with GTX)	17.83	22.60	4.77

Note: GTX = Great Train eXpress; SP = stated preference.

Table 8.

Difference From Basic State

Model	Category	Inner circle difference (%)	Outer circle difference (%)
Basic—SP model	Auto	−0.59	−0.93
	Bus	−0.53	−0.76
	Taxi	−0.33	−0.44
	Subway (with GTX)	1.45	2.13
Basic—GTX travel property	Auto	−2.26	−2.68
	Bus	−1.03	−1.28
	Taxi	−0.70	−0.86
	Subway (with GTX)	3.99	4.82
Basic—GTX station property	Auto	−1.07	−1.43
	Bus	−0.88	−1.11
	Taxi	−0.48	−0.60
	Subway (with GTX)	2.43	3.14
Basic—GTX transfer property	Auto	−0.85	−1.20
	Bus	−0.73	−0.97
	Taxi	−0.41	−0.53
	Subway (with GTX)	1.99	2.70

Note: GTX = Great Train eXpress; SP = stated preference.

This shows that the degree of fit for the model was higher when the latent variable was applied. Therefore, the addition of the latent variable afforded higher effectiveness and improved the fit of the model. These results are similar to those of previous studies that explain people’s traffic and behavior with regard to the choice of mode of transportation. Moreover, it supports the conclusions of Wen et al. ( 14 ), Tawfik and Rakha ( 15 ), and Ben-Akiva et al. ( 5 ), which indicate that latent models are superior to other modal split models, as they reflect people’s preferences. The t-statistic values of each dummy variable were included in P-value ≤ 0.05 and P-value ≤ 0.15; therefore, including the latent variable was appropriate.

The modal split models of the GTX and other modes of transportation are expressed as follows (Equations 8 –12):

Utilit y_{auto} = β_{pr} \times T . Itime + γ_{pr} \times Mcost

(8)

\begin{matrix} Utilit y_{Taxi} & = α_{T} + β_{pri} \times T . Itime + β_{pro} \times T . Otime \\ + γ_{pr} \times Mcost \end{matrix}

(9)

\begin{matrix} Utilit y_{Bus} & = α_{B} + β_{pbi} \times T . Itime + β_{pbo} \times T . Otime \\ + γ_{pb} \times Mcost \end{matrix}

(10)

\begin{matrix} Utilit y_{Subway} & = α_{S} + β_{pbi} \times T . Itime + β_{pbo} \times T . Otime \\ + γ_{pb} \times Mcost \end{matrix}

(11)

\begin{matrix} Utilit y_{GTX} & = α_{G} + β_{pbi} \times T . Itime + β_{pbo} \times T . Otime \\ + γ_{pb} \times Mcost, \end{matrix}

(12)

where $T . Itime$ is the in-vehicle time (min); $T . Otime$ is the out-vehicle time (min); $β_{pri}$ is the parameter of private in-vehicle time; $β_{pro}$ is the parameter of private out-of-vehicle time; $β_{pbi}$ is the parameter of public transit in vehicle time; $β_{pbo}$ is the parameter of public transit out-of-vehicle time; $γ_{pb}$ is the parameter of public transit cost; $γ_{pr}$ is the parameter of the private cost.

Conclusion

In this study, the modal split model was redefined by adding latent variables, including preferences that have not been sufficiently explained by existing socio-economic and transportation attributes. People’s potential travel behavior significantly affects the mode choice ( 4 ). In other words, mode choice behaviors that cannot be sufficiently explained by individual socio-economic attributes and mode attributes can be explained using latent variables. In existing studies about mode choice, less realistic components in the current modal split model were rectified by adding a constant value to the coefficient ( 2 ). Therefore, in this study, practical components were added based on theory by including values that reflect people’s preference factors as constants. Consequently, the explanatory power was higher when a latent variable was added compared with the modal split model based on the existing SP model. This was confirmed via a case study, and models for subway and other new modal split models were derived based on the construction of the GTX in Seoul. A SP survey was conducted on 1000 potential users of the GTX, and factor analysis was performed for the LCA to define the latent variables. In particular, the latent variable was used to classify the cluster into three categories based on the BIC and LL values to determine people’s preferences for the future introduction of the GTX. Subsequently, the inner and outer circles were categorized based on the boundaries to derive dummy and estimated values. The LCA results showed that factors such as GTX transit time, access to station, and transfer minimization can affect the GTX demand and modal split models. In particular, transfer minimization was more important to the inner circle, whereas access to station was more important to the outer circle. Passenger traffic behavior was quantified effectively by applying a latent variable to the SP model. Furthermore, statistical analysis confirmed that a more accurate and reasonable analysis of mode choice behavior can be realized when the modal split model is improved by the incorporation of a latent variable. Therefore, it was discovered that LCA, which supplements the disadvantages of the SP model, is appropriate for analyzing the selection behavior for new transportation.

This study is useful for suggesting a model that can accurately reflect the characteristics and preferences of passengers by redefining the modal split model of the new mode, such as higher-speed rail. By performing Stated Preference Analysis (SPA) using latent variables, it is envisioned that strategies for the inner and outer circle situations will be established in the future, and that operational efficiency can be achieved by combining the GTX and subway. In addition, a more reasonable modal split model can be derived by considering the preferences of each user for the new transportation mode. The model used in this study is specific to Seoul. However, socio-economic indicators and transportation characteristics are different not only in Seoul but also in each country, so different surveys should be conducted to derive the parameters even if the same methodology is applied. A more realistic model can be constructed by reflecting the socio-economic indicators and preference characteristics of each country. In addition, it is envisioned that the accuracy of the model will be improved using more samples to reflect individual preferences and characteristics. Therefore, many samples are required, and in future studies, the accuracy of the model can be improved by adding respondents to various transportation users.

Footnotes

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: D. Ku and M. Choi; data collection: H. Oh and S. Lee; analysis and interpretation of results: D. Ku, S. Na, M. Choi, H. Oh, M. Choi, H. Oh; draft manuscript preparation: D. Ku, H. Oh, S. Lee. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This paper was financially supported by the Korea Ministry of Land, Infrastructure, and Transport (MOLIT) as an Innovative Talent Education Program for Smart City, the Basic Study, and Interdisciplinary R&D Foundation Fund of the University of Seoul (2019) and the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2017R1D1A1B06032857).

ORCID iDs

Donggyun Ku

Minje Choi

Haram Oh

Sungyoung Na

Seungjae Lee

References

Yang

Wang

Travel Mode Choice Based on Latent Variable Enriched Discrete Choice Model. Proc., International Conference on Transportation Engineering, Southwest Jiaotong University, Chengdu, China, American Society of Civil Engineers, Reston, VA, 2009, pp. 4372–4377.

Chen

Mode Choice Model for Public Transport With Categorized Latent Variables. Mathematical Problems in Engineering, Vol. 2017, 2017, pp. 1–8. https://doi.org/10.1155/2017/7861945.

Galdames

Tudela

Carrasco

J. A.

Exploring the Role of Psychological Factors in Mode Choice Models by a Latent Variables Approach. Transportation Research Record, 2011. 2230: 68–74.

Kim

J. H.

Cheong

J. H.

Sohn

K. M.

Combined RP/SP Model With Latent Variables. Journal of Korean Society of Transportation, Vol. 28, No. 4, 2010, pp. 119–128.

Ben-Akiva

Walker

Bernardino

A. T.

Gopinath

D. A.

Morikawa

Polydoropoulou

Integration of Choice and Latent Variable Models. In In Perpetual Motion: Travel Behavior Research Opportunities and Application Challenges ( H. S.

Mahmassani

, ed.), Elsevier Science, Amsterdam, The Netherlands, 2002, pp. 431–470.

Crossrail. Economic Progress - Maximising Competitiveness & Productivity of the Economy. https://www.crossrail.co.uk/benefits/economic-sustainability/.

Dodgson

Gann

MacAulay

Davies

Innovation Strategy in New Transportation Systems: The Case of Crossrail. Transportation Research Part A: Policy and Practice, Vol. 77, 2015, pp. 261–275.

Dent

Hawa

DeWeese

Wasfi

Kestens

El-Geneidy

Market-Segmentation Study of Future and Potential Users of the New Réseau Express Métropolitain Light Rail in Montreal, Canada. Transportation Research Record: Journal of the Transportation Research Board, 2021. 2675: 1043–1054.

Yang

C. H.

Son

U. Y.

Estimation of Transfer Related Values of Seoul Subway Users Using Stated Preference and Revealed Preference Analyses. Journal of Korean Society of Transportation, Vol. 18, No. 4, 2000, pp. 19–30.

10.

Lee

K. T.

Developing Transportation Mode Choice Models Reflecting Behavioral Variation by Travel Distance. Republic of Korea, Hanyang University, Seoul, 2016.

11.

Hur

T. Y.

Eom

J. K.

Park

M. S.

Comparison of Rail Mode Share by District between Before and After KTX Opening in Seoul. Journal of the Korean Data Analysis Society, Vol. 14, No. 5, 2012, pp. 2451–2461.

12.

Madanat

S. M.

Yang

C. Y.

Yen

Y. M.

Analysis of Stated Route Diversion Intentions Under Advanced Traveler Information Systems Using Latent Variable Modeling. Transportation Research Record: Journal of the Transportation Research Board, 1995. 1485: 10–17.

13.

Wen

C. H.

Lai

S. C.

Latent Class Models of International Air Carrier Choice. Transportation Research Part E: Logistics and Transportation Review, Vol. 46, No. 2, 2010, pp. 211–221.

14.

Wen

C. H.

Wang

W. C.

Latent Class Nested Logit Model for Analyzing High-Speed Rail Access Mode Choice. Transportation Research Part E: Logistics and Transportation Review, Vol. 48, No. 2, 2012, pp. 545–554.

15.

Tawfik

A. M.

Rakha

H. A.

Latent Class Choice Model of Heterogeneous Drivers’ Route Choice Behavior Based on Learning in a Real-World Experiment. Transportation Research Record: Journal of the Transportation Research Board, 2013. 2334: 84–94.

16.

Prato

C. G.

Bekhor

Pronello

Methodology for Exploratory Analysis of Latent Factors Influencing Drivers’ Behavior. Transportation Research Record: Journal of the Transportation Research Board, 2005. 1926: 115–125.

17.

Walker

J. L.

Extended Discrete Choice Models: Integrated Framework, Flexible Error Structures, and Latent Variables. Doctoral dissertation. Massachusetts Institute of Technology, Cambridge, 2001.

18.

Afghari

A. P.

Haque

M. M.

Washington

Smyth

Bayesian Latent Class Safety Performance Function for Identifying Motor Vehicle Crash Black Spots. Transportation Research Record: Journal of the Transportation Research Board, 2016. 2601: 90–98.

19.

Cerwick

D. M.

Gkritza

Shaheed

M. S.

Hans

A Comparison of the Mixed Logit and Latent Class Methods for Crash Severity Analysis. Analytic Methods in Accident Research, Vol. 3, 2014, pp. 11–27.

20.

Román

Arencibia

A. I.

Feo-Valero

A Latent Class Model With Attribute Cut-Offs to Analyze Modal Choice for Freight Transport. Transportation Research Part A: Policy and Practice, Vol. 102, 2017, pp. 212–227.

21.

Kim

H. C.

Nicholson

Kusumastuti

Analysing Freight Shippers’ Mode Choice Preference Heterogeneity Using Latent Class Modelling. Transportation Research Procedia, Vol. 25, 2017, pp. 1109–1125.

22.

Hurtubia

Nguyen

M. H.

Glerum

Bierlaire

Integrating Psychometric Indicators in Latent Class Choice Models. Transportation Research Part A: Policy and Practice, Vol. 64, 2014, pp. 135–146.

23.

Kim

Pant

Yamashita

Measuring Influence of Accessibility on Accident Severity With Structural Equation Modeling. Transportation Research Record: Journal of the Transportation Research Board, 2011. 2236: 1–10.

24.

Wen

C. H.

Lan

L. W.

Cheng

H. L.

Structural Equation Modeling to Determine Passenger Loyalty Toward Intercity Bus Services. Transportation Research Record: Journal of the Transportation Research Board, 2005. 1927: 249–255.

25.

Outwater

M. L.

Castleberry

Shiftan

Ben-Akiva

Shuang Zhou

Kuppam

Attitudinal Market Segmentation Approach to Mode Choice and Ridership Forecasting: Structural Equation Modeling. Transportation Research Record: Journal of the Transportation Research Board, 2003. 1854: 32–42.

26.

Wang

Qin

Use of Structural Equation Modeling to Measure Severity of Single-Vehicle Crashes. Transportation Research Record: Journal of the Transportation Research Board, 2014. 2432: 17–25.

27.

Spada

Tucci

F. A.

Ummarino

Ciavarella

P. P.

Calà

Troiano

Caputo

, et al. Structural Equation Modeling to Shed Light on the Controversial Role of Climate on the Spread of SARS-CoV-2. Scientific Reports, Vol. 11, No. 1, 2021, pp. 1–11.

28.

Al-Mahameed

F. J.

Qin

Schneider

R. J.

Shaon

M. R. R.

Analyzing Pedestrian and Bicyclist Crashes at the Corridor Level: Structural Equation Modeling Approach. Transportation Research Record: Journal of the Transportation Research Board, 2019. 2673: 308–318.

29.

Fowkes

Wardman

The Design of Stated Preference Travel Choice Experiments: With Special Reference to Interpersonal Taste Variations. Journal of Transport Economics and Policy, Vol. 22, No. 1, 1988, pp. 27–44.

30.

Bradley

M. A.

Kroes

E. P.

Forecasting Issues in Stated Preference Survey Research. Selected Readings in Transport Survey Methodology. Proc., 3rd International Conference on Survey Methods in Transportation, January 5–7, 1990, Washington, D.C., Eucalyptus Press, New South Wales, Australia, 1992, pp. 89–107.

31.

Kim

K. S.

Cho

H. J.

SP Survey Design and Analysis Methodology. Boseonggak, Seoul, 2006.

32.

Bae

Y. G.

Jeong

J. H.

Kim

H. J.

Latent Class Analysis for Mode Choice Behavior. Journal of Korean Society of Transportation, Vol. 28, No. 3, 2010, pp. 99–107.

33.

S. Y.

Mode Choice Models Based on Behavioral Economics. Dissertation. Republic of Korea, University of Seoul, 2021.

34.

Brown

T. A.

Confirmatory Factor Analysis for Applied Research. The Guilford Press, New York, NY, 2006.

35.

Vermunt

J. K.

Latent Class Modeling With Covariates: Two Improved Three-Step Approaches. Political Analysis, Vol. 18, No. 4, 2010, pp. 450–469.

36.

Lanza

S. T.

Collins

L. M.

A New SAS Procedure for Latent Transition Analysis: Transitions in Dating and Sexual Risk Behavior. Developmental Psychology, Vol. 44, No. 2, 2008, p. 446.

37.

Kaplan

Keller

A Note on Cluster Effects in Latent Class Analysis. Structural Equation Modeling: A Multidisciplinary Journal, Vol. 18, No. 4, 2011, pp. 525–536.