Abstract
Contemporary models consider choice sets to be either fully deterministic or fully probabilistic. Deterministic choice set models do not account for stochasticity in the choice set formation, whereas probabilistic choice set models fail to recognize that exclusion and inclusion can be deterministic for some alternatives and individuals and yet random for others. A more general scenario is, therefore, where some alternatives are deterministically included or excluded and others probabilistically included. This paper proposes a richer framework that combines the features of both deterministic and probabilistic choice set models and explicitly allows an alternative to be deterministically included, deterministically excluded, or probabilistically considered in the choice set. This framework is better than the conventional models in four aspects: (a) the factors influencing consideration type are explicitly and parametrically analyzed instead of assumption as 0 or 1; (b) the specification can disentangle factors that affect the inclusion outcome from the type of consideration; and (c) the specification also permits differential sensitivity to factors in conditional choice probability among those who consider an alternative deterministically versus probabilistically. The partially probabilistic choice set model, a special case of the proposed generalized framework, developed using empirical data collected from working commuters in Chennai city, is benchmarked against the fully probabilistic choice set models. The results show that the former had improved goodness-of-fit, realistic consideration probability estimates, and better predictability of mode shares than the latter. Relevant policies have been evaluated by identifying the appropriate target segments at both the consideration and choice stages using the proposed model.
Keywords
Discrete choice models have been extensively used to analyze choice behavior for many decades ( 1 ). A large body of literature on choice set formation for discrete choice models exists. Broadly, choice sets are represented deterministically or probabilistically in the literature. In the former, an alternative is classified as either deterministically included or excluded from the choice set; the choice set is, therefore, assumed to be fully observable (2–7). However, the inclusion of an alternative may depend on availability, awareness, and contextual factors that may be unobserved and/or involve stochastic components. To address this concern, probabilistic choice set models assume that the inclusion or exclusion of an alternative in the choice set is latent and not observable by the analyst. Here, the consideration of an alternative in the choice set is modeled probabilistically (8–15). Such fully latent probabilistic choice set models have found their application in various choice constructs ( 9 , 15–20). Both deterministic and fully probabilistic choice set models in the literature impose two strong restrictions simultaneously: identical or uniform treatment of consideration (purely deterministic or probabilistic) across all alternatives for an individual, and homogeneous treatment of consideration of each alternative across individuals. This paper proposes a richer framework to relax these restrictive assumptions.
This paper presents a richer modeling framework that allows a partially probabilistic choice set. Including an alternative in a choice set is a complex decision as it depends on the availability, awareness, feasibility, extent/degree of its consideration, and so forth. Therefore, the choice set may not be fully observed or fully latent but may only be partially revealed to the analyst. The proposed framework combines the features of both deterministic and probabilistic choice set models and explicitly allows an alternative to be deterministically included, deterministically excluded, or probabilistically considered in the choice set.
This research is motivated by several considerations. Deterministic choice set models do not account for stochasticity in choice set formation. On the other hand, probabilistic choice set models fail to recognize that exclusion or inclusion can be deterministic for some alternatives and individuals and yet random for others. Therefore, a more general scenario is one where some alternatives are deterministically included or excluded and others probabilistically included. Also, this treatment of deterministic versus probabilistic could vary across respondents. Fully probabilistic choice set models do not offer a mechanism to distinguish between two plausible motivations leading to exclusion, for example, bicycle may be deterministically excluded by some individuals because it is unavailable to them, whereas, for others, it may be probabilistically excluded because the distance of a work trip is too long (say more than 20 km) making it inconvenient. There is also a need to differentiate between two dissimilar processes leading to inclusion: fully or always included versus probabilistically or partially included. For instance, even when a vehicle is available in a household, it might be exclusively or always available for some individuals and therefore deterministically included. In contrast, it might be shared among multiple workers in another household and only partially or probabilistically available in such cases.
It is necessary to understand, articulate, and define the treatment of alternatives in the decision maker’s choice set. Should these alternatives be specified deterministically (included/excluded) or probabilistically considered? Further, neither the deterministic choice set models nor the fully probabilistic choice set models explicitly account for the “deterministic inclusion” of alternatives in the choice set based on empirical observations. It is, therefore, necessary to develop a framework that acknowledges the potential difference between deterministic and probabilistic consideration of alternatives in the choice set. The following objectives are pursued in this study:
to propose and estimate a partially probabilistic choice set model which accounts for deterministic and probabilistic consideration of alternatives in the choice set;
to understand and highlight the behavioral difference in the sensitivities between deterministic and probabilistic consideration of alternatives in the choice set; and
to apply the proposed model to differentiate between factors that affect choice and consideration of alternatives in the choice set and illustrate the effect of selected policies in each stage.
Literature Review
Many mode choice models assume the choice set to be known a priori and therefore deterministic (2–7). Since these studies assume the choice set to be given, their estimation process is relatively straightforward. However, when choice sets are not completely known, they need to be explicitly generated for such choice problems. Consideration refers to the stage in which the decision maker decides to either include an alternative or exclude an alternative. The choice set is generated after consideration of alternatives. The next stage in which the decision maker chooses an alternative from the choice set is the choice stage. Manski’s ( 8 ) two-stage theoretical framework was one of the seminal works involving explicit and probabilistic choice set generation, where the probability that decision maker i chooses alternative j is given by
where
where
The probabilistic consideration may not be solely motivated by an analyst’s lack of knowledge. It could also be motivated by random variation in the extent of consideration of some alternatives based on context (e.g., dynamic vehicle unavailability, perception of walking distance, weather). Furthermore, despite the best efforts of the analyst, there might be insufficient data to determine whether they are always included in the choice set. The efforts made to construct the choice set as perceived by the decision maker and the assumptions made en route are discussed in sufficient detail in some studies ( 15 , 17 , 18 ). These studies depend on stated choice experiments to gather information about the perceived availability of different modes in the choice set. The probability of considering these stated choice sets is estimated using socioeconomic and trip-related characteristics for revealed preference data, making the choice set probabilistic but not fully latent ( 15 ). Some of these studies delineate the availability and consideration of alternatives in the choice set ( 17 ). They adopt alternative specifications to define the availability and consideration of each alternative in the choice set, which gives rise to several empirical model structures. The model structure proposed by Capurso et al. ( 18 ) also uses additional information with regard to the perceived availability of alternatives in the choice set. Still, it imposes restrictive assumptions with regard to consideration of certain alternatives in the choice set, for example, faster and expensive modes are always considered. While some of these studies acknowledge that choice sets could be partially deterministic and partially probabilistic ( 17 , 18 ), they have not attempted to behaviorally differentiate between deterministic and probabilistic inclusion/exclusion of alternatives in the choice set. Nevertheless, it is important to understand that the choice behavior, influential factors, and the sensitivities to these factors could be significantly different between deterministic exclusion or inclusion and probabilistic exclusion or inclusion. It is plausible that the implication of a policy that attempts to improve the chances of inclusion of bus in the choice set may hardly affect the consideration of both the extreme groups: deterministically excluded and deterministically included segments. Therefore, the policy to improve inclusion may be expected to influence only the group which considers bus in their choice set probabilistically. Although the policy implications and the behavioral inferences for these segments are potentially different, the fully probabilistic choice set models do not acknowledge and capture these differences. To overcome these limitations, in this study, the choice set is not assumed to be fully latent but “partially probabilistic” and partially latent. As mentioned earlier, both deterministic and probabilistic choice set models noted above make two assumptions. The type of consideration (deterministic versus probabilistic) is the same across all alternatives for a given individual. Also, the type of consideration of a given alternative is the same across all individuals. However, the type of consideration of alternatives could vary across each alternative in the choice set and/or across individuals. Therefore, it is necessary to account for the non-uniform type of consideration of alternatives in the choice set and the heterogeneous type of consideration across individuals.
The consequences of these assumptions are twofold when the choice set is partially latent. First, the consideration and choice stage probabilities are confounded. In the fully probabilistic choice set models, the consideration probability of deterministically excluded alternatives is overestimated. However, this overestimation could lead to bias in the conditional choice probability estimation because of joint estimation of consideration and conditional choice stages. Similarly, the consideration probabilities of deterministically included alternatives are underestimated and confounded with consideration choice probability estimates. A second consequence is the bias that arises by neglecting heterogeneity in conditional choice probability between deterministic and probabilistic inclusion segments. Thus, the choice of consideration of an alternative in the choice set—deterministic versus probabilistic—and the conditional choice to include or exclude an alternative are both confounded in the fully probabilistic choice set models. Therefore, in the absence of a valid model specification that addresses the potential differences between an alternative’s deterministic and probabilistic exclusion or inclusion, the choice set needs to be modeled as partially probabilistic rather than fully probabilistic. So, a framework that can allow for variation in the treatment or consideration of alternatives in the choice set needs to be developed. A more robust and behaviorally realistic representation allowing partial consideration of alternatives is, therefore, needed.
Model Formulation and Specification
This section discusses the formulation of a generalized framework that allows partially probabilistic consideration of alternatives in the choice set. Conventional models including fixed, deterministic, and fully probabilistic choice set models are special cases of this framework.
Likelihood Formulation of Consideration Type, Inclusion Outcome, and Choice
The consideration decision for individual i and alternative j is assumed to be composed of two separate but related binary outcomes:
the nature or type of consideration of an alternative is denoted by (
the inclusion decision in the choice set (
Let
Accordingly,
where
Existing models assume that
Let
where
Effectively, three types of consideration outcome are possible for any alternative j,
The nature of consideration
Let
The probability of a vector of type of consideration of alternatives in the choice set,
where
assuming independence in the type of consideration of all the alternatives in the choice set.
Let
Let
Note that on probabilistic consideration, the above condition probability is a function of
Then, the overall choice probability can be expressed based on the conditional probabilities above as follows:
With suitable specification for conditional choice probabilities, the log-likelihood can be expressed as,
where
= 0, if not.
The maximum likelihood estimation technique can be used to estimate the coefficients.
Special Cases Arising From the Proposed Framework
Equation 3 represents a more general framework from which several existing deterministic and probabilistic choice models can be derived as special cases.
Case i: fixed and deterministic choice set model arises if
Case ii: deterministic unavailability model is obtained if
Case iii: fully probabilistic choice set model occurs when
The first two cases assume that the nature of consideration is deterministic for all individuals, but the consideration outcome is heterogeneous in case ii, unlike case i. In case iii, the consideration of all alternatives is probabilistic for all individuals and is consistent with a fully latent choice set assumption.
These models impose the following restrictions simultaneously:
nature or type of consideration is identical across all alternatives for a given individual; and
consideration type is identical across individuals for a given alternative.
However, the proposed generalized framework (Equations 1–3) allows the type of consideration to be latent and, therefore, a random variable across observations and alternatives. Further, the decision to include/exclude an alternative in the choice set is assumed to be conditional on the type of consideration permitting different choice behaviors across deterministic and probabilistic inclusion segments.
The proposed model enables a richer representation of the choice set by relaxing both assumptions. Thus, this model offers very high flexibility in a choice set. Some individuals may choose from the fixed universal choice set, some from a partial subset deterministically, others from a fully probabilistic choice set, and yet others from partially latent choice sets.
The generalized framework is richer and is behaviorally more realistic in four respects. First, it has the additional ability to differentiate those segments based on the type of consideration, deterministic (
Empirical Application
Work-Trip Mode Choice Data description
Empirical data were collected from 872 workers in Chennai city ( 21 ) through face-to-face interviews to estimate the models. The sample mean test statistics of household income (Rs 15,527 with 99% confidence interval [CI] of Rs 14,561–Rs 16,493), age (36.9 years [CI of 35.9–38.01]), household size (4.37 [CI of 4.13–4.61]), number of motorized vehicles in the household (1.31 [CI of 1.21–1.40]) was reasonably close to the population measures: household income of Rs 14,500, aged 38, household size of 4.51, and 1.38 motorized vehicles in the household ( 22 ). So, the data are reasonably representative of the population. The mode shares for work trips were motorized two-wheeler (42%), car (6%), bus (20.5%), train (16.6%), intermediate public transport (IPT) modes such as auto-rickshaw (2.2%) (an auto-rickshaw is a motorized version of the pulled rickshaw or cycle-rickshaw), shared autos (1.4%), company bus (5.6%), and non-motorized modes (5.7%) respectively.
Data were collected with regard to age, family income, personal vehicle ownership, work details such as workdays/week, work duration, distance to the workplace, flexible work schedule, alternative mode use details in the last three months for various purposes. Modal attributes such as distance to the nearest bus stop and railway station, availability of direct bus service to the workplace, and subjective ratings about travel experiences in different modes were also obtained. Based on vehicle ownership and driving knowledge, commuters were classified into three categories: choice riders (those with driving knowledge and exclusive access to a personal motorized vehicle); semi-captive (those who own a vehicle but do not have exclusive access to a motorized vehicle); and captive (to non-personal vehicle) if they do not own personal a vehicle or have driving knowledge.
Joint Classification of the Type of Consideration (z) and the Inclusion/Exclusion Decisions (y) of Alternatives for Work-Trip Mode Choice Data
This section presents simple empirical rules to classify the type of consideration of different alternatives as deterministic exclusion, probabilistic consideration, and deterministic inclusion.
Motorized personal vehicles such as two-wheelers and cars are classified as being deterministically excluded or included or probabilistically considered based on the: (1) availability of vehicles in the household; (2) exclusive availability to the worker; and (3) driving knowledge. The motorized vehicle(s) are assumed to be deterministically excluded if they are unavailable (z = 0, y = 0) and deterministically included if they are exclusively available to the worker who has driving knowledge (z = 0, y = 1). Suppose the household has vehicles, but the worker does not have driving knowledge or a vehicle is not always available to the worker, the consideration is assumed to be probabilistic (z = 1). In such cases, the vehicles may be included in the choice set but not fully considered.
The type of consideration of bus is classified as deterministic or probabilistic based on the access distance at both ends. For short access distances (<750 m at each end), bus is assumed to be deterministically considered (z = 0, y = 1) only for the commuters who were captive by vehicle ownership (who did not own any vehicle), whereas, for longer bus stop access distances, the degree of consideration is assumed probabilistic (z = 1, y = 0 or 1) as the perception of walkability can vary depending on the age, health, and gender of the respondent. This limit of 0.75 km is determined empirically by an exploratory analysis as the share of workers who considered bus (used it at least twice in three months) decreased drastically beyond 750 m. Further, it is assumed that whenever a bus stop is within walking distance at both ends, it is deterministically included in the choice set given its wide network coverage and low cost.
Two factors are used to classify train consideration into deterministic or probabilistic segments: (1) access distance; and (2) work distance. It is observed that for very short distances (<2 km), train is rarely used as the out-of-vehicle travel time and cost/km (flat fare of Rs. 5 up to 20 km) might be too large compared with other modes. In this case, train is assumed to be deterministically excluded (z = 0, y = 0). On the other hand, depending on access distance, and the difficulty in reaching the railway stations (use of stairs in most cases) and delays associated with buying tickets at stations, and discomfort with waiting at railway stations, some segments would include train only to a partial extent in their choice set. Thus, for longer work distances, it is assumed that train consideration is probabilistic (z = 1, y = 0, or 1). The probabilistic rather than deterministic inclusion is also justified by the limited availability and length of the train network (compared with the bus network). This threshold of distances <2 km for deterministically excluding train from the choice set is not generalizable for other data sets and scenarios. However, in this particular data set, it is observed that the least distance between any two stations in both the Mass Rapid Transit System (MRTS) and suburban rail systems in the city is 2 km. It is also observed that the shortest work distance for which train was chosen for any trip purpose (other than work) at least twice in the last three months was 2.5 km. Further, the shortest distance for which train was chosen for a work trip was 4 km. Therefore, 2 km was thought as a safe threshold to exclude train from the choice set since one would end up traveling effectively much more than 2 km (last mile egress component of the trip) by train to reach a workplace which is less than 2 km from home.
On the other hand, the consideration of IPT modes, shared auto, company bus, and auto-rickshaw is assumed to be probabilistic (z = 1, y = 0 or 1) for two reasons. The spatial or temporal availability of a company bus and shared auto is not known deterministically to the analyst and is, therefore, treated as a random variable. Although auto-rickshaw is commonly available, because of operational issues such as fare haggling, service refusal, and non-compliance with regulatory tariff by drivers, there may be a refusal by either passenger or driver. Therefore, its consideration is modeled as being probabilistic.
The type of consideration for non-motorized modes is classified based on the work distance for both walk and bicycle. These modes are assumed to be excluded deterministically for very long commute distances because of travel time or feasibility considerations (z = 0, y = 0). Based on the maximum distance traveled using these modes from empirical analysis, 5-km and 8-km thresholds were obtained, respectively. For shorter trip lengths, these modes are assumed to be probabilistically considered (because of the availability of footpaths, perception on walking effort, safety and security, or bicycle infrastructure, which are unobserved). If the bicycle is unavailable, it is deterministically excluded from the choice set.
With this classification, the consideration of a mode could vary across individuals, and, for some individuals, some alternatives are considered deterministically and others probabilistically. Thus, this represents one instance or special case of the generalized partially probabilistic and heterogeneous consideration model above.
Utility Specification for Partially Probabilistic and Heterogeneous Consideration Type
The utility specification and probability computations for the above classification are discussed next. This model is compared against a fully probabilistic choice set model.
Because of the classification above, the nature of consideration z, and the binary inclusion decision can be combined to form three outcomes for each mode: deterministically excluded (z = 0, y = 0); deterministically included (z = 0, y = 1); or probabilistically considered (z = 1). To simplify the analysis and presentation, p(y,z) is taken as 0, 1, and
For the probabilistic consideration case, the binary inclusion outcome probabilities are assumed to be based on an underlying utility specification:
where
Further, it is assumed that the set of factors affecting the utility of the segments, which include alternative j deterministically and probabilistically, could be different.
where
For this case, the joint likelihood (Equation 8) is simplified as:
Model Results
Comparison of the Fully Probabilistic (M1) and the Proposed Partially Probabilistic (M2) Choice Set Models
This section compares the partially probabilistic choice set model (M2) with the fully probabilistic choice set model (M1) in predictive ability, especially with regard to consideration of alternatives in the choice set. The Akaike information criterion (AIC, 1,920.16 [M1] versus 1,777.55 [M2]) and Bayesian information criterion (BIC, 2,197.53 [M1] versus 2,070.23 [M2]) are smaller for the partially probabilistic choice set model than the fully probabilistic one. Since the choice set of individuals is different in the two models (M1 and M2), the AIC, BIC, or log-likelihood values of these two different models cannot be directly compared to investigate the model superiority. We use the Akaike likelihood ratio index test to compare these non-nested models. The test results show that the M2 is statistically superior to M1. Therefore, the partially probabilistic choice set models are statistically superior to the fully probabilistic choice set models. In other words, partial and probabilistic representation of the choice sets is better than fully probabilistic representation of the choice sets. The test statistic, that is,
critical value of 0.001 at the 95% confidence level. (Here
Comparison of Consideration Probability Estimates of Fully Probabilistic and Partially Probabilistic Choice Set Models
The consideration of alternatives is classified as deterministically excluded, probabilistically considered, or deterministically included, as noted already. The consideration probabilities (denoted as Pc1 for model M1 and P c2 for model M2) for each segment are evaluated and compared across the two models. Specifically, t-tests are conducted to investigate: (1) whether the mean consideration probabilities for various alternatives are significantly different from 0 (Hypothesis H0: Pc1 = 0 versus Ha: Pc1
The segments shown in Table 1 are the classifications adopted in M2 (as already defined above). The first two columns in the table show the estimated average consideration probability values of M1 for segments that deterministically exclude and deterministically include each of the respective alternatives in M2. The corresponding values in M2 are 0 and 1 and are, therefore, not mentioned explicitly. The deterministic exclusion was defined only for personal vehicles, train, and non-motorized modes but not for other alternatives in M2. Deterministic inclusion conditions were defined only for personal vehicles and bus but not for other alternatives in M2. The third column shows the estimated average consideration probability in both M1 and M2 for all alternatives when they are probabilistically considered in M2. These test results suggest that the consideration probability estimates of fully probabilistic choice set models (M1) are biased on account of:
the consideration probabilities in M1 are significantly different from 0 for those commuters who deterministically exclude two-wheeler, car, train, and non-motorized modes in M2;
the consideration probabilities in M1 are significantly different from 1 for those commuters who deterministically include two-wheeler, car, and bus in M2; and
the consideration probability estimates in M1 are significantly different from the consideration probability estimates of M2 in the probabilistically considered segments for all alternatives.
Mean Consideration Probabilities for Both Fully Probabilistic and Partially Probabilistic Choice Set Models and t-Test Results
Note: NMT = Non-motorized Transport; NA = not available.
Behavioral Insights/Inferences in the Proposed Partially Probabilistic Choice Set Models Which are Different From the Fully Probabilistic Choice Set Models
This section attempts to highlight the behavioral effects, which are: (a) observed in the fully probabilistic choice set model (M1) but absent in the partially probabilistic choice set model (M2); (b) observed in M2 but not in M1; and (c) significantly different between M1 and M2. The estimation coefficients for the consideration stage from both models are shown in Table 2. While tests such as t-difference tests (
23
) could be used to compare coefficients of nested models or sub-segments within a model, we believe (to the best of our knowledge) that there is no direct universally accepted statistical test to compare the coefficients of non-nested models. Therefore, both the cases
Comparison of Fully (M1) and Partially Probabilistic Choice Set (M2) Models, Consideration Stage
Note: 2w = two-wheeler; NMT = Non-motorized Transport.
Effects observed in M1 but not in M2.
Effects observed in M2 but not in M1.
Evident differences in the sensitivities of M1 and M2.
Significant at 95% confidence level (rest are significant at 90% confidence level).
The effect of owning car alone on consideration of car is significant in M2 but not in M1 since M2 delineates probabilistic consideration (55% of this group) and deterministic inclusion (rest 45%) that gets normalized in M1. The semi-captive commuters were less likely to consider car than the choice rider segment in both models. However, the marginal preference of choice riders in considering car in the choice set in relation to the semi-captive commutes is lower in M2 (−0.50) than M1 (−2.40). This is probably because the negative preference of commuters who deterministically exclude car from their choice set is included in addition to the preference of the probabilistically considering group in M1.
On the other hand, M2 reflects the preference of only the probabilistically considered segment. The semi-captive commuters who deterministically exclude it have a strong negative effect, while those who deterministically include it have a strong affinity to consider car.
The higher-income group is more likely to consider car in M1 (2.09) than the lower and middle-income groups. This effect was absent in M2. This could be a result of the relatively small sample size of the higher-income group commuters who include car probabilistically in their choice set.
Two-wheeler owners were less likely to consider car in their choice set than others in M1, but not in M2. This effect was absent in M2 since 80% of these commuters who own two-wheelers do not own car and deterministically exclude it from their choice set. Therefore, the lower preference of two-wheeler owners in considering car (in M1) is masking the deterministic exclusion of car by these commuters.
In M1, the difference in consideration of bus was not significantly different across captive, semi-captive, or choice segments. However, in M2, these segments were found to have a differential preference in considering bus in their choice set (semi-captive −0.90, choice segment −1.22). Car ownership has a negative influence on the consideration of bus. Commuters who own only a car (−0.2) and those with both two-wheelers and a car (−0.4) were less likely to consider bus than those who own only two-wheelers or do not own any motorized vehicle. Further, the semi-captive (−0.9) and choice segments (−1.22) were less likely to consider bus in their choice set than the captive commuters given a greater degree of access to personal vehicles. In addition, those who own only a two-wheeler (−0.1) and who own both two-wheeler and a car (−0.4) were less likely to consider train than those who did not own motorized vehicles, which highlighted greater affordability and valuation of personal vehicle modes. Further, the semi-captive (−0.73) and choice segment (−0.9) groups have an even lower tendency to consider train in their choice set. This trend is similar in consideration of both train and bus and could reflect the increased accessibility to other modes, including personal vehicles among the non-captive commuters. Since the chances of considering train are relatively high among commuters who own only a two-wheeler than among the rest of the segments (except those who do not own any vehicles), these could be potential target segments for improving consideration of train in the choice set.
Comparison of the Probabilistic versus Deterministic Inclusion Segments in the Choice Stage
Table 3 shows the choice stage of the fully probabilistic and partially probabilistic choice set models. In the proposed model, M2, the intercept values differ between deterministic and probabilistic consideration segments for car and bus. The probabilistic consideration segment had a smaller probability of choosing car than the deterministic segment when other factors were identical. A reverse effect is seen for bus. The target segments that are relatively easy to shift to other sustainable modes from two-wheeler and car include these alternatives only probabilistically than deterministically in their choice sets. From a policy perspective, the improved specification in M2 would let us propose strategies to improve bus ridership differently among segments that include two-wheeler, car, and bus probabilistically and deterministically in their choice set. For example, the innate preference for two-wheeler and car would be higher among commuters who include it deterministically than among those who include it probabilistically. An increase in the operating cost of personal vehicles (say a fuel price increase) is likely to induce a shift to sustainable modes among the segments that probabilistically consider personal vehicles than those who include it deterministically. Similarly, a reduction in bus fare will have a greater impact on the deterministic bus inclusion segment than the probabilistic one. Therefore, the partially probabilistic choice set models present a more realistic representation of a policy’s target segments and the anticipated shift to sustainable modes than the fully probabilistic models.
Choice Stage and Goodness-of-Fit Comparison of Fully Probabilistic and Partially Probabilistic Choice Set Models
Note: 2w = two-wheeler; NMT = Non-motorized Transport; NA = not available.
Deterministically included segment’s sensitivity in M2.
Probabilistically considered segment’s sensitivity in M2.
Comparison of Predicted Mode Shares in M1 and M2 Against Observed Mode Shares for Selected Sub-Segments
The mean predicted choice probability estimates of M1 and M2 are compared against the observed market shares for sub-segments (consideration of each alternative). Figure 1 shows selected sub-segments for which noticeable differences were observed between M1 and M2 in the predicted and sample-based market shares. The plot shows the mode share comparison (predicted versus observed) for commuters who deterministically exclude train (Figure 1a), probabilistically consider car (Figure 1b) and deterministically include bus (Figure 1c) in their choice sets, respectively. Similar plots have been developed for all possible cases (type of consideration) for each alternative. However, those have a less or negligible bias in predicted mode shares and are not shown here.

Comparison of predicted mode shares in M1 and M2 with observed market shares for selected sub-segments: (a) mode share for commuters who deterministically exclude train in M2; (b) mode share for commuters who probabilistically consider car in M2; and (c) mode share for commuters who deterministically include bus in M2.
From Figure 1, a to c , it follows that the average mode share in M1 deviates from the sample-based market shares considerably more than M2, especially for the following cases. For commuters who
i) deterministically exclude train from their choice set,
two-wheeler, car, auto, and shared auto shares are overestimated in addition to the train shares.
Non-motorized Transport shares are underestimated.
ii) probabilistically consider car in their choice set,
two-wheeler, bus, train, shared auto, and NMT shares are overestimated.
car and company bus shares are underestimated.
iii) deterministically include bus in their choice set,
two-wheeler, car, train, and shared auto shares are overestimated
auto, company bus, and NMT shares are underestimated in addition to bus.
Such bias in the consideration and the choice probability estimates of M1 could lead to misinformed policy evaluation and skewed inferences from the model.
Model Validation
A subset of the sample (70% or 609 observations) was drawn and used as the calibration sample. The remaining 30% was used as the hold-out validation sample ( 23 ). The partially probabilistic choice set model was calibrated using both these sub-samples. The coefficients estimated using the calibration sample were applied to the hold-out validation sample to compute the predicted log-likelihood and the likelihood ratio index. The difference between the actual likelihood ratio index and the predicted likelihood ratio index was 6.1% in M2 and 8.1% in M1, which is less than the acceptable error tolerance limit (10% to 15%). Thus, both the models are validated.
Policy Application
From a policy analysis standpoint, identifying influential variables at the consideration and choice stages can enable the selection of appropriate policies at each stage. To this end, the consideration probability, as well as the conditional choice probability for each alternative, is classified as low (<0.25), medium (0.25, 0.75), and high (>0.75) probability levels. In total, nine possible combinations can arise. Depending on the combinations (low consideration and high choice probability, high consideration and low choice probability, etc.), suitable target segments can be identified, and appropriate policies suggested. The short and medium-distance trips in two-wheelers could be an identified target segment for shifting to public transport.
The Chennai MTC (Metropolitan Transport Corporation) follows a distance slab (stage of approximately 2 km) based bus fare structure with a minimum bus fare of Rs 2 for the first stage (in 2007) and subsequently increased (by Rs 1/.50) for additional distances. A fare change scenario that is tested involves extending the minimum bus fare of Rs 2 up to 5 km, reducing it (by 15% to 45% in steps) for medium distances (5–12 km), and retaining current fares for longer-distance trips (>12 km). Under this changed fare structure, there was a 1.2% to 1.6% increase in bus shares and a decrease of 0.6% to 0.8% in two-wheeler and 0.2% decrease in car shares, as shown in Table 4. However, a 0.5% to 0.7% decrease in train shares was also observed since bus and train are competing modes in the medium-distance range.
Comparison of Predicted Mode Shares From the Partially Probabilistic Choice Set and Fully Probabilistic Choice Set Models (%)
Note: 2w = two wheeler; SHAUTO = shared auto; CBUS = company bus; NMT = Non-motorized Transport.
Another policy to encourage public transit use was tested where bus fare concessions are provided for commuters who own only one two-wheeler or those who do not own vehicles. The model showed an increase of 1.2% to 3% in bus shares with incremental concessions ranging from 15% to 45%. A corresponding decrease of 0.5% to 1.2% in two-wheeler shares and 0.1% in car shares were observed.
A policy providing bus fare concession for commuters whose monthly vehicle mileage is less than 300 km (for short-distance 5 km trip, total distance traveled = 10km + 2 km for maintenance activities = 12 km, 12 × 25 working days/month = 300 km/month) was tested. The model showed a 1% to 2.3% increase in bus shares, a 0.4% to 0.9% decrease in two-wheeler shares, and a marginal drop in all other mode shares.
While the above policies focused on improving the conditional choice, given that bus is considered in the choice set, policies that improve consideration of the bus are equally important. The policy that was tested in this context was to increase the accessibility to bus stops from home. A resulting reduction in the distance between home and bus stops by 25%, 50%, and 75% would produce an increase by 8% to 10% in bus consideration and 0.3% to 0.6% in bus shares.
Comparing the models M1 and M2, it was observed that the fully probabilistic choice set models underestimated the increase in bus shares as a result of these policies by 0.9% to 2% in each of these policies. This was further confirmed by the model validation exercise. On applying the coefficients (estimated using the calibration sample of 70%) of models M1 and M2 in the hold-out validation sample (30%), the predicted shares of bus (and other modes) in M2 (19.8%) were closer to the observed shares (20.5%) than in M1 (18.6%) as a result of empirical data-based partial and probabilistic choice set specification. Therefore, a similar trend in model validation and policy analysis confirms the under(over)estimation in M1. The policies to improve bus ridership at the choice stage predicted a relatively low increase (0.3% to 0.6%) in bus shares compared with the policies that attempted to improve the consideration of bus in the choice set. Policies based particularly on vehicle ownership and vehicle mileage were more effective than those based on work distance.
Conclusions
Existing mode choice models assume one of two extremes with regard to choice sets: either they are fully deterministic or fully probabilistic (latent). A new partially probabilistic choice set framework is presented in this study. Such a model can improve choice set and conditional choice specification by distinguishing segments based on deterministic exclusion, deterministic inclusion, and probabilistic consideration. Unfortunately, the fully probabilistic choice set models do not acknowledge, search, and use the additional empirically available information to reduce choice set latency. Further, a fully probabilistic choice set model overestimates the choice probability for alternatives excluded from the choice set while underestimating the choice probability of alternatives included/perceived in the choice set completely.
The proposed framework and models are illustrated using empirical data and compared with the fully probabilistic choice set model at the consideration and conditional choice decision stages. The proposed model offers the following advantages. First, it has the additional ability to differentiate those segments which vary in their type of consideration, deterministic versus probabilistic. Second, the factors influencing consideration type (deterministic or probabilistic) are explicitly and parametrically analyzed instead of being assumed to be 0 or 1 as in the existing models. Third, the specification can disentangle factors that affect the inclusion outcome from the type of consideration. Finally, the specification can also permit differential sensitivity to factors in conditional choice probability among those who consider an alternative deterministically from those who consider it probabilistically. Further, various models, including the partially probabilistic and fully probabilistic choice set models, are special cases of this framework.
The empirical results indicate that the mean consideration probabilities of alternatives that have been deterministically excluded and included from the choice set were significantly different from 0 and 1, respectively, in the fully probabilistic model. Similarly, the mean consideration probabilities of alternatives also differ for the probabilistic consideration segment between the proposed and fully probabilistic choice set model. The predicted mode shares of the latter are also shown to be biased. It also reflects as normalized sensitivities in both the consideration and choice stage of the fully probabilistic choice set models. The partially probabilistic choice set models could capture the differential sensitivities among segments that include these alternatives deterministically and probabilistically with regard to the travel cost of two-wheeler, car, bus, and travel time.
Choice set misspecification in the fully probabilistic choice set models led to biased sensitivities and behavioral misinterpretations. Spurious significance (long-distance trips and car ownership on two-wheeler, high income, and two-wheeler ownership on car) and insignificance (semi-captive and choice segment effects on bus and train) of certain factors were observed in this model.
It is often challenging to specify suitable variables at the appropriate stage in two-stage simultaneous models: consideration stage, choice stage, or both. Certain variables were significant only at the consideration stage (distance from home to bus stops and railway stations) and certain other variables at the choice stage (travel time and cost). However, certain variables that were significant at both levels (bus frequency, presence of direct bus service) were included only at the consideration level to avoid correlation/confounding errors. Rigorous stage-wise specification of these attributes requires further investigation.
The effect of alternative policy scenarios was evaluated using both models at both the consideration and choice stages. Significant differences in policy impacts were observed across the models. The results show a significant increase in bus mode share of 1% to 3% for varying levels of bus fare concessions targeted at specific segments based on vehicle ownership and work distance. Increasing accessibility to bus stops led to a modest increase of 0.6% in bus share.
In this paper, deterministic inclusion/exclusion of IPT modes such as auto, shared auto, and company bus have not been defined in the proposed model owing to data limitations. Data on the availability of alternatives, awareness of its availability and feasibility, evidence of its inclusion/exclusion in/from the choice set, and the contextual factors which influence it, and so forth are often not given due importance in the questionnaires since they might make them too long or lead to incomplete responses. Careful framing of these questions could help gain greater clarity of choice sets and improve the specification of the choice construct. However, such an improved choice set specification comes with additional but reasonable search costs and effort required before modeling. In the presence of such data, partially probabilistic choice set models serve as a useful framework to understand consideration and choice behavior more realistically and logically. This work could be further extended by developing empirical models for choice dimensions like route or destination. It would also be a useful contribution to develop empirical models for the generalized framework.
Footnotes
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: Parthan Kunhikrishnan, Karthik K. Srinivasan; data collection: Karthik K. Srinivasan; analysis and interpretation of results: Parthan Kunhikrishnan, Karthik K. Srinivasan; draft manuscript preparation: Parthan Kunhikrishnan, Karthik K. Srinivasan. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The support rendered by the pCoE on Connected Intelligent Urban Transport Laboratory at IIT Madras, funded by the Ministry of Urban Development, Government of India, is gratefully acknowledged.
