Abstract
Most project management research focuses almost exclusively on explanatory analyses. Evaluation of the explanatory power of statistical models is generally based on F-type statistics and the R 2 metric, followed by an assessment of the model parameters (e.g., beta coefficients) in terms of their significance, size, and direction. However, these measures are not indicative of a model’s predictive power, which is central for deriving managerial recommendations. We recommend that project management researchers routinely use additional metrics, such as the mean absolute error or the root mean square error, to accurately quantify their statistical models’ predictive power.
Two Performance Dimensions of a Model: Explanatory and Predictive Power
A key concern in the evaluation of statistical models is to establish explanatory power, which refers to “the strength of association indicated by a statistical model” (Shmueli & Koppius, 2011, p. 561). Researchers typically evaluate their models’ explanatory power based on F-type statistics and the R2 (Cohen, 1988), followed by an assessment of the model parameters in terms of their significance, size, and direction. Similarly, in structural equation modeling (SEM), which is arguably the most prominent method for testing complex cause-effect models, project management researchers often rely on covariance-based methods, which strongly emphasize assessing a model’s goodness-of-fit by using the χ2 statistic or alternative fit indices, such as CFI, RMSEA, and SRMR (Bagozzi & Yi, 2012). These SEM metrics are derived from an explanatory perspective in that they quantify the divergence between the empirical covariance matrix and the model-implied covariance matrix. As such, they indicate how well the hypothesized model fits the entire data at hand.
These parameters and metrics are estimated and evaluated using all of the available information; that is, the entire dataset. For example, the computation of the R2 draws on the estimates produced by the entire dataset to predict the dependent variables’ data that have already been used to obtain an optimal statistical solution. Solving the statistical model using the same sample of data to both explain the relationships between the variables and predict that same sample data is also referred to as in-sample prediction. But measures of in-sample prediction, such as the R 2, provide no indication of how well a statistical model is able to predict outcome values for previously unseen data (Shmueli & Koppius, 2011). To obtain such a measure, researchers need to use an initial sample to estimate the model parameters, and then use those parameters to predict the values of the dependent variables in a second sample. The process of using one sample to develop model parameters and then predicting the dependent variable in a second sample is referred to as out-of-sample prediction.
Assessing a model’s (out-of-sample) predictive power requires gathering new data or separating the dataset into a training sample and a holdout sample. The training sample does not include the cases to be predicted and serves as a basis for the model estimation. The estimated parameters are then used to generate predictions for the cases of the newly gathered dataset or the holdout sample. Prediction statistics, such as the mean absolute error or the root mean square error, facilitate quantifying the prediction error (Hastie et al., 2013). Researchers can also draw on k-fold cross-validation, which randomly splits the dataset into k equally sized subsets of data (Browne, 2000). The procedure then combines k-1 subsets into a training sample, which is used to estimate the model parameters. In the next step, the model estimates are used to generate case-level predictions for all observations in the omitted subset (i.e., the holdout sample). This process is repeated until each subset has served as a holdout sample. The splitting of the dataset is generally done randomly, but there are variants, such as stratified cross-validation, in which researchers can enforce a certain data distribution in each subset (Burman, 1989).
Evaluating a model’s explanatory and predictive power is not solely a question of using the right metrics. It also guides the choice of methods for estimating statistical models as they differ in their ability to accommodate explanation and prediction-oriented model assessments. In the context of SEM, for example, covariance-based methods do not provide any reliable indication with regard to prediction because the construct scores produced by the method, and which serve as a basis for any predictive power assessment, are indeterminate (Rigdon et al., 2019). This means there is an infinite number of different sets of construct scores that will fit the model equally well (Guttman, 1955).
On the contrary, composite-based SEM methods, such as partial least squares (PLS), always produce single specific (i.e., determinate) construct scores for each observation, which serve as a basis for assessing the model’s predictive power. For this reason, PLS is conceived as a causal-predictive approach to SEM (Hair & Sarstedt, 2019), which enables researchers to assess their models from both explanation and prediction perspectives (Chin et al., 2020). In the context of predictive power assessment, Shmueli et al. (2016) have proposed the PLSpredict procedure, which applies k-fold cross-validation to PLS path models (Shmueli et al., 2019). Similarly, researchers using PLS-SEM can engage in prediction-oriented model comparisons using metrics such as Schwarz’s (1978) BIC and the Geweke and Meese’s (1981) (GM) criterion. These criteria have been shown to achieve a sound trade-off between model fit and predictive power (Sharma et al., 2020) and can also be used to compute the relative plausibility of a model, given the data and set of models (Danks et al., 2020).
Model Evaluation in Project Management Research
In project management, as in other social sciences, researchers generally put little emphasis on prediction (Shmueli, 2010). This is surprising, since assessing the predictive power of models lies at the heart of the scientific enterprise. Researchers evaluate, compare, and reject theories on the basis of their ability to make falsifiable predictions about new observations. In the physical sciences, prediction-driven explanation has proven uncontroversial, especially in cases where theories make relatively unambiguous predictions and data are plentiful (Hofman et al., 2017). Similarly, in bioinformatics, scholars have concluded that “a predictive model represents the gold standard in understanding a biological system and will permit us to investigate the underlying causes of diseases and help us to develop therapeutics.” (Gifford, 2001, p. 2049).
Instead of testing whether a theory can accurately predict an outcome of interest (e.g., project performance, risk resilience, or team effectiveness), project management researchers’ primary objective has been assessing whether model coefficients are significant and in the hypothesized direction. As a result, the focus is on the form of the input–output relationship, rather than on predicting new output data given the input. Yet, at the same time, project management researchers frame their managerial recommendations as prescriptive statements—which inherently follow a prediction logic. For example, researchers frequently make conditional statements that foreshadow a specific result if a specific activity is implemented (i.e., prescriptive statements), such as Haffer et al. (2021, p. 156), when recommending that “project managers should positively affect project team members’ perceived work meaningfulness and work engagement by giving project team members an ability to utilize multiple skills, adequate autonomy, and opportunities to obtain job related feedback.” Making such statements, however, requires verifying their adequacy by conducting an additional assessment of the models’ predictive power (Shmueli & Koppius, 2011).
Researchers’ use of predictive statements in managerial implications sections when their model evaluations represent mostly explanation raises questions about the conceptual and practical relevance of their findings. As Kaplan (1964) notes, if we cannot predict successfully on the basis of a certain explanation, we have no good reason for accepting the explanation. This logic is consistent with Popper (1962) who posited that prediction is the primary criterion for evaluating theoretical falsifiability. A singular focus on explanation limits the potential, therefore, to foster understanding of behavioral phenomena. For example, a stronger prediction focus can take project management researchers to the next level in terms of developing new theories, testing their practical relevance, and improving existing models. It can also guide the comparison of alternative competing models derived from different theories by selecting a model that most accurately generalizes to other contexts, while avoiding models that overfit the data by tapping spurious sample-specific patterns (Sharma et al., 2019). Overall, there is much to be gained by putting greater emphasis on predictive power assessments.
How to Do Better
Project management researchers should begin putting a stronger emphasis on the routine use of out-of-sample prediction metrics. For example, studies that draw on large-scale empirical datasets, such as from social networks (e.g., Wang et al., 2018), can easily separate the dataset into training and holdout samples and apply out-of-sample prediction metrics. Similarly, choice experiments that project management researchers have used, for example, to identify the relative importance of criteria for supplier selection (Watt et al., 2010), can readily implement holdout tasks as the basis for assessing the model’s predictive power. To implement these predictive assessment techniques, project management researchers can draw on a wide range of methodological literature that offers clear guidance on how to implement them (Hastie et al., 2013; Shmueli & Koppius, 2011; Shmueli et al., 2019).
Project management scholars would also benefit from more carefully distinguishing between explanation and prediction in their model evaluations—a quick peek at most journals publishing project management research will confirm that authors frequently refer to their findings as prediction when in fact it is explanatory. To illustrate this point, consider Dasí et al.’s (2021) recent study on the impact of different combinations of team ability, motivation, and opportunity on project performance. Using data from 285 projects, the authors run a series of regression analyses in which they interpret the (adjusted) R 2 as indicative of a model’s superior predictive power for project performance compared to alternative models. For example, in their results discussion section, Dasí et al. (2021, p. 83) note that “the multiplicative model clearly provides the best solution with an F-value of 9.62 and an adjusted R-squared of 0.40.” They continue noting that these results support their hypothesis (H2b), which states that this model “is a better predictor of project performance than the constraining factor model” (Dasí et al., 2021, p. 79). The authors further extend their discussions in their managerial implications section, where they conclude by noting that “in complex projects, the greatest improvements in project performance can be achieved by increasing motivation. In addition to its own positive effect, this will amplify the effects of ability and opportunity” (Dasí et al., 2021, p. 85; emphasis added). Their predictive scenario would be further substantiated by moving beyond their explanatory model evaluations to an assessment of out-of-sample predictive power.
We also recommend that researchers consider applying statistical methods that bridge the apparent dichotomy between explanation and prediction. Most notably, PLS-SEM emphasizes prediction in the estimation of a model whose structure is grounded in causal explanations (Hair et al., 2019). As such, the method represents a sound balance between machine learning methods, which are fully predictive in nature, and covariance-based SEM, which focuses on explanation. Note also that out-of-sample predictive metrics can improve our understanding of prediction for both PLS-SEM and regression analysis.
Finally, in implementing our recommendations, researchers would benefit by acknowledging there often is an inherent tension between models that perform well in terms of explanation versus those with a high predictive accuracy. To do so, researchers should first establish the main goal of their research (e.g., explanation only is sufficient), and evaluate the model’s performance in terms of this goal (i.e., explanatory power). But if the goal is explanation plus prediction, then an out-of-sample assessment should be included in their analysis. Such an assessment may involve defining a minimum threshold of predictive power, which project management researchers expect their explanatory models to achieve. Researchers could also compare alternative models that are equally strong in terms of explanation, and then identify the model that exhibits higher predictive power (Sharma et al., 2019).
Considering that project management research implies an understanding of the causes as well as prediction of theoretical concepts and their relationships (Gregor, 2006), the dual focus on explanation plus prediction seems logical. Implementing model evaluation procedures that include both explanation and prediction is therefore a fundamental step toward increasing the rigor and relevance of project management research.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
