Abstract
This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model strength. The authors also illustrate and test a jackknife procedure to correct for the bias in the OOR and estimate its standard error. An example applying the OOR to evaluate logistic regression models predicting organizational turnover is provided. The authors discuss implications and offer recommendations for using the OOR to quantify and compare the effectiveness of logistic regression models in applied research.
Many important criterion variables in educational and organizational research are dichotomous in nature (e.g., college dropout, turnover), and thus require multiple logistic regression (MLOGR) analysis. Unfortunately, users of the method often find themselves disadvantaged by lack of a meaningful index to describe the overall strength or usefulness of their models. Allen and Le (2008) introduced the overall odds ratio (OOR) as a new index for quantifying the overall effect size in logistic regression models. The OOR can be interpreted in the same way as the odds ratio of individual independent variables: It is the ratio of the odds of belonging to a category of the dependent variable that a researcher is interested in predicting (e.g., leaving vs. staying in turnover) when the weighted linear combination of the independent variables increases one standard deviation to the odds before such an increase. Allen and Le derived a procedure to adjust for the inflation of OOR in samples of finite size analogous to the well-known Wherry (1931) formula in multiple linear regression (MLR). The authors then illustrated via Monte Carlo simulation that the OOR was not influenced by study base rate. Thus, the OOR addresses many shortcomings of current indices and has potential as a useful index of overall effect size for users of logistic regression.
However, there remain crucial issues that need to be further investigated before the OOR can be widely adopted in applied research. First, Allen and Le’s (2008) simulation was based on the logit model. Because there exists an alternative model, the probit model, which is arguably more relevant in many situations in educational and organizational research (Kemery, Dunlap, & Bedeian, 1989), their findings regarding independence of the OOR from study base rates are inconclusive. Additionally, Allen and Le did not establish any procedure to estimate the standard error of the OOR. This is an important limitation, because the standard error is essential to judge stochastic agreement between a sample statistic and a population parameter (Hanushek & Jackson, 1977) and provides information allowing the combination of research findings across studies (Hunter & Schmidt, 2004). Finally, the procedure for correction of OOR bias provided by Allen and Le is computationally complicated because matrix manipulation is involved and, therefore, may be both logically complex to model in computer programs and difficult to understand by most empirically oriented researchers.
The present study extends the work of Allen and Le (2008) by addressing the issues noted above. Specifically, we first simulate data under the probit model to thoroughly examine how the OOR varies in accordance with study base rate. Next, we illustrate a jackknife bootstrapping procedure as an alternative approach to adjust for bias in the OOR when it is estimated in samples of finite size. The jackknife procedure also provides a means to estimate the standard error of the OOR (Sapra, 2002). Finally, we demonstrate an application of using the OOR in applied research. An SAS macro for estimating the OOR, its standard error, and corrected value are provided to enable application and use of the index in logistic regression analysis. Thus, this study provides an important complement to Allen and Le’s (2008) work by making the OOR more readily accessible to empirically oriented researchers and practitioners.
The Overall Odds Ratio
Effects of individual predictors in MLOGR models can be intuitively represented using the odds ratio, which is the exponential function with natural log base of the regression weights [exp(β k )]. Note that when there is only one predictor, the exp(β k ) can be seen as representing the overall effect of the MLOGR model including only that predictor. Furthermore, when the only predictor is standardized, its regression coefficient β1 is equal to the standard deviation of η, the “linear combination of the predictors.” That is,
Extending this concept, Allen and Le (2008) suggested that a similar odds ratio can be used to index the strength of MLOGR models with more than one predictor. This odds ratio of the full model is referred to as the OOR and is defined as
Because
Equation (1) can be alternatively presented as (Allen & Le, 2008)
with Cov(X k , Xk′) being the covariance between two predictors X k and Xk′ (or variance if k = k′). The OOR is interpreted as the factor by which the odds of belonging to Category “1” in the criterion changes when the linear combination of the predictors (η) increases one standard deviation. Arguably, this interpretation is intuitively meaningful because it is a straightforward extension of the familiar odds ratio for individual predictors. Furthermore, from Equations (2) and (4), it can be seen that the OOR is determined by the variance of the linear combination of the predictors η. In other words, OOR is a function of Var(η). This is analogous to the R2 (or multiple R) in MLR. As such, OOR in MLOGR can be seen as an extension of the R2 (or more directly, the multiple R) in MLR.
Data simulations show that OOR is not dependent on study base rate (Allen & Le, 2008). Indeed, it can be analytically inferred from Equation (2) that when data are generated based on the logit model (i.e., the expected value [E(Y) or probability] of the dichotomous criterion is directly generated from the logit function of the linear combination of independent variables), the OOR is not affected by the base rate, which is a function of β0. However, although the logit model is more technically appropriate for logistic regression, the probit model, which assumes that there is a normally distributed variable Y′ underlying the dichotomous criterion variable Y, may better reflect some major constructs in educational and organizational research (Kemery et al., 1989). Unlike the logit model, it is not possible to analytically derive the potential relationship between OOR and the base rate in the probit model. As such, Allen and Le’s (2008) findings regarding the independence of OOR from study base rates may not be conclusive. Accordingly, we present a simulation based on the probit model.
Examining the Effect of Base Rate on Overall Effect Size Indices
To study the effect of base rate, we examined five effect size indices for MLOGR models: The ordinary least square,
Simulation conditions
Apart from base rate, which was varied from .10 to .50 with the interval of .10, we systematically varied (a) number of the predictors (NV), (b) intercorrelations among the predictors (IR), and (c) correlations between the predictors and the latent continuous variable Y′ underlying the dichotomous criterion Y (DR). There are two levels for the NV factor (two or five predictors), four levels for the IR factor (.00, .30, .50, and .70—correlations among all the predictors were specified to be the same), and four levels for the DR factor (.00, .10, .30, and .50—correlations between any predictors and the latent continuous variable Y′ were specified to be the same). In total, there are 150 conditions, 1 representing a wide range of values that researchers may encounter.
Simulation procedure
For each condition, we first created m (m = NV + 1) multivariate-normally distributed variables following the procedure described in Fan and Fan (2005). Out of these, NV variables serve as the predictors, and the remaining variable is the latent continuous variable Y′ (underlying the dichotomous criterion Y). The predictors were specified to correlate with each other at IR and with Y′ at DR (NV, IR, and DR were determined by the simulation condition). To avoid the potential confounding effect of sampling error, we needed to compare the population values of the indices. Since there is no close formula for estimating the overall effect size index in MLOGR, we simulated data based on very large sample sizes (N = 2,000,000). These 2,000,000 observations thus created the “population.” The criterion Y was then created by dichotomizing Y′. Dichotomization was achieved by comparing the value of Y′ in each observation to a cutoff score. The cutoff score was determined by the value in the normal distribution where its cumulative probability is equal to the desired base rate defined in the simulation condition. If Y′ is larger than the cutoff score, then Y is set to 1; otherwise, Y = 0.
Analyses
We analyzed data based on the general linear model with four between-subject factors (base rate, NV, IR, and DR) to examine the effect of base rate on the five indices of effect size. Five separate analyses were conducted, one for each index
Results
Results are presented in Table 1. As can be seen, proportion of variance in the indices (eta square) attributable to base rate range from .001
Effects of Simulation Factors on the Variations of the OOR and Other R2 Analogs
Note. N = 150 (number of conditions simulated).
Proportion of variance in the indices accounted for by NV, IR, DR, and their interaction effects (two-way and three-way effects) = (2) + (3) + (4) + (5) + (6) + (7) + (8). These factors are deemed relevant because they are theoretically expected to influence the variance of these indices (see text for details).
The last row of Table 1 shows the combined effects of NV, IR, and DR on the indices. As noted earlier, these effects denote the extent to which the indices appropriately reflect model strength. It can be seen that most of the variation in the five indices are due to these effects (from 97.0% for the
Correcting for Bias in the OOR
Bias in OOR estimates
In MLR, it is well known that the R2 and multiple R obtained in samples of limited size overestimate their population values due to overfitting error specific to a sample (Schmitt, Coyle, & Rauschenberger, 1977; Wherry, 1931). Indices for overall effect size in MLOGR models are also susceptible to this bias. Procedures to correct for the bias in R2 analogs have been developed (Liao & McGee, 2003), although they have not been widely adopted. Allen and Le (2008) suggested a procedure to correct for the bias in the OOR. Although this procedure was found to work reasonably well, it is computationally complicated as it involves matrix manipulation. We present an alternative approach, which does not require estimation and manipulation of intervariable matrices.
A jackknife correction procedure
Sapra (2002) discussed the bias for maximum likelihood estimators for probit models and applied the jackknifed bootstrapping procedure to correct for such bias (Efron, 1981). The procedure involves the following steps:
Sequentially delete one observation [observation i, with i = 1, 2, . . . , n (n = sample size)] from the data and then calculate the statistic of interest (regression coefficient) in the data set; denote this estimate
Repeat Step 1 until every observation is excluded once
Calculate the corrected statistic using the following equation:
where
Because the jackknife correction procedure introduced by Sapra (2002) was meant for regression coefficients, we adapted it for the OOR. Specifically, Var(η)1/2 (in Equations 2 and 4) was calculated as our statistic of interest, and then
Note that Equation (6b) was needed to address situations where the correction procedure overcorrects for bias due to sampling error. Since the statistic of interest, Var(η)1/2, cannot be negative, we fixed its value at 0. To facilitate calculation, we developed an SAS macro, which estimates the OOR, corrected OOR, and its standard error (discussed later). The macro is available on request.
Simulation procedure and data analysis
Data were simulated based on both the logit and probit models. We varied six factors in our data: (a) data model (logit and probit), (b) sample sizes (n = 100 and 500), (c) base rate (.10, .20, .30, .40, and .50), (d) number of predictors (NV = 2 and 5), (e) intercorrelations among the predictors (IR = .00, .30, .50, and .70), and (f) correlations between the predictors and the latent variable Y′ underlying the dichotomous criterion Y (DR = .00, .10, .30, and .50). 2 In total, there were 600 simulation conditions. 3 For each simulation condition, 500 data sets were generated. Equation (2) was then used to estimate the OOR, and the jackknife correction procedure described above was applied to estimate the corrected OOR for each data set. Results were then compared with the population values of the OOR (obtained in the previous analysis). For each condition, the mean and standard deviation of the absolute error (|corrected_OOR − true_OOR|) across simulated data sets was calculated, with corrected_OOR being the value of OOR obtained from the jackknife correction procedure and true_OOR being the population value of the OOR in that condition. We wrote an SAS program to perform the simulation and analysis described in this section. 4
Results
Because of space limitations, we present only one representative combination of conditions based on the sample size (n = 100) and number of predictors (NV = 2) for the probit model (Table 2). Results for other conditions are available on request.
Monte Carlo Results for Probit Model (Number of IVs = 2, n = 100): Estimated and Corrected OOR
Note. OOR = overall odds ratio; IR = intercorrelations among the IV; DR = correlation between the DV and IV; True = true value of the OOR (population value); Est. = estimated value of the OOR (mean of the OOR estimates from 500 simulated samples); Cor. = estimated OOR corrected for bias (mean of the corrected OOR from 500 simulated samples); Md = mean absolute difference between the estimated OOR and the true OOR; SDd = standard deviation of the absolute difference between the estimated OOR and true OOR.
As can be seen in Table 2, the corrected OOR reproduced the population values reasonably well: The mean absolute difference ranges from .24 to 2.73 with an overall mean of .55. Although the value of 2.73 appears to be very high, that value occurred when the true OOR was 6.47 (IR = .00 and DR = .50); percentagewise, the difference is not large. For conditions where the population values of the OOR are 1.00 (i.e., where the predictors have no effect on the criterion), the correction yielded relatively less accurate estimates percentagewise, generally overestimating the true values from 17% (1.17) to 37% (1.37). This overestimation is actually expected given the imposed truncation on the minimum values for the corrected OOR (Equation 6b). Specifically, when the true value of Var(η)1/2 is zero, its estimated value would be negative 50% of the time due to sampling error. Here, we had to set the corrected values at .00 because, theoretically, a standard deviation cannot be smaller than zero. Similar situations (i.e., overestimation due to theoretical truncations) have been found and discussed in past simulation studies (e.g., Le & Schmidt, 2006; Overton, 1998).
Closer examination of Table 2 further reveals that the accuracy of the corrected OOR improves as base rate increases. For example, with IR = .00, the mean absolute difference MD averaged across levels of DR is 1.06 when base rate is .10 (the first row in Table 2); the mean absolute difference MD (average across levels of DR) reduces to .67 when base rate increases to .50. As noted above, although mean absolute difference appears to increase with DR, this increase is proportional to the increase of the true OOR. Thus, percentagewise, there is no noticeable trend regarding effects of other simulation factors (IR and DR, except when DR = 0 as discussed earlier) on the accuracy of the corrected OOR.
Across all simulation conditions, a similar pattern emerges: Accuracy of the correction improves as the base rate increases. The bias (overestimation) of the corrected OOR is worst when DR = .00. The accuracy of the corrected OOR also appears to depend on sample size (accuracy increases as sample size increases) and, to a much lesser extent, data model (i.e., estimates for logit model are slightly more accurate than those for the probit model). Other factors (NV, IR, and DR, except when DR = .00) do not seem to significantly influence the accuracy of the corrected OOR. For example, for conditions with n = 500 and NV = 5 under the probit model, the averaged mean absolute difference MD across levels of DR, IR, and base rate is .32 (compared with the averaged MD of .55 in Table 2). With the same combination of conditions (n = 500 and NV = 5) under the logit model, the mean absolute difference MD averaged across all levels of DR, IR, and base rate is .28. Overall, the jackknife correction procedure yields reasonably accurate estimates for the OOR in all simulation conditions. Although results for conditions under the logit model are slightly more accurate than those under the probit model, the differences are minimal and practically negligible.
Estimating Standard Error of the Corrected OOR
The standard error of the corrected estimate can be calculated based on the following equation (Sapra, 2002):
where
To estimate standard error for the corrected OOR, we replaced θ in the above equations with the OOR.
Simulation procedure and data analysis
We used the same data sets developed in the previous section. Equations (7) and (8) were used to estimate standard error for the corrected OOR. These values were then compared with the “true” standard errors, which were the observed standard deviations of the corrected OOR estimated across 500 samples in each condition. As in previous analyses, the mean and standard deviation of the absolute error (|Est_SE − Obs_SE|) across simulated data sets was calculated for each condition, where Est_SE is the standard error estimated by the jackknife procedure and Obs_SE is the observed standard deviation of the corrected OOR across 500 samples (true standard error). Simulations and analyses were performed using the previously discussed SAS program.
Results
There are three conditions where the jackknife procedure failed to provide appropriate estimates of the standard error. In these conditions, the estimates were seriously inflated (in the order of hundreds). All these conditions are those with the lowest base rate (.10) and highest correlation with the underlying continuous dependent variable (DR = .50), except where the intercorrelations between the independent variables are very high (IR = .70). It is likely that small sample size (100) coupled with low base rate made it difficult for the logistic regression models to converge (alternatively, the models may have converged with unrealistic results). For the remaining conditions, the standard error (SE) estimated by the jackknife procedure consistently overestimates the observed SE. Table 3 presents results for one representative combination of conditions (n = 500 and NV = 5) for the logit model. As can be seen, the overestimation ranges from .02 to .52, with a mean absolute difference averaged across all conditions of 0.08. Percentagewise, the overestimation is most serious when DR = .00, averaging at almost 36% (.04/.11). For other conditions (excluding those with DR = .00), the overestimation is less severe. These observations here can again be partially explained by the truncation of the distribution of the corrected OOR when its population value is 1.00. Although there is no clear pattern, results indicate that overestimation increases as the base rate decreases.
Monte Carlo Results for Logit Model (Number of IVs = 5, n = 500): Estimated Standard Error of the Corrected OOR
Note. IR = intercorrelations among the IV; DR = correlation between the DV and IV; Obs. = observed standard error of the corrected OOR (this is the standard deviation of the corrected OOR across 400 simulated samples); Est. = estimated standard error of the corrected OOR obtained from the bootstrapping procedure; Md = mean absolute difference between the estimated SE and the observed SE; SDd = standard deviation of the absolute difference between the estimated SE and the observed SE.
Across simulation conditions, the same pattern can be seen: The jackknife procedure generally overestimates the observed SE, and the overestimation is most serious (percentagewise) when DR = .00. The accuracy improved with larger sample size. Overestimation for conditions with large sample size (n = 500) is generally small across all conditions examined. With smaller sample size (n = 100), the overestimation is significantly larger. For example, for the combination of conditions with n = 100 and NV = 2 for the logit model, the averaged mean absolute difference is 0.43. Overall, estimates for conditions under the logit model are slightly more accurate than those under the probit models. However, the differences are small and mostly negligible. For example, under the probit model, averaged mean absolute difference MD across levels of DR, IR, and base rate is 0.10 (compared with the averaged MD of 0.08 in Table 3).
An Example of Using the OOR to Evaluate Model Effectiveness
To demonstrate the application of the OOR in organizational research, we used results from a recent study examining predictors of employee turnover (Barrick & Zimmerman, 2009). In that study, the researchers used three sets of predictors, including biodata, personality, and prehire attitudes, to predict voluntary turnover.
5
It was found that the predictors significantly predicted employee turnover 6 months after hire (base rate = .20; n = 119). The Cox and Snell R2 analogs
From the information provided in Barrick and Zimmerman (2009), we applied Equations (3) and (4) to calculate the OOR. 6 The OORs for models with one predictor were found to be 1.70 (personality only), 2.02 (attitudes only), and 5.28 (biodata only). These values indicate the factors by which the odds of leaving the organization after 6 months will increase when the score on these predictors changes by one standard deviation. For example, for biodata, the odds of leaving for an employee whose score is one standard deviation above the mean of the biodata measure will be 5.28 times higher than those whose scores are at the mean. When all three predictors were included to predict turnover, the OOR is 7.01. This value indicates that the odds of leaving the organization after 6 months will increase approximately sevenfold when the score on the composite created by linearly combining the three predictors increases by one standard deviation. Compared with the model where only the biodata is included as the predictor, the model with all predictors is about 33% more effective (7.01/5.28 = 1.33). Similarly, it is 247% and 312% more effective than the model with attitudes only and personality only, respectively. Interpretation of the OOR for these models is intuitively meaningful and can be used to compare with models predicting turnover in other studies based on different predictors.
Discussion
Our study complements the work of Allen and Le (2008). Using a different data model (i.e., probit vs. the logit used in Allen and Le), we found that the OOR, together with the log likelihood ratio R-square analog
We believe that the OOR can be a useful index for applied researchers to evaluate and communicate research findings regarding the strength of their MLOGR models. Obviously, the OOR is not without limitations. In particular, it does not have a well-defined range with endpoints corresponding to perfect fit and complete lack of fit (.00 to 1.00 for R2 or −1.00 to 1.00 for multiple R in MLR), which is a desirable feature for indices of overall effect size (Menard, 2000). We therefore recommend that the OOR be used together with the log likelihood ratio R2 analog
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
