The Overall Odds Ratio as an Intuitive Effect Size Index for Multiple Logistic Regression

Abstract

This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model strength. The authors also illustrate and test a jackknife procedure to correct for the bias in the OOR and estimate its standard error. An example applying the OOR to evaluate logistic regression models predicting organizational turnover is provided. The authors discuss implications and offer recommendations for using the OOR to quantify and compare the effectiveness of logistic regression models in applied research.

Keywords

effect size logistic regression odds ratio

Many important criterion variables in educational and organizational research are dichotomous in nature (e.g., college dropout, turnover), and thus require multiple logistic regression (MLOGR) analysis. Unfortunately, users of the method often find themselves disadvantaged by lack of a meaningful index to describe the overall strength or usefulness of their models. Allen and Le (2008) introduced the overall odds ratio (OOR) as a new index for quantifying the overall effect size in logistic regression models. The OOR can be interpreted in the same way as the odds ratio of individual independent variables: It is the ratio of the odds of belonging to a category of the dependent variable that a researcher is interested in predicting (e.g., leaving vs. staying in turnover) when the weighted linear combination of the independent variables increases one standard deviation to the odds before such an increase. Allen and Le derived a procedure to adjust for the inflation of OOR in samples of finite size analogous to the well-known Wherry (1931) formula in multiple linear regression (MLR). The authors then illustrated via Monte Carlo simulation that the OOR was not influenced by study base rate. Thus, the OOR addresses many shortcomings of current indices and has potential as a useful index of overall effect size for users of logistic regression.

However, there remain crucial issues that need to be further investigated before the OOR can be widely adopted in applied research. First, Allen and Le’s (2008) simulation was based on the logit model. Because there exists an alternative model, the probit model, which is arguably more relevant in many situations in educational and organizational research (Kemery, Dunlap, & Bedeian, 1989), their findings regarding independence of the OOR from study base rates are inconclusive. Additionally, Allen and Le did not establish any procedure to estimate the standard error of the OOR. This is an important limitation, because the standard error is essential to judge stochastic agreement between a sample statistic and a population parameter (Hanushek & Jackson, 1977) and provides information allowing the combination of research findings across studies (Hunter & Schmidt, 2004). Finally, the procedure for correction of OOR bias provided by Allen and Le is computationally complicated because matrix manipulation is involved and, therefore, may be both logically complex to model in computer programs and difficult to understand by most empirically oriented researchers.

The present study extends the work of Allen and Le (2008) by addressing the issues noted above. Specifically, we first simulate data under the probit model to thoroughly examine how the OOR varies in accordance with study base rate. Next, we illustrate a jackknife bootstrapping procedure as an alternative approach to adjust for bias in the OOR when it is estimated in samples of finite size. The jackknife procedure also provides a means to estimate the standard error of the OOR (Sapra, 2002). Finally, we demonstrate an application of using the OOR in applied research. An SAS macro for estimating the OOR, its standard error, and corrected value are provided to enable application and use of the index in logistic regression analysis. Thus, this study provides an important complement to Allen and Le’s (2008) work by making the OOR more readily accessible to empirically oriented researchers and practitioners.

The Overall Odds Ratio

Effects of individual predictors in MLOGR models can be intuitively represented using the odds ratio, which is the exponential function with natural log base of the regression weights [exp(β_k)]. Note that when there is only one predictor, the exp(β_k) can be seen as representing the overall effect of the MLOGR model including only that predictor. Furthermore, when the only predictor is standardized, its regression coefficient β₁ is equal to the standard deviation of η, the “linear combination of the predictors.” That is,

β_{1} = Var {(η)}^{1 / 2} .

Extending this concept, Allen and Le (2008) suggested that a similar odds ratio can be used to index the strength of MLOGR models with more than one predictor. This odds ratio of the full model is referred to as the OOR and is defined as

OOR = \exp [Var {(η)}^{1 / 2}] = \exp [Var (β_{0} + Σ β_{k} X_{k})^{1 / 2}] = \exp [Var (Σ β_{k} X_{k})^{1 / 2}] .

Because

Var (η) = \sum_{k = 1}^{p} \sum_{k' = 1}^{p} β_{k} β_{k'} Cov (X_{k}, X_{k'}),

Equation (1) can be alternatively presented as (Allen & Le, 2008)

OOR = \exp [Var {(η)}^{1 / 2}] = \exp [{(\sum_{k' = 1}^{p} \sum_{k' = 1}^{p} β_{k} β_{k'} Cov (X_{k}, X_{k'}))}^{1 / 2}],

with Cov(X_k, X_k′) being the covariance between two predictors X_k and X_k′ (or variance if k = k′). The OOR is interpreted as the factor by which the odds of belonging to Category “1” in the criterion changes when the linear combination of the predictors (η) increases one standard deviation. Arguably, this interpretation is intuitively meaningful because it is a straightforward extension of the familiar odds ratio for individual predictors. Furthermore, from Equations (2) and (4), it can be seen that the OOR is determined by the variance of the linear combination of the predictors η. In other words, OOR is a function of Var(η). This is analogous to the R² (or multiple R) in MLR. As such, OOR in MLOGR can be seen as an extension of the R² (or more directly, the multiple R) in MLR.

Data simulations show that OOR is not dependent on study base rate (Allen & Le, 2008). Indeed, it can be analytically inferred from Equation (2) that when data are generated based on the logit model (i.e., the expected value [E(Y) or probability] of the dichotomous criterion is directly generated from the logit function of the linear combination of independent variables), the OOR is not affected by the base rate, which is a function of β₀. However, although the logit model is more technically appropriate for logistic regression, the probit model, which assumes that there is a normally distributed variable Y′ underlying the dichotomous criterion variable Y, may better reflect some major constructs in educational and organizational research (Kemery et al., 1989). Unlike the logit model, it is not possible to analytically derive the potential relationship between OOR and the base rate in the probit model. As such, Allen and Le’s (2008) findings regarding the independence of OOR from study base rates may not be conclusive. Accordingly, we present a simulation based on the probit model.

Examining the Effect of Base Rate on Overall Effect Size Indices

To study the effect of base rate, we examined five effect size indices for MLOGR models: The ordinary least square, $R_{O}^{2};$ Cox and Snell R² (Cox & Snell, 1989; $R_{M}^{2}$ ); Nagelkerke R² (Nagelkerke, 1991; $R_{N}^{2}$ ); the log likelihood R², $R_{L}^{2};$ and the OOR. The first three indices were selected because of their relative popularity and availability. The fourth index, $R_{L}^{2},$ was included because it has been suggested to be the most appropriate index for MLOGR (Menard, 2000, 2002), and past empirical findings show that it is not dependent on base rate.

Simulation conditions

Apart from base rate, which was varied from .10 to .50 with the interval of .10, we systematically varied (a) number of the predictors (NV), (b) intercorrelations among the predictors (IR), and (c) correlations between the predictors and the latent continuous variable Y′ underlying the dichotomous criterion Y (DR). There are two levels for the NV factor (two or five predictors), four levels for the IR factor (.00, .30, .50, and .70—correlations among all the predictors were specified to be the same), and four levels for the DR factor (.00, .10, .30, and .50—correlations between any predictors and the latent continuous variable Y′ were specified to be the same). In total, there are 150 conditions,¹ representing a wide range of values that researchers may encounter.

Simulation procedure

For each condition, we first created m (m = NV + 1) multivariate-normally distributed variables following the procedure described in Fan and Fan (2005). Out of these, NV variables serve as the predictors, and the remaining variable is the latent continuous variable Y′ (underlying the dichotomous criterion Y). The predictors were specified to correlate with each other at IR and with Y′ at DR (NV, IR, and DR were determined by the simulation condition). To avoid the potential confounding effect of sampling error, we needed to compare the population values of the indices. Since there is no close formula for estimating the overall effect size index in MLOGR, we simulated data based on very large sample sizes (N = 2,000,000). These 2,000,000 observations thus created the “population.” The criterion Y was then created by dichotomizing Y′. Dichotomization was achieved by comparing the value of Y′ in each observation to a cutoff score. The cutoff score was determined by the value in the normal distribution where its cumulative probability is equal to the desired base rate defined in the simulation condition. If Y′ is larger than the cutoff score, then Y is set to 1; otherwise, Y = 0.

Analyses

We analyzed data based on the general linear model with four between-subject factors (base rate, NV, IR, and DR) to examine the effect of base rate on the five indices of effect size. Five separate analyses were conducted, one for each index $(R_{O}^{2}, R_{M}^{2}, R_{N}^{2}, R_{L}^{2}, and OOR),$ based on the population values of the indices computed in 150 simulation conditions. We also examined the extent to which the indices appropriately reflected model strength. Because it has been well established that NR, IR, and DR determine model R² in MLR, they are expected to influence model effectiveness in MLOGR via their effects on the latent continuous variable Y′. Thus, the “helpful” variation of the five indices of overall effect size in MLOGR (vis-à-vis the “unhelpful” variation due to base rate) is determined by the main effects of NV, IR, and DR, together with their interaction effects.

Results

Results are presented in Table 1. As can be seen, proportion of variance in the indices (eta square) attributable to base rate range from .001 $(R_{L}^{2})$ to .013 $(R_{M}^{2})$ . Note that these values appear small because the range of simulated effect sizes is very large (MLR R² of the predictors and the latent variable Y′ ranges from .00 to .63). It is the relative magnitudes of the eta-squared values that are of interest here. As shown in Table 1, the indices can be divided into two groups based on how much they are influenced by base rates. Three R² analogs, $R_{O}^{2}, R_{M}^{2}, and R_{N}^{2},$ with eta squares ranging from .010 to .013, can be grouped together; the $R_{L}^{2}$ and OOR belong to the second group with eta squares substantially smaller than that of the first group (.001 and .003, respectively). This result is consistent with past research, which shows that $R_{L}^{2}$ and OOR are less dependent on base rate as compared with other R² analogs in MLOGR (Allen & Le, 2008; Menard, 2000).

Table 1.

Effects of Simulation Factors on the Variations of the OOR and Other R² Analogs

	Eta Square
Effects	$R_{O}^{2}$	$R_{M}^{2}$	$R_{N}^{2}$	$R_{L}^{2}$	OOR
1. Base rate	.010	.013	.012	.001	.003
2. Number of IVs (NV)	.014	.049	.048	.015	.024
3. Intercorrelation among the IV (IR)	.033	.006	.007	.035	.060
4. Correlation between the IVs and the DV (DR)	.847	.876	.880	.857	.658
Interactions (between NV, IR, and DR)
5. NV × IR	.048	.004	.004	.053	.130
6. NV × DR	.021	.033	.033	.022	.052
7. IR × DR	.006	.000	.000	.007	.020
8. NV × IR × DR	.010	.002	.002	.011	.048
Proportion of variance accounted for by relevant factors^a	.979	.970	.974	.998	.992

Note. N = 150 (number of conditions simulated). $R_{O}^{2}$ = ordinary least square R²; $R_{M}^{2}$ = Cox and Snell R²; $R_{N}^{2}$ = Nagelkerke R²; $R_{L}^{2}$ = log likelihood R²; OOR = overall odds ratio.

Proportion of variance in the indices accounted for by NV, IR, DR, and their interaction effects (two-way and three-way effects) = (2) + (3) + (4) + (5) + (6) + (7) + (8). These factors are deemed relevant because they are theoretically expected to influence the variance of these indices (see text for details).

The last row of Table 1 shows the combined effects of NV, IR, and DR on the indices. As noted earlier, these effects denote the extent to which the indices appropriately reflect model strength. It can be seen that most of the variation in the five indices are due to these effects (from 97.0% for the $R_{M}^{2}$ to 99.8% for the $R_{L}^{2}$ ), as should be expected. Here again the $R_{L}^{2}$ and OOR have the highest percentage of variance attributable to variations in model strength (99.8% and 99.2%, respectively). Taken together, these results provide evidence demonstrating the superiority of the $R_{L}^{2}$ and the OOR to other R² analogs as indices of overall effect size for MLOGR models.

Correcting for Bias in the OOR

Bias in OOR estimates

In MLR, it is well known that the R² and multiple R obtained in samples of limited size overestimate their population values due to overfitting error specific to a sample (Schmitt, Coyle, & Rauschenberger, 1977; Wherry, 1931). Indices for overall effect size in MLOGR models are also susceptible to this bias. Procedures to correct for the bias in R² analogs have been developed (Liao & McGee, 2003), although they have not been widely adopted. Allen and Le (2008) suggested a procedure to correct for the bias in the OOR. Although this procedure was found to work reasonably well, it is computationally complicated as it involves matrix manipulation. We present an alternative approach, which does not require estimation and manipulation of intervariable matrices.

A jackknife correction procedure

Sapra (2002) discussed the bias for maximum likelihood estimators for probit models and applied the jackknifed bootstrapping procedure to correct for such bias (Efron, 1981). The procedure involves the following steps:

Sequentially delete one observation [observation i, with i = 1, 2, . . . , n (n = sample size)] from the data and then calculate the statistic of interest (regression coefficient) in the data set; denote this estimate ${\hat{θ}}_{(i)}$

Repeat Step 1 until every observation is excluded once

Calculate the corrected statistic using the following equation:

\tilde{θ} = n \hat{θ} - \frac{n - 1}{n} \sum_{i = 1}^{n} {\hat{θ}}_{(i)},

where $\tilde{θ}$ is the corrected statistic and $\hat{θ}$ is the statistic estimated in the full sample.

Because the jackknife correction procedure introduced by Sapra (2002) was meant for regression coefficients, we adapted it for the OOR. Specifically, Var(η)^1/2 (in Equations 2 and 4) was calculated as our statistic of interest, and then $\tilde{θ}$ was estimated as follows:

If \tilde{θ} = Var (η)^{1 / 2} > 0, corrected OOR = \exp (\tilde{θ}) = \exp [Var {(η)}^{1 / 2}]

If \tilde{θ} = Var (η)^{1 / 2} \leq 0, corrected OOR = 1 .

Note that Equation (6b) was needed to address situations where the correction procedure overcorrects for bias due to sampling error. Since the statistic of interest, Var(η)^1/2, cannot be negative, we fixed its value at 0. To facilitate calculation, we developed an SAS macro, which estimates the OOR, corrected OOR, and its standard error (discussed later). The macro is available on request.

Simulation procedure and data analysis

Data were simulated based on both the logit and probit models. We varied six factors in our data: (a) data model (logit and probit), (b) sample sizes (n = 100 and 500), (c) base rate (.10, .20, .30, .40, and .50), (d) number of predictors (NV = 2 and 5), (e) intercorrelations among the predictors (IR = .00, .30, .50, and .70), and (f) correlations between the predictors and the latent variable Y′ underlying the dichotomous criterion Y (DR = .00, .10, .30, and .50).² In total, there were 600 simulation conditions.³ For each simulation condition, 500 data sets were generated. Equation (2) was then used to estimate the OOR, and the jackknife correction procedure described above was applied to estimate the corrected OOR for each data set. Results were then compared with the population values of the OOR (obtained in the previous analysis). For each condition, the mean and standard deviation of the absolute error (|corrected_OOR − true_OOR|) across simulated data sets was calculated, with corrected_OOR being the value of OOR obtained from the jackknife correction procedure and true_OOR being the population value of the OOR in that condition. We wrote an SAS program to perform the simulation and analysis described in this section.⁴

Results

Because of space limitations, we present only one representative combination of conditions based on the sample size (n = 100) and number of predictors (NV = 2) for the probit model (Table 2). Results for other conditions are available on request.

Table 2.

Monte Carlo Results for Probit Model (Number of IVs = 2, n = 100): Estimated and Corrected OOR

		True, Estimated, and Corrected Values of the OOR
		DR = .00				DR = .10				DR = .30				DR = .50
Intercorrelation Among the IV (IR)	Base Rate	True	Est.	Cor.	M_d (SD_d)	True	Est.	Cor.	M_d (SD_d)	True	Est.	Cor.	M_d (SD_d)	True	Est.	Cor.	M_d (SD_d)	Average Bias
IR = .00	.10	1	1.66	1.37	.37 (.38)	1.32	1.76	1.45	.34 (.32)	2.44	3.25	2.55	.80 (.72)	6.47	12.88	6.01	2.73 (2.69)	1.06
	.20	1	1.42	1.25	.25 (.27)	1.28	1.56	1.38	.27 (.24)	2.27	2.68	2.38	.53 (.46)	5.88	7.51	5.91	1.97 (1.81)	.76
	.30	1	1.34	1.21	.21 (.21)	1.27	1.48	1.34	.23 (.18)	2.20	2.44	2.23	.47 (.37)	5.63	7.03	5.86	1.85 (1.85)	.69
	.40	1	1.31	1.19	.19 (.19)	1.26	1.41	1.29	.20 (.16)	2.16	2.42	2.23	.45 (.40)	5.51	6.84	5.82	1.74 (1.73)	.64
	.50	1	1.29	1.17	.17 (.17)	1.26	1.42	1.30	.19 (.15)	2.15	2.37	2.19	.42 (.33)	5.47	7.02	5.97	1.88 (2.02)	.67
	Ave.	1	1.41	1.24	.24 (.25)	1.28	1.53	1.35	.25 (.21)	2.25	2.63	2.32	.53 (.45)	5.79	8.26	5.91	2.03 (2.02)	.77
IR = .30	.10	1	1.62	1.33	.33 (.36)	1.27	1.75	1.42	.33 (.33)	2.16	2.82	2.26	.68 (.64)	4.42	6.93	4.44	1.68 (1.82)	.75
	.20	1	1.43	1.27	.27 (.24)	1.24	1.50	1.32	.24 (.21)	2.19	2.27	2.03	.46 (.39)	4.03	5.12	4.23	1.29 (1.37)	.56
	.30	1	1.35	1.22	.22 (.21)	1.23	1.42	1.28	.20 (.18)	1.96	2.20	2.01	.43 (.40)	3.86	4.64	4.05	1.14 (1.09)	.50
	.40	1	1.32	1.20	.20 (.19)	1.22	1.41	1.29	.20 (.18)	1.93	2.13	1.97	.39 (.33)	3.79	4.33	3.86	.93 (.83)	.43
	.50	1	1.29	1.17	.17 (.17)	1.22	1.40	1.28	.19 (.15)	1.92	2.10	1.94	.38 (.30)	3.76	4.53	4.02	1.02 (1.07)	.45
	Ave.	1	1.40	1.24	.24 (.24)	1.24	1.49	1.32	.23 (.21)	1.99	2.30	2.04	.47 (.41)	3.97	5.11	4.12	1.22 (1.24)	.54
IR = .50	.10	1	1.64	1.35	.35 (.36)	1.25	1.68	1.37	.30 (.28)	2.03	2.51	2.04	.60 (.47)	3.79	5.62	3.81	1.35 (1.24)	.65
	.20	1	1.41	1.25	.25 (.24)	1.23	1.48	1.31	.23 (.19)	1.91	2.13	1.91	.42 (.37)	3.47	4.27	3.62	1.00 (.91)	.48
	.30	1	1.34	1.21	.21 (.20)	1.21	1.40	1.27	.19 (.15)	1.86	2.07	1.89	.39 (.32)	3.33	3.85	3.41	.83 (.65)	.41
	.40	1	1.31	1.19	.19 (.20)	1.21	1.42	1.30	.21 (.18)	1.83	2.01	1.86	.33 (.26)	3.26	3.80	3.41	.81 (.73)	.39
	.50	1	1.32	1.20	.20 (.19)	1.20	1.41	1.29	.20 (.16)	1.82	2.04	1.89	.36 (.31)	3.25	3.86	3.47	.80 (.70)	.39
	Ave.	1	1.41	1.24	.24 (.24)	1.22	1.48	1.31	.23 (.19)	1.89	2.15	1.92	.42 (.35)	3.42	4.28	3.55	.95 (.84)	.46
IR = .70	.10	1	1.65	1.36	.36 (.40)	1.24	1.68	1.39	.31 (.30)	1.94	2.51	2.00	.61 (.51)	3.39	3.93	3.53	.82 (.76)	.52
	.20	1	1.42	1.26	.26 (.25)	1.21	1.46	1.29	.22 (.20)	1.83	2.13	1.90	.44 (.37)	3.11	3.52	3.30	.67 (.58)	.40
	.30	1	1.34	1.21	.21 (.20)	1.20	1.43	1.30	.21 (.17)	1.78	2.00	1.83	.35 (.30)	2.99	3.13	2.98	.49 (.38)	.31
	.40	1	1.32	1.19	.19 (.19)	1.19	1.40	1.27	.19 (.17)	1.75	1.93	1.78	.33 (.28)	2.93	3.11	2.97	.46 (.39)	.30
	.50	1	1.29	1.18	.18 (.18)	1.19	1.38	1.26	.18 (.16)	1.75	1.89	1.75	.32 (.25)	2.91	3.14	3.01	.48 (.39)	.29
	Ave.	1	1.41	1.24	.24 (.25)	1.21	1.47	1.30	.22 (.20)	1.81	2.09	1.86	.41 (.34)	3.07	3.36	3.16	.58 (.50)	.36
Overall average bias		1	1.41	1.24	.24 (.24)	1.40	1.66	1.47	.28 (.24)	2.37	2.83	2.44	.59 (.51)	4.28	5.58	4.40	1.28 (1.25)	.55

Note. OOR = overall odds ratio; IR = intercorrelations among the IV; DR = correlation between the DV and IV; True = true value of the OOR (population value); Est. = estimated value of the OOR (mean of the OOR estimates from 500 simulated samples); Cor. = estimated OOR corrected for bias (mean of the corrected OOR from 500 simulated samples); M_d = mean absolute difference between the estimated OOR and the true OOR; SD_d = standard deviation of the absolute difference between the estimated OOR and true OOR.

As can be seen in Table 2, the corrected OOR reproduced the population values reasonably well: The mean absolute difference ranges from .24 to 2.73 with an overall mean of .55. Although the value of 2.73 appears to be very high, that value occurred when the true OOR was 6.47 (IR = .00 and DR = .50); percentagewise, the difference is not large. For conditions where the population values of the OOR are 1.00 (i.e., where the predictors have no effect on the criterion), the correction yielded relatively less accurate estimates percentagewise, generally overestimating the true values from 17% (1.17) to 37% (1.37). This overestimation is actually expected given the imposed truncation on the minimum values for the corrected OOR (Equation 6b). Specifically, when the true value of Var(η)^1/2 is zero, its estimated value would be negative 50% of the time due to sampling error. Here, we had to set the corrected values at .00 because, theoretically, a standard deviation cannot be smaller than zero. Similar situations (i.e., overestimation due to theoretical truncations) have been found and discussed in past simulation studies (e.g., Le & Schmidt, 2006; Overton, 1998).

Closer examination of Table 2 further reveals that the accuracy of the corrected OOR improves as base rate increases. For example, with IR = .00, the mean absolute difference M_D averaged across levels of DR is 1.06 when base rate is .10 (the first row in Table 2); the mean absolute difference M_D (average across levels of DR) reduces to .67 when base rate increases to .50. As noted above, although mean absolute difference appears to increase with DR, this increase is proportional to the increase of the true OOR. Thus, percentagewise, there is no noticeable trend regarding effects of other simulation factors (IR and DR, except when DR = 0 as discussed earlier) on the accuracy of the corrected OOR.

Across all simulation conditions, a similar pattern emerges: Accuracy of the correction improves as the base rate increases. The bias (overestimation) of the corrected OOR is worst when DR = .00. The accuracy of the corrected OOR also appears to depend on sample size (accuracy increases as sample size increases) and, to a much lesser extent, data model (i.e., estimates for logit model are slightly more accurate than those for the probit model). Other factors (NV, IR, and DR, except when DR = .00) do not seem to significantly influence the accuracy of the corrected OOR. For example, for conditions with n = 500 and NV = 5 under the probit model, the averaged mean absolute difference M_D across levels of DR, IR, and base rate is .32 (compared with the averaged M_D of .55 in Table 2). With the same combination of conditions (n = 500 and NV = 5) under the logit model, the mean absolute difference M_D averaged across all levels of DR, IR, and base rate is .28. Overall, the jackknife correction procedure yields reasonably accurate estimates for the OOR in all simulation conditions. Although results for conditions under the logit model are slightly more accurate than those under the probit model, the differences are minimal and practically negligible.

Estimating Standard Error of the Corrected OOR

The standard error of the corrected estimate can be calculated based on the following equation (Sapra, 2002):

S E_{\tilde{θ}} = \sqrt{\frac{(n - 1)}{n} \sum {({\hat{θ}}_{(i)} - \bar{θ})}^{2}},

where ${\hat{θ}}_{(i)}$ is the statistic of interest estimated in the jackknifed sample i (i = 1,2, . . ., n) and $\bar{θ}$ is the mean of the statistic across all n jackknifed samples:

\bar{θ} = \frac{\sum {\hat{θ}}_{(i)}}{n} .

To estimate standard error for the corrected OOR, we replaced θ in the above equations with the OOR.

Simulation procedure and data analysis

We used the same data sets developed in the previous section. Equations (7) and (8) were used to estimate standard error for the corrected OOR. These values were then compared with the “true” standard errors, which were the observed standard deviations of the corrected OOR estimated across 500 samples in each condition. As in previous analyses, the mean and standard deviation of the absolute error (|Est_SE − Obs_SE|) across simulated data sets was calculated for each condition, where Est_SE is the standard error estimated by the jackknife procedure and Obs_SE is the observed standard deviation of the corrected OOR across 500 samples (true standard error). Simulations and analyses were performed using the previously discussed SAS program.

Results

There are three conditions where the jackknife procedure failed to provide appropriate estimates of the standard error. In these conditions, the estimates were seriously inflated (in the order of hundreds). All these conditions are those with the lowest base rate (.10) and highest correlation with the underlying continuous dependent variable (DR = .50), except where the intercorrelations between the independent variables are very high (IR = .70). It is likely that small sample size (100) coupled with low base rate made it difficult for the logistic regression models to converge (alternatively, the models may have converged with unrealistic results). For the remaining conditions, the standard error (SE) estimated by the jackknife procedure consistently overestimates the observed SE. Table 3 presents results for one representative combination of conditions (n = 500 and NV = 5) for the logit model. As can be seen, the overestimation ranges from .02 to .52, with a mean absolute difference averaged across all conditions of 0.08. Percentagewise, the overestimation is most serious when DR = .00, averaging at almost 36% (.04/.11). For other conditions (excluding those with DR = .00), the overestimation is less severe. These observations here can again be partially explained by the truncation of the distribution of the corrected OOR when its population value is 1.00. Although there is no clear pattern, results indicate that overestimation increases as the base rate decreases.

Table 3.

Monte Carlo Results for Logit Model (Number of IVs = 5, n = 500): Estimated Standard Error of the Corrected OOR

		Observed and Estimated Standard Errors of the Corrected OOR
		DR = .00			DR = .10			DR = .30			DR = .50			Average Bias
Intercorrelation Among the IV (IR)	Base Rate	Obs.	Est.	M_d (SD_d)	Obs.	Est.	M_d (SD_d)	Obs.	Est.	M_d (SD_d)	Obs.	Est.	M_d (SD_d)
IR = .00	.10	.16	.23	.06 (.04)	.22	.26	.05 (.04)	.86	1.03	.25 (.24)	—	—	— (—)	.12
	.20	.10	.12	.03 (.01)	.15	.16	.02 (.02)	.71	.85	.18 (.15)	—	—	— (—)	.08
	.30	.09	.12	.03 (.01)	.16	.16	.02 (.01)	.74	.85	.18 (.15)	—	—	— (—)	.08
	.40	.09	.12	.03 (.01)	.16	.16	.02 (.01)	.76	.82	.16 (.13)	—	—	— (—)	.07
	.50	.09	.12	.04 (.01)	.16	.16	.02 (.01)	.71	.82	.17 (.15)	—	—	— (—)	.07
	Ave.	.11	.14	.04 (.02)	.17	.18	.03 (.02)	.76	.87	.19 (.17)	—	—	— (—)	.08
IR = .30	.10	.15	.22	.06 (.04)	.19	.23	.04 (.04)	.37	.42	.09 (.07)	1.64	1.90	.52 (.49)	.18
	.20	.10	.13	.03 (.01)	.13	.14	.01 (.01)	.28	.30	.05 (.04)	1.36	1.64	.41 (.35)	.12
	.30	.09	.12	.03 (.01)	.14	.14	.01 (.01)	.27	.30	.05 (.04)	1.36	1.62	.39 (.35)	.12
	.40	.10	.13	.03 (.01)	.13	.14	.02 (.01)	.27	.30	.05 (.04)	1.41	1.61	.37 (.36)	.12
	.50	.09	.12	.03 (.01)	.14	.14	.02 (.01)	.29	.30	.05 (.03)	1.54	1.66	.40 (.36)	.12
	Ave.	.11	.14	.04 (.02)	.15	.16	.02 (.02)	.29	.32	.06 (.05)	1.46	1.69	.41 (.38)	.13
IR = .50	.10	.17	.22	.06 (.04)	.19	.23	.05 (.04)	.32	.35	.06 (.06)	.71	.86	.21 (.29)	.09
	.20	.10	.12	.02 (.01)	.13	.14	.01 (.01)	.23	.25	.04 (.03)	.61	.72	.15 (.13)	.06
	.30	.10	.12	.03 (.01)	.14	.14	.01 (.01)	.22	.25	.04 (.03)	.58	.69	.15 (.12)	.06
	.40	.09	.12	.03 (.01)	.12	.14	.02 (.01)	.22	.25	.04 (.03)	.67	.71	.14 (.11)	.06
	.50	.09	.12	.03 (.01)	.13	.14	.02 (.01)	.22	.25	.04 (.03)	.61	.71	.15 (.13)	.06
	Ave.	.11	.14	.04 (.02)	.14	.16	.02 (.02)	.24	.27	.04 (.04)	.64	.74	.16 (.14)	.06
IR = .70	.10	.16	.22	.06 (.04)	.18	.23	.05 (.03)	.28	.32	.06 (.05)	.54	.64	.14 (.12)	.08
	.20	.10	.13	.03 (.01)	.13	.14	.02 (.01)	.20	.21	.03 (.02)	.49	.53	.10 (.09)	.04
	.30	.10	.13	.03 (.01)	.13	.13	.01 (.01)	.19	.21	.03 (.02)	.44	.49	.09 (.07)	.04
	.40	.10	.12	.03 (.01)	.13	.13	.01 (.01)	.20	.21	.03 (.02)	.43	.47	.08 (.07)	.04
	.50	.10	.13	.03 (.01)	.11	.13	.02 (.01)	.20	.22	.03 (.03)	.43	.46	.08 (.06)	.04
	Ave.	.11	.14	.03 (.02)	.14	.15	.02 (.02)	.21	.23	.03 (.03)	.47	.52	.10 (.08)	.05
Overall average bias		.11	.14	.04 (.02)	.15	.16	.02 (.02)	.38	.42	.08 (.07)	.85	.98	.22 (.20)	.08

Note. IR = intercorrelations among the IV; DR = correlation between the DV and IV; Obs. = observed standard error of the corrected OOR (this is the standard deviation of the corrected OOR across 400 simulated samples); Est. = estimated standard error of the corrected OOR obtained from the bootstrapping procedure; M_d = mean absolute difference between the estimated SE and the observed SE; SD_d = standard deviation of the absolute difference between the estimated SE and the observed SE.

Across simulation conditions, the same pattern can be seen: The jackknife procedure generally overestimates the observed SE, and the overestimation is most serious (percentagewise) when DR = .00. The accuracy improved with larger sample size. Overestimation for conditions with large sample size (n = 500) is generally small across all conditions examined. With smaller sample size (n = 100), the overestimation is significantly larger. For example, for the combination of conditions with n = 100 and NV = 2 for the logit model, the averaged mean absolute difference is 0.43. Overall, estimates for conditions under the logit model are slightly more accurate than those under the probit models. However, the differences are small and mostly negligible. For example, under the probit model, averaged mean absolute difference M_D across levels of DR, IR, and base rate is 0.10 (compared with the averaged M_D of 0.08 in Table 3).

An Example of Using the OOR to Evaluate Model Effectiveness

To demonstrate the application of the OOR in organizational research, we used results from a recent study examining predictors of employee turnover (Barrick & Zimmerman, 2009). In that study, the researchers used three sets of predictors, including biodata, personality, and prehire attitudes, to predict voluntary turnover.⁵ It was found that the predictors significantly predicted employee turnover 6 months after hire (base rate = .20; n = 119). The Cox and Snell R² analogs $(R_{M}^{2})$ of the logistic regression models with only one predictor were reported to be .054 (for prehire attitudes), .064 (for personality), and .073 (for biodata). When the three sets of predictors were combined, the $R_{M}^{2}$ was estimated at .134. Although these values provide information regarding the relative strengths of the models, they are not intuitively meaningful and do not allow readers to compare their effectiveness with similar models based on different sets of predictors from other studies.

From the information provided in Barrick and Zimmerman (2009), we applied Equations (3) and (4) to calculate the OOR.⁶ The OORs for models with one predictor were found to be 1.70 (personality only), 2.02 (attitudes only), and 5.28 (biodata only). These values indicate the factors by which the odds of leaving the organization after 6 months will increase when the score on these predictors changes by one standard deviation. For example, for biodata, the odds of leaving for an employee whose score is one standard deviation above the mean of the biodata measure will be 5.28 times higher than those whose scores are at the mean. When all three predictors were included to predict turnover, the OOR is 7.01. This value indicates that the odds of leaving the organization after 6 months will increase approximately sevenfold when the score on the composite created by linearly combining the three predictors increases by one standard deviation. Compared with the model where only the biodata is included as the predictor, the model with all predictors is about 33% more effective (7.01/5.28 = 1.33). Similarly, it is 247% and 312% more effective than the model with attitudes only and personality only, respectively. Interpretation of the OOR for these models is intuitively meaningful and can be used to compare with models predicting turnover in other studies based on different predictors.

Discussion

Our study complements the work of Allen and Le (2008). Using a different data model (i.e., probit vs. the logit used in Allen and Le), we found that the OOR, together with the log likelihood ratio R-square analog $R_{L}^{2}$ , is less dependent on study base rate than other commonly used R² analogs in MLOGR. Monte Carlo simulation further confirmed the usefulness of the jackknife procedure (Sapra, 2002) for correcting the bias of the OOR when it is estimated in finite sample sizes under both logit and probit models. We also provided a procedure by which the standard error of this index can be estimated. When applied to estimate the standard error of the corrected OOR, the jackknife procedure was found to generally overestimate the true standard error, especially when the predictors were not related with the criterion Y. Arguably, these conditions are rare in applied research, where the predictors are likely to be selected based on theoretical reasons or past research findings. If these conditions are excluded, the accuracy of the jackknife procedure improves. Although this overestimation is still relatively large with small sample size (N = 100), it is arguably tolerable in studies with larger sample size (N = 500). The level of accuracy is comparable to past simulations examining methods for estimating standard deviations (Le & Schmidt, 2006). Thus, the jackknife procedure can provide an initial solution to the problem of estimating the standard error of overall effect size indices in MLOGR. As noted earlier, we developed an SAS macro for estimating the OOR, its standard error, and corrected value. The macro is available for any researchers who want to use the OOR index in their MLOGR analyses.

We believe that the OOR can be a useful index for applied researchers to evaluate and communicate research findings regarding the strength of their MLOGR models. Obviously, the OOR is not without limitations. In particular, it does not have a well-defined range with endpoints corresponding to perfect fit and complete lack of fit (.00 to 1.00 for R² or −1.00 to 1.00 for multiple R in MLR), which is a desirable feature for indices of overall effect size (Menard, 2000). We therefore recommend that the OOR be used together with the log likelihood ratio R² analog $(R_{L}^{2})$ , which has been suggested to be the most appropriate index for MLOGR (Menard, 2000). The desired characteristics of the $R_{L}^{2}$ (i.e., independence of base rate and variation appropriately reflecting model strength) were confirmed via simulation in the current study. As also evident from current results, the OOR is comparable to the $R_{L}^{2}$ on these characteristics. Furthermore, unlike the current indices for overall effect size in MLOGR (including $R_{L}^{2}$ ), information provided by the OOR is intuitively meaningful, thereby allowing applied researchers to communicate their results to consumers of research, and especially to practitioners. We hope that the current study contributes toward facilitating such communication by making the OOR more accessible to applied researchers and practitioners.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

References

Allen

(2008). An additional measure of overall effect size for logistic regression models. Journal of Educational and Behavioral Statistics, 33, 416-441.

Barrick

M. R.

Zimmerman

R. D.

(2009). Hiring for retention and performance. Human Resource Management, 48, 189-206.

Cox

D. R.

Snell

E. J.

(1989). The analysis of binary data (2nd ed.). London, England: Chapman & Hall.

Efron

(1981). Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Biometrika, 68, 589-599.

Fan

(2005). Using SAS for Monte Carlo simulation research in SEM. Structural Equation Modeling, 12, 299-333.

Hanushek

E. A.

Jackson

J. E.

(1977). Statistical methods for social scientists. New York, NY: Academic Press.

Hunter

J. E.

Schmidt

F. L.

(2004). Methods of meta-analysis: Correcting error and bias in research findings (2nd ed.). Thousand Oaks, CA: Sage.

Kemery

E. R.

Dunlap

Bedeian

A. G.

(1989). The employee separation process: Criterion-related issues associated with tenure and turnover. Journal of Management, 15, 417-424.

Schmidt

F. L.

(2006). Correcting for indirect range restriction in meta-analysis: Testing a new meta-analytic procedure. Psychological Methods, 11, 416-438.

10.

Liao

J. G.

McGee

(2003). Adjusted coefficients of determination for logistic regression. The American Statistician, 57, 161-165.

11.

Menard

(2000). Coefficients of determination for multiple logistic regression analysis. The American Statistician, 54, 17-24.

12.

Menard

(2002). Applied logistic regression analysis (2nd ed.). Thousand Oaks, CA: Sage.

13.

Nagelkerke

N. J. D.

(1991). A note on a general definition of the coefficient of determination. Biometrika, 78, 691-692.

14.

Overton

R. C.

(1998). A comparison of fixed effects and mixed (random effects) models for meta-analysis tests of moderator effects. Psychological Methods, 3, 354-379.

15.

Sapra

S. K.

(2002). A jackknife maximum likelihood estimator for the probit model. Applied Economics Letters, 9, 73-74.

16.

Schmitt

Coyle

B. W.

Rauschenberger

(1977). A Monte Carlo evaluation of three formula estimates of cross-validated multiple correlation. Psychological Bulletin, 84, 751-758.

17.

Wherry

R. J.

(1931). A new formula for predicting the shrinkage of the coefficient of multiple correlation. Annals of Mathematical Statistics, 2, 440-451.