Coefficient Omega Bootstrap Confidence Intervals

Abstract

The performance of the normal theory bootstrap (NTB), the percentile bootstrap (PB), and the bias-corrected and accelerated (BCa) bootstrap confidence intervals (CIs) for coefficient omega was assessed through a Monte Carlo simulation under conditions not previously investigated. Of particular interests were nonnormal Likert-type and binary items. The results show a clear order in performance. The NTB CI had the best performance in that it had more consistent acceptable coverage under the simulation conditions investigated. The results suggest that the NTB CI can be used for sample sizes larger than 50. The NTB CI is still a good choice for a sample size of 50 so long as there are more than 5 items. If one does not wish to make the normality assumption about coefficient omega, then the PB CI for sample sizes of 100 or more or the BCa CI for samples sizes of 150 or more are good choices.

Keywords

coefficient omega reliability composite reliability bootstrap confidence interval interval estimate nonnormality ordinal dichotomous binary

McDonald (1970) first proposed coefficient omega as a reliability index for homogenous items a little over 40 years ago. Coefficient omega uses the item factor loadings and uniqueness from a factor analytic model to compute reliability. As such, coefficient omega is a more intuitive reliability measure because it is based on factor loadings and not strictly on correlations as compared to coefficient alpha (Cronbach, 1951; Guttman, 1945). However, there are probably two reasons why it is rarely used in applied settings. First, its utilization in scientific research is dwarfed by coefficient alpha, which is well cited in the literature. More than half a century of cited research gives the impression that coefficient alpha is the only viable reliability coefficient. Second, knowledge of its statistical properties is narrow, in particular with respect to its statistical distribution, which is needed for inference.

Measurement error resulting from using multiple-item questionnaires, inventories, and other measurement instrument is a common issue faced by behavioral/social science researchers. Measurement error from such instruments is commonly quantified by a reliability coefficient. Typically, behavioral/social science researchers use these reliability measures to evaluate items to aid in the creation/modification of reliable measurement instruments. By far, the most common reliability coefficient used in the behavioral/social sciences is coefficient alpha (Hogan, Benjamin, & Brezinski, 2000; Peterson, 1994).

Coefficient alpha is an excellent estimator of internal consistency when used correctly. In addition, confidence interval (CI) estimates for coefficient have been developed (Maydeu-Olivares, Coffman, & Hartmann, 2007; Padilla, Divers, & Newton, 2012; Romano, Kromrey, & Hibbard, 2010; van Zyl, Neudecker, & Nel, 2000; Yuan, Guarnaccia, & Hayslip, 2003). However, coefficient alpha is a biased estimate of reliability when items are not at least tau-equivalent or essentially tau-equivalent (Graham, 2006; Lord, Novick, & Birnbaum, 1968; McDonald, 1999; Zinbarg, Revelle, Yovel, & Li, 2005). Tau-equivalence can best be described in terms of the classical true score model (CTSM) from classical test theory (CTT) in relation to a one-factor factor analysis model. We briefly highlight the most restrictive case to the least restrictive case of the CTSM.

The most restrictive case of the classical true score model assumes the items to be parallel. In this case, the model is written as

x_{ij} = τ_{i} + u_{i},

where x_ij is the observed score, τ_i is the true score, and u_i is the measurement error for $i = 1, 2, \dots, n$ individuals and $j = 1, 2, \dots, k$ items. Here the items are assumed to have the same true score and that the errors are uncorrelated and follow the same distribution with a mean of zero and a covariance matrix where the diagonal elements are all equal $(σ_{i}^{2} = σ_{j}^{2})$ and the off-diagonal elements are also equal $(σ_{i j} = σ_{j i})$ with $σ_{ij} \neq σ_{i}^{2} .$ The one-factor model from such a covariance matrix has one factor loading (λ) of multiplicity k. When items are parallel, coefficient alpha is equal to the reliability of the set of items.

A less restrictive case is when items are assumed to be tau-equivalent. For tau-equivalence, the model is written as

x_{ij} = τ_{i} + u_{ij} .

For essential tau-equivalence, the model is slightly modified as

x_{ij} = (a_{j} + τ_{i}) + u_{ij} .

In this case, the true scores differ by a unique constant (a_j). The main difference between Model (1) and Models (2) and (3) is that the error variances for the latter two models need not be equal. In this case, the item covariance matrix for the x_js need not have equal variances $(σ_{i}^{2} \neq σ_{j}^{2})$ but has equal covariances $(σ_{i j} = σ_{j i}) .$ In terms of the one-factor model from this covariance matrix, it also has one factor loading (λ) with multiplicity k. For Models (2) and (3), coefficient alpha is also equal to the reliability of the set of items.

Finally, the least restrictive case is when items are congeneric. Here, the model is written as

x_{ij} = (a_{j} + b_{j} τ_{i}) + u_{ij} .

The main difference between tau-equivalent and congeneric items is the linear relationship (b_j) between the true (τ_i) and observed (x_ij) scores. The item covariance matrix here need not have equal variances $(σ_{i}^{2} \neq σ_{j}^{2})$ and covariances $(σ_{i j} \neq σ_{j i}) .$ The one-factor model from such a covariance matrix has unequal factor loadings (λ_j). In this final case, coefficient alpha underestimates the reliability of a set of items (Zinbarg et al., 2005) while coefficient omega is unbiased (McDonald, 1999; Zinbarg et al., 2005). It should be noted that coefficients alpha and omega are both unbiased when the set of items that are at least tau-equivalent.

Raykov (1998) proposed a bootstrap percentile CI for the composite reliability of congeneric items measuring a common dimension (Raykov, 1997). The method is specified as a structural equation model (SEM) and showed promise. An illustration of the method with 1,000 bootstrap samples was provided for a small simulation study that considered a sample size of 400 individuals and 6 multivariate normal congeneric items assuming unidimensionality.

In another study, Raykov (2002) derived the standard error for the composite reliability via the delta method, a non-bootstrap method. As before, the model is specified through an SEM framework. It showed promise, and was illustrated with a small simulation. The simulation included a sample size of 500 individuals and 5 multivariate normal congeneric items assuming unidimensionality. It is important to note that the delta method CI was compared to the bootstrap percentile CI with 2,000 bootstrap samples. Both methods had comparable results.

In a parallel study, Raykov and Shrout (2002) presented a more general form of the composite reliability again within the SEM framework with bootstrapped percentile CIs. The method extends the previous method by Raykov (1997, 1998). As before, the authors illustrated the method with 1,000 bootstrap samples in a small simulation that had a sample size of 300 individuals, 6 multivariate normal congeneric items, and a model that assumed two dimensions. The simulation provided evidence that the composite reliability estimate is unbiased and the CIs contain the population parameter.

In more recent publications (Raykov, 2012; Raykov & Marcoulides, 2011), the aforementioned bootstrap and non-bootstrap methods are discussed, and the non-bootstrap method is also illustrated on large example data sets (i.e., n≥350). An important consideration here is that the logit transformation was added to the non-bootstrap method because the reliability parameter is bounded by $[0, 1] .$ In terms of the non-bootstrap method, the authors indicate the method is applicable to approximately continuous items having a multivariate normal distribution. In addition, the method is applicable to nonnormal items having at least 5 to 7 response categories with the use the robust maximum likelihood (MLR) estimator. The authors further indicate that the MLR estimator can also be used with items with less than 5 response categories by using a three-step procedure that includes parcels. For further details about using the three-step procedure, see Raykov and Marcoulides (2011). In terms of the bootstrap method, the authors point out that “its large-sample properties” should be examined. This observation is in relation to the asymptotic theory that underlies SEM (Bollen, 1989; Raykov, 1998, 2002; Raykov & Shrout, 2002).

The composite reliability CI studies are a promising beginning, but there is still more research to conduct. First, the impact of nonnormal and noncontinuous items remains to be evaluated. Second, the large samples properties of the methods remain unknown. These two conditions have important implications for applied work because items are rarely normal (Micceri, 1989) and continuous (Raykov, 2002) and knowledge of a sufficient sample size for robustness/inference is vital.

The purpose here is to investigate the performance of coefficient omega bootstrap CIs under simulation conditions that investigate the situations described above. Of particular interest is the impact of nonnormality, Likert/ordinal (e.g., categorical) and binary items, and sample sizes less than 300. We begin with the definition of coefficient omega.

Coefficient Omega and Reliability

Consider a set of k items $x_{1}, x_{2}, \dots, x_{k}$ in a measurement instrument designed to measure a single construct, factor, or latent variable. In the behavioral/social sciences, it is common to compute the reliability of the composite/sum score $x = \sum_{j = 1}^{k} x_{j} .$ Reliability of the composite/sum score is defined as

ρ = \frac{σ_{τ}^{2}}{σ_{x}^{2}} = \frac{σ_{τ}^{2}}{σ_{τ}^{2} + σ_{u}^{2}},

where $σ_{x}^{2}$ denotes the observed score variance, $σ_{τ}^{2}$ the true score variance, and $σ_{u}^{2}$ the error variance. This definition of reliability assumes that all the items are parallel (Allen & Yen, 1979; Crocker & Algina, 1986).

Coefficient omega for congeneric items is defined as

ω = \frac{{(\sum_{j = 1}^{k} λ_{j})}^{2}}{{(\sum_{j = 1}^{k} λ_{j})}^{2} + \sum_{j = 1}^{k} ψ_{j}},

where λ_j and ψ_j are the jth factor loading and uniqueness, respectively (McDonald, 1970, 1999). Coefficient omega is estimated $(\hat{ω})$ by using sample estimates ${\hat{λ}}_{j}$ and ${\hat{ψ}}_{j}$ in Equation (6). While there are several methods for estimating ${\hat{λ}}_{j},$ here ML will be used.

Bootstrapped Coefficient Omega CIs

Bootstrapping for coefficient omega can be summarized in three steps. Suppose $X = {(x_{1}, x_{2}, \dots, x_{n})}^{t}$ are the observed data where each x_i is a 1 ×k vector. First, obtain a bootstrap sample $X^{(b)} = {(x_{1}^{(b)}, x_{2}^{(b)}, \dots, x_{n}^{(b)})}^{t},$ which is the bth random resample from X with replacement. Note that X and X^(b) have the same sample size. Second, compute and store the bth bootstrap estimate of coefficient omega $({\hat{ω}}^{(b)})$ from $X^{(b)} .$ Last, the stored estimates ${\hat{ω}}^{(1)}, {\hat{ω}}^{(2)}, \dots, {\hat{ω}}^{(B)}$ represent the empirical sampling distribution (ESD) of $\hat{ω}$ for $b = 1, 2, \dots, B$ bootstrap samples. The ESD can then be summarized for statistical inference about ω. Typical estimates are the bootstrap percentiles/quantiles, mean, and standard error (SE). The bootstrap estimate of SE is

SE (\hat{ω}) = {[\frac{1}{B - 1} \sum_{b = 1}^{B} {({\hat{ω}}^{(b)} - \bar{ω})}^{2}]}^{1 / 2},

where

\bar{ω} = \frac{1}{B} \sum_{b = 1}^{B} {\hat{ω}}^{(b)} .

The three most common bootstrap CIs were examined. First, the normal theory bootstrap (NTB) CI is estimated as $\hat{ω} \pm Z_{α / 2} SE (\hat{ω}) .$ Second, the percentile bootstrap (PB) CI is obtained by computing the $α / α 2$ and $1 - α / α 2$ percentiles from the $\hat{ω}$ ESD where α is the significance level (i.e., type I error rate). Third, the bias-corrected and accelerated (BCa) CI is an improved version of the PB CI in that it adjusts the PB CI $α / α 2$ and $1 - α / α 2$ percentiles in two ways: (a) it makes a correction for bias and (b) it makes a correction for skewness (or acceleration). Note that the NTB CI assumes that the ESD is normally distributed, whereas the PB and BCa make no assumption about the shape of the ESD. For technical and theoretical details concerning the bootstrap and the three CIs investigated, see Efron and Tibshirani (1998).

The bootstrap is a general statistical method with several attractive features (Efron & Tibshirani, 1998). However, its most notable feature is its application to situations where the theoretical distribution of a statistic of interest is complicated or unknown. To date, the distribution for coefficient omega remains unknown. Thus, this presents an ideal situation for the bootstrap and in particular for the PB and BCa CIs.

It should be noted that bootstrapping has been investigated for coefficient alpha (Padilla et al., 2012). In that study, the bootstrapped CIs performed well across a variety of simulation conditions that included items that were noncontinuous and a variety of sample sizes.

Method

Simulation Design

Five different simulation factors were investigated in a 4 (# of items) × 3 (corr. type) × 4 (# of item response categories) × 2 (distribution type) × 6 (sample size) Monte Carlo simulation design for a total of 576 conditions. All simulated items were nonnormal and Likert-type (ordinal) or binary; none of the items were continuous. For each simulation condition, 1,000 replications were obtained.

Likert-type and binary items were generated using the method used in Maydeu-Olivares et al. (2007). This method is outlined below:

Select the structure of the k×k correlation matrix P, where k is the number of items.

Select a set of thresholds ν to categorize items to a predetermined skewness and kurtosis.

Generate an n×k multivariate data matrix X* ~ N(0, P), where n is the sample size.

Categorize the generated data X* using the thresholds in ν to generate the dataset X. Each variable x in X is categorized by the thresholds as follows: $x = m$ if $ν_{m} < x * < ν_{m + 1}$ for $m = 0, 1, \dots, M - 1$ where $ν_{0} = - \infty$ and $ν_{M} = \infty,$ and M is the number of categories.

Compute the true population coefficient omega (ω) according to P and the thresholds in ν. See Maydeu-Olivares et al. (2007) for full details.

Estimate the coefficient omega bootstrapped CIs from X as outlined above.

Determine if the bootstrapped CIs include ω.

Below are the specific simulation conditions investigated.

Number of items (k)

Past research on coefficient alpha has looked at various numbers of items ranging from 2 to 20 (Duhachek & Iacobucci, 2004; Enders, 2003; Maydeu-Olivares et al., 2007). To make the results here consistent for coefficient omega, the following number of items were selected: k = 5, 10, 15, and 20.

Item correlation type (ρ)

Three different item correlation structures P were investigated. The first two correlation structures were from a parallel-item one-factor model with common loadings $λ = . 55 or . 705$ . These two parameters generated compound symmetric item correlation structures with $ρ = . 30 or . 56,$ respectively. The third correlation structure was generated from a congeneric item one-factor model with loading of $λ = . 3, . 4, . 5, . 6, . 7 .$ The third correlation structure was used by Maydeu-Olivares et al. (2007), but modified here for cases with multiples of 5 items instead of 7.

Item response categories (IRCs)

Four item response categories were investigated: 2, 3, 5, and 7. As noted above, none of the items were continuous. Item 4 above highlights the categories that were used.

Distribution type

Two different distribution types were investigated. When IRC = 2 (i.e., binary items), ν was chosen so that the distributions had the following characteristics:

Type 1: skewness = 1.70 and kurtosis = 0.88

Type 2: skewness = 0.41 and kurtosis = −1.83

The Type 2 distribution for binary categorization was studied by Maydeu-Olivares et al. (2007). When IRC = 3, 5, 7, ν was chosen so that the distributions had the following characteristics:

Type 1: skewness = 0 and kurtosis = 0.88

Type 2: skewness = 0.97 and kurtosis = −0.20

These two categorizations were studied by Maydeu-Olivares et al. (2007). The combination of number of items, item correlations, and item categorization created a range of .43 to .95 for ω.

Sample size (n)

The following typical sample sizes in behavioral/social science research were investigated: n = 50, 100, 150, 200, 250, 300. Duhachek and Iacobucci (2004) point out that $n > 200$ reaches a point of diminishing returns for reliability estimates. However, here we went a little further in order to be conservative.

In each simulation replication, coefficient omega and corresponding quantities were estimated. Relative bias for coefficient omega was computed as

{\hat{ω}}_{bias} = \frac{\hat{ω} - ω}{ω} .

Here, the 100(1 −α)% CIs for coefficient omega were estimated from a total of 2,000 bootstrap samples, where $α = . 05 .$ CI coverage was assessed using Bradley’s (1978) liberal criteria, which is defined as $1 - 1.5 α \leq 1 - α * \leq 1 - 0.5 α$ where α* is the true Type I error probability. Coverage is defined as the proportion of estimated CIs that contain α. Therefore, acceptable coverage is given by [.925, .975].

Results

Point Estimate Bias

The estimate of bias was investigated because it can have an impact on bootstrap CIs. However, all combinations of the simulation conditions were inspected and no noticeable bias was observed. By far, the largest bias was observed in four instances with the Type 1 distribution, binary items, and a sample size of 50. Specifically, ${\hat{ω}}_{bias} = - . 05$ with a compound item correlation matrix with $ρ = . 30$ and 10 items. For an unstructured correlation matrix and 5, 10, and 15 items, ${\hat{ω}}_{bias} = . 06, - . 06, and$ .05, respectively.

Confidence Interval Coverage

The NTB CI had the best performance in terms of coverage. However, the major impact on the CIs was sample size. Thus, the results are presented in the context of sample size. To preserve space, only tables for sample sizes of 50 to 150 will be presented because of the clear impact on CI coverage.

Sample size of 50

Here, the NTB CI was slightly impacted, but the PB and BCa CIs were heavily impacted. Results are presented in Table 1. For Type 1 distributions, the NTB CI only had 7 instances of unacceptable coverage with 5 binary items (7/576 = .012)

Table 1.

95% Coverage Probabilities for a Sample Size of 50.

	ρ	.30				.56				Unstructured
IRC	k	5	10	15	20	5	10	15	20	5	10	15	20
2^a	NTB	.913	.942	.950	.971	.918	.951	.975	.986	.897	.943	.950	.942
	PB	.936	.899	.882	.862	.931	.905	.912	.905	.935	.912	.875	.856
	BCa	.897	.893	.919	.932	.919	.927	.930	.939	.888	.887	.902	.913
3^a	NTB	.951	.969	.976	.966	.932	.947	.949	.947	.934	.974	.978	.975
	PB	.949	.927	.891	.910	.928	.908	.895	.914	.941	.932	.912	.896
	BCa	.905	.944	.933	.931	.930	.922	.917	.924	.903	.949	.945	.943
5^a	NTB	.943	.967	.979	.975	.945	.937	.938	.945	.952	.959	.967	.963
	PB	.927	.925	.920	.908	.927	.922	.917	.912	.954	.912	.918	.913
	BCa	.909	.938	.946	.922	.928	.919	.925	.923	.930	.917	.934	.926
7^a	NTB	.925	.962	.971	.968	.933	.948	.939	.938	.946	.975	.971	.949
	PB	.925	.915	.905	.933	.925	.919	.911	.906	.942	.924	.929	.908
	BCa	.895	.928	.919	.945	.915	.929	.916	.923	.915	.953	.940	.908
2^b	NTB	.918	.958	.983	.970	.935	.948	.956	.953	.917	.965	.973	.974
	PB	.933	.938	.944	.956	.937	.940	.964	.946	.937	.950	.943	.956
	BCa	.919	.943	.971	.973	.938	.951	.963	.952	.926	.961	.959	.972
3^b	NTB	.946	.971	.977	.970	.938	.959	.961	.940	.936	.966	.966	.976
	PB	.950	.942	.916	.925	.944	.929	.937	.919	.942	.927	.922	.921
	BCa	.909	.952	.950	.946	.935	.938	.940	.920	.915	.942	.938	.951
5^b	NTB	.937	.973	.972	.971	.943	.944	.944	.942	.941	.963	.968	.965
	PB	.944	.930	.916	.917	.937	.928	.918	.923	.945	.938	.926	.926
	BCa	.915	.948	.945	.944	.942	.930	.920	.922	.911	.946	.946	.939
7^b	NTB	.932	.977	.970	.964	.934	.947	.948	.951	.927	.963	.959	.964
	PB	.942	.940	.933	.917	.919	.930	.919	.934	.928	.940	.919	.926
	BCa	.897	.956	.934	.935	.919	.927	.932	.937	.910	.946	.930	.937

Note. IRC = item response category; NTB = normal theory bootstrap; PB = percentile bootstrap; BCa = biased-corrected and accelerated bootstrap. Unacceptable coverage is bolded and outside [.925, .975]. All methods based on 2,000 bootstrap samples and 1,000 simulation replications.

Type 1^a Dist.: skew = 1.70 and kurtosis = 0.88 for IRC = 2; skew = 0 and kurtosis = 0.88 for IRC = 3, 5, 7.

Type 2^b Dist.: skew = 0.41 and kurtosis = −1.83 for IRC = 2; skew = 0.97 and kurtosis = −0.20 for IRC = 3, 5, 7.

On the other hand, the PB and BCa CIs consistently had coverage below the acceptable range (42/576 = .073 and BCa = 36/576 = .063, respectively). Most of the impact occurred with Type 1 distributions (31/576 = .054 and 25/576 = .043, respectively), and mostly with binary items (9/576 = .016 and 8/576 = .014, respectively). With 3-, 5-, and 7-point Likert-type items, the PB and BCa CIs had sporadic unacceptable coverage.

With Type 2 distributions, the NTB CI had unacceptable coverage in 3 instances for binary items with a compound symmetric $(ρ = . 30)$ and unstructured correlation matrix (3/576 = .005). For 3-point Likert-type items, the NTB CI has 2 instances of unacceptable coverage (2/576 = .003). Here, the BCa CI had unacceptable coverage with 5 binary items and a compound symmetric correlation matrix $(ρ = . 30)$ (5/576 = .009). For 3-, 5-, and 7-point Likert-type items, the PB CI tended to have unacceptable coverage for 15 to 20 items (12/576 = .021); the BCa CI tended to have unacceptable coverage with 5 items (7/576 = .012).

Sample size of 100

At this sample size, the methods began to stabilize. Results are presented in Table 2. With Type 1 distributions, the NTB CI only had one instance of unacceptable coverage for 20 binary items with a compound symmetric correlation structure $(ρ = . 30)$ . Continuing with binary items, the PB CI tended to have unacceptable coverage for 10 to 20 items (5/576 = .009); the BCa CI tended to have unacceptable coverage with 5 items and a compound symmetric $(ρ = . 30)$ and unstructured correlation matrix (2/576 = .003). There were 6 instances where the BCa CI had unacceptable coverage for 5 items with 3-, 5-, and 7-point Likert-type responses with a compound symmetric $(ρ = . 30)$ and unstructured correlation matrix (6/576 = .010). For 10 items, the BCa CI had unacceptable coverage with 5- and 7-point Likert-type responses with a compound symmetric correlation matrix $(ρ = . 30)$ (2/576 = .003).

Table 2.

95% Coverage Probabilities for a Sample Size of 100.

	ρ	.30				.56				Unstructured
IRC	k	5	10	15	20	5	10	15	20	5	10	15	20
2^a	NTB	.940	.953	.961	.981	.948	.943	.946	.950	.942	.956	.967	.965
	PB	.946	.922	.903	.912	.953	.928	.917	.934	.950	.931	.927	.906
	BCa	.896	.944	.946	.956	.954	.944	.936	.953	.918	.948	.965	.940
3^a	NTB	.933	.955	.954	.939	.939	.947	.947	.952	.937	.967	.960	.932
	PB	.942	.925	.930	.924	.933	.935	.934	.944	.939	.942	.942	.932
	BCa	.910	.938	.937	.927	.936	.944	.944	.952	.910	.949	.946	.934
5^a	NTB	.929	.933	.947	.949	.945	.933	.939	.946	.928	.950	.954	.954
	PB	.931	.922	.932	.937	.943	.920	.926	.933	.932	.927	.950	.928
	BCa	.924	.918	.934	.937	.941	.927	.935	.941	.905	.936	.947	.939
7^a	NTB	.928	.941	.948	.949	.944	.939	.948	.946	.950	.958	.964	.950
	PB	.928	.929	.939	.946	.938	.931	.937	.928	.932	.939	.950	.937
	BCa	.911	.924	.943	.937	.940	.932	.940	.942	.910	.941	.945	.935
2^b	NTB	.927	.964	.955	.962	.950	.944	.960	.949	.933	.959	.962	.957
	PB	.934	.949	.947	.957	.955	.937	.952	.938	.942	.945	.946	.946
	BCa	.926	.962	.952	.966	.955	.944	.957	.947	.939	.958	.955	.946
3^b	NTB	.944	.959	.943	.960	.951	.942	.929	.944	.942	.955	.952	.945
	PB	.936	.932	.934	.943	.945	.930	.927	.932	.942	.931	.930	.928
	BCa	.917	.943	.938	.947	.949	.937	.922	.939	.911	.947	.941	.929
5^b	NTB	.941	.959	.939	.958	.936	.937	.946	.940	.941	.961	.954	.958
	PB	.952	.942	.928	.944	.937	.933	.939	.926	.952	.950	.938	.959
	BCa	.930	.945	.926	.950	.939	.933	.938	.930	.924	.951	.938	.955
7^b	NTB	.942	.952	.953	.959	.934	.937	.944	.937	.949	.954	.945	.935
	PB	.943	.944	.941	.950	.940	.926	.940	.926	.954	.942	.941	.924
	BCa	.932	.945	.951	.952	.938	.930	.946	.931	.940	.940	.937	.922

Type 1^a Dist.: skew = 1.70 and kurtosis = 0.88 for IRC = 2; skew = 0 and kurtosis = .88 for IRC = 3, 5, 7.

Type 2^b Dist.: skew = 0.41 and kurtosis = −1.83 for IRC = 2; skew = 0.97 and kurtosis = −0.20 for IRC = 3, 5, 7.

For Type 2 distributions, the PB CI had one instance of unacceptable coverage for 20 7-point Likert-type items with an unstructured correlation matrix. The BCa CI had 3 instances of unacceptable coverage for 5 items with a compound symmetric $(ρ = . 30)$ and unstructured correlation matrix (3/576 = .005). It also had two instances of unacceptable coverage with 15 and 20 items with a compound symmetric $(ρ = . 56)$ and unstructured correlation matrix (2/576 = .003).

For the remaining sample sizes, only the PB and BCa CIs were impacted.

Sample size of 150

With this sample size, only the PB and BCa CIs were impacted. Results are presented in Table 3. Here, most unacceptable coverage occurred with Type 1 distributions. For binary items, the PB CI had two instances of unacceptable coverage for 10 and 15 items with a compound symmetric correlation matrix $(ρ = . 30)$ (2/576 = .003); the BCa CI also had two instances of unacceptable coverage for 5 items with compound symmetric $(ρ = . 30)$ and unstructured correlation matrices (2/576 = .003). For 3-point Likert-type items, the BCa CI had unacceptable coverage in one stance for each type of correlation (3/576 = .005); both NTB and PB CIs had one instance of unacceptable coverage for 10 items with a compound symmetric correlation matrix $(ρ = . 56)$ . For 5-point Likert-type items, the PB CI had an instance of unacceptable coverage for compound symmetric $(ρ = . 30)$ and unstructured correlation matrices (2/576 = .003); the BCa CI had a single instance of unacceptable coverage for 5 items with an unstructured correlation matrix. Last, the BCa CI had a single instance of unacceptable coverage for 10 items with an unstructured correlation matrix.

Table 3.

95% Coverage Probabilities for a Sample Size of 150.

	ρ	.30				.56				Unstructured
IRC	k	5	10	15	20	5	10	15	20	5	10	15	20
2^a	NTB	.952	.952	.950	.958	.949	.942	.947	.930	.945	.945	.963	.961
	PB	.952	.916	.921	.930	.957	.928	.942	.927	.961	.928	.929	.942
	BCa	.909	.940	.939	.947	.955	.937	.946	.932	.906	.942	.956	.950
3^a	NTB	.937	.942	.940	.952	.942	.922	.947	.946	.959	.939	.935	.951
	PB	.930	.941	.933	.948	.936	.921	.948	.945	.950	.937	.932	.948
	BCa	.920	.938	.935	.944	.942	.919	.954	.948	.923	.934	.931	.944
5^a	NTB	.944	.950	.942	.944	.946	.949	.946	.947	.925	.953	.946	.940
	PB	.943	.940	.924	.936	.934	.942	.939	.941	.919	.941	.944	.929
	BCa	.937	.941	.929	.936	.940	.947	.939	.946	.903	.944	.942	.934
7^a	NTB	.946	.953	.946	.940	.951	.944	.942	.954	.948	.927	.944	.948
	PB	.948	.939	.942	.936	.944	.940	.942	.944	.946	.926	.942	.940
	BCa	.940	.942	.941	.936	.949	.945	.938	.949	.935	.923	.938	.941
2^b	NTB	.956	.949	.969	.950	.945	.949	.945	.937	.938	.963	.955	.957
	PB	.964	.946	.963	.948	.950	.950	.948	.941	.942	.958	.944	.954
	BCa	.955	.951	.967	.951	.944	.952	.949	.945	.922	.967	.947	.955
3^b	NTB	.948	.954	.936	.943	.943	.943	.959	.940	.939	.945	.952	.947
	PB	.955	.946	.935	.933	.947	.937	.943	.947	.938	.944	.937	.942
	BCa	.941	.942	.937	.932	.945	.943	.952	.946	.920	.940	.938	.941
5^b	NTB	.944	.956	.950	.947	.948	.944	.943	.946	.937	.950	.944	.946
	PB	.943	.954	.951	.946	.939	.943	.937	.948	.936	.936	.942	.935
	BCa	.938	.949	.950	.945	.935	.944	.941	.949	.920	.938	.945	.938
7^b	NTB	.937	.954	.956	.945	.946	.943	.958	.938	.939	.949	.954	.951
	PB	.940	.934	.942	.929	.947	.942	.954	.939	.939	.944	.956	.952
	BCa	.941	.935	.944	.934	.945	.942	.956	.940	.926	.945	.956	.955

Type 1^a Dist.: skew = 1.70 and kurtosis = 0.88 for IRC = 2; skew = 0 and kurtosis = .88 for IRC = 3, 5, 7.

Type 2^b Dist.: skew = 0.41 and kurtosis = −1.83 for IRC = 2; skew = 0.97 and kurtosis = −0.20 for IRC = 3, 5, 7.

Only the BCa CI was impacted with Type 2 distributions. Specifically, the BCa CI had unacceptable coverage for 5 binary, 3-, and 5-point Likert-type items (3/576 = .005).

Sample size of 200

In this instance again, most unacceptable coverage occurred with Type 1 distributions. The PB CI had coverage probability of .919 $(C I_{prob} = . 919)$ with 20 binary items with an unstructured correlation matrix; the BCa CI had $C I_{prob} = . 918, . 914$ with 5 binary items and compound symmetric $(ρ = . 30)$ and unstructured correlation matrices, respectively. The BCa CI also had $C I_{prob} = . 921$ with five 3-point Likert-type items with an unstructured correlation matrix.

With Type 2 distributions, the BCa CI had $C I_{prob} = . 923$ for 5 binary items with an unstructured correlation matrix.

Sample size of 250

Here, all unacceptable coverage occurred with Type 1 distributions. The PB CI had $C I_{prob} = . 923$ for 10 binary items with a compound symmetric $(ρ = . 30)$ item correlation matrix and $C I_{prob} = . 920$ for 20 five-point Likert-types items with an unstructured correlation. For the BCa CI, $C I_{prob} = . 922, . 905$ with 5 binary items and compound symmetric $(ρ = . 30)$ and unstructured correlation matrices, respectively. The BCa CI also had $C I_{prob} = . 923$ for 20 five-point Likert-type items with an unstructured correlation matrix.

Sample size of 300

In this situation, only a single instance of unacceptable coverage with Type 2 distributions occurred. Specifically, the BCa CI had $C I_{prob} = . 923$ for 15 five-point Likert-type items with a compound symmetric $(ρ = . 30)$ item correlation matrix.

Figure 1 displays the 95% CI coverage for each CI method by sample size. The figure clearly displays the sample size impact. While all CIs had much variability with a sample size of 50, the PB and BCa CIs tended to have coverage below 95%. Another noticeable feature is that as the sample size increases all of the CIs stabilize within the acceptable coverage of $[. 925, . 975]$ . However, coverage for the NTB CI was within acceptable range for sample sizes larger than 50.

Figure 1.

Distribution of 95% CI coverage for method by sample size.

Discussion

The NTB, PB, and BCa CI estimates for coefficient omega were investigated via a simulation study. The coefficient omega CI estimate considered in this simulation is for a composite of unidimensional (i.e., one factor or latent variable) congeneric items. To date, no study has thoroughly investigated the performance of CI estimates for composite reliability of unidimensional congeneric items. Of particular interest was the impact of nonnormality, Likert-type and binary items, and a sample size of less than 300. The results indicate that the NTB CI had the best coverage across all of the simulation conditions investigated. Even so, the major impact was sample size.

While the sample size impacted all three bootstrap CIs, it was most noticeable for the PB and BCa CIs. In general, all CIs underperformed when sample size was 50. However, the NTB CI was the best performing CI with only 8 instances of unacceptable coverage that tended to occur with 5 binary items. On the other hand, the PB and BCa CIs were heavily and negatively impacted with a sample size of 50. Most of the impact occurred with Type 1 distributions, mostly with binary items. Coverage was better with Type 2 distributions. However, the BCa CI was still underperforming with 5 items whereas the PB CI was impacted with 15 to 20 items.

When the sample size is 100 or more, with the exception of the NTB CI, coverage for the PB and BCa CIs began to stabilize. Here, the NTB CI had two sporadic instances of unacceptable coverage with a sample size of 100 and 150, respectively. With Type 1 distributions, the PB and BCa had several instances of unacceptable coverage, most of which occurred with binary items. With Type 2 distributions, there were fewer instances of unacceptable coverage for the PB and BCa CIs. However, the BCa CI tended to underperform with 5 items and an unstructured item correlation matrix.

One feature to note is that the CIs had issues with binary items under Type 1 distributions with a small sample size (n = 50). In this situation, the distribution had skewness = 1.70. In fact, this was the largest skewness in the simulation. This suggests that a sample size of 50 might be too small to provide and accurate estimate of the SE required for the NTB CI and the $α / α 2 2$ and $1 - α / α 2 2$ percentiles required for the PB and BCa CIs. We also note that these percentiles are usually located near the tails of the empirical distribution function (EDF), which is the cumulative distribution associated with the ESD. There are three potential solutions. One is to have more parametric knowledge of the EDF or use some sort of smoothing in the estimation of the EDF for the NTB CI (Efron & Tibshirani, 1998). While not derived here, the evidence in the results suggests that the ESD for coefficient omega may be normal. It should be noted that van Zyl et al. (2000) showed that the estimate of coefficient alpha to be normally distributed, and coefficient alpha is similar to coefficient omega. Second, for the PB and BCa CIs, the m-out-of-n method may be useful (see Chernick & LaBudde, 2011). We reiterate that this was only an issue for small sample sizes, and only occurred with 5 items for the NTB CI. Therefore, a third solution is to just increase the sample size. When the sample size increased this situation was no longer an issue for any of the CIs.

Within the context of the simulation conditions investigated, there is a clear order of preference among the three proposed bootstrap CIs. The NTB CI had the best performance in that it had acceptable coverage under all but 16 simulation conditions (560/576 = .972). This was followed by the PB and BCa CIs, whose performance was similar to one another (516/576 = .896 and 508/576 = .882, respectively). A noticeable feature is that while all CIs tend to be slightly below 95%, the NTB CI is the most liberal and the PB CI most conservative. As mentioned above, all methods were impacted with a sample size of 50 with Type 1 distributed items, and in particular with binary items; however, the NTB CI was only impacted with 5 items. Even so, the PB and BCa CIs stabilized with a sample size of 100 or more. Therefore, the NTB CI can be used for sample sizes larger than 50. The NTB CI is still a good choice for a sample size of 50 so long as there are more than 5 items. If one does not wish to make the normality assumption about the ESD, then the PB CI for sample sizes of 100 or more or the BCa CI for samples sizes of 150 or more are good choices.

In spite of the promising results, more research is still warranted. One aspect future research can investigate is a comparison between the bootstrap CIs proposed here and the non-bootstrap CIs from previous research such as the delta based or the three-step methods with logit transformation (Raykov, 2012; Raykov & Marcoulides, 2011). The results here were assessed for three different correlations matrices of which two were compound symmetric; one correlation matrix was unstructured. However, it is unlikely that data will conform to a compound symmetric correlation matrix in applied settings. Thus, future research should investigate the impact of correlations matrices that deviate from compound symmetry on coefficient omega CI estimation. Last, missing data are a reality in the behavioral/social sciences. Therefore, another possible aspect to consider in future research is the impact of missing data.

Through the results provided here, four advantages can be pointed out regarding the application of the bootstrap CIs investigated. First, none of the investigated items were generated from a normal distribution. Second, none of the investigated items were continuous; all were Likert-type or binary. Third, all conditions were investigated with a sample size of 50 to 300. Last, the type of correlation structure did not have an impact. As such, coefficient omega appears to be appropriate for parallel and congeneric items.

In summary, the performance of three bootstrap CIs for coefficient omega was investigated under conditions not previously considered in published work. The focus was primarily on scenarios with nonnormal Likert-type or binary items. Such scenarios arise quite often in applied work (Micceri, 1989; Raykov, 2002). The results provide clear evidence in terms of performance between the three CIs investigated. The CIs discussed here are perhaps known and have been applied in other contexts. However, here it is shown how they can be applied to this particular problem, and more important, their performance under these less explored conditions is fully described. It is hoped that the results can guide researchers when they need to address this type of reliability problem.

Finally, interested readers can obtain a free and easy-to-use R function for the coefficient omega bootstrap CIs investigated here with example data by visiting the corresponding author’s website (www.omegalab-padilla.org).

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Allen

M. J.

Yen

W. M.

(1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.

Bollen

K. A.

(1989). Structural equations with latent variables. New York, NY: Wiley.

Bradley

J. V.

(1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144-152. doi:10.1111/j.2044-8317.1978.tb00581.x

Chernick

M. R.

LaBudde

R. A.

(2011). An introduction to bootstrap methods with applications to R. Hoboken, NJ: Wiley.

Crocker

L. M.

Algina

(1986). Introduction to classical and modern test theory. New York, NY: Holt, Rinehart, & Winston.

Cronbach

(1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334. doi:10.1007/bf02310555

Duhachek

Iacobucci

(2004). Alpha’s standard error (ASE): An accurate and precise confidence interval estimate. Journal of Applied Psychology, 89, 792-808. doi:10.1037/0021-9010.89.5.792

Efron

Tibshirani

(1998). An introduction to the bootstrap. Boca Raton, FL: CRC Press.

Enders

C. K.

(2003). Using the expectation maximization algorithm to estimate coefficient alpha for scales with item-level missing data. Psychological Methods, 8, 322-337. doi:10.1037/1082-989x.8.3.322

10.

Graham

J. M.

(2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: What they are and how to use them. Educational and Psychological Measurement, 66, 930-944. doi:10.1177/0013164406288165

11.

Guttman

(1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255-282. doi:10.1007/bf02288892

12.

Hogan

T. P.

Benjamin

Brezinski

K. L.

(2000). Reliability methods: A note on the frequency of use of various types. Educational and Psychological Measurement, 60, 523-531. doi:10.1177/00131640021970691

13.

Lord

Novick

Birnbaum

(1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

14.

Maydeu-Olivares

Coffman

D. L.

Hartmann

W. M.

(2007). Asymptotically distribution-free (ADF) interval estimation of coefficient alpha. Psychological Methods, 12, 157-176. doi:10.1037/1082-989x.12.2.157

15.

McDonald

R. P.

(1970). The theoretical foundations of principal factor analysis, canonical factor analysis, and alpha factor analysis. British Journal of Mathematical and Statistical Psychology, 23, 1-21. doi:10.1111/j.2044-8317.1970.tb00432.x

16.

McDonald

R. P.

(1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.

17.

Micceri

(1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156-166. doi:10.1037/0033-2909.105.1.156

18.

Padilla

M. A.

Divers

Newton

(2012). Coefficient alpha bootstrap confidence interval under nonnormality. Applied Psychological Measurement, 36, 331-348. doi:10.1177/0146621612445470

19.

Peterson

R. A.

(1994). A meta-analysis of Cronbach’s coefficient alpha. Journal of Consumer Research, 21, 381-391.

20.

Raykov

(1997). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21, 173-184. doi:10.1177/01466216970212006

21.

Raykov

(1998). A method for obtaining standard errors and confidence intervals of composite reliability for congeneric items. Applied Psychological Measurement, 22, 369-374. doi:10.1177/014662169802200406

22.

Raykov

(2002). Analytic estimation of standard error and confidence interval for scale reliability. Multivariate Behavioral Research, 37, 89-103. doi:10.1207/s15327906mbr3701_04

23.

Raykov

(2012). Scale construction and development using structural equation modeling. In Hoyle

R. H.

(Ed.), Handbook of structural equation modeling (pp. 472-492). New York, NY: Guilford Press.

24.

Raykov

Marcoulides

G. A.

(2011). Introduction to psychometric theory. New York, NY: Routledge.

25.

Raykov

Shrout

P. E.

(2002). Reliability of scales with general structure: Point and interval estimation using a structural equation modeling approach. Structural Equation Modeling, 9, 195-212. doi:10.1207/s15328007sem0902_3

26.

Romano

J. L.

Kromrey

J. D.

Hibbard

S. T.

(2010). A Monte Carlo study of eight confidence interval methods for coefficient alpha. Educational and Psychological Measurement, 70, 376-393. doi:10.1177/0013164409355690

27.

van Zyl

Neudecker

Nel

(2000). On the distribution of the maximum likelihood estimator of Cronbach’s alpha. Psychometrika, 65, 271-280. doi:10.1007/bf02296146

28.

Yuan

K.-H.

Guarnaccia

C. A.

Hayslip

Jr. (2003). A study of the distribution of sample coefficient alpha with the Hopkins Symptom Checklist: Bootstrap versus asymptotics. Educational and Psychological Measurement, 63, 5-23. doi:10.1177/0013164402239314

29.

Zinbarg

R. E.

Revelle

Yovel

(2005). Cronbach’s α, Revelle’s β, and Mcdonald’s ωH: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70, 123-133. doi:10.1007/s11336-003-0974-7