Making Reliability Reliable

Abstract

The current conventions for test score reliability coefficients are unsystematic and chaotic. Reliability coefficients have long been denoted using names that are unrelated to each other, with each formula being generated through different methods, and they have been represented inconsistently. Such inconsistency prevents organizational researchers from understanding the whole picture and misleads them into using coefficient alpha unconditionally. This study provides a systematic naming convention, formula-generating methods, and methods of representing each of the reliability coefficients. This study offers an easy-to-use solution to the issue of choosing between coefficient alpha and composite reliability. This study introduces a calculator that enables its users to obtain the values of various multidimensional reliability coefficients with a few mouse clicks. This study also presents illustrative numerical examples to provide a better understanding of the characteristics and computations of reliability coefficients.

Keywords

reliability coefficient alpha structural equation modeling tau-equivalency hierarchical omega

Coefficient alpha (hereinafter alpha) is the most commonly used single-administration test score reliability coefficient (hereinafter reliability coefficient). Whereas previous studies such as Cortina (1993) and Schmitt (1996) offered influential lessons on alpha for organizational researchers, it is still commonly misconceived and widely misused (Cho & Kim, 2015; Dunn, Baguley, & Brunsden, 2013; Green & Yang, 2009a; Raykov, 2012; Sijtsma, 2009a; Yang & Green, 2011). Another study that focuses only on alpha is not likely to resolve the chronic misconceptions and misapplications. A better approach to ascertaining the characteristics of alpha is considering other reliability coefficients together with alpha. Once we know the commonalities and differences between alpha and other reliability coefficients, we can naturally discern the conditions under which it should or should not be used. However, there is an obstacle that prevents organizational researchers from looking at the big picture.

The current conventions for reliability coefficients are haphazard and undisciplined. Our knowledge of reliability was not built in a day by a genius. For more than a hundred years, numerous researchers (e.g., Brown, 1910; Spearman, 1910) have developed reliability coefficients in various ways, but during this process, they were named, interpreted and expressed inconsistently. The present of reliability coefficients is locked in the past (i.e., path dependence). The way reliability coefficients are currently being named, computed, and used lacks reliability, which makes it difficult for new users of reliability to determine the whole picture.

This study attempts to improve the reliability of reliability coefficients. I describe my approach as systematic because I propose a system composed of reliability coefficients and that can conditionally suggest appropriate reliability coefficients depending on the characteristics of the data. The system includes most of the reliability coefficients commonly used in real-world data analyses or explained in research methods textbooks such as the Spearman–Brown formula, the Flanagan–Rulon formula, standardized alpha, alpha, stratified alpha, McDonald’s omega, and so-called composite reliability.

This study proves that various reliability coefficients are generated from measurement models nested within the bifactor measurement model. The idea of estimating reliability based on a measurement model in the large framework of structural equation modeling (SEM) is not new. For example, Miller (1995) employed an SEM path diagram to explain the meaning of alpha and its correct use. McDonald (1985, 1999) and Zinbarg, Revelle, and Yovel (2007) argued that alpha is a special case of omega when the data meet a certain prerequisite. This study is an extension of such previous studies, and it offers a more comprehensive analysis on a number of reliability coefficients and their algebraically equivalent variations instead of concentrating on one or two.

This study consists of three sections. The “problems” section examines the current practice and declares that alpha is ill positioned as a representative of reliability coefficients. The “a systematic approach” section claims that a successful repositioning of reliability coefficients may be based on their renaming and formula re-expressions. The “examples” section offers various computation examples and introduces a gadget that calculates various reliability coefficients with a few mouse clicks.

Problems in Current Practice

This study begins by determining what the problem is and assessing how widespread it is.¹ In previous studies criticizing alpha’s misuse, such misuse has typically consisted of one or both of the following types:

Alpha is most frequently used even though it is not the most accurate reliability coefficient (i.e., it is overused)

Alpha’s use is unqualified if its assumptions such as tau-equivalency are not examined (i.e., it is incorrectly used)

However, little research has empirically examined the fundamental premise that alpha is overused and/or incorrectly used in practice. One may raise a counterargument that alpha is not as severely misused as the existing literature suggests. For example, a reasonable argument is that although alpha was overused in the past, organizational researchers are increasingly switching from alpha to other reliability coefficients based on the influence of recent methodological studies that discourage the use of alpha. Another plausible expectation is that although articles appearing in less prestigious journals may use alpha without examining its assumptions, high-impact journal articles demonstrate exemplary use of reliability coefficients because the reviewers and editors demand higher standards of methodological rigor. To evaluate how seriously alpha is misused in organizational research, this study examined two elite journals, namely, the Academy of Management Journal (AMJ) and the Journal of Applied Psychology (JAP).

In addition to diagnosing whether alpha is overused and/or incorrectly used, this study addresses two other issues relevant to current practice. First, this study examines what terms are currently used to denote reliability coefficients. This is necessary because the study will later discuss the unreliability of reliability coefficients’ names. Second, this study considers whether the use of confirmatory factor analysis (CFA) or SEM was reported. Composite reliability, which is based on a unidimensional SEM measurement model, is the second most frequently used reliability coefficient in organizational research (Peterson & Kim, 2013). This study predicts that the use of SEM will have an effect on the choice of reliability coefficient: Those studies that employ SEM are more likely to report composite reliability rather than alpha, and those studies that do not rely on SEM are more likely to use alpha.

Let us explain the method of data collection. After searching all articles except editorials that were published in AMJ and JAP during the years 2013 and 2014, I included empirical studies that reported single-administration test score reliability estimates and excluded studies that used other types of reliability (e.g., interrater reliability) and meta-analysis. The sample consisted of 42 AMJ articles from a total of 145 (29.0%) and 96 JAP articles from a total of 147 (65.3%). When multiple names were used to express the same reliability coefficient, the unabridged or more descriptive ones were recorded. For example, if both Cronbach’s alpha and α were used, the former was coded as the name. If both internal consistency reliability and α were used, the latter was coded as the name.

Table 1 shows the results. More than 80% of the studies used alpha. No studies reported the use of any reliability coefficients other than alpha and composite reliability. Approximately 10% of the studies did not clearly indicate what reliability coefficients were used. No studies examined the assumption of tau-equivalency. A total of 16 versions of alpha’s name were recorded. Among them, frequently used names were Cronbach’s alpha, coefficient alpha and α.

Table 1.

The Current Use of Reliability Coefficients in Two Top Journals.

		AMJ (n = 42)		JAP (n = 96)
		n	%	n	%
Used reliability coefficients	Alpha only	35	83.3	84	87.5
	CR only	1	2.4	1	1.0
	Alpha and CR	2	4.8	2	2.1
	Not identified	4	9.5	9	9.4
Tau-equivalency	Examined	0	0.0	0	0.0
Tau-equivalency	Not examined	42	100.0	96	100.0
Use of CFA/SEM	Reported	27	64.3	44	45.8
Use of CFA/SEM	Not reported	15	35.7	52	54.2
Name of alpha	Cronbach’s alpha	13	35.1	30	34.9
	Coefficient alpha	7	18.9	15	17.4
	α	10	27.0	31	36.0
	Others	7	18.9	10	11.6

Note: AMJ is the Academy of Management Journal, JAP is the Journal of Applied Psychology, CR is composite reliability, CFA is confirmatory factor analysis, and SEM is structural equation modeling. The percentages in the “name of alpha” row are the ratios of each cell to the number of those studies that reported the use of alpha, and the percentages in other rows are the ratio of each cell to the total number of examined studies of each journal. Names such as internal consistency reliability, internal consistency, and reliability were classified as “not identified.” Names such as Cronbach alpha, Cronbach’s alpha reliability, and Cronbach’s α were classified as “Cronbach’s alpha.” Names such as alpha, alpha reliability, and alpha coefficients were classified as “others.”

Alpha’s overuse was more serious than anticipated. First, the use of SEM had little effect on the choice of reliability coefficients. Most studies that employed SEM still used alpha instead of SEM-based reliability coefficients such as composite reliability. Second, 1 in 10 studies unexpectedly did not specify the name of the utilized reliability coefficients. A plausible explanation for why researchers report reliability estimates namelessly is that they take the use of alpha for granted and thus feel little need to report a commonplace thing.

Alpha’s incorrect use was also more severe than expected. Not a single study examined tau-equivalency. That is, organizational researchers are automatically using alpha without considering whether its assumptions are satisfied. Such unconditional choice probably stems from one or some combination of the following misconceptions:

Alpha is a versatile reliability coefficient that is applicable to any type of data

Alpha is robust to any violation of its assumptions (i.e., even a serious violation of the assumptions has an insignificant effect on the value of the reliability estimate)

A high value of alpha itself verifies that its assumptions are satisfied

Identifying whether alpha’s assumptions are satisfied is difficult

Alternative methods that can be used when its assumptions are violated are difficult to use

This study claims that each of the above statements is incorrect by providing formula derivations (i.e., Misconception 1), counterexamples (i.e., Misconceptions 2 and 3), illustrative examples, and an easy-to-use solution (i.e., Misconceptions 4 and 5). For example, by showing various computation examples, this study demonstrates that alpha can produce estimation errors as large as .14 when it is misapplied to data that violate one of its assumptions. In addition to disproving such misconceptions, understanding what caused them is necessary for finding a fundamental solution.

A brief review of alpha’s history is useful for identifying the underlying source of its misuse. Its popularity did not originate from its technical superiority. It became a de facto standard even at the existence of a mathematically superior predecessor (λ ₂: Guttman, 1945) for several reasons that seemed important at the time of Cronbach (1951) but that are trivial from a modern view (Cho & Kim, 2015). First, alpha’s computation was simpler. Second, Cronbach (1951) positioned alpha as a reliability coefficient, whereas Guttman (1945) described λ_i as lower-bound estimators of reliability, which was mathematically correct but represented an unpopular description. Third, Cronbach’s (1951) proof that alpha equals the average of the reliability values (λ ₄: Guttman, 1945) that are calculated for all possible split-halves positioned it as a general reliability coefficient. Although it seems intuitively attractive, the average is not as meaningful as the maximum (Osburn, 2000) or the minimum (Revelle, 1979) of the λ ₄ values that are obtained from all possible split-halves (for a modern interpretation, see Hunt & Bentler, 2015). A positive feedback loop and past popularity bred today’s situation. Once the habit of unconditionally using alpha was formed, it prospered despite the development of more sophisticated methods.

Alpha’s habitual use is a matter not of mathematics but of marketing. Alpha ranked consistently low in previous comparison studies that examined the accuracy of reliability coefficients (Kamata, Turhan, & Darandari, 2003; Osburn, 2000; Revelle & Zinbarg, 2009; Tang & Cui, 2012; van der Ark, van der Palm, & Sijtsma, 2011). What differentiates alpha from other reliability coefficients is that the awareness of the name alpha outdistances that of any other reliability coefficients. In other words, its name is the main cause of its immense use. Analyzing the reason for the phenomenal citation record of Cronbach (1951), Cronbach (1978) echoed this argument by stating that “I am sure the paper is cited mostly because I put a brand name on a common-place coefficient” (p. 263).

His comments capture the essence; alpha is a brand name. The reason a researcher automatically chooses alpha without understanding its formula is analogous to the reason why a laundry detergent consumer habitually selects a familiar brand not knowing its chemical composition. Even if a comparison study were to reveal that the most popular brand underperforms its competitors, the top-of-mind brand (e.g., Gillette) would not lose significant market share. This phenomenon is what is occurring with alpha. Its use frequency remains unchanged despite the unfavorable results of several performance tests. However, if the brand name becomes null and void for some reason, its sales volume will rapidly decrease. Previous studies that placed sole reliance on a mathematical approach failed to change the 65-year-old habit. A deep-rooted problem requires a more comprehensive solution that includes a radical cure. Rebranding alpha is an efficient way to reposition it into the place where it belongs.

A Systematic Approach to Reliability Coefficients

Measurement Models

Before proceeding with the discussion, let us explain the measurement models used in this study starting from unidimensional models. Figure 1 provides a description of SEM styles regarding unidimensional parallel, tau-equivalent and congeneric measurement models. The term unidimensional will be omitted when little possibility of confusion exists. The modifiers strictly and essentially indicate whether item means are constrained to be equal. For example, an essentially tau-equivalent model includes a constant, whereas a strictly tau-equivalent model does not. Although the addition of a constant has an effect on the mean, it does not affect the variances, covariances or the value of reliability. This study focuses on essentially parallel/tau-equivalent/congeneric models and omits the term essentially for simplicity. Manifest variables (X ₁, X ₂, …) have a common latent variable (F) and errors (e ₁, e ₂, …). Errors are assumed to be purely random and independent of each other. To determine the scale, the variance of the latent variable is set to a nonzero number (typically 1.0). The congeneric model does not have additional constraints. The tau-equivalent model is the same as the congeneric model, only with the constraint that all the factor loadings are equal. The parallel model is the tau-equivalent model with the constraint that the error variances are all equal.

Figure 1.

The parallel, tau-equivalent, and congeneric measurement models.

Table 2 presents interitem covariance matrices of data that perfectly satisfy the conditions of being parallel, tau-equivalent and congeneric. Covariances (i.e., off-diagonal elements of the covariance matrices) between observed item scores are determined only by the common latent variable, whereas variances (i.e., diagonal elements of the covariance matrices) of item scores are determined by the common latent variable and errors. Parallel data have equal interitem covariances and equal item variances. Tau-equivalent data have equal interitem covariances, but they may have different item variances. Congeneric data do not require the equality constraints about variances and covariances. Any parallel data are also tau-equivalent, and any tau-equivalent data are also congeneric.

Table 2.

Covariance Matrices of Data That Satisfy the Conditions of Being Parallel, Tau-equivalent, and Congeneric.

	A. Observed Test Score Variance				B. True Score Variance				C. Error Variance
Parallel reliability $\begin{array}{l} {\hat{ρ}}_{P} = (3^{2} \cdot 6) \\ / (3^{2} \cdot 6 + 3 \cdot 4) \\ = .8181 \end{array}$		X1	X2	X3		X1	X2	X3		X1	X2	X3
	X1	10	6	6	X1	6	6	6	X1	4	0	0
	X2	6	10	6	X2	6	6	6	X2	0	4	0
	X3	6	6	10	X3	6	6	6	X3	0	0	4
Tau-equivalent reliability ${\hat{ρ}}_{T} = (3^{2} \cdot 5) / 60 = .75$		X1	X2	X3		X1	X2	X3		X1	X2	X3
	X1	8	5	5	X1	5	5	5	X1	3	0	0
	X2	5	10	5	X2	5	5	5	X2	0	5	0
	X3	5	5	12	X3	5	5	5	X3	0	0	7
Congeneric reliability $\begin{array}{l} {\hat{ρ}}_{C} = {(2 + 3 + 4)}^{2} / \\ 100 = .81 \end{array}$		X1	X2	X3		X1	X2	X3		X1	X2	X3
	X1	10	6	8	X1	2²	3·2	4·2	X1	6	0	0
	X2	6	16	12	X2	2·3	3²	4·3	X2	0	7	0
	X3	8	12	22	X3	2·4	3·4	4²	X3	0	0	6

Note: If tau-equivalent reliability (e.g., alpha) is misapplied to the above congeneric data, it underestimates the reliability by .03 ( ${\hat{ρ}}_{T} = \hat{α} = (3^{2} \cdot ((6 + 8 + 12) / 3)) / 100 = .78$ ).

Now, let us consider multidimensional models. Three major models conceptualize multidimensionality in SEM (Figure 2): the correlated factor model, the second-order factor model, and the bifactor model. A unidimensional model consists of only a common construct (i.e., T in Figure 1) and omits subtest constructs. A correlated factor model includes only subtest constructs (i.e., D_p in Figure 2) and neglects a common construct. The common construct of a bifactor model is called a general factor (i.e., F), and its subtest constructs are called group factors (i.e., G_p ). The common construct in a second-order factor model is called a second-order factor (i.e., Q), and its subtest constructs are called first-order factors (i.e., O_p ). To determine the scale, the variances of common constructs and subtest constructs are usually set to 1.0 (i.e., $V a r (G_{p}) = V a r (F) = V a r (O_{p}) = V a r (Q) = V a r (D_{p}) = 1$ for all p).

Figure 2.

Three major multidimensional measurement models.

The bifactor model is a generalization of the second-order factor model, and the latter is nested within the former. Mathematically speaking, the latter is equivalent to the former only under the proportionality constraint (Yung, Thissen, & McLeod, 1999). Before explaining the proportionality constraint, this study notes that group factors are defined to be independent of a general factor but that first-order factors are dependent on a second-order factor. Disturbances (i.e., ζ_i ) are mathematically analogous to group factors because both explain the variances that are not explained by a common construct. The variance due to the disturbance is proportional to the variance due to the second-order factor between manifest variables that have the same first-order factor. For example, let us consider Y ₁ and Y ₂. Any effect of the second-order factor (i.e., Q) or the disturbance on Y ₁ or Y ₂ must be mediated by the coefficients λ ₁₁ or λ ₂₁.Y ₁’s ratio of the variance due to the disturbance to the variance due to the second-order factor (i.e., $λ_{11}^{2} σ_{ζ_{1}}^{2} / λ_{11}^{2} γ_{1}^{2}$ ) is equal to that of Y ₂ (i.e., $λ_{21}^{2} σ_{ζ_{1}}^{2} / λ_{21}^{2} γ_{1}^{2}$ ). The corresponding ratio of X ₁ (i.e., $λ_{11}^{2} / λ_{1 F}^{2}$ ) may differ from that of X ₂ (i.e., $λ_{21}^{2} / λ_{2 F}^{2}$ ) in a bifactor model.

This study offers a direct formula that computes the omega coefficient of a second-order factor model without a transformation. McDonald derived omega from a bifactor model. Applying its formula to a second-order factor model requires a Schmid–Leiman transformation (Schmid & Leiman, 1957) of the parameter estimates (Brunner, Nagy, & Wilhelm, 2012; Yung et al., 1999), which is unfamiliar to typical organizational researchers. A direct formula provides an easier computation and a better understanding of its meaning.

This study introduces multidimensional parallel models and multidimensional tau-equivalent models. The conditions of being parallel and tau-equivalent have only been discussed in unidimensional models in the literature. If such restrictions were so useful in deriving meaningful reliability coefficients (e.g., alpha and standardized alpha) from unidimensional models, they must be equally advantageous to multidimensional models. The second-order factor parallel model (Figure 3) requires four restrictions: The path coefficients between the second-order factor and all first-order factors are restricted to be equal to each other (i.e., γ_p = γ for all p), the first-order factor loadings of all items are restricted to be equal to each other (i.e., λ_i = λ for all i), all first-order factors are restricted to have equal numbers of items (i.e., n_p = n for all p), and the errors of all items are restricted to be equal to each other (i.e., $σ_{e_{i}}^{2} = σ_{e}^{2}$ for all i). The second-order factor tau-equivalent model requires three restrictions that are identical to those of the second-order factor parallel model, except for the equality constraint of the error variances (i.e., $σ_{e_{i}}^{2} \neq σ_{e_{j}}^{2}$ ). Appendix B shows that the reliability coefficients that are derived from bifactor parallel/tau-equivalent models are identical to the reliability coefficients that are derived from second-order factor parallel/tau-equivalent models.

Figure 3.

Two suggested multidimensional measurement models.

Systematic Nomenclature

Table 3 presents the names of reliability coefficients that are currently used in the literature. The conventional names of reliability coefficients are not user friendly. First, they do not deliver meaningful information to the users. For example, names such as Spearman–Brown or Flanagan–Rulon provide only the names of those who discovered these formulas without expanding on their characteristics. Although respect for these scholars is displayed, the names do not consider the needs of users.

Table 3.

Names of Reliability Coefficients Currently Used in the Literature.

	Unidimensional		Multidimensional
	Split-Half	General	General
Parallel	Spearman–Brown formula	Standardized alpha	(Not yet published)
Tau-equivalent	Flanagan–Rulon formula Flanagan formula Rulon formula Guttman’s λ ₄	Cronbach’s alpha Coefficient alpha Guttman’s λ ₃ Hoyt method KR-20	Stratified alpha
Congeneric	Raju (1970) coefficient Angoff–Feldt coefficient Angoff coefficient	Composite reliability Construct reliability Congeneric reliability Omega Unidimensional omega Raju (1977) coefficient Classical congeneric reliability coefficient	Omega Omega total McDonald’s omega Multidimensional omega

Second, the names are inconsistent. Some are called formulas (e.g., Spearman–Brown and Flanagan–Rulon), some are called coefficients (e.g., alpha), and others are called lower bounds (e.g., Guttman’s λ_i ). Some bear their originators’ names (e.g., Spearman–Brown), and some use a combination of the first and second developers’ names (e.g., Flanagan–Rulon). One estimator goes by the name of the fourth person to propose it (Cronbach, 1951), and others do not bear the name of any developers.

Third, they are not mutually exclusive. Formulas that are algebraically equivalent have different names, such as the Flanagan–Rulon formula and Guttman’s λ ₄. Without background knowledge, users may accept these names as referring to different formulas. On the other hand, one name is used to represent multiple formulas. McDonald (1978) defined omega in a multidimensional context and later used the term regardless of the dimensionality (McDonald, 1999). Previous studies are increasingly using omega as a general name for various SEM-based reliability coefficients (Brunner et al., 2012; Cho & Kim, 2015; Dunn et al., 2013; Green & Yang, 2015; Lucke, 2005; Padilla & Divers, 2013; Revelle & Zinbarg, 2009). Whenever readers encounter the term omega, they must understand the context to know exactly which formula was used. Although methodologists may accept such mixed use as being convenient, it can confuse nonexperts. Raju coefficients also require special attention; Raju (1970, 1977) coefficients require the specification of the years of publication to avoid confusion because the researcher proposed two reliability coefficients.

Fourth, the current nomenclature is not expandable. Not all reliability coefficients have names. Table 3 indicates that a reliability estimator based on a multidimensional parallel model is a theoretically possible reliability coefficient that no one has yet formally proposed, thereby leading to its lack of a name. While generally accepted rules for naming a newly developed reliability coefficient do not exist, a naming method that was popular in the past is choosing one of the Greek letters, for example, α(Cronbach, 1951), β (Revelle, 1979), λ (Guttman, 1945), ω (Ten Berge & Zegers, 1978), θ (Armor, 1974), and ω (McDonald, 1978). This naming strategy is not sustainable because we are running short of Greek letters. Greek letters such as σ, ∊, ρ, and τ are almost exclusively used for frequently used methodological notions, and most of the remaining Greek letters (e.g., φ, ζ, γ, and ξ) are habitually used in the SEM literature. Using one of them as the name of a reliability coefficient would confuse users.

Table 4 shows the names proposed by this study. A systematic nomenclature should be informative, consistent, mutually exclusive and expandable. It should effectively and economically convey the characteristics of each method as well as their commonalities and differences. A systematic name of a reliability coefficient should also be combined with its use. For example, imagine that Fisher (1925) named the formula of two-way ANOVA (analysis of variance) after himself. We would spend a long time memorizing when to use the Fisher formula. If we rename alpha as tau-equivalent reliability, we do not need to learn the conditions under which it should be used once we know the tau-equivalent measurement model. A systematic nomenclature becomes easier once we become accustomed to it.

Table 4.

Names and Notations of Reliability Coefficients Suggested in This Study.

	Unidimensional		Multidimensional
	Split-Half	General	General
Parallel	Split-half parallel reliability (ρ_SP )	Parallel reliability (ρ_P )	Multidimensional parallel reliability (ρ_MP )
Tau-equivalent	Split-half tau-equivalent reliability (ρ_ST )	Tau-equivalent reliability (ρ_T )	Multidimensional tau-equivalent reliability (ρ_MT )
Congeneric	Split-half congeneric reliability (ρ_SC )	Congeneric reliability (ρ_C )	Bifactor model Bifactor reliability (ρ_BF ) Second-order factor model Second-order factor reliability (ρ_SOF ) Correlated factors model Correlated factors reliability (ρ_CF )

This study provides additional cautionary tales on the expressions Cronbach’s alpha and composite reliability because they are the most commonly used but misleading names. First, let us consider Cronbach’s alpha. At the time of Cronbach’s (1951) publication, this formula was already “a common-place coefficient” (Cronbach, 1978, p. 263) and usually called KR-20. Kuder and Richardson (1937) proposed several new reliability formulas but did not name them. The designation “KR-20” referred to its being the 20th formula in their article. Cronbach (1951) claimed that KR-20 was an awkward name for something that would be used frequently; thus, he proposed a new name: coefficient alpha. This name, however, had the potential to be confusing because the name alpha was regularly used in research methodology textbooks to denote other concepts such as significance level (e.g., α = .05). Thus, calling it Cronbach’s alpha would have provided the users more discernibility than simply referring to it as alpha. Such convention, however, gives a misleading impression that Cronbach first developed the formula, and Cronbach (2004) opposed the expression Cronbach’s alpha.

Next, let us consider composite reliability. At least seven names are used to represent a reliability estimator based on a congeneric model in the literature: composite reliability, construct reliability (e.g., Hair, Black, Babin, & Anderson, 2010), congeneric reliability (e.g., Graham, 2006; Lucke, 2005), classical congeneric reliability coefficient (Feldt & Brennan, 1989), omega (McDonald, 1999), unidimensional omega (Cho & Kim, 2015) and Raju (1977) coefficient. Among them, composite reliability is the least appropriate for the name of a specific reliability coefficient because it is shorthand for the reliability of composite scores (Cho & Kim, 2015). This name may even cause erroneous associations such as with complex or synthesized reliability. Below is my criticism in terms of history.

The originator of this formula intended to use the term congeneric, not composite. The term composite reliability first appeared in Werts, Rock, Linn, and Jöreskog (1978), in which the authors called it simply “reliability” but referred to it as “the composite reliability” once when they compared it with single-item reliability. Few pioneers of this formula called it composite reliability or any other term. It is ironic that the lack of an appropriate alternative caused the unintended name to become widely used. Jöreskog (1971) also did not suggest a special name for the reliability coefficient he developed and simply called it reliability. However, the term congeneric is what he coined to describe its measurement model. Congeneric reliability is a name that honors his contribution.

Readers should consider the parallel use of conventional and systematic names if necessary. Some conventional names, such as alpha, are too deeply embedded in our memory to remove in the short term. Even if a researcher uses only a systematic name in his or her study, the reviewers and readers are likely to be unfamiliar with it. Using both conventional and systematic names may enhance the fidelity of communication. For example, tau-equivalent reliability can be denoted as “tau-equivalent reliability (i.e., alpha).” If the systematic nomenclature becomes more common in the future, it will substitute the conventional nomenclature in the long term.

Systematic Derivation of the Formula

Although understanding how a formula was derived is the best way to understand its essence, the literature on reliability often overlooks the derivations of formulas. Four problems are associated with this practice. First, few articles comprehensively and systematically address the derivation of formulas regarding various reliability coefficients. Second, due to the lack of related research, the various original studies that discovered these formulas are being used as the sole source of derivations. Third, the derivational methods used in the original studies were not able to utilize later, more efficient methodologies, which means that the original studies use more complex methods even though simpler derivational methods have since been developed. Fourth, most reliability coefficients were derived using different sets of logic, leading to a lack of consistency in derivational methods. For example, to derive formulas that are algebraically equivalent, Hoyt (1941) used the ANOVA approach, whereas Cronbach (1951) calculated the average of the split-half reliability coefficient obtained from all possible split halves.

The current study proves that various reliability coefficients can be derived from measurement models nested within the bifactor model. The goal of the current study is to propose a derivational method for formulas that clearly reveal the basic assumptions of each reliability coefficient and the difference between each reliability coefficient. The derivations based on the unidimensional model are displayed here, and the derivations based on multidimensional models will appear in Appendix B.

The definition of reliability

Let us define test score reliability in the unidimensional model or classical true score model (Lord & Novick, 1968). Consider a test that is composed of k dichotomously or polytomously scored items. This study defines the test score X as the sum of k observed scores X_i : $X = \sum_{i = 1}^{k} X_{i}$ . The unidimensional model assumes that X_i consists of two unobserved scores: the true score T_i and the error e_i . This study assumes that no specific factors exist to circumvent the controversial issue of specific factors and errors (Bollen, 1989). T_i is further deconstructed into two components such that $T_{i} = μ_{i} + λ_{i} T$ . Thus, the observed score of item i is expressed as $X_{i} = T_{i} + e_{i} = μ_{i} + λ_{i} T + e_{i}$ . The constant μ_i has essentially no effect on variances or covariances and thus has no effect on the value of the reliability. Factor loading λ_i is interpreted as the importance, discrimination or effective length of item i. The variance of a latent variable is fixed to unity (i.e., Var(T) = 1) without loss of generality to determine the scale. Errors are purely random and are not correlated with the true score (i.e., $C o v (T_{i}, e_{i}) = C o v (T_{i}, e_{j}) = 0$ ). This study further assumes that errors are not correlated with each other (i.e., $C o v (e_{i}, e_{j}) = 0$ ). The reliability is defined as the ratio of the true score variance to the test score variance: $ρ_{X X^{'}} = \frac{σ_{T}^{2}}{σ_{X}^{2}} = \frac{σ_{T}^{2}}{σ_{T}^{2} + σ_{E}^{2}}$ .

The unidimensional parallel model

The unidimensional parallel model restricts the factor loading and error variance of each item to be equal (i.e., λ_i = λ and $σ_{e_{i}}^{2} = σ_{e}^{2}$ for all i). The variance of item i is $λ^{2} + σ_{e}^{2}$ , and the covariance between item i and j is λ² . Let c denote $λ^{2} + σ_{e}^{2}$ .

Special case (k = 2)

Let ρ₁₂ denote the ratio of λ² to c. ρ ₁₂ equals the product-moment correlation between the first and second items (or split half). The interitem covariance matrix is as follows:

Σ_{s p} = | \begin{matrix} λ^{2} + σ_{e}^{2} & λ^{2} \\ λ^{2} & λ^{2} + σ_{e}^{2} \end{matrix} | = c | \begin{matrix} 1 & ρ_{12} \\ ρ_{12} & 1 \end{matrix} | = c | \begin{matrix} ρ_{12} & ρ_{12} \\ ρ_{12} & ρ_{12} \end{matrix} | + c | \begin{matrix} 1 - ρ_{12} & 0 \\ 0 & 1 - ρ_{12} \end{matrix} |

The sums of the second, third, and fourth matrices are $σ_{X}^{2}$ , $σ_{T}^{2}$ , and $σ_{E}^{2}$ , respectively. Therefore, $σ_{T}^{2} = 4 c ρ_{12}$ , and $σ_{E}^{2} = 2 c (1 - ρ_{12})$ . Brown (1910) and Spearman (1910) independently and simultaneously developed an algebraically equivalent formula for split-half parallel reliability. The derived version (ρ_SP ) and the original version ( ${\tilde{ρ}}_{S P}$ ) are as follows:

ρ_{S P} = \frac{σ_{T}^{2}}{σ_{T}^{2} + σ_{E}^{2}} = \frac{4 ρ_{12}}{4 ρ_{12} + 2 (1 - ρ_{12})}, and {\tilde{ρ}}_{S P} = \frac{2 ρ_{12}}{1 + ρ_{12}}

General case

Let ${\bar{ρ}}_{i j}$ denote the ratio of λ ² to c. ${\bar{ρ}}_{i j}$ equals the average product-moment correlation between items. The interitem covariance matrix in the unidimensional parallel model is as follows:

Σ_{u p} = c | \begin{matrix} 1 & {\bar{ρ}}_{i j} & \dots & {\bar{ρ}}_{i j} \\ {\bar{ρ}}_{i j} & 1 & \dots & {\bar{ρ}}_{i j} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\bar{ρ}}_{i j} & {\bar{ρ}}_{i j} & \dots & 1 \end{matrix} | = c | \begin{matrix} {\bar{ρ}}_{i j} & {\bar{ρ}}_{i j} & \dots & {\bar{ρ}}_{i j} \\ {\bar{ρ}}_{i j} & {\bar{ρ}}_{i j} & \dots & {\bar{ρ}}_{i j} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\bar{ρ}}_{i j} & {\bar{ρ}}_{i j} & \dots & {\bar{ρ}}_{i j} \end{matrix} | + c | \begin{matrix} 1 - {\bar{ρ}}_{i j} & 0 & \dots & 0 \\ 0 & 1 - {\bar{ρ}}_{i j} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & 1 - {\bar{ρ}}_{i j} \end{matrix} |

The sum of the second k × k matrix (i.e., $σ_{T}^{2}$ ), of which all elements are ${\bar{ρ}}_{i j}$ , is $c k^{2} {\bar{ρ}}_{i j}$ . The sum of the third k × k matrix (i.e., $σ_{E}^{2}$ ), of which all diagonal elements are $1 - {\bar{ρ}}_{i j}$ , is $c k (1 - {\bar{ρ}}_{i j})$ . The derived version (ρ_P ) of parallel reliability is algebraically equivalent to the formula known as standardized alpha ( ${\tilde{ρ}}_{P}$ ). Specifically,

ρ_{P} = \frac{σ_{T}^{2}}{σ_{T}^{2} + σ_{E}^{2}} = \frac{k^{2} {\bar{ρ}}_{i j}}{k^{2} {\bar{ρ}}_{i j} + k (1 - {\bar{ρ}}_{i j})}, and {\tilde{ρ}}_{P} = \frac{k {\bar{ρ}}_{i j}}{1 + (k - 1) {\bar{ρ}}_{i j}}

The unidimensional tau-equivalent model

The unidimensional tau-equivalent model (Novick & Lewis, 1967) restricts the factor loadings of each item to be equal (i.e., λ_i = λ for all i). The observed score of item i is expressed as $X_{i} = μ_{i} + λ F + e_{i}$ . The unidimensional parallel model is nested within the unidimensional tau-equivalent model, and the only difference between them is the equality restriction of error variances. The variance of item i is $λ^{2} + σ_{e_{i}}^{2}$ , and the covariance between item i and j is λ ².

Special case (k = 2)

Let $σ_{1}^{2}$ (= $λ^{2} + σ_{e_{1}}^{2}$ ) and $σ_{2}^{2}$ (= $λ^{2} + σ_{e_{2}}^{2}$ ) denote the variance of the first and second items, respectively, or the split-half, and let σ ₁₂ (= λ ²) denote the covariance between them. The interitem covariance matrix and split-half tau-equivalent reliability (ρ_ST ) are as follows:

Σ_{s t} = | \begin{matrix} λ^{2} + σ_{e_{1}}^{2} & λ^{2} \\ λ^{2} & λ^{2} + σ_{e_{2}}^{2} \end{matrix} | = | \begin{matrix} σ_{1}^{2} & σ_{12} \\ σ_{12} & σ_{2}^{2} \end{matrix} | = | \begin{matrix} σ_{12} & σ_{12} \\ σ_{12} & σ_{12} \end{matrix} | + | \begin{matrix} σ_{1}^{2} - σ_{12} & 0 \\ 0 & σ_{2}^{2} - σ_{12} \end{matrix} |, and

ρ_{S T} = \frac{σ_{T}^{2}}{σ_{X}^{2}} = \frac{4 σ_{12}}{σ_{X}^{2}} .

Flanagan (1937), Guttman (λ ₄, 1945), Rulon (1939), and Mosier (1941) proposed formulas that are algebraically equivalent to ρ_ST :

{\tilde{ρ}}_{F} = \frac{4 ρ_{12} σ_{1} σ_{2}}{σ_{1}^{2} + σ_{2}^{2} + 2 ρ_{12} σ_{1} σ_{2}} = \frac{4 σ_{12}}{σ_{1}^{2} + σ_{2}^{2} + 2 σ_{12}} = \frac{4 σ_{12}}{σ_{X}^{2}} = ρ_{S T},

{\tilde{ρ}}_{R u l o n} = 1 - \frac{σ_{X_{1} - X_{2}}^{2}}{σ_{X}^{2}} = \frac{(σ_{1}^{2} + σ_{2}^{2} + 2 σ_{12}^{}) - (σ_{1}^{2} + σ_{2}^{2} - 2 σ_{12}^{})}{σ_{X}^{2}} = \frac{4 σ_{12}^{}}{σ_{X}^{2}} = ρ_{S T},

{\tilde{ρ}}_{M} = \frac{4 (ρ_{1 X} σ_{1} σ_{X} - σ_{1}^{2})}{σ_{X}^{2}} = \frac{4 (σ_{1}^{2} + σ_{12}^{} - σ_{1}^{2})}{σ_{X}^{2}} = \frac{4 σ_{12}^{}}{σ_{X}^{2}} = ρ_{S T}, and

{\tilde{λ}}_{4} = 2 (1 - \frac{σ_{1}^{2} + σ_{2}^{2}}{σ_{X}^{2}}) = 2 (\frac{(σ_{1}^{2} + σ_{2}^{2} + 2 σ_{12}) - (σ_{1}^{2} + σ_{2}^{2})}{σ_{X}^{2}}) = \frac{4 σ_{12}^{}}{σ_{X}^{2}} = ρ_{S T} .

General case

Let ${\bar{σ}}_{i j}$ denote λ ². ${\bar{σ}}_{i j}$ equals the average covariance between items. The interitem covariance matrix is as follows:

Σ_{u t} = | \begin{matrix} σ_{1}^{2} & {\bar{σ}}_{i j} & \dots & {\bar{σ}}_{i j} \\ {\bar{σ}}_{i j} & σ_{2}^{2} & \dots & {\bar{σ}}_{i j} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\bar{σ}}_{i j} & {\bar{σ}}_{i j} & \dots & σ_{k}^{2} \end{matrix} | = | \begin{matrix} {\bar{σ}}_{i j} & {\bar{σ}}_{i j} & \dots & {\bar{σ}}_{i j} \\ {\bar{σ}}_{i j} & {\bar{σ}}_{i j} & \dots & {\bar{σ}}_{i j} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\bar{σ}}_{i j} & {\bar{σ}}_{i j} & \dots & {\bar{σ}}_{i j} \end{matrix} | + | \begin{matrix} σ_{1}^{2} - {\bar{σ}}_{i j} & 0 & \dots & 0 \\ 0 & σ_{2}^{2} - {\bar{σ}}_{i j} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & σ_{k}^{2} - {\bar{σ}}_{i j} \end{matrix} | .

Cronbach ( $\tilde{α}$ , 1951), Guttman ( ${\tilde{λ}}_{3}$ , 1945), Hoyt (1941), and Kuder and Richardson (1937) proposed formulas that are algebraically equivalent to tau-equivalent reliability. Kuder and Richardson (1937) expressed the formula (i.e., KR-20) slightly differently from others because they applied the formula to dichotomously score items (i.e., X_i = 0 or 1). Let p_i denote the percentage of correct responses for item i. We obtain the formula of tau-equivalent reliability ( ρ_T ) by dividing the sum of the second matrix by $σ_{X}^{2}$ . The conventional version ( ${\tilde{ρ}}_{T}$ ) and KR-20 ( ${\tilde{ρ}}_{K R - 20}$ ) are all algebraically equivalent to ρ_T . Specifically,

ρ_{T} = \frac{σ_{T}^{2}}{σ_{X}^{2}} = \frac{k^{2} {\bar{σ}}_{i j}}{σ_{X}^{2}}, {\tilde{ρ}}_{T} = \tilde{α} = {\tilde{λ}}_{3} = \frac{k}{k - 1} (1 - \frac{\sum σ_{i}^{2}}{σ_{X}^{2}}), and

{\tilde{ρ}}_{K R - 20} = \frac{k}{k - 1} (1 - \frac{\sum p_{i}^{} (1 - p_{i}^{})}{σ_{X}^{2}}) = \frac{k}{k - 1} (1 - \frac{\sum σ_{i}^{2}}{σ_{X}^{2}}) = \frac{k}{k - 1} (\frac{\sum_{i \neq} \sum_{j} σ_{i j}}{σ_{X}^{2}}) = \frac{k^{2} {\bar{σ}}_{i j}}{σ_{X}^{2}} = ρ_{T} .

The unidimensional congeneric model

The unidimensional congeneric model (Jöreskog, 1971) is an unrestricted base model. The observed score of item i is expressed as $X_{i} = μ_{i} + λ_{i} T + e_{i}$ . The unidimensional tau-equivalent model is nested within the unidimensional congeneric model, and the only difference between them is the equality restriction on the factor loadings. The variance of item i is $λ_{i}^{2} + σ_{e_{i}}^{2}$ , and the covariance between items i and j is $λ_{i} λ_{j}$ .

Special case (k = 2)

The interitem covariance matrix and split-half congeneric reliability (ρ_SC ) are as follows:

Σ_{s c} = | \begin{matrix} λ_{1}^{2} + σ_{e_{1}}^{2} & λ_{1} λ_{2} \\ λ_{1} λ_{2} & λ_{2}^{2} + σ_{e_{2}}^{2} \end{matrix} | = | \begin{matrix} λ_{1}^{2} & λ_{1} λ_{2} \\ λ_{1} λ_{2} & λ_{2}^{2} \end{matrix} | + | \begin{matrix} σ_{e_{1}}^{2} & 0 \\ 0 & σ_{e_{2}}^{2} \end{matrix} |, and

ρ_{S C} = \frac{σ_{T}^{2}}{σ_{X}^{2}} = \frac{{(λ_{1}^{} + λ_{2}^{})}^{2}}{σ_{X}^{2}} .

This coefficient cannot be estimated without further constraints because the model is under-identified. Specifically, there are more unknowns (i.e., λ ₁, λ ₂, $σ_{e_{1}}^{2}$ , and $σ_{e_{2}}^{2}$ ) than known pieces of information (i.e., $σ_{1}^{2}$ , $σ_{2}^{2}$ , and σ₁₂ ) (Feldt & Brennan, 1989; Haertel, 2006). Appendix A shows how previous studies addressed this issue.

General case

The interitem covariance matrix, the derived formula of congeneric reliability (ρ_C ), and the conventional version ( ${\tilde{ρ}}_{C}$ ) are as follows:

Σ_{u c} = | \begin{matrix} λ_{1}^{2} + σ_{e_{1}}^{2} & λ_{2} λ_{1} & \dots & λ_{k} λ_{1} \\ λ_{1} λ_{2} & λ_{2}^{2} + σ_{e_{2}}^{2} & \dots & λ_{k} λ_{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ λ_{1} λ_{k} & λ_{2} λ_{k} & \dots & λ_{k}^{2} + σ_{e_{k}}^{2} \end{matrix} | = | \begin{matrix} λ_{1}^{2} & λ_{2} λ_{1} & \dots & λ_{k} λ_{1} \\ λ_{1} λ_{2} & λ_{2}^{2} & \dots & λ_{k} λ_{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ λ_{1} λ_{k} & λ_{2} λ_{k} & \dots & λ_{k}^{2} \end{matrix} | + | \begin{matrix} σ_{e_{1}}^{2} & 0 & \dots & 0 \\ 0 & σ_{e_{2}}^{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & σ_{e_{k}}^{2} \end{matrix} |,

ρ_{C} = \frac{σ_{T}^{2}}{σ_{X}^{2}} = \frac{{(\sum λ_{i})}^{2}}{σ_{X}^{2}}, and {\tilde{ρ}}_{C} = \frac{{(\sum λ_{i})}^{2}}{{(\sum λ_{i})}^{2} + \sum σ_{e_{i}}^{2}} .

Systematic Expression of Formula

A formula may have multiple algebraically equivalent versions. For example, the definition of reliability can be expressed in several ways, including $ρ_{X X^{'}} = \frac{σ_{T}^{2}}{σ_{X}^{2}} = \frac{σ_{T}^{2}}{σ_{T}^{2} + σ_{E}^{2}} = 1 - \frac{σ_{E}^{2}}{σ_{X}^{2}}$ . Henceforth, algebraically equivalent variations will be referred to as versions or formula expressions. When a formula is more complex, it is more difficult to observe the algebraic equivalence of the formula and its many versions. If no consistent rule exists for selecting which version to use, users will express the same formula in different ways, which will ultimately lead to an increased difficulty in understanding the meaning of the formula and discovering commonalities.

Table 5 presents a summary of conventional formula expressions. Excluding the relationship between split-half parallel reliability (i.e., the Spearman–Brown formula) and parallel reliability (i.e., standardized alpha), no commonalities or regularities between the formulas occur. Some formulas are a subtraction of a certain value from 1, and others are not. One formula has $σ_{X}^{2}$ as the denominator, whereas others have other elements in their denominators. It appears as if these formulas are not related to each other.

Table 5.

Conventional Formula Expressions of Reliability Coefficients.

	Unidimensional		Multidimensional
	Split-Half	General	General
Parallel	$\frac{2 ρ_{12}}{1 + ρ_{12}}$	$\frac{k {\bar{ρ}}_{i j}}{1 + (k - 1) {\bar{ρ}}_{i j}}$	(Not yet published)
Tau-equivalent	$1 - \frac{σ_{X_{1} - X_{2}}^{2}}{σ_{X}^{2}}$	$\frac{k}{k - 1} (1 - \frac{\sum_{i = 1}^{k} σ_{i}^{2}}{σ_{X}^{2}})$	$1 - \frac{\sum_{p = 1}^{m} σ_{p}^{2} (1 - α_{p})}{σ_{X}^{2}}$
Congeneric	$\frac{σ_{12}}{π_{1} π_{2} σ_{X}^{2}}$	$\frac{{(\sum_{i = 1}^{k} λ_{i})}^{2}}{{(\sum_{i = 1}^{k} λ_{i})}^{2} + \sum_{i = 1}^{k} σ_{e_{i}}^{2}}$	$\frac{1^{'} c c^{'} 1 + 1^{'} A A^{'} 1}{σ_{X}^{2}}$

Note: $σ_{X}^{2}$ is observed test score variance, k is the number of items, ${\bar{ρ}}_{i j}$ is the average product-moment correlation between items, λ_i in the unidimensional models is the factor loading of item i, and m is the number of group factors. Other notations are explained in Appendices A and B.

The current principle applied when expressing the formula of a reliability coefficient is that the formula expression used by the author of the original article is used. This convention is author friendly but not reader friendly. The formula expression chosen by the author and the expression easily understood by readers is different for three reasons. First, each formula is derived using different methods with various interpretations. The derivational method and the interpretation of the formulas also influenced their expression.

Second, authors are rewarded for the distinctiveness of their work, not consistency. Academia prefers studies that are different from previous studies; those that present results that are similar to existing findings are not highly regarded. Researchers may attempt to express different versions of formulas and propose different methods of interpretation.

Third, authors have preferred formula expressions that are computationally simple. Many reliability formulas were published in an era during which they could be calculated only by hand instead of with computers. Because of social inertia, computationally simpler formulas have been chosen even after computer-based computational processes became common. Thus, the most famous versions of formulas are typically those that require the fewest calculations (Falk & Savalei, 2011).

The three problems mentioned above suggest a direction for our proposed systematic formula expression. First, the system should include a consistent set of principles. Second, this study’s goal is to connect apparently disparate formulas to a degree in which users can easily recognize the common features of the formulas. Third, its goal is not to provide ease of calculation; rather, it wants to provide ease of understanding. In the current environment in which all calculations are dependent on computers, ease of calculation cannot be an important consideration for users.

Table 6 summarizes the formula expressions that the current study proposes. This study applies a consistent set of principles. More specifically, between matrix expressions and nonmatrix expressions, this study uses nonmatrix expressions. The denominator is made consistent to be $σ_{X}^{2}$ or its equivalent, and the numerator is made consistent to be equivalent to $σ_{T}^{2}$ . The sum is presented in the order of true score + error or general factor + group factor + error. This study does not reduce fractions to their lowest terms. For example, in the formula expression of parallel reliability, that is, $\frac{k^{2} {\bar{ρ}}_{i j}}{k^{2} {\bar{ρ}}_{i j} + k (1 - {\bar{ρ}}_{i j})}$ , $k^{2} {\bar{ρ}}_{i j}$ represents $σ_{T}^{2}$ and $k (1 - {\bar{ρ}}_{i j})$ denotes $σ_{E}^{2}$ .

Table 6.

Systematic Formula Expressions of Reliability Coefficients.

	Unidimensional		Multidimensional
	Split-Half	General	General
Parallel	$\frac{4 ρ_{12}}{4 ρ_{12} + 2 (1 - ρ_{12})}$	$\frac{k^{2} {\bar{ρ}}_{i j}}{k^{2} {\bar{ρ}}_{i j} + k (1 - {\bar{ρ}}_{i j})}$	$\frac{k^{2} {\bar{ρ}}_{d} + n k ({\bar{ρ}}_{s} - {\bar{ρ}}_{d})}{k^{2} {\bar{ρ}}_{d} + n k ({\bar{ρ}}_{s} - {\bar{ρ}}_{d}) + k (1 - {\bar{ρ}}_{s})}$
Tau-equivalent	$\frac{4 σ_{12}}{σ_{X}^{2}}$	$\frac{k^{2} {\bar{σ}}_{i j}}{σ_{X}^{2}}$	$\frac{k^{2} {\bar{σ}}_{d} + n k ({\bar{σ}}_{s} - {\bar{σ}}_{d})}{σ_{X}^{2}}$
Congeneric	$\frac{{(λ_{1}^{} + λ_{2}^{})}^{2}}{σ_{X}^{2}}$	$\frac{{(\sum_{i = 1}^{k} λ_{i})}^{2}}{σ_{X}^{2}}$	Bifactor reliability $\frac{(\sum_{i = 1}^{k} λ_{i F})^{2} + \sum_{p = 1}^{m} (\sum_{i = 1}^{k} λ_{i p})^{2}}{σ_{X}^{2}}$ Correlated factors reliability $\frac{\sum_{p = 1}^{m} \sum_{q = 1}^{m} φ_{p q} (\sum_{i = 1}^{k} λ_{i p}) (\sum_{i = 1}^{k} λ_{i q})}{σ_{X}^{2}}$
		Second-order factor reliability $\frac{(\sum_{p = 1}^{m} \sum_{i = 1}^{k} λ_{i p} γ_{p})^{2} + \sum_{p = 1}^{m} (\sum_{i = 1}^{k} λ_{i p} {(1 - γ_{p}^{2})}^{1 / 2})^{2}}{σ_{X}^{2}}$

Note: ${\bar{σ}}_{i j}$ is the average covariance between items, λ_iF is the general factor loading of item i, λ_ip is the factor loading of item i on p th subtest construct (i.e., G_p , O_p , or D_p ), n is the number of items per group factor, ${\bar{ρ}}_{d}$ is the average product-moment correlation between items that have different group factors, ${\bar{ρ}}_{s}$ is the average product-moment correlation between items that have the same group factor, ${\bar{σ}}_{d}$ is the average covariance between items that have different group factors, ${\bar{σ}}_{s}$ is the average covariance between items that have the same group factor and m is the number of group factors. Other notations are explained in Appendices A and B.

The systematic formula expressions provide us with an intuitive understanding about the condition under which reliability coefficients have a value of less than zero. The definition of reliability $(i.e., {ρ_{XX}}^{'} = \frac{σ_{T}^{2}}{σ_{X}^{2}} = 1 - \frac{σ_{E}^{2}}{σ_{X}^{2}})$ tells us that the value of reliability always ranges between zero and one. Contrary to the popular belief that the value of a reliability estimate also always ranges between zero and one, it can be negative if the prerequisite of the reliability estimator is not satisfied. Table 6 reveals that the value of congeneric model-based reliability coefficients is always nonnegative, whereas the value of parallel or tau-equivalent model-based reliability coefficients can be negative.

This study derives two hierarchical omega coefficients from multidimensional parallel and multidimensional tau-equivalent models. McDonald (1985, 1999) proposed two definitions of reliability that are applicable to multidimensional models, and Zinbarg, Revelle, Yovel, and Li (2005) more explicitly expressed McDonald’s proposal, recommending it be categorized into ω, which were relabeled as ω_t in Revelle and Zinbarg (2009), and ω_H (or ω_h ). ω_H is called hierarchical omega or omega hierarchical. The formula of the McDonald–Zinbarg hierarchical omega includes only the variance due to a general factor in the numerator (i.e., $ω_{H} = \frac{σ_{G e n e r a l}^{2}}{σ_{X}^{2}}$ ), whereas that of omega includes the variance due to general factor and the variance due to group factors in the numerator (i.e., $ρ_{B F} = ω = \frac{σ_{G e n e r a l}^{2} + σ_{G r o u p}^{2}}{σ_{X}^{2}}$ ). Although obtaining bifactor or second-order factor hierarchical omega requires parameter estimates that are obtained from multidimensional SEM models, estimating parallel/tau-equivalent hierarchical omega does not rely on SEM procedures. Previous studies strongly have recommended the use of hierarchical omega (Revelle & Zinbarg, 2009; Zinbarg et al., 2005; Zinbarg et al., 2007; Zinbarg, Yovel, Revelle, & McDonald, 2006). If the concept of hierarchical omega is useful for bifactor models, it must be equally advantageous to multidimensional tau-equivalent and multidimensional parallel models. Table 7 displays their formulas.

Table 7.

Four Coefficients of Hierarchical Omega.

	Description	Formula
Parallel	Parallel hierarchical omega (ω_HP )	$\frac{k^{2} {\bar{ρ}}_{d}}{k^{2} {\bar{ρ}}_{d} + n k ({\bar{ρ}}_{s} - {\bar{ρ}}_{d}) + k (1 - {\bar{ρ}}_{s})}$
Tau-equivalent	Tau-equivalent hierarchical omega (ω_HT )	$\frac{k^{2} {\bar{σ}}_{d}}{σ_{X}^{2}}$
Second-order factor	Second-order factor hierarchical omega (ω_HSOF )	$\frac{(\sum_{p = 1}^{m} \sum_{i = 1}^{k} λ_{i p} γ_{p})^{2}}{σ_{X}^{2}}$
Bifactor	Bifactor hierarchical omega (ω_HBF )	$\frac{(\sum_{i = 1}^{k} λ_{i F})^{2}}{σ_{X}^{2}}$

Note: The notations are explained in Table 4, and the derivations are listed in Appendix B.

This study classifies the reliability of multidimensional models into two hierarchical levels: the reliability of a measurement model and the reliability of subtest constructs. Offering both test reliability and subtest reliability estimates may provide more complete information about the measurement. Table 8 shows the formulas of subtest reliability. Formulas of subtest reliability coefficients of multidimensional parallel, multidimensional tau-equivalent and correlated factors models are a minor modification of those of parallel reliability, tau-equivalent reliability, and congeneric reliability. Subtest reliability coefficients of bifactor and second-order factor models do not have analogous unidimensional reliability coefficients. Brunner et al. (2012) extended the concept of hierarchical omega to the subtest reliability level; however, they used the term to denote the ratio of the variances due to group factors to test score variances (i.e., % group factor in Table 8), which is different from the way Zinbarg et al. (2005) defined hierarchical omega at the test reliability level (i.e., % general factor). Because using the term hierarchical omega at the level of subtest reliability can be confusing, this study employed generic terms.

Table 8.

Formulas of Subtest Reliability in Multidimensional Models.

	Subtest Reliability	% General Factor	% Group Factor
Multidimensional Parallel (PρMP)	$\frac{n^{2} {\bar{ρ}}_{X_{i} \in G_{p}}}{n^{2} {\bar{ρ}}_{X_{i} \in G_{p}} + n (1 - {\bar{ρ}}_{X_{i} \in G_{p}})}$	—	—
Multidimensional Tau-equivalent (PρMT)	$\frac{n^{2} {\bar{σ}}_{X_{i} \in G_{p}}}{σ_{X_{i} \in G_{p}}^{2}}$	—	—
Bifactor ( PρBF)	$\frac{(\sum_{i \in G_{p}}^{} λ_{i F})^{2} + (\sum_{i \in G_{p}}^{} λ_{i p})^{2}}{σ_{X_{i} \in G_{p}}^{2}}$	$\frac{(\sum_{i \in G_{p}}^{} λ_{i F})^{2}}{σ_{X_{i} \in G_{p}}^{2}}$	$\frac{(\sum_{i \in G_{p}}^{} λ_{i p})^{2}}{σ_{X_{i} \in G_{p}}^{2}}$
Second-order factor(PρSOF)	$\frac{(\sum_{i \in O_{p}}^{} λ_{i p} γ_{p})^{2} + (\sum_{i \in O_{p}}^{} λ_{i p} {(1 - γ_{p}^{2})}^{1 / 2})^{2}}{σ_{X_{i} \in O_{p}}^{2}}$	$\frac{(\sum_{i \in O_{p}}^{} λ_{i p} γ_{p})^{2}}{σ_{X_{i} \in O_{p}}^{2}}$	$\frac{(\sum_{i \in O_{p}}^{} λ_{i p} {(1 - γ_{p}^{2})}^{1 / 2})^{2}}{σ_{X_{i} \in O_{p}}^{2}}$
Correlated factors (PρCF)	$\frac{{(\sum_{i \in D_{p}} λ_{i p})}^{2}}{σ_{X_{i} \in D_{p}}^{2}}$	—	—

Systematic Use

Although tau-equivalent reliability (i.e., alpha) is the most popular reliability coefficient among organizational researchers, experts treat it quite differently. Previous studies are practically unanimous in declaring that there must be an alternative to the current practice of indiscriminately using this coefficient, although little consensus exists about exactly which alternative technique should replace alpha (Bentler, 2009; Green & Yang, 2009b; Hunt & Bentler, 2015; Osburn, 2000; Revelle & Zinbarg, 2009; Schmidt, Le, & Ilies, 2003; Sijtsma, 2009b; van der Ark et al., 2011). The current study does not propose a specific reliability estimator as an alternative; rather, it delineates a system composed of multiple methods.

This study does not agree with the unconditional use and denial of tau-equivalent reliability. Criticizing alpha’s unconditional use is different from advocating that another alternative should take its seat and be universally used instead. To prevent the blind use of tau-equivalent reliability, previous studies have criticized it using rather strong language. For example, Peters (2014) claimed that we should abandon alpha because it is “a fatally flawed estimate of its reliability” (p. 56). This study encourages the use of tau-equivalent reliability if the data meet the condition of being tau-equivalent. What this study disapproves of is the concept of a single best reliability coefficient that is appropriate for all types of data sets, which implicitly assumes that the objective function is one dimensional.

Any reliability coefficient in a system is not superior or inferior to another because they simply assume different measurement models. The evaluation criteria of a scientific model are at least two dimensional; a good model should explain the maximum amount of data using the fewest elements. SEM measurement models provide a trade-off between goodness of fit (i.e., chi-square) and parsimony (i.e., degree of freedom), and less parsimonious models should have significantly better goodness of fit. Unidimensional models are nested within multidimensional models, and tau-equivalent models are nested within congeneric models. When comparing two competing SEM models where one is nested within another, we typically use the chi-square difference to test significance.

Recommendations for use

Figure 4 shows a guideline for choosing a reliability coefficient. The chi-square difference test represents major statistical criteria at all steps. How to identify the dimensionality (i.e., STEP 1) of data is an important but controversial issue. Numerous methods have been proposed to test unidimensionality (Hattie, 1985). Sometimes, exploratory factor analysis or CFA can be used to examine the dimensionality of data. This study will introduce a gadget that enables users to perform CFA without SEM software.

Figure 4.

A guideline for choosing a reliability coefficient.

Organizational researchers should combine theoretical considerations with statistical criteria when deciding which measurement model to use. A researcher’s judgment is especially important when choosing between multidimensional models (i.e., STEP 3). A bifactor model or a second-order model have a hierarchy that consists of a common construct and subtest constructs. A correlated factors model is composed of only subtest constructs. A key question is whether the researcher has a theoretical interest in a common construct and whether it has theoretical underpinnings (Brunner et al., 2012). The answer is affirmative in many cases because a common construct is what scale developers originally intended to measure and what typical researchers are most interested in (Cho & Kim, 2015; Reise, 2012). The use of correlated factor reliability is questionable because the use of a total test score of a correlated factors model is not recommended (McDonald, 1999). Brunner et al. (2012) advocated the use of only subtest reliability coefficients when a correlated factors model is selected. Reporting correlated factor reliability may have the merit of providing additional information about the measurement; however, complete reliance on it is not recommended.

Examples and a Calculator

Illustrative Examples

Understanding a formula promotes its frequent use. Organizational researchers rarely use multidimensional reliability coefficients, even though they usually study multifaceted phenomena and analyze multidimensional data. Such lack of use probably originates from their limited awareness and understanding of multidimensional reliability coefficients. The terms awareness and understanding denote different meanings here. For example, thus far, this study is likely to make readers aware of the names and formulas of multidimensional reliability coefficients but is unlikely to make the readers understand their meaning. A good way to understand a complex formula is to apply it to a simple numerical example. This study presents a unidimensional example and five multidimensional examples to make the formulas more comprehensible to readers.

Let us start from a comparison of tau-equivalent reliability and congeneric reliability. Table 9 omits the upper triangle of the covariance matrices, as done by typical SEM software packages. Readers should be careful to use the sum of the fitted or implied covariance matrix (i.e., ${\hat{σ}}_{X}^{2}$ ) instead of the sum of the observed covariance matrix (i.e., $σ_{X}^{2}$ ) when they calculate the estimate of congeneric reliability. The fitted covariance matrix is computed from the parameter estimates. For example, 4.42 = 1.96 ⋅ 2.25 and 10.00 = 1.96² + 6.13.

Table 9.

A Computation of Tau-equivalent Reliability and Congeneric Reliability.

Observed Covariance Matrix
	X1	X2	X3	X4	$\sum_{t o t a l} = \sum_{d i a g o n a l} + \sum_{o f f - d i a g o n a l} = 124$ $\sum_{d i a g o n a l} = 10 + 11 + 12 + 13 = 46$ $\sum_{o f f - d i a g o n a l} = 2 \cdot \sum_{s u b - d i a g o n a l} = 2 \cdot (4+5+6+7+8+9)=78$
X1	10
X2	4	11
X3	5	6	12
X4	7	8	9	13
Proposed formula expression ${\hat{ρ}}_{T} = \hat{α} = 4^{2} \cdot (78/(4 ⋅ 3))/124=.8387$
Conventional formula expression ${\hat{ρ}}_{T} = \hat{α} = (4 / 3) \cdot (1 - 46/124)=.8387$
Fitted/Implied Covariance Matrix					Factor Loadings and Errors
	X1	X2	X3	X4		${\hat{λ}}_{i}$	${\hat{σ}}_{e_{i}}^{2}$
X1	10.00				X1	1.96	6.13
X2	4.42	11.00			X2	2.25	5.92
X3	4.98	5.71	12.00		X3	2.53	5.56
X4	6.98	7.99	9.01	13.00	X4	3.55	.37
∑	124.23				∑	10.30	18.01
= $2 \cdot (\sum_{s u b - d i a g o n a l} + \sum_{d i a g o n a l}) - \sum_{d i a g o n a l}$					$\sum^{2}$	106.22
Proposed formula expression ${\hat{ρ}}_{C} = 106.22/124.23 =.8550$
Conventional formula expression ${\hat{ρ}}_{C} = 106.22/(106.22 + 18.01) =.8550$

Table 10 presents a decomposition of the covariance matrix of multidimensional parallel data. Readers may find the meaning of multidimensional parallel reliability difficult to understand because it is new to them and has a complex form. The denominator of the formula of multidimensional parallel reliability, that is, $\frac{k^{2} {\bar{ρ}}_{d} + n k ({\bar{ρ}}_{s} - {\bar{ρ}}_{d})}{k^{2} {\bar{ρ}}_{d} + n k ({\bar{ρ}}_{s} - {\bar{ρ}}_{d}) + k (1 - {\bar{ρ}}_{s})}$ , consists of three parts, corresponding to the sum of all elements of B, C, and D in Table 10. $k^{2} {\bar{ρ}}_{d}$ is proportional to the variance due to the general factor ( $σ_{G e n e r a l}^{2}$ ). In the example, the number of items (k) is 6, and the average product-moment correlation between items that have different group factors is .3, which leads to the value of $6^{2} \cdot .3 = 10.8$ . $n k ({\bar{ρ}}_{s} - {\bar{ρ}}_{d})$ is proportional to the variance due to group factors ( $σ_{G r o u p}^{2}$ ). In Table 10, the average product-moment correlation between items that have the same group factor is .5, and the sum of all elements of C is $3 \cdot 6 \cdot (.5 - .3) = 3.6$ . $k (1 - {\bar{ρ}}_{s})$ is proportional to the variance due to errors ( $σ_{E}^{2}$ ), and the corresponding value in Table 10 is $6 (1 - .5) = 3$ . Substitution of these values into the formula leads to ${\hat{ρ}}_{M P} = .8276$ .

Table 10.

A Computation Example of Multidimensional Parallel Reliability.

	A Observed Test Score Variance ( $σ_{X}^{2} = 17.4$ )						B The Variance Due to General Factor ( $σ_{G e n e r a l}^{2} = 6^{2} \cdot .3 = 10.8$ )
	X1	X2	X3	X4	X5	X6	X1	X2	X3	X4	X5	X6
X1	1	.5	.5	.3	.3	.3	.3	.3	.3	.3	.3	.3
X2	.5	1	.5	.3	.3	.3	.3	.3	.3	.3	.3	.3
X3	.5	.5	1	.3	.3	.3	.3	.3	.3	.3	.3	.3
X4	.3	.3	.3	1	.5	.5	.3	.3	.3	.3	.3	.3
X5	.3	.3	.3	.5	1	.5	.3	.3	.3	.3	.3	.3
X6	.3	.3	.3	.5	.5	1	.3	.3	.3	.3	.3	.3
	C The Variance Due to Group Factors ( $σ_{G r o u p}^{2} = 3 \cdot 6 \cdot (.5 - .3) = 3.6$ )						D The Variance Due to Errors ( $σ_{E}^{2} = 6 \cdot (1 - .5) = 3$ )
	X1	X2	X3	X4	X5	X6	X1	X2	X3	X4	X5	X6
X1	.2	.2	.2	0	0	0	.5	0	0	0	0	0
X2	.2	.2	.2	0	0	0	0	.5	0	0	0	0
X3	.2	.2	.2	0	0	0	0	0	.5	0	0	0
X4	0	0	0	.2	.2	.2	0	0	0	.5	0	0
X5	0	0	0	.2	.2	.2	0	0	0	0	.5	0
X6	0	0	0	.2	.2	.2	0	0	0	0	0	.5

Note: If tau-equivalent reliability (e.g., alpha) is misapplied to the above data, it underestimates the reliability by .0414 ( ${\hat{ρ}}_{T} = \hat{α} = (6^{2} \cdot ((.3 \cdot 9 + .5 \cdot 6) / 15)) / 17.4 = .7862$ ).

Matrices A and B of Table 11 illustrate the commonalities and differences between multidimensional tau-equivalent reliability and other related reliability coefficients. First, the values of multidimensional tau-equivalent reliability equal those of stratified alpha, and the computation of the former is more understandable than that of the latter. Second, tau-equivalent reliability (i.e., alpha) underestimates the reliability when it is applied to multidimensional data. Third, the values of hierarchical omega that were computed from the two data sets differ greatly, even though those of reliability are equal.

Table 11.

A Computation Example of Multidimensional Tau-equivalent Reliability.

	A Multidimensional (Strong General Factor and Weak Group Factors)						B Multidimensional (Weak General Factor and Strong Group Factors)
	X1	X2	X3	X4	X5	X6	X1	X2	X3	X4	X5	X6
X1	8	5	5	4	4	4	10	8	8	1	1	1
X2	5	8	5	4	4	4	8	12	8	1	1	1
X3	5	5	7	4	4	4	8	8	12	1	1	1
X4	4	4	4	8	5	5	1	1	1	12	8	8
X5	4	4	4	5	8	5	1	1	1	8	10	8
X6	4	4	4	5	5	9	1	1	1	8	8	10
${\hat{ρ}}_{M T} = (6^{2} \cdot 4 + 3 \cdot 6 \cdot (5 - 4)) / 180 = .9$ ${\hat{ω}}_{H T} = (6^{2} \cdot 4) / 180 = .8$ ${\hat{ρ}}_{T} = \hat{α} = (6^{2} \cdot 4.4) / 180 = .88$ ${\hat{α}}_{S t r a t i f i e d} = 1 - (53 \cdot .1510 + 55 \cdot .1819) / 180 = .9$ $_{1} {\hat{ρ}}_{M T} = (3^{2} \cdot 5) / 53 = .8490$ $_{2} {\hat{ρ}}_{M T} = (3^{2} \cdot 5) / 55 = .8181$							${\hat{ρ}}_{M T} = (6^{2} \cdot 1 + 3 \cdot 6 \cdot (8 - 1)) / 180 = .9$ ${\hat{ω}}_{H T} = (6^{2} \cdot 1) / 180 = .2$ ${\hat{ρ}}_{T} = \hat{α} = (6^{2} \cdot 3.8) / 180 = .76$ ${\hat{α}}_{S t r a t i f i e d} = 1 - (82 \cdot .1220 + 80 \cdot .1) / 180 = .9$ $_{1} {\hat{ρ}}_{M T} = (3^{2} \cdot 8) / 82 = .8780$ $_{2} {\hat{ρ}}_{M T} = (3^{2} \cdot 8) / 80 = .9$
	C Unidimensional (Strong General Factor and No Group Factors)						D Unidimensional (Weak General Factor and No Group Factors)
	X1	X2	X3	X4	X5	X6	X1	X2	X3	X4	X5	X6
X1	10	4	4	4	4	4	20	1	1	1	1	1
X2	4	10	4	4	4	4	1	25	1	1	1	1
X3	4	4	8	4	4	4	1	1	25	1	1	1
X4	4	4	4	10	4	4	1	1	1	25	1	1
X5	4	4	4	4	12	4	1	1	1	1	25	1
X6	4	4	4	4	4	10	1	1	1	1	1	30
${\hat{ρ}}_{T} = \hat{α} = {\hat{ω}}_{H T} = (6^{2} \cdot 4) / 180 = .8$							${\hat{ρ}}_{T} = \hat{α} = {\hat{ω}}_{H T} = (6^{2} \cdot 1) / 180 = .2$

Let us further discuss the relationships between reliability, hierarchical omega and unidimensionality based on Table 11. First, unidimensionality should be identified before reliability is calculated. Many users perform the reverse; they calculate reliability to identify unidimensionality, misconceiving that a high value of a unidimensional reliability coefficient indicates unidimensionality (Cortina, 1993; Green, Lissitz, & Mulaik, 1977; Hattie, 1985; Schmitt, 1996). Table 11 disproves this misconception; the tau-equivalent reliability estimates of Matrices A and B exceed the commonly used cut-off criterion (Lance, Butts, & Michels, 2006), even though both were computed from multidimensional data.

Second, the concept of hierarchical omega and unidimensionality should be distinguished. A glance at Table 11 may give readers a misleading impression that the level of hierarchical omega is related with the degree of unidimensionality or homogeneity. Although hierarchical omega is a matter of degree, dimensionality is a yes-or-no issue (Zinbarg et al., 2006); all data are either unidimensional or multidimensional. Unidimensionality should be distinguished from the degree to which total test scores reflect a common construct (Reise, 2012). For example, Matrix A is multidimensional, but most of its variances are due to a general factor. Matrix D is unidimensional, but only a small fraction of its variances originate from a general factor.

Third, hierarchical omega is not a substitute of reliability; it is a complement of reliability. Although hierarchical omega originated from another definition of reliability that is derived from multidimensional models, its characteristics are different from other reliability coefficients. For example, matrices A and B of Table 11 show that the values of hierarchical omega are substantially smaller than those of other reliability coefficients. Various reliability coefficients share the common definition of $1 - \frac{σ_{E}^{2}}{σ_{X}^{2}}$ , whereas hierarchical omega is defined by $1 - (\frac{σ_{G r o u p}^{2} + σ_{E}^{2}}{σ_{X}^{2}})$ . Hierarchical omega does not differentiate the variances due to group factors from the variances due to errors. Neither reliability nor hierarchical omega offers complete information about the data. For example, knowing that the value of the multidimensional tau-equivalent reliability is .9 does not allow us to judge whether it was calculated from A or B. The value of hierarchical omega alone cannot discriminate between A and C nor between B and D. Hierarchical omega displays a different aspect of the data that reliability does not show, and using a combination of the two enables us to see other sides of the data.

Table 12 shows a computation of bifactor reliability. The parameter estimates are ${\hat{λ}}_{1 F} = {\hat{λ}}_{6 F} = {\hat{λ}}_{31} = {\hat{λ}}_{42} = 1$ , ${\hat{λ}}_{2 F} = {\hat{λ}}_{5 F} = {\hat{λ}}_{11} = {\hat{λ}}_{21} = {\hat{λ}}_{52} = 2$ , ${\hat{λ}}_{3 F} = {\hat{λ}}_{62} = 3$ , ${\hat{λ}}_{4 F} = 4$ and ${\hat{λ}}_{12} = {\hat{λ}}_{22} = {\hat{λ}}_{32} = {\hat{λ}}_{41} = {\hat{λ}}_{51} = {\hat{λ}}_{61} = 0$ . A proportionality constraint does not exist (e.g., $1 : 2 : 3 \neq 2 : 2 : 1$ ). These values lead to ${\hat{ρ}}_{B F} = (169 + 61) / 260 = .8846$ , $_{1} {\hat{ρ}}_{B F} = (36 + 25) / 75 = .8133$ , $_{2} {\hat{ρ}}_{B F} = (49 + 36) / 101 = .8415$ , and ${\hat{ω}}_{H B F} = 169 / 260 = .65$ . The value of the tau-equivalent reliability (i.e., alpha) is .7795, which underestimates the reliability by .1051.

Table 12.

A Computation Example of Bifactor Reliability.

	A Observed Test Score Variance $σ_{X}^{2} = 260$							B Variances Due to a General Factor ${(1 + 2 + 3 + 4 + 2 + 1)}^{2} = 169$
	X1	X2	X3	X4	X5	X6		1	2	3	4	2	1
X1	10	6	5	4	2	1	1	1·1	1·2	1·3	1·4	1·2	1·1
X2	6	12	8	8	4	2	2	2·1	2·2	2·3	2·4	2·2	2·1
X3	5	8	15	12	6	3	3	3·1	3·2	3·3	3·4	3·2	3·1
X4	4	8	12	22	10	7	4	4·1	4·2	4·3	4·4	4·2	4·1
X5	2	4	6	10	14	8	2	2·1	2·2	2·3	2·4	2·2	2·1
X6	1	2	3	7	8	15	1	1·1	1·2	1·3	1·4	1·2	1·1
	C Variances Due to Group Factors ${(2 + 2 + 1)}^{2} + {(1 + 2 + 3)}^{2} = 61$						D Variances Due to Errors $σ_{E}^{2} = 30$
	2	2	1	1	2	3		X1	X2	X3	X4	X5	X6
2	2·2	2·2	2·1	0	0	0	X1	5	0	0	0	0	0
2	2·2	2·2	2·1	0	0	0	X2	0	4	0	0	0	0
1	1·2	1·2	1·1	0	0	0	X3	0	0	5	0	0	0
1	0	0	0	1·1	1·1	1·3	X4	0	0	0	5	0	0
2	0	0	0	2·1	2·1	2·3	X5	0	0	0	0	6	0
3	0	0	0	3·1	3·1	3·3	X6	0	0	0	0	0	5

Table 13 shows a computation of second-order factor reliability. The parameter estimates are ${\hat{λ}}_{11} = {\hat{λ}}_{31} = {\hat{λ}}_{52} = {\hat{λ}}_{62} = 5$ , ${\hat{λ}}_{21} = {\hat{λ}}_{42} = 10$ , ${\hat{λ}}_{12} = {\hat{λ}}_{22} = {\hat{λ}}_{32} = {\hat{λ}}_{41} = {\hat{λ}}_{51} = {\hat{λ}}_{61} = 0$ , ${\hat{γ}}_{1} = .6$ , ${\hat{γ}}_{2} = .8$ , ${(1 - {\hat{γ}}_{1}^{2})}^{1 / 2} = .8$ , ${(1 - {\hat{γ}}_{2}^{2})}^{1 / 2} = .6$ , and $σ_{X}^{2} = {\hat{σ}}_{X}^{2} = 1400$ . The italicized numbers on the outside of the box originate from the parameter estimates. For example, 3 in B is ${\hat{λ}}_{1} \cdot {\hat{γ}}_{1} = 5 \cdot .6 = 3$ , and 4 in C is ${\hat{λ}}_{1} \cdot {(1 - {\hat{γ}}_{1}^{2})}^{1 / 2} = 5 \cdot .8 = 4$ . A proportionality constraint exists such that 3 : 6 : 3 = 4 : 8 : 4 and 8 : 4 : 4 = 6 : 3 : 3. These values lead to ${\hat{ρ}}_{S O F} = (784 + 400) / 1400 = .8457$ , $_{1} {\hat{ρ}}_{S O F} = (144 + 256) / 496 = .8064$ , $_{2} {\hat{ρ}}_{S O F} = (256 + 144) / 520 = .7692$ , and ${\hat{ω}}_{H S O F} = 784 / 1400 = .56$ . The value of the tau-equivalent reliability (i.e., alpha) is .7577, which underestimates the reliability by .088.

Table 13.

A Computation Example of Second-order Factor Reliability.

	A Observed Test Score Variance $σ_{X}^{2} = 1400$							B Variances Due to a Second-Order Factor ${(3 + 6 + 3 + 8 + 4 + 4)}^{2} = 784$
	Y1	Y2	Y3	Y4	Y5	Y6		3	6	3	8	4	4
Y1	60	50	25	24	12	12	3	3·3	3·6	3·3	3·8	3·4	3·4
Y2	50	126	50	48	24	24	6	6·3	6·6	6·3	6·8	6·4	6·4
Y3	25	50	60	24	12	12	3	3·3	3·6	3·3	3·8	3·4	3·4
Y4	24	48	24	130	50	50	8	8·3	8·6	8·3	8·8	8·4	8·4
Y5	12	24	12	50	60	25	4	4·3	4·6	4·3	4·8	4·4	4·4
Y6	12	24	12	50	25	80	4	4·3	4·6	4·3	4·8	4·4	4·4
	C Variances Due to Disturbances ${(4 + 8 + 4)}^{2} + {(6 + 3 + 3)}^{2} = 400$						D Variances Due to Errors $σ_{E}^{2} = 216$
	4	8	4	6	3	3		Y1	Y2	Y3	Y4	Y5	Y6
4	4·4	4·8	4·4	0	0	0	Y1	35	0	0	0	0	0
8	8·4	8·8	8·4	0	0	0	Y2	0	26	0	0	0	0
4	4·4	4·8	4·4	0	0	0	Y3	0	0	35	0	0	0
6	0	0	0	6·6	6·3	6·3	Y4	0	0	0	30	0	0
3	0	0	0	3·6	3·3	3·3	Y5	0	0	0	0	35	0
3	0	0	0	3·6	3·3	3·3	Y6	0	0	0	0	0	35

Table 14 shows a computation of the correlated factor reliability. The parameter estimates are ${\hat{λ}}_{11} = {\hat{λ}}_{21} = {\hat{λ}}_{42} = {\hat{λ}}_{62} = 2$ , ${\hat{λ}}_{31} = 4$ , ${\hat{λ}}_{52} = 3$ , ${\hat{λ}}_{12} = {\hat{λ}}_{22} = {\hat{λ}}_{32} = {\hat{λ}}_{41} = {\hat{λ}}_{51} = {\hat{λ}}_{61} = 0$ , ${\hat{φ}}_{11} = {\hat{φ}}_{22} = 1$ , and ${\hat{φ}}_{12} = {\hat{φ}}_{21} = .5$ . The italicized numbers on the outside of the box originate from the parameter estimates. For example, 1.5 in C is ${\hat{φ}}_{12} \cdot {\hat{λ}}_{52} = .5 \cdot 3 = 1.5$ . These values lead to ${\hat{ρ}}_{C F} = (113 + 56) / 200 = .8450$ , $_{1} {\hat{ρ}}_{C F} = 64 / 76 = .8421$ , and $_{2} {\hat{ρ}}_{C F} = 49 / 68 = .7205$ . The value of the tau-equivalent reliability (i.e., alpha) is .7680, which underestimates the reliability by .077.

Table 14.

A Computation Example of Correlated Factor Reliability.

	A Observed Test Score Variance $σ_{X}^{2} = 200$							B Variances Only Due to Lambdas ${(2 + 2 + 4)}^{2} + {(2 + 3 + 2)}^{2} = 113$
	X1	X2	X3	X4	X5	X6		2	2	4	2	3	2
X1	8	4	8	2	3	2	2	2·2	2·2	2·4	0	0	0
X2	4	8	8	2	3	2	2	2·2	2·2	2·4	0	0	0
X3	8	8	20	4	6	4	4	4·2	4·2	4·4	0	0	0
X4	2	2	4	10	6	4	2	0	0	0	2·2	2·3	2·2
X5	3	3	6	6	16	6	3	0	0	0	3·2	3·3	3·2
X6	2	2	4	4	6	10	2	0	0	0	2·2	2·3	2·2
	C Variances Due to Correlations and Lambdas $0.5 \cdot 2 \cdot (2 + 2 + 4) \cdot (2 + 3 + 2) = 56$						D Variances Due to Errors $σ_{E}^{2} = 31$
	2	2	4	2	3	2		X1	X2	X3	X4	X5	X6
1	0	0	0	1·2	1·3	1·2	X1	4	0	0	0	0	0
1	0	0	0	1·2	1·3	1·2	X2	0	4	0	0	0	0
2	0	0	0	2·2	2·3	2·2	X3	0	0	4	0	0	0
1	1·2	1·2	1·4	0	0	0	X4	0	0	0	6	0	0
1.5	1.5·2	1.5·2	1.5·4	0	0	0	X5	0	0	0	0	7	0
1	1·2	1·2	1·4	0	0	0	X6	0	0	0	0	0	6

RelCalc: A Calculator That Computes Reliability Coefficients

A key to ending the blind use of tau-equivalent reliability (i.e., alpha) is improving the user convenience of its alternatives. Thus far, this study has provided various solutions to resolve the common misuse of reliability coefficients. Now, it offers a quick fix to the last but not least problem. Whereas popular statistical software packages such as SPSS and SAS offer an automatic calculation of tau-equivalent reliability, commonly used SEM software packages, except EQS (Bentler, 2006), do not produce SEM-based reliability estimates. Users of such programs should personally calculate the value of a reliability coefficient based on its formula. Such computations are inconvenient and susceptible to mistakes. This is a possible reason why organizational researchers who use SEM rely on tau-equivalent reliability instead of SEM-based reliability coefficients. If they could compute congeneric reliability by simply clicking a mouse, they are likely to use it substantially more frequently. I developed a calculator to overcome this obstacle. The history of reliability coefficients provides a lesson that publishing something namelessly is likely to produce an uninformative or confusing name. I call this calculator RelCalc. It is free to use and distribute.²

RelCalc is a Microsoft Excel spreadsheet consisting of two modules. The first module is designed to help its users choose the right unidimensional reliability coefficient that fits their data. Graham (2006) gave an excellent lecture on choosing between tau-equivalent reliability and congeneric reliability. Following his advice requires readers to fully understand the statistical procedures used to examine the tau-equivalency assumption. As the results of the first section show, this teach-them-how-to-fish approach has seen little effect on how typical organizational researchers use reliability coefficients. This study adopts a give-them-a-fish approach and introduces a program that automates the required statistical procedure.

The first module examines whether the data meet the condition of being parallel or tau-equivalent and calculates three unidimensional reliability coefficients based on the user’s input of a covariance matrix. This idea originated from Miles’s (2005) suggestion that the maximum likelihood estimation, the most commonly used estimation method in SEM, is an optimization technique that finds a solution to minimize the discrepancy function, and for this, Microsoft Excel offers an optimization tool.

The second module of RelCalc can compute all multidimensional reliability coefficients that were included in this study (i.e., Tables 6, 7, and 8). Unlike the first module, the second module does not estimate the parameters of measurement models; users should copy and paste the parameter estimates that are obtained from an SEM software.

Conclusion

Reliability coefficients should be understood as building blocks of a single system rather than as a collection of completely unrelated methods. Although various formulas have been proposed to estimate the reliability of unidimensional data, they all start from a single formula: the ratio of true score variance to test score variance. Because the variance of a sum is equal to the sum of all elements in the covariance matrix of the components, we are actually discussing the covariance matrix. This study demonstrated the decomposition of covariance matrix for estimating reliability based on SEM models. We do not require a dozen names for reliability coefficients that are seemingly unrelated or a dozen formula expressions that are seemingly unrelated; we need only a single principle that connects all these reliability coefficients.

Footnotes

Appendix A

Appendix B

Acknowledgments

I am deeply grateful to Adam Meade, associate editor, and three anonymous reviewers for their invaluable guidance and constructive comments. I also appreciate helpful support from Kyung Su Liu, Seonghoon Kim, Yanyun Yang, Richard Zinbarg, Peter M. Bentler, and Jiheon Kim. All errors are my sole responsibility.

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: The present research has been conducted by the Research Grant of Kwangwoon University in 2015.

Notes

References

Angoff

W. H.

(1953). Test reliability and effective test length. Psychometrika, 18(1), 1–14.

Armor

D. J.

(1974). Theta reliability and factor scaling. In Costner

H. L.

(Ed.), Sociological methodology (pp. 17–50). San Francisco, CA: Jossey-Bass.

Bentler

P. M.

(2006). EQS 6 structural equations program manual. Encino, CA: Multivariate Software.

Bentler

P. M.

(2009). Alpha, dimension-free, and model-based internal consistency reliability. Psychometrika, 74(1), 137–143.

Bollen

K. A.

(1989). Structural equations with latent variables. New York, NY: John Wiley.

Brown

(1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3(3), 296–322.

Brunner

Nagy

Wilhelm

(2012). A tutorial on hierarchically structured constructs. Journal of Personality, 80(4), 796–846.

Cho

Kim

(2015). Cronbach’s coefficient alpha: Well known but poorly understood. Organizational Research Methods, 18(2), 207–230.

Cortina

J. M.

(1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78(1), 98–104.

10.

Cronbach

L. J.

(1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.

11.

Cronbach

L. J.

(1978). Citation classics. Current Contents, 13, 263.

12.

Cronbach

L. J.

(2004). My current thoughts on coefficient alpha and successor procedures. Educational and Psychological Measurement, 64(3), 391–418.

13.

Dunn

T. J.

Baguley

Brunsden

(2013). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105(3), 399–412.

14.

Falk

C. F.

Savalei

(2011). The relationship between unstandardized and standardized alpha, true reliability, and the underlying measurement model. Journal of Personality Assessment, 93(5), 445–453.

15.

Feldt

L. S.

(1975). Estimation of the reliability of a test divided into two parts of unequal length. Psychometrika, 40(4), 557–561.

16.

Feldt

L. S.

Brennan

R. L.

(1989). Reliability. In Linn

R. L.

(Ed.), Educational measurement (3rd ed., pp. 105–146). New York, NY: American Council on Education and Macmillan.

17.

Fisher

R. A.

(1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd Ltd.

18.

Flanagan

J. C.

(1937). A proposed procedure for increasing the efficiency of objective tests. Journal of Educational Psychology, 28(1), 17–21.

19.

Graham

J. M.

(2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: What they are and how to use them. Educational and Psychological Measurement, 66(6), 930–944.

20.

Green

S. B.

Lissitz

R. W.

Mulaik

S. A.

(1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37(4), 827–838.

21.

Green

S. B.

Yang

(2009a). Commentary on coefficient alpha: A cautionary tale. Psychometrika, 74(1), 121–135.

22.

Green

S. B.

Yang

(2009b). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 74(1), 155–167.

23.

Green

S. B.

Yang

(2015). Evaluation of dimensionality in the assessment of internal consistency reliability: Coefficient alpha and omega coefficients. Educational Measurement: Issues and Practice, 34(4), 14–20.

24.

Guttman

(1945). A basis for analyzing test-retest reliability. Psychometrika, 10(4), 255–282.

25.

Haertel

E. H.

(2006). Reliability. In Brennan

R. L.

(Ed.), Educational measurement (4th ed., pp. 65–110). Westport, CT: American Council on Education and Praeger.

26.

Hair

J. F.

Black

W. C.

Babin

B. J.

Anderson

R. E.

(2010). Multivariate data analysis. Upper Saddle River, NJ: Pearson.

27.

Hattie

(1985). Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9(2), 139–164.

28.

Hoyt

(1941). Test reliability estimated by analysis of variance. Psychometrika, 6(3), 153–160.

29.

Hunt

T. D.

Bentler

P. M.

(2015). Quantile lower bounds to reliability based on locally optimal splits. Psychometrika, 80(1), 182–195.

30.

Jöreskog

K. G.

(1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36(2), 109–133.

31.

Kamata

Turhan

Darandari

(2003, April). Estimating reliability for multidimensional composite scale scores. Paper presented at the annual meeting of American Educational Research Association, Chicago, IL.

32.

Kuder

G. F.

Richardson

M. W.

(1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151–160.

33.

Lance

C. E.

Butts

M. M.

Michels

L. C.

(2006). The sources of four commonly reported cutoff criteria. Organizational Research Methods, 9(2), 202–220.

34.

Lord

F. M.

Novick

M. R.

(1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

35.

Lucke

J. F.

(2005). “Rassling the hog”: The influence of correlated item error on internal consistency, classical reliability and congeneric reliability. Applied Psychological Measurement, 29(2), 106–125.

36.

McDonald

R. P.

(1978). Generalizability in factorable domains: “Domain validity and generalizability”: 1. Educational and Psychological Measurement, 38(1), 75–79.

37.

McDonald

R. P.

(1985). Factor analysis and related methods. Hillsdale, NJ: Lawrence Erlbaum.

38.

McDonald

R. P.

(1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.

39.

Miles

J. N. V.

(2005). Confirmatory factor analysis using Microsoft Excel. Behavior Research Methods, 37(4), 672–676.

40.

Miller

M. B.

(1995). Coefficient alpha: A basic introduction from the perspectives of classical test theory and structural equation modeling. Structural Equation Modeling, 2(3), 255–273.

41.

Mosier

C. I.

(1941). A short cut in the estimation of split-halves coefficients. Educational and Psychological Measurement, 1(1), 407–408.

42.

Novick

M. R.

Lewis

(1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32(1), 1–13.

43.

Osburn

H. G.

(2000). Coefficient alpha and related internal consistency reliability coefficients. Psychological Methods, 5(3), 343–355.

44.

Padilla

M. A.

Divers

(2013). Coefficient omega bootstrap confidence intervals: Nonnormal distributions. Educational and Psychological Measurement, 73(6), 956–972.

45.

Peters

G. J. Y.

(2014). The alpha and the omega of scale reliability and validity. European Health Psychologist, 16(2), 56–69.

46.

Peterson

R. A.

Kim

(2013). On the relationship between coefficient alpha and composite reliability. Journal of Applied Psychology, 98(1), 194–198.

47.

Rae

(2008). A note on using alpha and stratified alpha to estimate the reliability of a test composed of item parcels. British Journal of Mathematical and Statistical Psychology, 61(2), 515–525.

48.

Rajaratnam

Cronbach

L. J.

Gleser

G. C.

(1965). Generalizability of stratified-parallel tests. Psychometrika, 30(1), 39–56.

49.

Raju

N. S.

(1970). New formula for estimating total test reliability from parts of unequal length. Proceedings of the 78th Annual Convention of APA, 5, 143–144.

50.

Raju

N. S.

(1977). A generalization of coefficient alpha. Psychometrika, 42(4), 549–565.

51.

Raykov

(2012). Scale construction and development using structural equation modeling. In Hoyle

R. H.

(Eds.), Handbook of structural equation modeling (pp. 472–492). New York, NY: Guilford.

52.

Reise

S. P.

(2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47(5), 667–696.

53.

Revelle

(1979). Hierarchical cluster-analysis and the internal structure of tests. Multivariate Behavioral Research, 14(1), 57–74.

54.

Revelle

Zinbarg

R. E.

(2009). Coefficients alpha, beta, omega and the glb: Comments on Sijtsma. Psychometrika, 74(1), 145–154.

55.

Rulon

P. J.

(1939). A simplified procedure for determining the reliability of a test by split-halves. Harvard Educational Review, 9, 99–103.

56.

Schmid

Leiman

J. M.

(1957). The development of hierarchical factor solutions. Psychometrika, 22(1), 53–61.

57.

Schmidt

F. L.

Ilies

(2003). Beyond alpha: An empirical examination of the effects of different sources of measurement error on reliability estimates for measures of individual differences constructs. Psychological Methods, 8(2), 206–224.

58.

Schmitt

(1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8(4), 350–353.

59.

Sijtsma

(2009a). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74(1), 107–120.

60.

Sijtsma

(2009b). Reliability beyond theory and into practice. Psychometrika, 74(1), 169–173.

61.

Spearman

(1910). Correlation calculated from faulty data. British Journal Psychology, 3(3), 271–295.

62.

Tang

Cui

(2012, April). A simulation study for comparing three lower bounds to reliability. Paper presented at the annual meeting of the American Educational Research Association, Vancouver, Canada.

63.

Ten Berge

J. M. F.

Zegers

F. E.

(1978). A series of lower bounds to the reliability of a test. Psychometrika, 43, 575–579.

64.

van der Ark

L. A.

van der Palm

D. W.

Sijtsma

(2011). A latent class approach to estimating test-score reliability. Applied Psychological Measurement, 35(5), 380–392.

65.

Werts

C. E.

Rock

D. R.

Linn

R. L.

Jöreskog

K. G.

(1978). A general method of estimating the reliability of a composite. Educational and Psychological Measurement, 38(4), 933–938.

66.

Yang

Green

S. B.

(2011). Coefficient alpha: A reliability coefficient for the 21st century? Journal of Psychoeducational Assessment, 29(4), 377–392.

67.

Yung

Y. F.

Thissen

McLeod

L. D.

(1999). On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika, 64(2), 113–128.

68.

Zinbarg

R. E.

Revelle

Yovel

(2007). Estimating ω_h for structures containing two group factors: Perils and prospects. Applied Psychological Measurement, 31(2), 135–157.

69.

Zinbarg

R. E.

Revelle

Yovel

(2005). Cronbach’s α, Revelle’s β, and McDonald’s ω_H: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70(1), 123–133.

70.

Zinbarg

R. E.

Yovel

Revelle

McDonald

R. P.

(2006). Estimating generalizability to a latent variable common to all of a scale’s indicators: A comparison of estimators for ω_h . Applied Psychological Measurement, 30(2), 121–144.