A Comment on the Estimation of the Reliability of Multidimensional Marketing Constructs: A Store Personality Scale Application

Abstract

Notwithstanding the prevalence of multidimensional constructs in marketing research, very few reliability analyses have taken into account the multidimensional structure of their empirical data. In this article, the author argues that if a scale can be divided into two or more distinct though related dimensions, its internal consistency should be evaluated by coefficients designed for such cases. This article presents stratified alpha as a viable alternative to the common practices of computing alpha for the whole scale and averaging alpha values. It also illustrates, using D’Astous and Lévesque’s store personality scale, that the discrepancy between stratified alpha and alpha (or alpha mean) might be substantial. Thus, it would seem more scientifically sound to compute and report stratified alpha.

Keywords

Multidimensional constructs reliability internal consistency alpha stratified alpha store personality

Introduction

Despite some criticisms (see, e.g., Bergkviest & Rossiter, 2007; Drolet & Morrison, 2001; Rossiter, 2011), the use of multi-item scales is commonplace in conducting marketing research. To see the validity of this account, one has only to glance through the extensive marketing scales’ handbooks by Bearden, Netemeyer and Haws (2011) and Bruner, Hensel and James (2005).

Within the well-known marketing measure development literature, reliability has always been a critical issue since reliability is a necessary (although insufficient) condition for validity (Churchill, 1979; Netemeyer, Bearden & Sharma, 2003; Peter, 1979, 1981). Basically, reliability could be seen as the correlation of a scale with a hypothetical one which truly measures what it is supposed to measure (Lord & Novick, 1968, pp. 60–61). Construct validity refers to the degree to which a measure assesses the construct it is purported to measure (Peter, 1981, p. 134).

Marketing researchers often deal with more and more multifaceted constructs (e.g., store personality or service quality) and have to develop/borrow scales that tap two or more interrelated dimensions or underlying factors (i.e., multidimensional scales). Basically, a construct is said to be multidimensional (vs unidimensional) if it has statistical characteristics demonstrating that its items tap more than one (vs one) distinct though related underlying factors (Netemeyer et al., 2003, p. 9). When developing or simply using multidimensional scales, Churchill (1979, p. 69) explicitly recommends, in his extremely influential article, that ‘the reliability of the total construct would not be measured through coefficient alpha’. Churchill (1979) also tells us that the reliability of the entire scale should be gauged through what Nunnally (1967, pp. 226–235) calls ‘the formula for the reliability of linear combination of measures’ and that the author labels here, to render unto Caesar that which is Caesar’s, stratified alpha (Cronbach, 1951; Cronbach, Schönemann & McKie, 1965; Rajaratnam, Cronbach & Gleser, 1965).

Notwithstanding the prevalence of multidimensional constructs in marketing research (Bearden et al., 2011; Bruner et al., 2005) and Churchill’s (1979) early directives, one can note with dismay that—in the euphoria of scale development—very few reliability analyses have taken into account the multidimensionality of their empirical data. A sampling of articles published in recent years in highly regarded marketing journals indicates that in most cases, marketing researchers contend themselves by just reporting alpha for each dimension (e.g., Brakus, Schmitt & Zarantonello, 2009; Das, 2014; Das, Guin & Datta, 2013; Jha & Bhattacharyya, 2013; Koçak, Abimbola & Özer, 2007; Willems & Swinnen, 2011). Many other researchers lapse into the quite common and disquieting habit of computing alpha for both the dimensions and the entire scale (e.g., Richins, 2004; Sung, Choi, Ahn & Song, 2015). Few others report averages of alpha values (e.g., Smith, MacKenzie, Yang, Buchholz & Darley, 2007). Ever fewer are those who provide estimates for stratified alpha (e.g., Bearden, Money & Nevins, 2006) or another multidimensional reliability index (e.g., Finn & Kayandé, 2004). Thus, it seems that information about a very important psychometric property of multidimensional marketing constructs (i.e., reliability) is most of the time either missing or less than properly estimated (i.e., using alpha or averaging alpha values). The upshots of this are inaccurate and vacuous reliability estimates that mislead researchers and readers alike. Worse, as Lee and Hooley (2005) so aptly state, it is extremely plausible that we (i.e., marketing researchers) compute and report alpha in a mechanistic ritual without fully understanding its psychometrical and statistical underpinnings.

Within this article and consistent with Churchill (1979), the author argues that if a scale can be divided into two or more distinct yet related dimensions or sub-scales, this scale’s internal consistency must be—at least for the sake of scientific rigour—evaluated by coefficients designed for such cases. Though marketing researchers may have at their disposal a full array of reliability estimators that take into account the multidimensionality of their measures (see, e.g., Heise & Bohrnstedt, 1970; Li, Rosenthal & Rubin, 1996; McDonald, 1985, 1999; Werts, Rock, Linn & Jöreskog, 1978), the author suggests that stratified alpha could be an interesting alternative for at least two reasons. The first reason is accuracy of estimation, that is, simulation-based studies by Osburn (2000) and Kamata, Turhan and Darandari (2003) clearly designate stratified alpha as the most dependable formula for assessing the reliability of a multidimensional measure. In the former study, Osburn (2000) compares alpha to related internal consistency reliability coefficients and shows that stratified alpha is exactly equal to the true reliability (Osburn, 2000, pp. 350–352). In the latter, Kamata et al. (2003) evaluate stratified alpha against Li et al.’s (1996) maximal reliability and McDonald’s (1999) coefficient omega and point out that ‘stratified alpha appeared to be the most reliable procedure among the three alternative methods’ (Kamata et al., 2003, p. 15). The second reason that speaks in favour of stratified alpha is simplicity of calculation, that is, unlike other coefficients that involve highly sophisticated mathematical derivations, stratified alpha’s computation is well within the reach of almost every researcher. Indeed, as this investigation shows later, the three pieces of information required for finding stratified alpha could be effortlessly obtained from popular statistics packages, such as SPSS.

The primary goal of this article, therefore, is to instruct marketing researchers in stratified alpha’s significance and characteristics in an easily understandable format. This investigation complements studies by Osburn (2000), Kamata et al. (2003) and Rae (2007) as it uses real-life data rather than simulations and hypothetical data. This research is also different from a recent study by Rae (2008) as it offers a practical point of view instead of being purely theoretical (i.e., mathematical derivation).

The reminder of this report is organized around the following lines. The second section provides a brief literature review on the concepts of reliability and internal consistency. The next section deals with Cronbach’s (1951) alpha and alpha’s underlying assumptions. The fourth section focuses on stratified alpha. The fifth section reports an empirical illustration using D’Astous and Lévesque’s (2003) store personality scale. After that follows summary, recommendations, further research avenues and limitations.

Literature Review

Reliability

No discussion of reliability would be complete without evoking the classical true score model (Lord & Novick, 1968, Chapter 3) or the ‘venerable T model’ in Lumsden’s (1976, p. 254) words. Within the framework of classical test theory (CTT), an item i with observed score x_i is decomposed to a true score t_i and an error score e_i.

x_{i} = t_{i} + e_{i}

(1)

For a scale comprised of k items i (i = 1, …, k), the composite observed score X is equal to its composite true score T plus composite error score E.

X = T + E

(2)

where X = x₁ + x₂ + . . . + x_k, T = t₁ + t₂ + . . . + t_k and E = e₁ + e₂ + . . . + e_k.With this model, it is assumed that (i) the measurement error scores for a respondent are uncorrelated with that individual’s true scores; (ii) the measurement error scores are expected to sum to zero over the population of respondents; and (iii) the item error scores are mutually uncorrelated. Recall that the true score is the latent construct being measured (presuming no practice or changing motivation effects).¹ In CTT, it is also assumed that the composite observed variance $σ_{X}^{2}$ is equal to the sum of the composite true variance $σ_{T}^{2}$ and the composite error variance $σ_{E}^{2}$ , that is,

σ_{X}^{2} = σ_{T}^{2} + σ_{E}^{2} = σ_{C I}^{2} + σ_{S E}^{2} + σ_{R E}^{2} .

(3)

In Equation (3), CI denotes the construct of interest, SE the systematic error and RE is the random error (Streiner, 2003, p. 100). Reliability can then be defined as ‘a measure of the degree of true-score variation relative to observed score variation’ (Lord & Novick, 1968, p. 61), that is, $\frac{σ_{T}^{2}}{σ_{X}^{2}}$ . Thus, higher reliabilities indicate that a lesser portion of observed variance is due to errors in measurement.

\frac{σ_{T}^{2}}{σ_{X}^{2}} = 1 - \frac{σ_{E}^{2}}{σ_{X}^{2}}

(4)

According to Lord and Novick (1968, p. 61), a reliability coefficient could be also defined ‘as the squared correlation between observed score[s] and true score[s]’.

ρ_{X T}^{2} = \frac{σ_{T}^{2}}{σ_{X}^{2}} = \frac{σ_{T}^{2}}{σ_{T}^{2} + σ_{E}^{2}}

(5)

It follows from Equation (5) that the smaller $σ_{E}^{2}$ compared to $σ_{T}^{2}$ , the closer $ρ_{X T}^{2}$ will be to 1 (in the sense that all the observed score variance can be ascribed to variation in the true scores). The quantity $ρ_{X T}^{2}$ is customarily referred to as the ‘true’ or ‘ideal’ reliability (e.g., Cronbach, Rajaratnam & Gleser, 1965; Graham, 2006). However, because $ρ_{X T}^{2}$ exists only in theory, various methods for estimating it in either single or multiple administrations have been proposed.

Internal Consistency as a Reliability Method

Since its introduction almost more than a century ago by Charles Spearman, many definitions for the concept of reliability have been offered in the literature. Most of these definitions are either theoretical (i.e., the proportion of true score variance to total variance) or descriptions of methods of how reliability evidence is obtained. Methodologically speaking, reliability can be defined as the extent to which a sample’s patterns of responses to items are consistent or repeatable across items (i.e., internal consistency), forms of a scale intended to measure the same construct (i.e., alternate form reliability), measurement occasions (i.e., test–retest reliability) or raters (i.e., inter-rater agreement) (Helms, Henze, Sass & Mifsud, 2006, p. 632). Among reliability methods, internal consistency is by far the most widely used method for gauging the reliability of the scores on a measurement scale (Hogan, Benjamin & Brezinski, 2000). One reason for this is that it is the only reliability method that does not require two or more administrations of the scale, two or more form of the scale or two or more raters, and so can be determined with much less data collection burden than test–retest, alternate form or inter-rater reliability. In internal consistency reliability, a single scale administered to a single sample of respondents on a single occasion is utilized to judge how consistent the results are for different items for the same construct within the measure. As Charter (2003, p. 291) points out, an ‘internal consistency coefficient is a measure of the “here-and-now on-the spot” reliability’. Though several coefficients have been proposed for assessing internal consistency reliability (cf. Feldt & Charter, 2003; Osburn, 2000; Zinbarg, Revelle, Yovel & Li, 2005) and despite some calls for its replacement (most heard are those by Rossiter (2002), Bergkviest and Rossiter (2007) and Sijtsma (2009)), Cronbach’s (1951) coefficient alpha has close to an undethronable position within marketing research (Bruner & Hensel, 1993; Churchill, 1979; Duhachek, Coughlan & Iacobucci, 2005; Peterson, 1994; Voss, Stem & Fotopoulos, 2000).

Coefficient Alpha and Essential Tau-equivalency

Also known as the generalized version of Kuder and Richardson’s formula 20 (or KR-20 for short) or as the equivalent of Guttman’s third lower bound λ₃, Cronbach’s (1951) coefficient alpha is usually expressed as follows (Cronbach, 1951, p. 299, Equation 2):

α = \frac{k}{k - 1} (1 - \frac{\sum_{i = 1}^{k} σ_{i}^{2}}{σ_{X}^{2}}),

(6)

where k is the number of items i in the scale (i = 1, …, k), $σ_{i}^{2}$ is the variance of the ith item and $σ_{X}^{2}$ is the variance of the entire scale that can be reconstructed as the sum of item variances plus the sum of item covariances, such that i ≠ j, that is, $σ_{X}^{2} = \sum_{i = 1}^{k} σ_{i}^{2} + \sum_{i = 1}^{k} \sum_{j = 1}^{k} σ_{i j}^{}$ .

α = \frac{k}{k - 1} (1 - \frac{\sum_{i = 1}^{k} σ_{i}^{2}}{\sum_{i = 1}^{k} σ_{i}^{2} + \sum_{i = 1}^{k} \sum_{j = 1}^{k} σ_{i j}^{​}}) w i t h i \neq j

(7)

In Equation (7), the left term tends to 1 as k increases. The denominator of the second term has two components. The first is the same as the numerator and the second is the covariance of all pairs of items i and j. Thus, the more the items covary together by comparison with the sum of their variance, the smaller the ratio, and the higher alpha will be. As such, alpha ‘represents the proportion of a scale’s total variance that is attributable to a common source—that common source being the true score of the latent construct being measured’ (Netemeyer et al., 2003, p. 49).

Alpha could also be written as follows (Cronbach, 1951, p. 304, Equation 16):

α = \frac{k^{2} {\bar{σ}}_{i j}}{σ_{X}^{2}},

(8)

where ${\bar{σ}}_{i j}$ is the mean inter-item covariance. Seen in this formula, it is obvious that α and $ρ_{X T}^{2}$ (cf. Equation 5) have $σ_{X}^{2}$ in common. However, while $σ_{T}^{2}$ can only be computed from the sum of an infinite set of all the theoretical true items that could possibly be created to measure the latent true variable, alpha is based on the actual scale items (Komaroff, 1997). That is where the strength of items sampling and purification lies (Churchill, 1979; Rossiter, 2002).

A great deal of additional work on the characteristics of alpha followed Cronbach’s (1951) early work. Most relevant to our discussion are those made by Novick and Lewis (1967; see also Lord & Novick, 1968, pp. 87–90) who derived, within the CTT framework,² the necessary and sufficient condition under which alpha is equal to the true reliability of a composite measurement (i.e., $ρ_{X T}^{2}$ ). That condition is commonly known in psychometrics as the essential τ-equivalence model. Later, Lord and Novick (1968, pp. 47–50) and Jöreskog (1971) identified a number of other classical measurement models that may be useful in estimating reliability. Each of these models requires that the data used meet different requirements or conditions. Bizarrely, none of the three major meta-analyses on the use of alpha in the marketing literature considered these models (cf. Bruner & Hensel, 1993; Churchill & Peter, 1984; Peterson, 1994). Almost the same kind of remark could be placed for more recent articles by Voss et al. (2000) and Duhachek et al. (2005). Oversimplifying to some extent, Table 1 tries to present a hierarchy of these classical measurement models starting with the least constraining model to arrive at the most restrictive one.

Table 1.

Some Classical Test Theory Measurement Models

Measurement Model	Model Mathematical Expression	Model Assumptions
The congeneric model	$x_{i} = [a_{i} + (b_{i} t)] + e_{i}$	Each item measures the same latent variable, with possibly different scales, with possibly different degrees of precision and with possibly different amounts of error.
The essentially tau-equivalent model	$x_{i} = (a_{i} + t) + e_{i}$	All items measure the same latent variable, on the same scale, but with possibly different degrees of precision, and with possibly different amounts of error.
The tau-equivalent model	$x_{i} = t + e_{i}$	All items measure the same latent variable, on the same scale, with the same degree of precision, but with possibly different amounts of error.
The parallel model	$x_{i} = t + e$	All items must measure the same latent variable, on the same scale, with the same degree of precision and with the same amount of error.

Source: Compiled by the author.

It is obvious that the congeneric, or Jöreskog’s composite reliability, model is the most general, least restrictive model for use in reliability estimation. If all multiplicative constants b_i in the congeneric model are set to 1 (inferring that item true scores are measured on the same scale, or have the same standard deviation), we arrive at the essentially τ-equivalent, or Cronbach’s alpha, model. If all additive constants a_i are then set to 0 (inferring that not only do item true scores have the same variance but they are also measured with the same degree of precision, or have the same mean), we arrive at the τ-equivalent model. Finally, if all error variances are set to equal one another (i.e., e₁ = e₂ = . . . = e_k), we arrive at the parallel, or Spearman–Brown prophecy formula, model.

It is noteworthy that all these classical theory measurement models assume that all observed variables (i.e., items) measure a single latent true variable. So, alpha, which is an estimate of reliability that is based on the essentially τ-equivalent model, assumes unidimensionality rather than measuring it (Cortina, 1993; Gerbing & Anderson, 1988; Green, Lissitz & Mulaik, 1977; Lee & Hooley, 2005). If the assumption of essential τ-equivalency among items is violated such that different true scores underlie the scale items (i.e., a multidimensional construct), several studies concur that coefficient alpha provides a downwardly biased estimate of true reliability (Cronbach et al., 1965; Kamata et al., 2003; Komaroff, 1997; Raykov, 1997, 1998; Osburn, 2000; Rae, 2007, 2008). To obtain an unbiased estimator of reliability under multidimensionality, it is possible to resort to a modification of coefficient alpha that was originally proposed by Lee J. Cronbach and his colleagues (Cronbach et al., 1965; Rajaratnam et al., 1965) and which is known in psychometrics as stratified alpha.

Stratified Scales and Stratified Alpha

In psychological measurement, there are situations where a test (i.e., scale) has been constructed by clustering items into strata. The items within each stratum being concerned with a single ability (i.e., construct), a single response format (i.e., summated or Likert scale, semantic differential or dichotomous) or sharing a dependence on a specific body of material, such as a reading passage or a graph. Such strata are commonly referred to as sub-scales, sub-factors or dimensions in the marketing literature. To estimate the reliability of such a scale, Cronbach and his colleagues (Cronbach et al., 1965; Rajaratnam et al., 1965) proposed stratified alpha.

Stratified alpha (i.e., α_S) could be expressed as follows (Feldt & Qualls, 1996; Kamata et al., 2003; Nunnally, 1967; Osburn, 2000; Rae, 2007, 2008; Rajaratnam et al., 1965):

α_{S} = 1 - \frac{\sum_{i = 1}^{s} σ_{X_{i}}^{2} (1 - α_{i})}{σ_{X_{T}}^{2}}

(9)

or alternatively as

α_{S} = 1 - \frac{\sum_{i = 1}^{s} σ_{X_{i}}^{2} - \sum_{c = 1}^{C} σ_{X_{i}}^{2} α_{i}}{σ_{X_{T}}^{2}},

(10)

where s is the number of strata or sub-scales within a scale (i = 1, …, s), α_i is the estimate of reliability for the ith subscale, $σ_{X_{i}}^{2}$ is the observed score variance for the ith sub-scale and $σ_{X_{T}}^{2}$ is the observed score variance for the total scale. The number of items k_i may in principle vary from sub-scale to sub-scale, that is, k₁ ! k₂ ! . . . ! k_s (Cronbach et al., 1965, p. 292). When there is only one sub-scale, it is easy to demonstrate that stratified alpha and coefficient alpha are equal. Stratified alpha could be regarded as an estimated generalizability coefficient (GC), since Rajaratnam et al. (1965) derived stratified alpha from generalizability theory (GT; Cronbach et al., 1963). In GT terminology, a GC is the ratio of universe score variance to the expected observed score variance.³ For the case of stratified alpha, the universe is divided into a number of strata, each containing an infinite number of distinguishable types of items, and a predetermined number of items are drawn at random from the strata to form a scale (Rae, 2008). For now, the article provides a selective literature review of previous evidence on stratified alpha.

Using hypothetical data, Cronbach et al. (1965, pp. 306–309) were among the first researchers to demonstrate that stratified alpha could be substantially greater than alpha, especially when the sub-scales differed in terms of content and/or difficulty. Other results come from Feldt and Qualls (1996) who mathematically illustrated that the magnitude of the negative bias in alpha for multidimensional scales is directly linked to the number of sub-scales and inversely related to the number of items within sub-scales. That is, the greater the number of sub-scales or dimensions, the greater is the bias in alpha. The greater the number of items within sub-scales, the lesser the bias in alpha. Feldt and Qualls (1996) also showed that stratified alpha will always be less than alpha if the average item covariance between sub-scales is greater than the average item covariance within sub-scales. This last point was recently reinforced by Rae (2008).

The article by Osburn (2000) reports a simulation study that compares various internal consistency reliability estimates closely related to coefficient alpha. In some of the simulated data sets, the items were multidimensional rather than unidimensional. In one of these multidimensional data sets, Osburn (2000) grouped four items into two sub-scales of two items each. Results illustrate that stratified alpha equalled true reliability irrespective of whether the items were parallel, τ-equivalent or congeneric and irrespective of the degree of heterogeneity in the two-factor data (i.e., the correlation between the two factors) (see Osburn, 2000, p. 349, Table 1). By contrast, for the two-factorial data, coefficient alpha underestimated the true reliability and this underestimation worsened considerably as heterogeneity increased. In the most heterogeneous case⁴ (i.e., correlation of only 0.20 between the two factors), coefficient alpha was only 0.234 as opposed to a true reliability of 0.703. Next, Osburn (2000) grouped eight items into two sub-scales of four items each. The results mirror the previous data set results. Stratified alpha precisely equalled true reliability irrespective of parallelism, τ-equivalency and congenericity among the items scores and irrespective of the degree of heterogeneity (see Osburn, 2000, p. 351, Table 2). Alpha underestimated the true reliability and this underestimation got worse considerably as heterogeneity increased. In the most heterogeneous case (i.e., correlation of only 0.20 between the two factors), coefficient alpha was only 0.204 in contrast to a true reliability of 0.613, thus demonstrating this coefficient’s inappropriateness in situations in which the assumption of unidimensionality is violated. Osburn (2000) also showed, for the two multidimensional data sets, that most of the studied reliability coefficients underestimate true reliability. Exceptions are stratified alpha and Li et al.’s (1996) maximal reliability. However, whilst stratified alpha provides a precise and accurate reliability estimate, maximal reliability overestimates true reliability.

Table 2.

Means and Standard Deviations for the Store Personality Scale Items

	Item	M	SD
1	Trustworthy	3.74	1.073
2	Sincere	3.51	0.960
3	Reliable	3.84	0.996
4	Honest	3.26	1.108
5	True	3.49	0.927
6	Snobbish	4.45	0.673
7	High class	4.28	0.833
8	Stylish	4.34	0.702
9	Chic	4.33	0.756
10	Elegant	4.28	0.821
11	Selective	3.79	1.089
12	Welcoming	3.99	0.941
13	Daring	3.99	0.885
14	Dynamic	4.50	0.594
15	Enthusiastic	4.06	0.853
16	Leader	4.30	0.734
17	Lively	4.22	0.698
18	Friendly	3.88	0.951
19	Congenial	3.93	0.914
20	Hardy	3.91	0.975
21	Solid	3.94	1.017
22	Reputable	4.36	0.629
23	Well organized	3.95	0.884
24	Thriving	4.39	0.728
25	Irritating	2.17	1.093
26	Annoying	2.09	1.063
27	Outmoded	2.24	1.112
28	Superficial	2.05	1.038

Source: Author’s findings.

Note: M is mean and SD is standard deviation.

Kamata et al.’s (2003) simulation study results corroborate Osburn’s (2000) findings in the sense that coefficient alpha has shown to largely underestimate true reliability. This negative bias is more important when the correlation between sub-scales (or dimensions) is low. So, according to Kamata et al. (2003, p. 10), ‘it is not appropriate to use coefficient alpha as an estimate of the reliability of a multidimensional composite scale score, unless the correlation between dimensions is high’. Through hypothetical data and a variety of simulation models, Kamata et al. (2003) also evaluated the performance of stratified alpha against McDonald’s (1970, 1999) omega and Li et al.’s (1996) maximal reliability. Their study results showed, among other things, very small discrepancies between stratified alpha and true reliability (typically to the fourth decimal place). They came up with the conclusion that McDonald’s (1999) omega ‘provides a good estimate of true reliability, but one has to be aware that it may overestimate the true reliability’ and that ‘stratified alpha appeared to be the most reliable procedure among the three alternative methods’ (Kamata et al., 2003, p. 15).

However, according to Rae (2007, p. 182), Kamata et al.’s (2003) findings are not fully surprising for the simple reason that their simulated data sets are essentially τ-equivalent within each sub-scale. In fact, for stratified alpha to equal true reliability, Rae (2007) demonstrated that items within each sub-scale must be essentially τ-equivalent. If one or more sub-scales have items that are congeneric, then stratified alpha will always be a lower bound to reliability (i.e., stratified alpha will at its turn underestimate true reliability). Rae (2007) also introduced correlation among measurement errors into the discussion. Rae (2007) showed that if items within sub-scales are essentially τ-equivalent and errors of measurement are positively (vs negatively) correlated, stratified alpha will overestimate (vs underestimate) the true reliability to some extent. Albeit this might be seen as one stratified alpha caveat, Rae (2007, p. 183) argued that ‘[t]his is not a particular weakness of stratified alpha’ because many others reliability estimators suffer from the same drawback (cf. Lucke, 2005; Raykov, 1998).

Of all the empirical evidence reviewed above, two findings warrant special attention. The first is that, just like alpha, essentially τ-equivalent items (within each sub-scale) are actually required for stratified alpha to equal true reliability. The second is that stratified alpha is at best (i.e., in case of τ-equivalent items and uncorrelated measurement errors) a precise estimator of true reliability and at worst (i.e., in case of congeneric items with correlated measurement errors) a better lower bound to reliability than alpha for the entire scale.

Objective

This article’s main aim is to instruct marketing researchers about stratified alpha’s significance, characteristics and calculation.

Methodology

Measure

The data here analyzed were collected using a refined version of D’Astous and Lévesque’s (2003) store personality scale (i.e., 28 items). D’Astous and Lévesque’s (2003) original scale contains 34 items, with a 20-item reduced version. The refined version (i.e., 28 items) used here was obtained after reconsidering the face validity—that is, the items must reflect what they are intended to measure (Hardesty & Bearden, 2004, p. 99)—of the store personality scale. D’Astous and Lévesque (2003) define store personality as ‘the mental representation of a store on dimensions that typically capture an individual’s personality’ (p. 57). In line with the ongoing debate on the definition of brand personality (see Aaker, 1997; Azoulay & Kapferer, 2003; Geuens, Weijters & De Wulf, 2009), store personality is here conceptualized as the set of human personality traits that are both applicable and relevant to a store. Driven by face validity considerations and following Hardesty and Bearden’s (2004) recommendations, a qualitative purification process (that included expert judgements and a pretest) was conducted. First, a French version of the original 34-item store personality scale, directly taken from D’Astous, Hadj Saïd and Lévesque (2002, p. 121), was presented to three marketing academics with backgrounds in measurement and retailing. To determine whether each item should be retained for further analyses, the three ‘expert judges’ were given the above-mentioned definition of store personality and were instructed to indicate the probability of using each of the 34 items to describe a store. The items were rated with the following scale: (i) definitely probable, (ii) somewhat probable and (iii) not at all probable. Items that were not rated as probable by at least two of the three judges, such as ‘conservative’, ‘genuine’ and ‘imposing’, were dropped resulting in a 31-item solution. During a discussion group with the expert judges, the removal of these items was then debated. From this discussion group, an additional consensus emerged regarding the item ‘upscale’ that was considered as more representative of social class rather than a personality trait. Therefore, the item ‘upscale’ was discarded. Second, a pretest sample of nine adult respondents was then obtained through face-to-face interviews. The two primary purposes of this pretest were to detect ambiguous items and to determine whether there is any problem with the French version of the intended questionnaire. From the pretest respondents’ comments, the items ‘loud’ and ‘conscientious’ appeared to be extremely ambiguous. Thus, they were considered for deletion. The remaining pool of 28 items that was used for the main study data collection is presented in Table 2.

Data Collection

Data collection took place inside two retail stores (store intercepts) located in the high-/middle-class suburbs of a major city. Respondents (i.e., customers or simple visitors) were intercepted after their shopping trip at the main exit/entrance of each superstore by a trained individual. They were handed the questionnaires, instructed to think of each store as having its own personality and invited to fill in the self-administrated survey sheet. Store personality items were evaluated on a five-point bipolar scale analogous to the one adopted by D’Astous and Lévesque (2003). So, each item is an adjective with five ordered answer categories (1 = ‘not at all descriptive of the store’ to ‘5 = completely descriptive of the store’). Questionnaire filling took between 15 and 20 minutes. Over a period of 2 weeks of data collection, 200 self-administrated questionnaires were gathered from each superstore. Eight survey sheets containing missing data were discarded resulting in 392 questionnaires usable for analysis. Consistent with D’Astous and Lévesque (2003), we pooled the data collected from both stores in a single database. The total sample of 392 respondents contains more females (271) than males (121). About 58 per cent of the respondents declared to be singles. The vast majority (70.154 per cent) of the respondents were in the 25 and more age brackets. Therefore, the data to be analyzed contain the scores of 392 respondents on 28 items. Table 2 presents item means (M) and standard deviations (SD).

Analysis

Dimensionality Detection Procedures

In this study, not one but three different criteria for determining the number of factors to retain were employed. Kaiser–Guttman rule (i.e., eigenvalues grater than one), the customary factor retention criterion in marketing research, was of course the first one. However, since many studies have demonstrated that Kaiser–Guttman criterion tends to overestimate the number of factors to retain (see Patil, Singh, Mishra & Donavan, 2008; Zwick & Velicer, 1986), Horn’s parallel analysis (PA; Horn, 1965) and Velicer’s minimum average partial correlation (MAP; Velicer, 1976) were chosen as two additional dimensionality detection measures. Parallel analysis and MAP are currently presented as the most accurate criteria for determining the correct number of factors to retain (cf. Patil et al., 2008).

The exploratory factor analysis (EFA) employed here could be described as follows. Given that the author’s aim is to discover the common factors underlying the responses to the 28 items and not to reduce the data analyzed, principal axis factoring (PAF) was chosen as extraction method instead of principal component analysis. An oblique rotation (i.e., PROMAX) was preferred to an orthogonal one since few, if any, dimensions in reality are completely orthogonal especially within the same scale (Iacobucci, 1994, 2001; Preacher & MacCallum, 2003).

This more thoughtful EFA procedure was performed using SPSS 12.0. The SPSS computer codes (or syntaxes) for performing PA and MAP factor retention criteria were taken from O’Connor (2000, pp. 399–401).⁵

Checking Essential Tau-equivalency

Since essential τ-equivalency assumes that items are measured on the same scale (i.e., same response format), examining item SD may be of some usefulness (Graham, 2006). A very simple yet effective strategy for investigating whether the obtained data fit the assumptions of essential τ-equivalency that was described in Feldt and Charter (2003) was applied here. Feldt and Charter (2003) recommended that a researcher should examine the ratio of the largest item standard deviation (SDL) to the smallest item standard deviation (SDS). If the ratio SDL/SDS is between 1 and 1.30 (i.e., the largest SD at the item level does not exceed the smaller SD by more than 30 per cent), then alpha might be an appropriate internal consistency reliability index. Larger differences indicate that the researcher should consider other coefficients because the data do not conform to the model of essential τ-equivalence.

Results

Dimensionality

As expected, the different criteria that this study employs for determining the number of factors to retain are not completely convergent. Table 3 shows the first 10 eigenvalues from real data, their corresponding 95th percentile eigenvalues from 500 random correlation matrices and the average squared correlations for the first 10 roots. It indicates that while Kaiser–Guttman criterion suggests that at least the seven factors that exhibited eigenvalues greater than one should be retained, both PA and MAP suggest five.

Table 3.

Factor Retention Criteria

Root
	0	1	2	3	4	5	6	7	8	9	10
Real Data Eigenvalues	—	9.808	3.238	2.148	1.884	1.645	1.068	1.024^a	0.843	0.817	0.648
PA	—	1.602	1.505	1.439	1.384	1.335^b	1.296	1.258	1.216	1.178	1.145
MAP	0.126	0.045	0.039	0.039	0.035	0.031^c	0.033	0.033	0.037	0.040	0.043

Source: Author’s findings.

Notes: (1) PA—parallel analysis (95th percentile from 500 random data eigenvalues) and MAP—minimum average partial correlations test (average squared correlations).

(2) ^athe last factor to retain according to Kaiser–Guttman criterion (eigenvalue $ 1).

(3) ^bthe last random data eigenvalue inferior to real-data eigenvalue.

(4) ^cthe smallest average squared correlation.

After presetting the number of factors to extract to five and performing an iterative purification process with a series of PAF analysis, only 20 adjectives remained. The Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy yielded a value of 0.822 and the Bartlett’s test of sphericity obtained an approximated chi-square statistic of χ² (190) = 6438.266 (p < 0.001) indicating that the data set is adequate for an EFA. The communalities and the factor loadings for the PROMAX-rotated five- factor solution are presented in Table 4. The communality estimates ranged from 0.420 to 0.906. The items loaded heavily (i.e., all factor loadings ≥ 0.50) on the five extracted factors with factor loadings ranging from 0.533 to 1.033.⁶ From these results, it is clearly apparent that five factors lie behind the 20 retained items listed in Table 4. The four first factors can be interpreted as sophistication,congeniality,⁷ genuineness and unpleasantness. Judging by the highest factor loadings for the items ‘leader’, ‘lively’ and ‘thriving’, the fifth factor was labelled vivaciousness.

Table 4.

The 20 Retained Items from PAF Analysis with PROMAX Rotation

	Item	Factor Loading					Communality
		1	2	3	4	5
1	Trustworthy	0.076	–0.137	0.799	–0.053	0.027	0.610
2	Sincere	–0.081	0.070	0.945	0.031	0.061	0.906
4	Honest	–0.054	0.018	0.834	0.164	0.035	0.660
5	True	0.089	–0.028	0.816	–0.109	–0.052	0.746
6	Snobbish	0.688	–0.061	0.068	0.010	0.076	0.501
7	High class	0.744	0.013	0.075	0.025	0.051	0.627
8	Stylish	1.033	–0.058	–0.067	0.080	–0.034	0.872
9	Chic	0.840	0.103	–0.015	–0.039	0.031	0.870
10	Elegant	0.862	0.062	–0.018	–0.020	0.009	0.826
12	Welcoming	0.202	0.707	–0.083	–0.096	–0.072	0.675
13	Daring	0.166	0.650	0.155	–0.072	–0.046	0.736
15	Enthusiastic	–0.001	0.554	–0.006	–0.082	0.218	0.467
16	Leader	0.102	0.196	0.016	0.155	0.533	0.420
17	Lively	0.033	0.153	0.039	0.105	0.739	0.632
18	Friendly	–0.112	0.788	0.036	–0.061	0.067	0.610
19	Congenial	–0.044	0.974	–0.100	0.090	0.065	0.823
24	Thriving	0.005	–0.147	0.034	–0.383	0.699	0.710
25	Irritating	0.181	–0.197	0.010	0.790	–0.046	0.631
26	Annoying	–0.061	–0.097	–0.103	0.794	0.309	0.724
28	Superficial	–0.093	0.187	0.144	0.699	–0.213	0.606

Source: Author’s findings.

Notes: (1) Factor loadings greater than 0.50 are set in bold.

(2) Loadings between 0.20 and 0.50 are italicized.

One of the advantages associated with using oblique rotations is that factor correlations are estimated. As Table 5 reveals, the five factors were insignificantly (i.e., 0.039, not significant) to moderately (i.e., 0.60, p < 0.001) intercorrelated. As might be guessed, the unpleasantness factor was negatively correlated with the other four factors. The mean inter-factor correlation was found to be 0.337.

Table 5.

Factor Correlations Matrix

	Factor	1	2	3	4	5
1	Sophistication	1.00
2	Congeniality	0.600**	1.00
3	Genuineness	0.319**	0.460**	1.00
4	Unpleasantness	–0.452**	–0.292**	–0.202**	1.00
5	Vivaciousness	0.444**	0.305**	0.039^ns	–0.264**	1.00

Source: Author’s findings.

Notes: **p < 0.001 and ns = non-significant correlation.

Essential Tau-equivalency

Results from Feldt and Charter’s (2003) procedure indicate that the assumption of essential τ- equivalence among items within each of the five sub-scales is tenable. Given that the SDL/SDS ratios for the sub-scales varied between 1.051 and 1.237 and thus below the 1.30 threshold, we could resort to alpha for gauging the internal consistency of the five sub-scales. The SDL/SDS ratio for the scale as a whole yielded a value of 1.646 indicating that alpha should not be used for estimating the internal consistency reliability of the entire scale (Table 6).

Table 6.

Checking Essential Tau-equivalency

	Factor	SDL	SDS	SDL/SDS
1	Sophistication	0.833	0.673	1.237
2	Congeniality	0.951	0.853	1.114
3	Genuineness	1.108	0.927	1.195
4	Unpleasantness	1.093	1.038	1.052
5	Solidity	0.734	0.698	1.051
	Entire scale	1.108	0.673	1.646

Source: Author’s findings.

Notes: SDL—largest item standard deviation and SDS—smallest item standard deviation.

Alpha Coefficients for the Sub-scales and Alpha for the Entire Scale

Using SPSS 12, the alpha coefficients for the sub-scales scores were 0.927 for sophistication, 0.891 for congeniality, 0.899 for genuineness, 0.781 for unpleasantness and 0.736 for vivaciousness (with an average of $\bar{α}$ = 0.846). As such, all alpha coefficients, except vivaciousness, obtained values above the threshold of 0.77 which is the mean from Peterson’s (1994, p. 385) meta-analytical study on alpha. The reader should keep in mind that internal consistency reliability depends as much on the sample being tested as on the scale. Table 7 presents the number of items within each sub-scale, their respective mean inter-item covariances, alpha coefficients and observed score variances.

Table 7.

Alpha Coefficients and Alpha for the Entire Scale

	Factor	k_i	${\bar{σ}}_{i j}$	α_i	$σ_{X_{i}}^{2}$
1	Sophistication	5	0.414	0.927	11.155
2	Congeniality	5	0.513	0.891	14.386
3	Genuineness	4	0.717	0.899	12.767
4	Unpleasantness	3	0.615	0.781	7.094
5	Vivaciousness	3	0.250	0.736	3.059
	Entire scale	20	0.158	0.829	76.222

Source: Author’s findings.

Notes: k_i is the number of items in the ith sub-scale, ${\bar{σ}}_{i j}$ is the mean inter-item covariance, α_i is the estimate of reliability for the ith sub-scale and $σ_{X_{i}}^{2}$ is the observed score variance for the ith sub-scale.

Using Equation (8), alpha for the entire store personality scale score was found to be 0.829. Such a value indicates that a scale can contain even five dimensions and still return an alpha greater than 0.80. Thus, the existence of a single underlying factor is not a prerequisite for high alpha values (Cortina, 1993; Green et al., 1977).

Stratified Alpha

Using Equation (10) and the estimates in Table 7, the value of stratified alpha is given by the following:

\begin{array}{l} α_{5} = 1 - \frac{[12.767 (1 - 0.899)] + [11.155 (1 - 0.927)] + [14.386 (1 - 0.891)] + [3.059 (1 - 0.736)] + [7.094 (1 - 0.781)]}{76.222} \\ = 1 - \frac{6.033}{76.222} = 0.920 \end{array}

The ratio of stratified alpha to alpha for the entire scale score yielded a value of 1.109 indicating that stratified alpha is approximately 11 per cent higher than alpha. Hence, for this particular data set, the discrepancy between alpha and stratified alpha was substantial. The same kind of conclusion could be drawn for the ratio α_S / $\bar{α}$ = 0.920/0.846 = 1.087 (i.e., α_S is 8.7 per cent higher than $\bar{α}$ ).

Other Reliability Indexes

For the sake of illustration, two additional multidimensional reliability coefficients were also computed (see Appendix A). Heise and Bohrnstedt’s (1970) omega obtained a value of Ω = 0.939 and McDonald’s (1985, p. 217, equation 7.3.9) omega was ω = 0.957. Thus, for this particular data set, stratified alpha is only 2 per cent lesser than Heise and Bohrnstedt’s (1970) Ω and 4 per cent lesser than McDonald’s (1985) Ω, that is α_S < Ω < ω. This inequality displays that stratified alpha provides a fairly conservative estimate of multidimensional reliability in comparison to McDonald’s (1985) coefficient ω, a point noted by Kamata et al. (2003) in their simulation study.

Additionally, Rossiter (2002, p. 322) suggests that a minimum Revelle’s (1979) beta, β = 0.50, is required for inferring that a second-order general factor underlying the dimensions or sub-scales accounts for at least 50 per cent of the item score variance (see also Revelle, 1979, p. 68). Revelle’s beta was computed and obtained a value of β = 0.54 indicating that 54 per cent of the variance at the items level is due to a general second-order factor, that is, store personality.

Conclusion

Marketing researchers often have to deal with multidimensional constructs. When they are developing a new multidimensional scale or simply using a pre-established one, readers and especially reviewers expect them to report a reliability index. The overall message from this report is that coefficient alpha (or an average for alpha values) should not be that index. All along this article, the author speaks in praise of one coefficient (i.e., stratified alpha) which seems to be an extremely useful option when a measure possesses some degree of multidimensionality.

To curtail all misunderstandings, it must be indicated that the author has no wish to originate a blind adherence to this index. Although it may seem that the assumptions of the essentially τ-equivalent model most often correspond to the type of data that marketing researchers collect (i.e., same response format within a scale), using stratified alpha should not be pro forma but rather should reflect informed decision -making about which classical theory measurement model one’s data best fit. Researchers have to keep in mind that essential τ-equivalency among items within sub-scales is required for stratified alpha to be an unbiased estimator of reliability. If the congeneric model conforms more to the collected data, multidimensional reliability coefficients, such as those provided by Werts et al. (1978) and McDonald (1985, 1999), would probably be better alternatives to stratified alpha. As a good reliability assessment practice, researchers are here encouraged to always check whether their data are essentially τ-equivalent and to use the appropriate measurement model and index to estimate reliability.

Good statistical practice would also dictate that point estimation should be supplemented by interval estimation (Duhachek et al., 2005). Several techniques for interval estimation of alpha have been proposed and recently reviewed by Duhachek and Iacobucci (2004). Some of these methods should also be helpful for obtaining a standard error and confidence interval when estimating stratified alpha. This makes confidence interval estimation of stratified alpha a topic that should be studied in detail in future investigations.

This study has a number of limitations. The first limitation is that this article has focused squarely on a single aspect of scale development (and usage), that is, internal consistency reliability. The second limitation is that the general point argued in this article is only relevant to one kind of multidimensional constructs where the relationships flow from the first-order latent factors (or dimensions) to their indicators (i.e., items) and where these first-order factors are themselves indicators of an underlying second-order construct. This type of construct is termed—in Jarvis, MacKenzie and Podsakoff’s (2003) terminology—a reflective first-order and reflective second-order construct. A third limitation is that this study employs a single scale administered to a single sample of respondents to support its argumentation. A final limitation is that with real data, true reliability is unknown. Real data may also be vulnerable to inflated/attenuated correlation due to systematic bias.

Footnotes

Acknowledgements

The author is grateful to GBR’s anonymous referees for their extremely useful suggestions to improve the quality of the article. Usual disclaimers apply.

Heise and Bohrnstedt’s ( 1970 ) Ω

Heise and Bohrnstedt (1970) suggested Ω as a reliability index in the context of factor analysis and worked on the basis of the sample correlation matrix R. The basic equation of factor analysis is

(A1)

R = F F^{'} + U^{2},

where F is the factor matrix and U² is a diagonal matrix of the unique variances. The factors are assumed to be orthogonal. The reliability measure for a scale weighted by a vector a = [a₁, a₂, ..., a_k] is given by

(A2)

Ω = \frac{a^{'} (R - U^{2}) a}{a R a^{'}},

where R – U² is a correlation matrix with communalities in the diagonal. In the case when the communalities are known, omega was claimed to be exactly equal to the reliability of a composite (Heise & Bohrnstedt, 1970, p. 117).

Notes

References

Aaker

J.L.

(1997). Dimensions of brand personality. Journal of Marketing Research, 34(3), 347–356.

Azoulay

, & Kapferer

J.-N.

(2003). Do brand personality scales really measure brand personality? Journal of Brand Management, 11(2), 143–155.

Baumgartner

, & Steenkamp

J.-B.

(2006). An extended paradigm for measurement analysis of marketing constructs applicable to panel data. Journal of Marketing Research, 43(3), 431–442.

Bearden

W.O.

, Money

R.B.

, & Nevins.

J.L

. (2006). A measure of long-term orientation: Development and validation. Journal of the Academy of Marketing Science, 34(3), 456–467.

Bearden

W.O.

, Netemeyer

R.G.

, & Haws

K.L.

(2011). Handbook of marketing scales: Multi-item measures for marketing and consumer behavior research (3rd ed.). Thousand Oaks, CA: SAGE Inc.

Bergkviest

, & Rossiter

J.R.

(2007). The predictive validity of multiple-item versus single-item measures of the same constructs. Journal of Marketing Research, 44(2), 175–184.

Brakus

J.J.

, Schmitt

B.H.

, & Zarantonello

(2009). Brand experience: What is it? How is it measured? Does it affect loyalty? Journal of Marketing, 73(3), 52–68.

Bruner

G.C.

, & Hensel

P.J.

(1993). Multi-item scale usage in marketing journals: 1980 to 1989. Journal of the Academy of Marketing Science, 21(4), 339–344.

Bruner

G.C.

, Hensel

P.J.

, & James

K.E.

(2005). Marketing scales handbook: A compilation of multi-item measures (vol. 4). Chicago, IL: American Marketing Association.

10.

Charter

R.A.

(2003). A breakdown of reliability coefficients by test type and reliability method, and the clinical implications of low reliability. Journal of General Psychology, 130(3), 290–304.

11.

Churchill

G.A.

(1979). A paradigm for developing better measures of marketing constructs. Journal of Marketing Research, 16(1), 64–73.

12.

Churchill

G.A.

& Peter

J.P.

(1984). Research design effects on the reliability of rating scales: A meta-analysis. Journal of Marketing Research, 21(4), 360–375.

13.

Cortina

J.M.

(1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78(1), 98–104.

14.

Cronbach

L.J.

(1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.

15.

Cronbach

L.J.

, Rajaratnam

, & Gleser

G.C.

(1963). Theory of generalizability: A liberalization of reliability theory. British Journal of Statistical Psychology, 16(November), 137–163.

16.

Cronbach

L.J.

, Schönemann

, & McKie.

. (1965). Alpha coefficients for stratified-parallel tests. Educational and Psychological Measurement, 25(2), 291–312.

17.

D’Astous

, Hadj Saïd

, & Lévesque

(2002). Conception et test d’une échelle de mesure de la personnalité des magasins. Paper presented at the 18th International Conference of the French Marketing Association, Lille, France. Retrieved 10 June 2009, from http://www.afm-marketing.org

18.

D’Astous

, & Lévesque

(2003). A scale for measuring store personality. Psychology & Marketing, 20(5), 455–469.

19.

Das

(2014). Store personality and consumer store choice behaviour: An empirical examination. Marketing Intelligence & Planning, 32(3), 375–394.

20.

Das

, Guin

K.K.

, & Datta

(2013). Impact of store personality antecedents on store personality dimensions: An empirical study of department retail brands. Global Business Review, 14(3), 471–486.

21.

Drolet

A.L.

, & Morrison

D.G.

(2001). Do we really need multiple-item measures in service research? Journal of Service Research, 3(3), 196–204.

22.

Duhachek

, Coughlan

A.T.

, & Iacobucci

(2005). Results on the standard error of the coefficient alpha index of reliability. Marketing Science, 24(2), 294–301.

23.

Duhachek

, & Iacobucci

(2004). Alpha’s standard error (ASE): An accurate and precise confidence interval estimate. Journal of Applied Psychology, 89(5), 792–808.

24.

Feldt

L.S.

, & Charter

R.A.

(2003). Estimating the reliability of a test split into two parts of equal or unequal length. Psychological Methods, 8(1), 102–109.

25.

Feldt

L.S.

, & Qualls

A.L.

(1996). Bias in coefficient alpha arising from heterogeneity of test content. Applied Measurement in Education, 9(3), 277–286.

26.

Feldt

L.S.

, Woodruff

D.J.

, & Salih

F.A.

(1987). Statistical inferences for coefficient alpha. Applied Psychological Measurement, 11(1), 93–103.

27.

Finn

, & Kayandé, U. (1997). Reliability assessment and optimization of marketing measurement. Journal of Marketing Research, 34(May), 262–275.

28.

Finn

, & Kayandé, U. (2004). Scale modification: Alternative approaches and their consequences. Journal of Retailing, 80(1), 37–52.

29.

Gerbing

D.W.

, & Anderson

J.C.

(1988). An updated paradigm for scale development incorporating unidimensionality and its assessment. Journal of Marketing Research, 25(May), 186–192.

30.

Geuens

, Weijters

, & De Wulf

(2009). A new measure of brand personality. International Journal of Research in Marketing, 32(1), 97–107.

31.

Graham

J.M.

(2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: What they are and how to use them? Educational and Psychological Measurement, 66(6), 930–944.

32.

Green

S.B.

, Lissitz

R.W.

, & Mulaik

S.A.

(1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37(4), 827–838.

33.

Hardesty

D.M.

, & Bearden

W.O.

(2004). The use of expert judges in scale development: Implications for improving face validity of measures of unobservable constructs. Journal of Business Research, 57(2), 98–107.

34.

Heise

D.R.

, & Bohrnstedt

G.W.

(1970). Validity, invalidity and reliability. In Borgatta

E.F.

& Bohrnstedt

G.W.

(Eds), Sociological methodology (pp. 104–129). San Francisco, CA: Jossey-Bass.

35.

Helms

J.E.

, Henze

K.T.

, Sass

T.L.

, & Mifsud

V.A.

(2006). Treating Cronbach’s alpha reliability coefficients as data in counseling research. The Counseling Psychologist, 34(5), 630–660.

36.

Hogan

T.P.

, Benjamin

, & Brezinski

K.L.

(2000). Reliability methods: A note on the frequency of use of various types. Educational and Psychological Measurement, 60(4), 523–531.

37.

Horn

J.L.

(1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(June), 179–185.

38.

Iacobucci

(1994). Classic factor analysis. In Bagozzi

R.P.

(Ed.), Principles of marketing research (pp. 279–316). Cambridge, MA: Blackwell.

39.

Iacobucci

(2001). Factor analysis. Journal of Consumer Psychology, 10(1/2), 75–76.

40.

Jarvis

, MacKenzie

S.B.

, & Podsakoff

P.M.

(2003). A critical review of construct indicators and measurement model misspecification in marketing and consumer research. Journal of Consumer Research, 30(September), 199–218.

41.

Jha

, & Bhattacharyya

S.K.

(2013). Learning orientation and performance orientation: Scale development and its relationship with performance. Global Business Review, 14(1), 43–54.

42.

Jöreskog

K.G.

(1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36(2), 109–133.

43.

Kamata

, Turhan

, & Darandari

(2003). Estimating reliability for multidimensional composite scale scores. Paper presented at the annual meeting of the American Educational Research Association, April, Chicago, IL. Retrieved 3 January 2010, from http://www.coe.fsu.edu/AERA/Kamata2.pdf

44.

Koçak

, Abimbola

, & Özer

(2007). Consumer brand equity in a cross-cultural replication: An evaluation of a scale. Journal of Marketing Management, 23(1/2), 157–173.

45.

Komaroff

(1997). Effect of simultaneous violations of essential tau–equivalence and uncorrelated error on coefficient alpha. Applied Psychological Measurement, 21(4), 337–348.

46.

Lee

, & Hooley

(2005). The evolution of ‘classical mythology’ within marketing measure development. European Journal of Marketing, 39(3/4), 365–385.

47.

, Rosenthal

, & Rubin

D.B.

(1996). Reliability of measurement in psychology: From Spearman–Brown to maximal reliability. Psychological Methods, 1(1), 98–107.

48.

Lord

F.M.

, & Novick

M.R.

(1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

49.

Lucke

J.F.

(2005). ‘Rassling the hog’: The influence of correlated item error on internal consistency, classical reliability, and congeneric reliability. Applied Psychological Measurement, 29(2), 106–125.

50.

Lumsden

(1976). Test theory. Annual Review of Psychology, 27(1), 251–280.

51.

McDonald

R.P.

(1970). The theoretical foundations of common factor analysis, principal factor analysis and alpha factor analysis. British Journal of Mathematical Psychology, 23(1), 1–21.

52.

McDonald

R.P.

(1985). Factor analysis and related methods. Hillsdale, NJ: Erlbaum.

53.

McDonald

R.P.

(1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.

54.

Netemeyer

R.G.

, Bearden

W.O.

, & Sharma

(2003). Scaling procedures: Issues and applications. Thousand Oaks, CA: SAGE Publications.

55.

Novick

M.R.

, & Lewis

(1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32(1), 1–13.

56.

Nunnally

J.C.

(1967). Psychometric theory. New York, NY: McGraw-Hill.

57.

O’connor

B.P.

(2000). SPSS and SAS programs for determining the number of components using parallel analysis and Velicer’s MAP test. Behavior Research Methods, Instruments, & Computers, 32(3), 396–402.

58.

Osburn

H.G.

(2000). Coefficient alpha and related internal consistency reliability coefficients. Psychological Methods, 5(3), 343–355.

59.

Patil

V.H.

, Singh

S.N.

, Mishra

, & Donavan

T.D.

(2008). Efficient theory development and factor retention criteria: Abandon the ‘eigenvalue greater than one’ criterion. Journal of Business Research, 61(2), 162–170.

60.

Peter

J.P.

(1979). Reliability: A review of psychometric basics and recent marketing practices. Journal of Marketing Research, 16(1), 6–17.

61.

Peter

J.P.

(1981). Construct validity: A review of basic issues and marketing practices. Journal of Marketing Research, 18(2), 133–145.

62.

Peterson

R.A.

(1994). A meta-analysis of Cronbach’s coefficient alpha. Journal of Consumer Research, 21(September), 381–391.

63.

Preacher

K.J.

, & MacCallum

R.C.

(2003). Repairing Tom Swift’s electric factor analysis machine. Understanding Statistics, 2(1), 13–43.

64.

Rae

(2007). A note on using stratified alpha to estimate the composite reliability of a test composed of interrelated nonhomogeneous items. Psychological Methods, 12(2), 177–184.

65.

Rae

(2008). A note on using alpha and stratified alpha to estimate the reliability of a test composed of item parcels. British Journal of Mathematical and Statistical Psychology, 61(2), 515–525.

66.

Rajaratnam

, Cronbach

L.J.

, & Gleser

G.C.

(1965). Generalizability of stratified-parallel tests. Psychometrika, 30(1), 39–56.

67.

Raykov

(1997). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21(2), 173–184.

68.

Raykov

(1998). Coefficient alpha and composite reliability with interrelated non homogeneous items. Applied Psychological Measurement, 22(4), 375–385.

69.

Revelle

(1979). Hierarchical cluster analysis and the internal structure of tests. Multivariate Behavioral Research, 14(1), 57–74.

70.

Richins

M.L.

(2004). The material values scale: Measurement proprieties and the development of a short form. Journal of Consumer Research, 31(June), 209–219.

71.

Rossiter

J.R.

(2002). The C-OAR-SE procedure for scale development in marketing. International Journal of Research in Marketing, 19(4), 305–335.

72.

Rossiter

J.R.

(2011). Marketing measurement revolution: The C-OAR-SE method and why it must replace psychometrics. European Journal of Marketing, 45(11), 1561–1588.

73.

Sijtsma

(2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74(1), 107–120.

74.

Smith

R.E.

, MacKenzie

S.B.

, Yang

, Buchholz

L.M.

, & Darley

W.K.

(2007). Modeling the determinants and effects of creativity in advertising. Marketing Science, 26(6), 819–833.

75.

Streiner

D.L.

(2003). Starting at the beginning: An introduction to coefficient alpha and internal consistency. Journal of Personality Assessment, 80(1), 99–103.

76.

Sung

, Choi

S.M.

, Ahn

, & Song

Y.-A

. (2015). Dimensions of luxury brand personality: Scale development and validation. Psychology & Marketing, 32(1), 121–132.

77.

Velicer

W.F.

(1976). Determining the number of components from the matrix of partial correlations. Psychometrika, 41(3), 321–327.

78.

Voss

K.E.

, Stem

D.E.

, & Fotopoulos

(2000). A comment on the relationship between coefficient alpha and scale characteristics. Marketing Letters, 11(2), 177–191.

79.

Werts

C.E.

, Rock

D.R.

, Linn

R.L.

, & Jöreskog

K.G.

(1978). A general method of estimating the reliability of a composite. Educational and Psychological Measurement, 38(4), 933–938.

80.

Willems

, & Swinnen

(2011). Am I cheap? Testing the role of store personality and self-congruity in discount retailing. The International Review of Retail, Distribution and Consumer Research, 21(5), 513–539.

81.

Zinbarg

R.E.

, Revelle

, Yovel

, & Li

(2005). Cronbach’s α, Revelle’s β, and McDonald’s ω_h: Their relation with each other and two alternative conceptualizations of reliability. Psychometrika, 70(1), 1–11.

82.

Zwick

W.R.

, & Velicer

W.F.

(1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletins, 99(3), 432–442.