Factorial Invariance of the Statistical Anxiety Rating Scale across Sex and Students' Classification

Abstract

The Statistical Anxiety Rating Scale (STARS) was used to measure statistics anxiety across 423 graduate and undergraduate students from a midsized university, in the western United States. Students' responses were analyzed using confirmatory factor analysis (CFA) to assess the validity of scores from the proposed six-factor model, which was well-fitting, according to various adjunct fit indexes. Students' responses were then examined using multigroup CFA to explore factorial invariance across sex and student classification (i.e., undergraduates and graduates). The model was found to be factorially invariant across sex, but not across student classification, possibly meaning graduate and undergraduate students ascribed different meaning to some items. If one ignores the test of factorial invariance, between-groups statistical tests can be unduly influenced by measurement artifacts, sometimes erroneously identifying statistically significant mean differences when there are none.

Many university students taking statistics classes report statistics anxiety (Onwuegbuzie, 2004). Statistics anxiety has been researched for several decades (Earley & Mertler, 2002), using numerous tests. One of the most popular tests is the Statistical Anxiety Rating Scale (STARS; Cruise, Cash, & Bolton, 1985). The STARS was developed to measure statistics anxiety using a sample of 423 students (graduate and undergraduate) across a variety of academic disciplines. Statistics anxiety has been defined in several ways, but commonly cited ones are attributable to Cruise, et al. (1985), Zeidner (1990), and Onwuegbuzie, Da Ros, and Ryan (1997). Onwuegbuzie, et al. defined statistics anxiety as “a state-anxiety reaction to any situation in which a student is confronted with statistics in any form and at any time” (p. 28). Cruise, et al.'s definition stated “the feelings of anxiety encountered when taking a statistics course or doing statistical analysis; that is, gathering, processing, and interpreting” (p. 92), whereas Zeidner defined statistics anxiety as a performance characterized by extensive worry, intrusive thoughts, mental disorganization, tension, and physiological arousal … when exposed to statistics content, problems, instructional situations, or evaluative contexts, and is commonly claimed to debilitate performance in a wide variety of academic situations by interfering with the manipulation of statistics data and solution of statistics problems. (p. 319)

Prior estimates show statistics anxiety is experienced by the majority of graduate students at uncomfortable levels (Onwuegbuzie, 2004). Of major concern is that performance in a statistics class and magnitude of statistics anxiety are negatively related (Zeidner, 1990; Onwuegbuzie & Seaman, 1995; Fitzgerald & Jurs, 1996). This is alarming, considering the important role statistics plays in quantitative research (Birenbaum & Eylath, 1994). Graduate students should be able to readily interpret statistical findings in scholarly publications (Birenbaum & Eylath, 1994). Knowledge of statistics and applying statistical techniques are ever more critical in all academic disciplines (Baloğlu, 2003; Mji, 2009). Although research methods are commonly taught separately from statistics in graduate programs, research methods per se are not typically part of the undergraduate curriculum, so statistics courses may be students' only formal introduction to research methods. Statistics anxiety is not the principal concern, but rather the outcomes related to anxiety (Mji, 2009). Anxious students may have difficulty learning and using statistics.

There are many explanations for possible sources of statistics anxiety. These are often broken down into three categories: environmental, dispositional, and situational (Baloğlu, 2003). Environmental aspects might include sex, age, ethnicity, academic major, and previous mathematics experiences (Baloğlu, 2003), which may be described as the biases one brings into the statistics course (Onwuegbuzie, et al., 1997). Some prior research indicated women experience difficulty in quantitative areas (Royse & Romph, 1992) and that women experienced higher statistics anxiety (Zeidner, 1990; Onwuegbuzie & Seaman, 1995); however, other research has indicated that no such significant differences (Cruise & Wilkins, 1980; Baloğlu, 2003). Unlike environmental factors, situational factors associated with greater of statistics anxiety have been reported during enrollment in statistics class. These factors include exposure to statistical definitions, the instructor (Onwuegbuzie, et al., 1997), lack of feedback from the instructor (Zeidner, 1991), and the general nature of the statistics class. Dispositional factors which might be related to statistics anxiety include learning styles (Onwuegbuzie, 1998), general attitude toward statistics (Harvey, Plake, & Wise, 1985), and perceptions of statistics (Zeidner, 1991). For instance, because many students dread being required to take a statistics course, they often take these courses at the end of their degree program (Onwuegbuzie & Wilson, 2003; Zeidner, 1991). Waiting until the end of their academic careers before enrolling in a required statistics class means not actively applying statistics during their academic training (Onwuegbuzie, et al., 1997). In addition, many students perceive statistics as the most difficult class they have to take (Schacht & Stewart, 1990) and perform worse than in other classes (Onwuegbuzie, Slate, Paterson, & Watson, 2000).

There is also evidence of higher statistics anxiety among graduate students than undergraduates (Harvey, et al., 1985; Benson & Bandalos, 1989;). On the other hand, in a separate study, Benson (1989) reported statistics anxiety did not differ statistically significantly between undergraduate and graduate students. Higher statistics anxiety also was correlated negatively with the length of the class (Bell, 2001), suggesting shorter courses may be linked to higher anxiety.

Psychometrics of the STARS

Factorial invariance (also known as equivalence of measurement) allows the assumption that a construct holds the same meaning for the different groups tested. It is proposed that by ignoring a test of factorial invariance, researchers essentially ignore a fundamental measurement assumption when conducting between-group comparisons. Then, a comparison could be statistically significant or not only as a result of a measurement artifact, namely nonequivalence of measurement between groups. The implications of ignoring this issue in the context of the STARS should be examined. Even researchers not interested in statistics anxiety may have interest in the broad implications of measurement nonequivalence, which, in the extreme, could put into question one's statistical conclusions.

Baloğlu (2002) attempted to confirm the six-factor model of STARS using a sample of 221 undergraduate college students. All factor loadings were significant (p < .05). Model fit was also assessed using several fit indices, including the goodness-of-fit index (GFI; Jöreskog & Sörbom, 1982), comparative fit index (CFI; Bentler, 1990), and root mean square error of approximation (RMSEA; Steiger, 1990). All measures of fit indicated the six-factor model was not a good fit for the data, so support for construct validity of the STARS was minimal in this group. Hanna, Shevlin, and Dempster (2008) conducted confirmatory factor analyses, testing one-, four-, and six-factor models with 849 undergraduate psychology students in the United Kingdom. Appropriate fit indices, including RMSEA, CFI, and standardized root mean square residual (SRMR) indicated the one-factor model fit the data poorly. Both the 4- and six-factor models exhibited reasonable model fit, with the six-factor model fitting the data the best.

Dauphinee, Schau, and Stevens (1997) conducted azconfirmatory factor analysis of the Survey of Attitudes Toward Statistics (SATS) in a group of 991 undergraduate students. An invariance analysis across sex suggested that the SATS model was equivalent across sex. Maximum likelihood confirmatory factor analysis was used using LISREL 7. Although this was then the only known method of performing an invariance analysis, this method has now been judged to be inaccurate for use with ordinal data. Instead, a weighted least squares mean- and variance-adjusted (WLSMV) estimator must be used to accommodate ordinal data (Lubke & Muthén, 2004; Millsap & Yun-Tein, 2004). This estimator is generally robust under ordinal (non-normal) data conditions (Flora & Curran, 2004; Hutchinson, Raymond, & Black, 2008).

Clearly, statistics anxiety has been studied for decades; but, there are no studies which have first assessed whether the validity of the scores from STARS are equivalent across different subgroups. This is a particularly important issue because absence of measurement equivalence implies that group responses are not meaningfully comparable (Byrne, Shavelson, & Muthén, 1989; Vandenberg & Lance, 2000). Absence of measurement equivalence prohibits valid score comparison across different subgroups because the comparison is essentially “misleading and illegitimate” (Hui & Triandis, 1985, p. 134). Without measurement equivalence, notable mean differences in statistics anxiety across different groups may be attributable to measurement artifacts rather than real differences in perception of statistics anxiety (Hutchinson, et al., 2008). If there is no measurement equivalence across comparison groups, it is possible that prior research results are invalid because the assumption of equivalent groups was incorrect. The current study brings to the forefront the importance of measurement equivalence when making group comparisons. For instance, if subpopulations interpret the meaning of the STARS items differently, then no accurate comparison of groups can be meaningfully made.

The research on statistics anxiety is fairly extensive, but there are clear weaknesses in the literature. Although the STARS has been a popular scale for measuring statistics anxiety, there has been no psychometric research on whether STARS is measuring statistics anxiety equivalently for all students. Perhaps previous studies have been published under the assumption that STARS does measure statistics anxiety equivalently for all students. This assumption may be incorrect. Therefore, the rationale for the present study was to assess whether measurement is equivalent among different groups of students who have typically been compared.

Hypothesis 1. Statistics anxiety will exhibit factorial invariance across sex.

Hypothesis 2. Statistics anxiety will exhibit factorial invariance across students' year of study.

Method

Participants

Upon IRB approval, students at a midsized university in the Western part of the USA were recruited from intact classrooms to complete the STARS and a demographics questionnaire. There were 423 participants (293 women, Mage = 26.6 yr., SD = 10.5; 130 men, Mage = 24.9 yr., SD = 9.6). Both undergraduate students (120 freshmen, 64 sophomores, 51 juniors, 23 seniors) and graduate students (53 master's, 112 doctoral) were recruited to take the survey. Students were enrolled in introductory undergraduate statistics and psychology classes, or in intermediate and advanced graduate statistics and research methods classes. Students across various academic disciplines were represented in this sample, including behavioral sciences (n = 89), business (n = 14), education (n = 41), health sciences (n = 82), performing and visual arts (n = 6), social sciences (n = 31), and other (n = 8).

Measures

The survey included the STARS instrument by demographic questions (see Appendix, pp.). Each of the 51 items in the STARS (Cruise & Wilkins, 1980) was rated on a 5-point scale. For the first 23 items measuring test and class anxiety, interpretation anxiety, and fear of asking for help, participants rated their anxiety using anchors of 1: No anxiety and 5: Strong anxiety. A sample item is “Studying for an examination in a statistics course”. For the next 28 items measuring worth of statistics, computational self-concept, and fear of statistics teachers, participants were asked to rate their agreement using anchors of 1: Strongly agree and 5: Strongly disagree. A sample item is “Statistics is worthless to me since it is empirical and my area of specialization is abstract”. For Items 1 to 23, higher scores on each item correspond to higher anxiety; and for Items 24 to 51, higher scores on each item correspond to more positive attitudes.

Onwuegbuzie (1999) reported an estimate of internal reliability of scores on the STARS of .78 on the Worth of Statistics subscale and .84 on the Test and Class Anxiety subscale with a median of .8 for 225 African-American participants at a small suburban college of education in a mid-southern state. Baloğlu (2002) reported internal consistency reliability coefficients on the STARS scores of .64 on the Fear of Statistics Teachers subscale of .94 on the Worth of Statistics subscale for 246 college students.

The survey also comprised a series of demographic questions, including sex, age, academic major, classification (i.e., freshman, sophomore, junior, senior, master's, doctoral), mathematics background (i.e., statistics course in high school, undergraduate statistics course in college, graduate statistics courses, high school algebra, college algebra, trigonometry, and calculus). For mathematics background, students could check all that applied.

Procedure

Participants in undergraduate introductory statistics courses were given a hard copy of the survey and a separate sheet of paper with the URL to access the online version. The class was given the option of either taking the survey in paper format or online; all chose to complete the hard copy. Participants in another class were provided a hard copy of the survey without an option to complete online. Other classes were only provided with the URL to access the online survey. The online survey format was also used as a follow-up method for collecting data—an e-mail was sent out to introductory statistics and psychology students with the URL of the survey. In all, 69 participants completed the paper survey and 354 completed the online survey.

The human participants consent form was on the first page of the paper survey packet. Participants were explicitly told to tear off the consent form to keep for their own records. The next six pages comprised the STARS. A cash prize entry form was attached as the last page; participants were told to detach this page from the survey packet and fill in their name, phone number, and e-mail address if they were interested in having a chance to win the $20 cash prize. The cash prize entry forms were kept in a separate stack from the rest of the survey. Entry forms were shuffled to minimize possible matching of entry form to participants' corresponding surveys.

Those participants who chose to complete the survey online were first presented a screen encompassing the human participant consent form. At the bottom of the screen were two options: I consent and I decline. Declining to consent directed the participant to the university's website. Consenting directed the participant to the survey. Online, if a participant wished to enter into the $20 cash prize drawing, a text box was available to provide the necessary contact information (i.e., name, phone number, e-mail address). If the participant felt his anonymity could be compromised by entering contact information into the textbox, an e-mail address was also provided for separate transmission.

Before data collection began, an application for exempt review was submitted to and subsequently approved by the institutional review board at the university at which the study was conducted. Depending on the feasibility of the researcher personally being able to distribute paper surveys, some students were only given the option to take the online version. For all introductory statistics students, a follow-up email was sent to indicate that they could take the survey online if they had not taken it in class. The primary purpose of this follow-up was to solicit a response from those students who did not attend class on the day the paper survey was distributed.

Results

Preliminary Analyses

Data were entered into PASW 18 (IBM, 2010) and preliminary descriptive (e.g., means, standard deviations, frequencies) and internal consistency reliability analyses (e.g., Cronbach's alpha) were run. Cronbach's alpha was .96 for the total score. Reliability coefficients for the six subscales are shown in Table 1, which are relatively consistent with estimates found in prior studies.

TABLE 1
Cronbach's Alpha by Subpopulation

Scale Women Men Undergraduate Graduate Total

(n = 293) (n = 130) (n = 258) (n = 165) (N = 423)

Test and Class Anxiety .87 .88 .87 .90 .88

Interpretation Anxiety .88 .87 .89 .88 .88

Fear of Asking for Help .84 .83 .82 .87 .84

Worth of Statistics .95 .95 .94 .92 .95

Fear of Statistics Teacher .80 .76 .80 .77 .79

Computational Self-concept .87 .86 .86 .87 .86

Total Scale Score .96 .96 .96 .96 .96

Baseline Model

A six-factor confirmatory factor analysis model was specified with all observed variables being ordered categorical and estimated with Mplus (Version 5.21; Muthén & Muthén, 2010), using the WLSMV estimator. Overall model fit and component fit were examined using several measures of fit, including an appropriate χ² test employing the WLSMV estimator, RMSEA, Tucker-Lewis index (TLI; Tucker & Lewis, 1973), and CFI.

The six-factor model in the current study fit well. Although the χ² test statistic was statistically significant, as shown in Table 2, several adjunct fit indexes provided evidence supporting the six-factor model. The CFI was .943 and the TLI was .940 (each with a range of 0 to 1.0), where the closer the measure is to 1.0 the better the model fit. The RMSEA was .059, where a lower number is more desirable and indicates better fit (RMSEA values can range from 0 to infinity and the RMSEA is considered a measure of “badness of fit”). These fit values were compared to Hu and Bentler's (1999) guidelines for cutoff criteria and indicate adequate model fit. The RMSEA was higher than the suggested cutoff value of .05 for close fit (Browne & Cudeck, 1993); however, Brown and Cudeck also stated that a RMSEA value less than .08 is indicative of reasonable fit. The CFI was slightly lower than .95, which is the cutoff criterion suggested by Hu and Bentler (1999) and indicates model fit may not be ideal. In regard to component fit, all parameter estimates (standardized factor loadings) were statistically significant (p < .01) and in the expected direction, which is indicative of good fit.

TABLE 2
Baseline Model Fit for Six-factor STARS Model

Model χ² df p TLI CFI RMSEA

Initial model (51 items) 2965.58 1209 <.001 .940 .943 .059

Revised model (49 items) 2561.61 1112 <.001 .947 .950 .056

Model modification indices (MI) were examined to identify areas of misfit, which provides an estimate of the decrease in χ² for the overall model if a given parameter were freed for estimation (Brown, 2006). Two items from the Agreement subscale produced extreme modification index values on the factor loadings. Item 11, “Since I have never enjoyed math I do not see how I can enjoy statistics,” was set to load on the factor computational self-concept, and Item 24, “Statistical figures are not fit for human consumption,” were set to load on the factor Worth of Statistics. Given possible multiple factor loadings suggested by the modification indices for these two items, they were removed from the model and the CFA was re-run. The fit of the new model was slightly better, as shown in Table 2. Table 2 summarizes the fit statistics for the two versions of the baseline model. A baseline model is the first step for subsequent model comparisons.

Invariance Results

Invariance by sex.—Before testing for invariance across sex, the six-factor baseline model was separately fit for women and men to assess whether the model provided good fit for each group separately. Adjunct fit indexes suggested good global model fit for both sexes (Table 3). In regard to component fit, all parameter estimates were statistically significant for both sexes and indicative of good fit.

TABLE 3
Invariance Tests of Six-factor STARS Model by Sex and Student Classification

Model χ² df Δχ²* Δdf* p TLI CFI RMSEA

Sex

    Baseline Men 1478.04 1112 <.0001 .955 .957 .050

    Baseline Women 2181.14 1112 <.011 .944 .947 .057

    Configural 3124.37 1956 <.001 .952 .955 .053

    Metric 3164.07 1996   60.11   40 .02 .953 .955 .053

    Scalar 3245.60 2119 153.95 123 .03 .957 .956 .050

Education Level

    Baseline Undergraduate 1978.40 1112 <.001 .939 .942 .055

    Baseline Graduate 1533.96 1112 <.001 .952 .955 .048

    Configural 3166.65 2044 <.001 .950 .952 .051

    Metric 3248.82 2085 142.51   41 <.001 .949 .951 .051

*
Based on WLSMV-adjusted calculations.

The first step of a factorial invariance analysis is to assess configural invariance, which requires identical models to be specified for both men and women with the added stipulation that parameters are separately estimated for the men and women. As shown in Table 3, despite the χ² test being statistically significant, the CFI, TLI, and RMSEA are suggestive of good fit. Therefore, the six-factor model seems to be suitable for both men and women.

The second step of the invariance analysis, metric invariance (Vandenberg & Lance, 2000), tested what the decrement in model fit would be statistically significant if the factor loadings were constrained equal for the men and women. The χ² difference test was not statistically significant p = .02 (applying Bonferroni adjustment and using alpha of .01 to account for numerous statistical tests), indicating items appear to be functioning similarly for both groups. Therefore, the invariance testing continued.

The test for scalar invariance (Vandenberg & Lance, 2000) was the third step in the invariance analyses, where it was tested whether the item thresholds were invariant across men and women. The χ² difference test was not statistically significant (Table 3), meaning scalar invariance held, so the invariance testing proceeded to the test of latent means differences.

Latent mean differences between men and women for both the Test and Class Anxiety factor and the Interpretation Anxiety factor were statistically significant (Table 4), indicating that women were reporting higher anxiety in those areas. Differences between latent means were not statistically significant for the men and women on Fear of Asking for Help, Worth of Statistics, Fear of Statistics Teacher, or Computational Self-concept (Table 4).

TABLE 4
Tests of Latent Means Between Sexes

Factor Latent Mean p

Test and Class Anxiety −0.405 <.001

Interpretation Anxiety −0.422 <.001

Fear of Asking for Help −0.252 .04

Worth of Statistics −0.021 .85

Fear of Statistics Teacher   0.170 .19

Computational Self-concept −0.062 .60

Note.—A negative latent mean indicates that women exhibited higher anxiety or more agreement.

Invariance by students' classification.—Separate CFA models were tested for undergraduate and graduate students. According to the adjunct fit indexes (Table 3), the baseline model fit adequately for undergraduates; however, both CFI and TLI were slightly lower than desirable. Estimates of all parameters were statistically significant. The CFA model for graduate students fit well (Table 3). Estimates of all parameters were statistically significant for both groups.

Configural invariance was observed, despite a statistically significant χ² value (Table 3), indicating support for the tenability of the same pattern of fixed and free factor loadings across undergraduate and graduate students. Metric invariance, however, did not hold across this subpopulation, with the χ² difference test being statistically significant (Table 3). In general, lack of metric invariance indicates that some items may have behaved differently across undergraduate students and graduate students, even though the overall factor structure held across the two groups.

Because factorial invariance failed at the metric level of testing, scalar invariance was not tested. Instead, post hoc tests were performed to study the nature of between-group differences for the latent variable. Chi-square difference tests were conducted for the six latent variables (Table 5). From the statistical tests, one may infer both configural and metric invariance were evident for two factors, Test and Class Anxiety and Interpretation Anxiety. The factor Fear of Asking for Help passed the test for configural, metric, and scalar invariance. However, the last three factors (Worth of Statistics, Fear of Statistics Teachers, Computational Self-concept) were noninvariant on the factor loadings (metric invariance), which may suggest of undergraduate and graduate students interpreting the STARS items in different ways.

TABLE 5
PosthocModel Fit for Student Classification

Model χ² df Δχ² Δdf p TLI CH RMSEA

Test and Class Anxiety

Configural 204.41 41 <.0001 .95 .97 .14

Metric 194.21 48   8.32   7 .31 .96 .97 .12

Scalar 216.31 70 46.63 22 .002 .98 .97 .10

Interpretation Anxiety

Configural 213.44   71 <.0001 .95 .96 .10

Metric 225.60   80   20.32   9 .010 .96 .96 .09

Scalar 342.65 108 123.03 28 <.0001 .95 .94 .10

Fear of Asking for Help

Configural 88.08     5 <.0001 .94 .98 .28

Metric 98.470   8 5.36   3 .15 .96 .97 .23

Scalar 97.960 18 9.34 10 .50 .99 .98 .15

Worth of Statistics

Configural 616.05 181 <.0001 .97 .97 .11

Metric 669.54 195 65.64 14 <.0001 .97 .97 .11

Fear of Statistics Teacher

Configural   32.40 11 .0007 .97 .99 .10

Metric   46.12 15 14.26   4 .0065 .97 .98 .10

Computational Self-concept

Configural 112.86 11 <.0001 .91 .95 .21

Metric 119.21 15 15.29   4 .0041 .93 .95 .18

Discussion

Cruise and other authors using the STARS suggested that the construct of statistics anxiety consists of six factors: Worth of Statistics, Interpretation Anxiety, Test and Class Anxiety, Computational Self-concept, Fear of Asking for Help, and Fear of Statistics Teachers. A CFA confirmed the tenability of this six-factor structure. The WLSMV estimator was used because of its robustnessto handle ordinal data (Flora & Curran, 2004).

Use of the WLSMV estimator does not seem to be described in studies examining rating scale data from scales purporting to measure statistics anxiety. Results of studies not employing the WLSMV estimator were based on state-of-the-art estimators at the time; however, new evidence in the form of a more appropriate estimator could bring the prior results inot question. For instance, Dauphinee, et al. (1997) analyzed rating scale data from participants complete the Survey of Attitudes Towards Statistics (SATS) employing the maximum likelihood estimator to conduct a multigroup CFA across sex and found the measurement model to be equivalent across both groups. However, it is conceivable that by employing the WLSMV estimator, which would have produced unconfounded estimates of thresholds and factor loadings (Lubke & Muthén, 2004; Millsap & Yun-Tein, 2004), may have led to different conclusions.

In the current study, the factorial invariance tests in the current study employed the WLSMV estimator in the multigroup CFA and indicated that the six-factor measurement model was equivalent across sex. The presence of metric invariance indicates that the individual items from the STARS function similarly for men and women. Scalar invariance indicates the thresholds do not differ by sex.

Because the model by sex was invariant across the thresholds, tests of latent means were performed for each of the six factors. Two sex differences were identified as statistically significant: Women had higher mean ratings on Test and Class Anxiety and Interpretation Anxiety. Further implications of invariance of the thresholds include the validity of between-group comparisons made for sex. If the researcher compares observed means (or latent means) on two groups, an independent-samples t test would be meaningful and readily interpretable as a true mean difference. However, in the absence of metric invariance, researchers should be cautious in interpreting the meaning of such differences.

On the other hand, invariance tests across undergraduate and graduate students indicated that these two groups may not have ascribed the same meaning to all items on the STARS, i.e., scalar noninvariance was present. This finding precluded any meaningful between-group comparisons of these two groups (Vandenberg & Lance, 2000). In other words, the STARS should not be used if the purpose is to compare mean scores of undergraduate and graduate students.

Although some research supports higher scores on statistics anxiety among graduate students than for undergraduate students (e.g., Benson & Bandalos, 1989; Harvey, et al., 1985), tests for factorial invariance were not conducted. Therefore, it is unknown whether the results are meaningful for the samples in those studies.

Conclusions

Although discussions of equivalence of measurement or factorial invariance may not be found in many statistics texts, researchers should be aware of the effect of imprecise measurement in statistical analyses. In particular, lack of factorial invariance of metrics (thresholds) should preclude between-groups comparisons, even for an independent-samples t test. For instance, should one find a statistically significant difference in observed (or latent) sample means without first conducting a multigroup CFA to test for the presence of factorial invariance, there can be no certainty that this difference reflects a real difference in the trait of interest. It is possible the difference is an artifact of measurement, reflecting nonequivalence in the sample.

Factorial invariance should be viewed as an assumption in between-groups statistical tests, as one should strengthen the validity of statistical conclusions based on comparison of group scores on a trait. Before assuming that certain subpopulations of interest are invariant, that assumption must first be tested to ensure such comparisons are valid.

Limitations

The sample of students in the current study came from one university and therefore may affect generalizability of results to other populations. Invariance tests by academic discipline were not conducted in the current study, as group sample sizes were not sufficient. Future research should focus on obtaining larger and more diverse samples by academic concentration, as statistics anxiety might manifest differently for physical science majors as opposed to liberal arts majors, for example.

Scale	Women	Men	Undergraduate	Graduate	Total
Test and Class Anxiety	.87	.88	.87	.90	.88
Interpretation Anxiety	.88	.87	.89	.88	.88
Fear of Asking for Help	.84	.83	.82	.87	.84
Worth of Statistics	.95	.95	.94	.92	.95
Fear of Statistics Teacher	.80	.76	.80	.77	.79
Computational Self-concept	.87	.86	.86	.87	.86
Total Scale Score	.96	.96	.96	.96	.96

Model	χ²	df	p	TLI	CFI	RMSEA
Initial model (51 items)	2965.58	1209	<.001	.940	.943	.059
Revised model (49 items)	2561.61	1112	<.001	.947	.950	.056

Model	χ²	df	Δχ²*	Δdf*	p	TLI	CFI	RMSEA
Sex
Baseline Men	1478.04	1112			<.0001	.955	.957	.050
Baseline Women	2181.14	1112			<.011	.944	.947	.057
Configural	3124.37	1956			<.001	.952	.955	.053
Metric	3164.07	1996	60.11	40	.02	.953	.955	.053
Scalar	3245.60	2119	153.95	123	.03	.957	.956	.050
Education Level
Baseline Undergraduate	1978.40	1112			<.001	.939	.942	.055
Baseline Graduate	1533.96	1112			<.001	.952	.955	.048
Configural	3166.65	2044			<.001	.950	.952	.051
Metric	3248.82	2085	142.51	41	<.001	.949	.951	.051

Factor	Latent Mean	p
Test and Class Anxiety	−0.405	<.001
Interpretation Anxiety	−0.422	<.001
Fear of Asking for Help	−0.252	.04
Worth of Statistics	−0.021	.85
Fear of Statistics Teacher	0.170	.19
Computational Self-concept	−0.062	.60

Model	χ²	df	Δχ²	Δdf	p	TLI	CH	RMSEA
Test and Class Anxiety
Configural	204.41	41			<.0001	.95	.97	.14
Metric	194.21	48	8.32	7	.31	.96	.97	.12
Scalar	216.31	70	46.63	22	.002	.98	.97	.10
Interpretation Anxiety
Configural	213.44	71			<.0001	.95	.96	.10
Metric	225.60	80	20.32	9	.010	.96	.96	.09
Scalar	342.65	108	123.03	28	<.0001	.95	.94	.10
Fear of Asking for Help
Configural	88.08	5			<.0001	.94	.98	.28
Metric	98.470	8	5.36	3	.15	.96	.97	.23
Scalar	97.960	18	9.34	10	.50	.99	.98	.15
Worth of Statistics
Configural	616.05	181			<.0001	.97	.97	.11
Metric	669.54	195	65.64	14	<.0001	.97	.97	.11
Fear of Statistics Teacher
Configural	32.40	11			.0007	.97	.99	.10
Metric	46.12	15	14.26	4	.0065	.97	.98	.10
Computational Self-concept
Configural	112.86	11			<.0001	.91	.95	.21
Metric	119.21	15	15.29	4	.0041	.93	.95	.18

Footnotes

APPENDIX

References

Baloğlu

(2002) Psychometric properties of the Statistics Anxiety Rating Scale. Psychological Reports, 90, 315–325.

Baloğlu

(2003) Individual differences in statistics anxiety among college students. Personality and Individual Differences, 34, 855–865.

Bell

J. A.

(2001) Length of course and levels of statistics anxiety. Education, 121, 713–716.

Bell

J. A.

(2003) Statistics anxiety: The nontraditional student. Education, 124, 157–162.

Benson

(1989) Structural components of statistical test anxiety in adults. Journal of Experimental Education, 57, 247–261.

Benson

, & Bandalos

(1989) Structural model of statistical test anxiety. In Schwarzer

van der Ploeg

H. M.

, & Spielberger

C. D.

(Eds.), Advances in test anxiety research. Hillsdale, NJ: Swets & Zeitlinger.

Bentler

P. M.

(1990) Comparative fit indexes in structural models. Quantitative Methods in Psychology, 107, 238–246.

Birenbaum

, & Eylath

(1994) Who is afraid of statistics? Educational Research, 36, 93–98.

Brown

T. A.

(2006) Confirmatory factor analysis for applied research. New York: Guilford.

10.

Browne

M. W.

, & Cudeck

(1993) Alternative ways of assessing model fit. In Bollen

K. A.

& Long

J. S.

(Eds.), Testing structural equation models. Newbury Park, CA: Sage.

11.

Byrne

B. M.

Shavelson

R. J.

, & Muthén

B. O.

(1989) Testing for the equivalence of factor covariance and mean structures. Psychological Bulletin, 105, 456–466.

12.

Cruise

R. J.

Cash

R. W.

, & Bolton

D. L.

(1985) Development and validation of an instrument to measure statistical anxiety. Unpublished manuscript, Andrews Univer., Berrien Springs, MI.

13.

Cruise

R. J.

, & Wilkins

E. M.

(1980) STARS: Statistical Anxiety Rating Scale. Andrews Univer., Berrien Springs, MI.

14.

Dauphinee

T. L.

Schau

, & Stevens

J. J.

(1997) Survey of attitudes toward statistics. Structural Equation Modeling, 4, 129–141.

15.

Earley

M. A.

, & Mertler

C. A.

(2002) Deconstructing statistics anxiety. Paper presented at the Annual Meeting of the Mid-Western Educational Research Association, Columbus, OH.

16.

Fitzgerald

S. M.

, & Jurs

S. J.

(1996) A model predicting statistics achievement among graduate students. College Student Journal, 30, 361–366.

17.

Flora

D. B.

, & Curran

P. J.

(2004) An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9, 466–491.

18.

Hanna

Shevlin

, & Dempster

(2008) The structure of the Statistics Anxiety Rating Scale. Personality and Individual Differences, 45, 68–74.

19.

Harvey

A. L.

Plake

B. S.

, & Wise

S. L.

(1985) The validity of six beliefs about factors related to statistics achievement. Paper presented at the American Educational Research Association, Chicago, IL.

20.

L-T.

, & Bentler

P. M.

(1999) Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.

21.

Hui

C. H.

, & Triandis

H. C.

(1985) Measurement in cross-cultural psychology: A review and comparison of strategies. Journal of Cross-cultural Psychology, 16, 131–152.

22.

Hutchinson

S. R.

Raymond

K. J.

, & Black

K. R.

(2008) Factorial invariance of a campus climate measure across race, gender, and student classification. Journal of Diversity in Higher Education, 1, 235–250.

23.

IBM. (2010) PASW Statistics 18. Chicago: IBM SPSS.

24.

Jöreskog

, & Sörbom

(1982) Recent developments in structural equation modeling. Journal of Marketing Research, 19, 404–416.

25.

Jöreskog

, & Sörbom

(2007) LISREL 8. Lincolnwood, IL: Scientific Software International.

26.

Lubke

G. H.

, & Muthén

B. O.

(2004) Applying multigroup confirmatory factor models for continuous outcomes to Likert scale data complicates meaningful group comparisons. Structural Equation Modeling, 11, 514–534.

27.

Millsap

R. E.

, & Yun-Tein

(2004) Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39, 479–515.

28.

Mji

(2009) Differences in university students' attitudes and anxiety about statistics. Psychological Reports, 104, 737–744.

29.

Muthén

B. O.

, & Muthén

L. K.

(2010) Mplus. Los Angeles, CA: Author.

30.

Nunnally

J. C.

(1976) Psychometric theory. New York, NY: McGraw Hill.

31.

Onwuegbuzie

A. J.

(1998) The relationship between library anxiety and learning styles among graduate student. Library & Information Science Research, 20, 235–249.

32.

Onwuegbuzie

A. J.

(1999) Statistics anxiety among African American graduate students: An affective filter? Journal of Black Psychology, 25(2), 189–209.

33.

Onwuegbuzie

A. J.

(2004) Academic procrastination and statistics anxiety. Assessment of Evaluation in Higher Education, 29, 3–19.

34.

Onwuegbuzie

A. J.

Da Ros

, & Ryan

J. M.

(1997) The components of statistics anxiety. Focus on Learning Problems in Mathematics, 19, 11–35.

35.

Onwuegbuzie

A. J.

, & Daley

C. E.

(1999) Perfectionism and statistics anxiety. Personality and Individual Differences, 26, 1089–1102.

36.

Onwuegbuzie

A. J.

, & Leech

N. L.

(2003) Teaching statistics courses. Academic Exchange Quarterly, 7, 319–325.

37.

Onwuegbuzie

A. J.

, & Seaman

(1995) The effect of time and anxiety on statistics achievement. Journal of Experimental Psychology, 63, 115–124.

38.

Onwuegbuzie

A. J.

Slate

J. R.

Paterson

F. R.

, & Watson

M. H.

(2000) Factors associated with achievement in educational research courses. Research in Schools, 7, 53–65.

39.

Onwuegbuzie

A. J.

, & Wilson

V. A.

(2003) Statistics anxiety. Teaching in Higher Education, 8, 195–209.

40.

Royse

, & Romph

E. L.

(1992) Math anxiety. Journal of Social Work Education, 28, 270–277.

41.

Schacht

, & Stewart

B. J.

(1990) What's funny about statistics? Teaching Sociology, 18, 52–56.

42.

Steiger

J. H.

(1990) Structural model evaluation and modification. Multivariate Behavioral Research, 25, 173–180.

43.

Tucker

L. R.

, & Lewis

(1973) A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1–10.

44.

Vandenberg

R. J.

, & Lance

C. E.

(2000) A review and synthesis of the measurement invariance literature. Organizational Research Methods, 3, 4–70.

45.

Zeidner

(1990) Does test anxiety bias scholastic aptitude test performance by gender and sociocultural group? Journal of Genetical Psychology, 150, 175–185.

46.

Zeidner

(1991) Statistics and mathematics anxiety in social science students—some interesting parallels. British Journal of Educational Psychology, 61, 319–328.

Scale	Women	Men	Undergraduate	Graduate	Total
Scale	(n = 293)	(n = 130)	(n = 258)	(n = 165)	(N = 423)
Test and Class Anxiety	.87	.88	.87	.90	.88
Interpretation Anxiety	.88	.87	.89	.88	.88
Fear of Asking for Help	.84	.83	.82	.87	.84
Worth of Statistics	.95	.95	.94	.92	.95
Fear of Statistics Teacher	.80	.76	.80	.77	.79
Computational Self-concept	.87	.86	.86	.87	.86
Total Scale Score	.96	.96	.96	.96	.96