Abstract
In two studies (ns = 312 and 1,149) with 9- to 12-grade students in pre–International Baccalaureate (IB) and IB Diploma programs, we evaluated the reliability, factor structure, measurement invariance, and criterion-related validity of the scores from the School Attitude Assessment Survey–Revised (SAAS-R). Reliabilities of the five SAAS-R subscale scores were good (αs > .80) for pre-IB (Grades 9-10) and IB students (Grades 11-12). Study 1 model fit indices for the five-factor SAAS-R model from confirmatory factor analyses showed greater misfit than those previously reported by McCoach and Siegle. In contrast, Study 2 fit indices for the five-factor model with pre-IB and IB students were similar to values reported by McCoach and Siegle. Tests of measurement invariance in Study 2 using multigroup confirmatory factor analysis identified three items within the Motivation/Self-Regulation subscale that differed in their item intercepts (i.e., uniform differential item functioning) with pre-IB students endorsing these items more strongly compared with IB students. Based on these results along with evidence of criterion-related validity as reflected in the moderate statistical relations between the SAAS-R subscales and students’ GPAs, the SAAS-R shows promise as a research tool that can be used to examine the psychological factors associated with pre-IB and IB students’ academic achievement.
The School Attitude Assessment Survey–Revised (SAAS-R; McCoach & Siegle, 2003a), a revision of McCoach’s School Attitude Assessment Survey (2002), is a 35-item survey instrument designed to measure five psychological factors associated with students’ academic achievement. The five factors are Academic Self-Perceptions, Attitudes Toward Teachers, Attitudes Toward School, Goal Valuation, and Motivation/Self-Regulation. The SAAS-R, which uses a 7-point response scale (1 = strongly disagree to 7 = strongly agree), has been used with diverse groups of students, including academically gifted achievers and underachievers (e.g., Matthews & McBee, 2007), with the five factors serving various roles in research studies (e.g., outcome variable, predictor, mediator). A summary of studies that have used the SAAS-R is available online as supplemental materials (Appendix A; all supplemental materials are available online at http://gcq.sagepub.com/supplemental).
McCoach and Siegle (2003a), the developers of the measure, provided initial psychometric support of the SAAS-R using confirmatory factor analysis (CFA) and comparative analyses that tested differences on the means of the five SAAS-R factors between gifted high achievers and gifted underachievers. Suldo, Shaffer, and Shaunessy (2008) added to the psychometric support for the SAAS-R using exploratory and confirmatory factor analyses, correlational analyses of theoretically related variables (e.g., school climate, school satisfaction, academic self-efficacy, in-school conduct, time spent on homework), and comparative analyses that tested differences on the means of the five SAAS-R factors among high school students with low, average, and high achievement. Recently, Davie (2012) used CFA and provided support for the five-factor model underlying the SAAS-R, finding acceptable model fit (i.e., comparative fit index [CFI] = .94, Tucker–Lewis index [TLI] = .94, root mean square error of approximation [RMSEA] = .05, and standardized root mean square residual [SRMR] = .04) in a sample of 847 high school students (gifted high achievers, gifted underachievers, and non-gifted low achievers).
Based on these initial results, the SAAS-R has shown promise as a research tool for evaluating the psychological factors associated with students’ academic achievement. This tool is much needed as researchers aim to better understand the experiences of various student populations including high-achieving high school students in specialized programs such as the International Baccalaureate (IB) Diploma program. The IB program, developed in the late 1960s, is designed for high school juniors and seniors but schools often offer students in grades 9 and 10 a structured pre-IB curriculum (International Baccalaureate Organization [IBO], 2013a). The IB program is unique in its holistic approach to education and its emphasis on the well-rounded, civically minded, and globally conscious learner (IBO, 2013e). Like Advanced Placement classes and dual enrollment courses, the IB Diploma program is one of the accelerated programming options offered to gifted learners in secondary settings (Colangelo, Assouline, & Gross, 2004), though the IB program was not initially designed for this population exclusively. While there is a paucity of research documenting the effectiveness of the IB program in meeting gifted learners’ cognitive and affective needs, Hertberg-Davis and Callahan (2008) reported that many IB students felt both challenged and overwhelmed by the rigorous coursework.
The IB Diploma program has grown worldwide and in the United States. The Americas region of IB, including North, Central, and South America, offered approximately 800 programs in 2000 and more than 2,300 in 2012 (IBOb). In 2013, 800 IB Diploma programs were offered in the United States (IBO, 2013d); the largest concentration of IB Diploma programs is in the southeastern states, with the most IB Diploma programs offered in Florida (Perna et al., 2011). Many colleges recognize the IB Diploma and award college credits accordingly, while others award additional weight in college application decisions for IB diploma completers or by end-of-course exam scores. For instance, in Florida, students who graduate with an IB Diploma may begin college at public universities with sophomore standing (IBO, 2013c).
As researchers begin extending the use of the SAAS-R to various subgroups of high-achieving students, such as those in IB programs, it is critical that the psychometric properties of the scores from the SAAS-R are evaluated and reported when the instrument is used in different contexts with different populations. One important psychometric property of an instrument is its factor structure, which represents the theoretical dimensions underlying the instrument and the grouping of items that reflect these dimensions or factors. Evaluation of an instrument’s factor structure has played a central role in the construct validation process (DiStefano & Hess, 2005). Reliability (e.g., Cronbach’s alpha) is another important psychometric property that needs to be assessed. Finally, for the SAAS-R, evidence of criterion-related validity is particularly relevant because the instrument was designed to differentiate “academically able achievers from academically able underachievers” (McCoach & Siegle, 2003a, p. 425).
The American Educational Research Association (AERA), American Psychological Association (APA), and the National Council on Measurement in Education (NCME) collectively assert in the Standards for Educational and Psychological Testing (2014) that it is inappropriate to assume that measurement properties such as the factor structure, reliability, and criterion-related validity that were identified during the development of an instrument will necessarily generalize to other populations and contexts, and as a result call for a reexamination of measurement quality when an instrument is used in a new context or with different groups. This call is particularly important given that studies that have examined the psychometric properties of various psychological and educational measures have often found differences in score reliability (Vacha-Haase & Thompson, 2011) and factor structures (Sass, 2011) when measures are administered under different conditions, with these differences potentially compromising the researchers’ statistical analyses (e.g., multiple regression, ANOVA) in terms of attenuated relationships and biased parameter estimates.
Despite calls for more extensive reporting of psychometric information (e.g., see Kieffer, Reese, & Vacha-Haase’s, 2010, call in the context of giftedness research), many researchers have not examined the measurement properties of instruments when used with their research samples and instead have relied on prior results reported during the original development of the measure (Thompson & Vacha-Haase, 2000). In the 38 studies that have used the SAAS-R (see supplemental materials), only two (Davie, 2012; Suldo et al., 2008) examined the factor structure of the measure, and only Davie (2012) conducted a CFA at the item-level (Davie’s sample was 847 high school students at one school). Suldo et al. used exploratory factor analysis to examine the 35-item SAAS-R at the item level and then conducted a CFA of 15 item parcels created from the 35 items (e.g., from the seven Academic Self-Perceptions items, three composite scores or parcels were created by averaging groups of items consisting of three, two, and two Academic Self-Perceptions items) to evaluate the five-factor model underlying the SAAS-R. Interestingly, of the 38 studies that used the SAAS-R, 24 (63%) calculated Cronbach’s alpha reliabilities for the data collected in the study while the remaining studies either relied on the reliabilities reported by McCoach and Siegle (2003a) or did not mention reliability. The limited reliability reporting for the SAAS-R is consistent with Vacha-Haase and Thompson’s (2011) review of 12 years of reliability generalization studies, which found that 54.6% of the 12,994 primary studies that were part of their review did not mention reliability and among those that did mention reliability, 15.7% relied on reliability estimates provided in test manuals or prior articles. Warne, Lazo, Ramos, and Ritter (2012) found similar results in their review of statistical methods used in five gifted education research journals between 2006 and 2010. Warne et al. (2012) reported that only 53.3% of the quantitative and mixed methods articles provided reliability estimates for their own data (Cronbach’s alpha was the most widely reported reliability estimate). The limited reporting of measurement details goes counter to the guidelines provided by AERA (Duran et al., 2006) for reporting empirical research, which state that “sufficient detail should be provided to make clear that measures are being used appropriately, have suitable dependability (reliability) properties, and are interpreted properly for the groups studied” (p. 36).
Aligned with AERA’s Standards for Reporting on Empirical Social Science Research in AERA Publications (Duran et al., 2006) and the Standards for Educational and Psychological Testing (2014) that call for a reexamination of measurement quality when an instrument is used in a new context or with new samples of students, we used data from two studies to evaluate the factor structure, measurement invariance, reliabilities, and criterion-related validity of the scores from the SAAS-R. We examined the factor structure using CFA. In addition to examining the factor structure of the SAAS-R separately for the pre-IB and IB groups, we evaluated the equivalence or invariance of the psychometric properties of the SAAS-R (e.g., item intercepts, item factor loadings, item residuals) across the pre-IB and IB groups. Although the issue of measurement invariance has received increased attention outside the field of gifted education (Dimitrov, 2010; Millsap, 2011), Warne et al.’s (2012) review of articles in five gifted journals identified only five studies that conducted tests of invariance (e.g., Peters & Gentry, 2010). We located only one study that examined measurement invariance for the SAAS-R. McCoach and Siegle (2003b) focused on one subscale from the SAAS-R (Academic Self-Perceptions) and tested the equivalence of the item factor loadings for gifted high school students (Grades 9-12) and ninth-grade students from a general school population. They found statistically significant differences on two of the seven factor loadings.
Evaluation of measurement invariance is important because if the psychometric properties of items (e.g., item factor loadings, which represent the relation between an item and a factor; item intercepts, which represent the predicted item response when the latent variable is zero) are different across groups, comparisons of mean scores on the latent variables may not be valid. In addition to playing an important part in the validation process, evaluation of measurement invariance may also provide substantive information about how pre-IB and IB groups may interpret and conceptualize the attitudinal items on the SAAS-R. Evidence of very different factor structures for these groups of students may suggest that the meaning of the items is fundamentally different between these groups. Analyses at the individual item level can help identify subtle differences between these groups that might not be seen if analyses focused only on the subscale means of the SAAS-R. These statistical analyses may provide a unique window into the educational experiences of pre-IB and IB students that go beyond simply comparing mean differences on the subscales (e.g., ANOVA).
Last, we evaluated the reliability of the five SAAS-R subscale scores using Cronbach’s alpha, and for criterion-related validity, we examined the relationships between the five SAAS-R factors and students’ grade point averages (GPAs) obtained from school records (details about these analyses are provided in the statistical analysis section). All analyses were conducted in two samples (Studies 1 and 2) of 11th- and 12th-grade high school students in IB programs and in two samples (Studies 1 and 2) of 9th and 10th graders in a pre-IB curriculum offered at IB Diploma-granting schools.
Method
This research on the SAAS-R is part of a larger project examining stress and coping of students pursuing accelerated high school curricula (see Shaunessy-Dedrick, Suldo, Roth, & Fefer, 2014; Suldo, Dedrick, Shaunessy-Dedrick, Fefer, & Ferron, 2014; Suldo, Dedrick, Shaunessy-Dedrick, Roth, & Ferron, 2014). The methods and results from two studies of the SAAS-R from the larger project are reported below.
Participants
Study 1
Participants consisted of 312 students (161 pre-IB 9th and 10th graders, and 151 IB 11th and 12th graders) from three public high schools recruited from three school districts in one state in the southeastern part of the United States. Students identified as IB for the current study had to be officially admitted by the school to the IB program (which, in these schools, includes pre-IB students in Grades 9 and 10, and IB Diploma students in Grades 11 and 12). Consideration for admission to the pre-IB program was based on competitive applications ranked in part according to statewide assessment scores and GPA in Grades 7 and 8. Only students who had received written parental permission and had provided signed assent participated in this study. Sample sizes of student participants from the three schools were 85, 102, and 125. The distribution of IB and pre-IB students did not differ by school, χ2(2, N = 312) = 4.29, p = .12.
The sample was primarily female (61.2%) and White (55.3%). Twelve percent of the students were eligible for free or reduced price lunch. Seventy-five percent of the fathers and 73.3% of the mothers of the participants had completed college or beyond. There were no statistically significant differences (p > .05) between pre-IB and IB students on any of the demographic characteristics in Table 1.
Demographic Characteristics of the High School Students in Pre-IB and IB (International Baccalaureate) Programs for Studies 1 and 2.
Note. Percentages may not sum to 100 due to rounding. For Studies 1 and 2, pre-IB and IB students did not differ significantly on the demographic characteristics included in the table based on the results of chi-square analyses (p > .01).
Study 2
Participants consisted of 1,149 students (589 pre-IB 9th and 10th graders, and 560 IB 11th and 12th graders) from 10 public high schools recruited from five school districts in one state in the southeastern United States. Two of the schools that were used in Study 1 participated in Study 2. Different samples of students were used in Study 2 from these two schools (i.e., no students participated in both Studies 1 and 2). As in Study 1, students were considered for IB program admission based on prior achievement. Of the 10 IB sites, eight required a pre-IB curriculum and two other sites (within the same school district) offered the IB Middle Years Program (IBO, 2013f) for students in Grades 9 and 10. Only students who had received written parental permission and had provided signed assent participated. Sample sizes of student participants from the 10 schools ranged from 78 to 169 (M = 115). There was a statistically significant difference in the distribution of IB and pre-IB students by school, χ2(9, N = 1,149) = 43.05, p < .001. The largest difference was between one school in which the percentage of participants who were in IB was 64% (36% in pre-IB) and a second school in which 37% were in IB (63% in pre-IB).
Participants were primarily female (59.4%) and White (49.2%). Sixty-five percent of the fathers and 71.0% of the mothers of the participants had completed college or beyond. There were no statistically significant differences (p > .01) between pre-IB and IB students on any of the demographic variables in Table 1.
Procedure
Procedures used to recruit schools and student participants were similar across the two studies. On obtaining approval from the school districts and the university institutional review board, each participating school sent parent consent forms via students served in two classrooms of IB students per grade level; schools, in consultation with cooperating teachers, selected the classes. Four of the five districts’ research policies allowed for student incentives for participation (movie ticket passes or $10 iTunes gift cards). A research team of graduate students led by two faculty members who served as principal investigator (PI) and co-PI for the project collected data in the spring of 2011 and spring 2012 (for Studies 1 and 2, respectively). All research assistants received training to ensure standardization across data collection. Prior to questionnaire completion, researchers read the student assent form aloud and students completed the assent form before proceeding to the questionnaire. Student participants completed a 14-page questionnaire for Study 1 and a 16-page questionnaire for Study 2, which included demographic items (e.g., gender, grade level), the SAAS-R, and several psychological measures, administered in groups of approximately 10 to 120 (typical was 50-60) during the school day. Completion of the survey packet took approximately 45 minutes. All measures had been piloted with IB students from other schools prior to administration.
Students’ unweighted cumulative high school GPAs were obtained directly from the school districts (Study 1) or calculated by the research team based on transcripts from one semester (Spring 2012) obtained from the school districts (Study 2). GPAs were recorded on a 4-point scale.
School Attitude Assessment Survey–Revised
The SAAS-R is a 35-item self-report questionnaire that measures five factors: Academic Self-Perceptions (7 items), Attitudes Toward Teachers (7 items), Attitudes Toward School (5 items), Goal Valuation (6 items), and Motivation/Self-Regulation (10 items). The response scale ranges from 1 (strongly disagree) to 7 (strongly agree). McCoach and Siegle (2003a) used an iterative process to extend and revise the original School Attitude Assessment Survey (McCoach, 2002). Confirmatory factor analyses of the final 35-item version of the SAAS-R in a combined sample of 645 students (146 11th and 12th graders attending a summer program for talented students, 200 9th grade students in an urban high school, and 299 high school students of varying achievement levels) indicated acceptable fit of the five-factor model, χ2(550, N = 537) = 1581.7 (CFI = .91, TLI = .92, RMSEA = .06, SRMR = .06). Correlations of the five factors ranged from .27 (Academic Self-Perceptions and Attitudes Toward School) to .74 (Goal Valuation and Motivation/Self-Regulation).
For Study 1, we used the original wording on the SAAS-R. For Study 2, we inserted the word IB before the words classes or teachers for the seven items in the Attitudes Toward Teachers subscale (e.g., “My IB classes are interesting”; “I relate well to my IB teachers”). We made these changes based on feedback from students in Study 1 regarding some ambiguity surrounding whether the term teacher referred to those who taught courses in their IB program or other non-IB courses available in their school (e.g., electives, additional college-level courses).
Statistical Analysis
CFA was used to evaluate the five-factor model underlying the SAAS-R separately for pre-IB and IB students in each study. Confirmatory factor analyses were based on the matrix of Pearson product moment correlations of the 35 SAAS-R items and were conducted in Mplus 7.11 (Muthén & Muthén, 1998-2012) using robust maximum likelihood estimation in which standard errors and the chi-square test statistic are robust to nonnormality (see online supplemental materials, Appendix B, Tables B1 and B2, for measures of skewness and kurtosis for the 35 items of the SAAS-R for Studies 1 and 2, respectively). Full-information maximum likelihood estimation within Mplus was used to handle missing data. The amount of missing data was minimal in each study. In Study 1, 94.4% of the pre-IB students had complete data on the 35-item SAAS-R (missing data per item ranged from 0% to <1%), and 95% of the IB students had complete data (missing data per item ranged from 0% to 1.3%). In Study 2, 92% of the pre-IB students had complete data (missing data for the items ranged from 0% to <1%); 95% of the IB students had complete data on the SAAS-R with missing data ranging from 0% to 1.1% for the items. One factor loading within each factor (reference indicator) was fixed to one to statistically identify the model.
Fit of the models within each group (i.e., pre-IB and IB) was evaluated using the Satorra–Bentler (SB) scaled (mean-adjusted) chi-square test, Bentler’s (1992) normed CFI, TLI, SRMR, and the RMSEA. Hu and Bentler’s (1999) cutoff values of greater than or equal to .95 for the CFI and TLI, SRMR less than .08, and RMSEA less than .06 were used as general indicators of acceptable fit of the models. Evaluation of model fit is complex because of the multiple factors that may affect the fit indices (e.g., sample size, model complexity, magnitude of the correlations of the baseline model; see Sun, 2005), and although guidelines have been presented in the literature, many have questioned universal cutoff values and have proposed the use of multiple indices and human judgment that combines theoretical and statistical criteria (Chen, Curran, Bollen, Kirby, & Paxton, 2008). For example, Brown (2006) has suggested that values of the CFI and TLI in the .90 to .95 range may indicate acceptable fit if other fit measures provide evidence of good model fit.
Measurement invariance of the five-factor SAAS-R model between the pre-IB and IB groups was tested within each study using multigroup confirmatory factor analysis. Multigroup confirmatory factor analysis involves testing a series of hierarchically ordered models of increasing restrictiveness. The first model tested was the least restrictive invariance model in which the same five factors underlying the SAAS-R were associated with the same items across the pre-IB and IB groups with no equality constraints imposed on the factor loadings, item intercepts, item unique variances, or factor variances and covariances across groups. This model (Model 1) provides a test of configural invariance. Metric invariance (Model 2) addresses whether or not the unstandardized factor loadings are the same across the groups. If the factor loadings representing the relationships between the items and the latent variables (e.g., Academic Self-Perceptions) are different across groups, this suggests that the items may have different meanings across groups. This lack of equivalence, indicating the absence of measurement invariance, is often referred to as differential item functioning (DIF), specifically nonuniform DIF (i.e., item response differences between groups vary across the levels of the latent variable). Next, we explored the presence of scalar invariance (Model 3). Scalar invariance was evaluated by examining group differences in the item intercepts (i.e., the extent to which students endorse an item). A lack of equivalence in the intercepts is an indication of uniform DIF (i.e., after equating the groups on a latent variable, one group’s item responses differ in the same direction from the other group’s responses across all levels of the latent variable). After conducting tests of configural, metric, and scalar invariance, we examined invariance of the item uniqueness (residuals) parameters (Model 4). Finally, we examined structural invariance in terms of the equality of the factor variances (Model 5) and factor covariances (Model 6).
The strategy used to examine the various levels of measurement invariance was to evaluate the SB scaled chi-square change (ΔSB χ2) for statistical significance relative to the change in degrees of freedom (Δdf) for the models being compared. A scaled chi-square difference is required because “a difference between two SB χ2 values is not distributed as χ2” (Brown, 2006, p. 385). These tests were supplemented by comparing the changes in the CFI, TLI, RMSEA, and SRMR to the guidelines presented by Chen (2007). When the changes in these fit indices for the more restrictive model met Chen’s guidelines (ΔCFI < .01, ΔTLI < .01, ΔRMSEA < .015, and ΔSRMR < .03, for the factor loadings, and ΔCFI < .01, ΔTLI < .01, ΔRMSEA < .015, and ΔSRMR < .01, for the item intercepts), we concluded that the hypothesis of invariance was tenable (i.e., do not reject the null hypothesis of equality).
Following the examination of the factor structure (dimensionality) of the SAAS-R, we calculated Cronbach’s alphas for each of the SAAS-R subscales by IB group and study. Finally, we used structural equation modeling to evaluate criterion-related validity by examining the relationships between the five SAAS-R factors (predictor variables) and the outcome variable of students’ GPA obtained from students’ records. As an additional analysis to evaluate the criterion-related validity of the SAAS-R scores, we followed the approach used by McCoach and Siegle (2003a), which involved comparing the observed scores on the five SAAS-R factors for academically able achievers and underachievers.
Results
Descriptive Analyses
Study 1
Based on the descriptive statistics for the 35 SAAS-R items by IB group (pre-IB and IB), the two most strongly endorsed items by the pre-IB group were Items 15 (M = 6.65, SD = 0.82, on a 7-point scale) and 18 (M = 6.64, SD = 0.69). Both items are within the Goal Valuation factor. For the IB group the same two items were the most strongly endorsed (M = 6.65, SD = 0.74, and M = 6.54, SD = 0.90, respectively). Several of the items within the Academic Self-Perceptions and Goal Valuation factors exhibited severe departures from normality (e.g., Item 15 within Goals).
Descriptive comparisons of the pre-IB and IB groups on the individual items revealed that the effect sizes (Cohen’s d) ranged from −0.32 (Item 16 within Attitudes Toward Teachers), where IB students more strongly endorsed this item to an effect of 0.48 (Item 23 within Attitudes Toward School), with pre-IB students more strongly endorsing this item. The mean effect size for the 35 items was 0.01 (median = 0.01) with only 3 of the 35 items having an effect size (absolute value) of 0.35 or greater (Item 7, 0.40; Item 19, 0.39; and Item 23, 0.48). All three of these items were in the Attitudes Toward School subscale and favored pre-IB students; see Table B1 in Appendix B in online supplemental materials).
Study 2
The pattern of descriptive results for Study 2 was similar to that of Study 1. The two most strongly endorsed items by the pre-IB and IB groups were two Goal Valuation Items (15 and 18). Effect sizes (Cohen’s d) ranged from −0.12 (Item 5 within Academic Self-Perceptions), where IB students more strongly endorsed this item to 0.34 (Item 6 within Attitudes Toward School) with pre-IB students more strongly endorsing this item. The mean effect size for the 35 items was 0.10 (median = 0.11), which was comparable to that in Study 1.
Confirmatory Factor Analysis
Study 1
Model fit indices for the confirmatory factor analyses of the five-factor measurement model are presented in Table 2 with a summary of parameter estimates (standardized loadings, intercorrelations between factors) presented in Tables 3 and 4 (see online supplemental materials, Appendix C, for all standardized and unstandardized parameter estimates for all models). Model fit indices for both the pre-IB and IB groups were generally below the acceptable values and suggested greater misfit than what McCoach and Siegle (2003a) found with their sample of 537 high school students. Within the current sample, greater model misfit was present for the IB group, SBχ2(550, N = 151) = 1043.84, p < .0001 (CFI = .83, TLI = .82, RMSEA = .08, SRMR = .09).
Fit Indices for the School Attitude Assessment Survey-Revised Using Robust Maximum Likelihood Estimation by Study and International Baccalaureate (IB) Group Along With Values Reported by McCoach and Siegle (2003a) and Davie (2012).
Note. CFI = comparative fit index; TLI= Tucker–Lewis index; RMSEA = root mean square error of approximation; SRMR = standardized root mean square residual.
Numbers in parentheses represent the 90% confidence interval for the RMSEA.
Confidence interval not reported for the RMSEA.
Standardized Factor Pattern Coefficients (Loadings) for the Five-Factor School Attitude Assessment Survey–Revised (SAAS-R) Using Robust Maximum Likelihood Estimation by International Baccalaureate (IB) Group and Study Along With Values Reported by McCoach and Siegle (2003a).
Note. ASP = Academic Self-Perceptions; ATT = Attitudes Toward Teachers; ATS = Attitudes Toward School; Goals = Goal Valuation; MOT/S-R = Motivation/Self-Regulation.
Interfactor Correlations for School Attitude Assessment Survey–Revised by Study and International Baccalaureate (IB) Group Along With Values Reported by McCoach and Siegle (2003a).
Note. ASP = Academic Self-Perceptions; ATT = Attitudes Toward Teachers; ATS = Attitudes Toward School; Goals = Goal Valuation; MOT/S-R = Motivation/Self-Regulation.
Pre-IB (n = 161 and 589 for Studies 1 and 2, respectively).
IB (n = 151 and 560 for Studies 1 and 2, respectively).
McCoach and Siegle (2003a); n = 537.
All the standardized pattern coefficients (factor loadings) for the SAAS-R items were statistically significantly different from zero (p < .05). Average loadings by subscale for the pre-IB group ranged from .59 for the Motivation/Self-Regulation subscale to .81 for the Attitudes Toward School subscale. For the IB group the average loadings by subscale ranged from .65 for the Attitudes Toward Teachers subscale to .85 for the Attitudes Toward School subscale.
The correlations between the five factors were all statistically significant (p < .05; see Table 4). The largest correlation was between Goals and Motivation/Self-Regulation for the pre-IB (.74) and IB (.66) groups. McCoach and Siegle’s largest correlation also was between Goals and Motivation/Self-Regulation (r = .74). McCoach and Siegle’s correlations between the five factors were generally stronger (M = .48, median = .41) compared with those in the IB group (M = .37, median = .34) and the pre-IB group (M = .40, median = .32).
Modification indices, which are estimations of how much the chi-square would change if a parameter that has been fixed to zero was freely estimated, were used to examine sources of misfit in the model. Based on the modification indices, a major source of misfit for both the pre-IB and IB groups involved covariances between the measurement errors for pairs of items. For the pre-IB and IB groups there were 15 and 26 pairs of error covariance terms, respectively, that reflected substantial misfit. In theory, the measurement errors for the items should be random and therefore not covary with each other; however, when there are similarities in item content and language for pairs of items, the item errors may covary. In view of the problems associated with exploratory, post hoc model modification (e.g., capitalization on chance; overfitting models that do not replicate; potential biasing effect on other parameter estimates; Brown, 2006), we only added error covariance terms that were conceptually meaningful: Items 2 and 5 (Academic Self-Perceptions), Items 1 and 34 (Attitudes Toward Teachers), and Items 30 and 33 (Motivation/Self-Regulation). These item pairs were contained within the same SASS-R factor, had very similar wording, and were replicated in Study 2. Even with these three additional terms added to the models, the models did not reach an acceptable level of fit in either group (a precondition for evaluating measurement invariance) and therefore we did not conduct invariance testing within Study 1 (see Study 2 for invariance testing).
Study 2
Model fit indices for the confirmatory factor analyses of the five-factor measurement model are presented in Table 2 with parameter estimates (standardized loadings, intercorrelations between factors) presented in Tables 3 and 4 (see Appendix C in online supplemental materials for all standardized and unstandardized parameter estimates for all models). Model fit indices for the IB group were equal to or slightly better than those in the pre-IB group. Values for the CFI (.92), RMSEA (.05), and SRMR (.06) for the IB group also were slightly better than those reported by McCoach and Siegle (2003a). The TLI for the IB group (.91) was slightly lower than McCoach and Siegle’s value (.92).
All the standardized pattern coefficients (factor loadings) for the SAAS-R items were significantly different from zero (p < .05). Average loadings by subscale for the pre-IB group ranged from .69 for the Motivation/Self-Regulation subscale to .86 for the Attitudes Toward School subscale; for the IB group the average loadings by subscale ranged from .73 for the Academic Self-Perceptions and Motivation/Self-Regulation subscales to .88 for the Attitudes Toward School subscale.
The correlations between the five factors were all statistically significant (p < .05; see Table 4). The largest correlation was between Goals and Motivation/Self-Regulation for the pre-IB (.63) and IB (.68) groups. McCoach and Siegle’s largest correlation also was for Goals and Motivation/Self-Regulation (r = .74). McCoach and Siegle’s correlations (M = .48, median = .41) were similar to those of the pre-IB group (M = .48, median = .47) but slightly stronger than those of the IB group (M = .42, median = .35).
We examined the modification indices for the pre-IB and IB models and determined that a major source of misfit involved covariances between errors for three pairs of items (Items 2 and 5, Items 1 and 34, and Items 30 and 33). The wording for these three pairs of items was very similar, a factor that often contributes to error covariances (Brown, 2006). We included these three error covariance terms into the model because these modifications were conceptually meaningful. Inclusion of these error covariances produced a statistically significant improvement in the fit of the models (ΔSB χ2 = 120.05, Δdf = 3, p < .0001, for pre-IB; ΔSB χ2 = 73.46, Δdf = 3, p < .0001, for IB; see Table 5 for the fit statistics for the pre-IB and IB groups). Minimal changes in the parameter estimates (standardized loadings, intercorrelations between factors) were observed for the modified models.
Invariance Tests for the School Attitude Assessment Survey-Revised by IB Group (Study 2).
Note. ASP = Academic Self-Perceptions; ATT = Attitudes Toward Teachers; ATS = Attitudes Toward School; Goals = Goal Valuation; MOT/S-R = Motivation/Self-Regulation. SB χ2 = Satorra–Bentler scaled chi-square.
See Table 2 for fit statistics for the pre-IB and IB groups.
Three pairs of error covariances (2 and 5; 1 and 34; 30 and 33) were added to the model within each group.
Change in fit represents improved fit.
p < .001.
After fitting the five-factor model with the three error covariance terms for the pre-IB and IB groups separately, we used multigroup CFA to evaluate measurement invariance (i.e., equality of the item factor loadings, item intercepts, and uniquenesses) and structural invariance (i.e., equality of factor variances and factor covariances). Table 5 summarizes the sequence of invariance tests that were conducted. Overall, there was support for configural invariance (equal form, Model 1B), SB χ2 (1,094, N = 1,149) = 2554.54, p < .0001 (CFI = .93, TLI = .92, RMSEA = .05, SRMR = .05). In the next stage of invariance testing, the results for metric invariance (equal factor loadings) were mixed with the change in chi-square (ΔSB χ2 = 60.72, Δdf = 30, p < .001) indicating statistically significant differences in the factor loadings between the groups, whereas the changes in the alternative measures of fit indicated that the hypothesis of equal loadings was tenable (e.g., ΔCFI = .001). Based on recommendations by Cheung and Rensvold (1999), we followed up the overall metric invariance test to examine invariance of the factor loadings at the individual factor level using the change in chi-square (ΔSB χ2) relative to the change in degrees of freedom (Δdf) for the more restrictive model. Table 5 summarizes the invariance tests for the factor loadings conducted separately for each of the five SAAS-R factors. There were statistically significant differences for the factor loadings for two of the factors: Attitudes Toward School (ΔSB χ2 = 20.30, Δdf = 4, p < .001) and Motivation/Self-Regulation (ΔSB χ2 = 27.52, Δdf = 9, p < .01). Follow-up comparisons for the individual items within these two factors using a Bonferroni correction to adjust the alpha level for the multiple comparisons (.05/13 = .004) did not yield any statistically significant differences in the item loadings based on the change in chi-square (p > .004). Because no specific item loadings were found to be statistically significantly different across groups, all loadings were constrained to be equal for the remaining invariance tests.
Next, we evaluated the equality of the item intercepts or scalar invariance (i.e., the constraint of equal intercepts was added to the constraint of equal loadings). Similar to metric invariance, the results were mixed. The change in chi-square indicated that there were statistically significant differences in the intercepts between the groups, whereas the changes in the fit indices suggested that the hypothesis of equal intercepts was tenable. Invariance tests using the change in chi-square at the individual factor level revealed statistically significant differences for the item intercepts for two of the factors: Academic Self-Perceptions (ΔSB χ2 = 13.26, Δdf = 6, p < .05) and Motivation/Self-Regulation (ΔSB χ2 = 34.71, Δdf = 9, p < .0001; see Table 5). Follow-up comparisons for the individual items within these two factors using a Bonferroni correction (.05/15 = .003) identified three items within the Motivation/Self-Regulation factor (Items 4, 24, 26; see online supplemental materials, Appendix B, for item content) where the intercepts for the pre-IB group (4.83, 5.92, 5.19) were significantly higher (p < .003) than the intercepts for the IB group (4.51, 5.70, 4.90). The largest difference was 0.32 on the 7-point response scale and was for Item 4, “I check my assignments before I turn them in.”
For the remaining invariance tests, the intercepts for Items 4, 24, and 26 were allowed to vary across groups. We tested the equality of the item residual variances (uniquenesses) and found no significant differences between the pre-IB and IB groups (ΔSB χ2 = 49.80, Δdf = 35, p >.05). A test of the equality of the three error covariances between groups also was not statistically significant (ΔSB χ2 = 1.64, Δdf = 3, p > .05).
In the last stage of testing, we evaluated structural invariance by testing the equality of the variances of the five SAAS-R factors and the covariances of these factors. Changes in the chi-square indicated statistically significant differences in the factor variances (ΔSB χ2 = 15.56, Δdf = 5, p < .01), whereas the changes in the fit indices suggested that the hypothesis of equal factor variances was tenable. When we conducted individual tests of the factor variances using a Bonferroni correction (.05/5 = .01), we found statistically significant differences in the variance of the Attitudes Toward School factor (ΔSB χ2 = 8.11, Δdf = 1, p < .01) with greater variability for the IB students on this factor. We next tested the equality of the covariances between the factors and found significant differences between groups based on the change in chi-square (ΔSB χ2 = 26.39, Δdf = 10, p < .01); minimal change in the alternative fit indices was observed suggesting that the hypothesis of equal factor covariances was tenable. When we conducted follow-up comparisons of the 10 factor covariances using a Bonferroni correction (.05/10 = .005); we found no statistically significant differences between the pre-IB and IB groups on the factor covariances (all ps > .02).
In summarizing the tests of measurement invariance, we determined that at the item level the unstandardized factor loadings did not differ significantly between the pre-IB and IB groups. We identified three items within the Motivation/Self-Regulation factor where the intercepts for the pre-IB group were significantly higher than those from the IB group. Variability in the Attitudes Toward School factor was greater for the IB group.
Cronbach’s Alpha Reliability
Study 1
Table 6 contains Cronbach’s alpha reliabilities for the pre-IB and IB groups for Studies 1 and 2, along with the values reported by McCoach and Siegle (2003a) for their sample of 645 high school students of varying achievement levels. Reliabilities for the scores for the five factors in Study 1 were similar for the pre-IB and IB groups and ranged from .83 (Attitudes Toward Teachers for the IB group) to .93 (Attitudes Toward School for the IB group). McCoach and Siegle’s reported reliabilities ranged from .86 to .91.
Cronbach’s Alpha Reliabilities for the Five Factors of the School Attitude Assessment Survey–Revised (SAAS-R) by Study and International Baccalaureate (IB) Group Along With Values Reported by McCoach and Siegle (2003a).
Note. ASP = Academic Self-Perceptions; ATT = Attitudes Toward Teachers; ATS = Attitudes Toward School; Goals = Goal Valuation; MOT/S-R = Motivation/Self-Regulation.
Study 2
Cronbach’s alpha reliabilities in Study 2 were generally higher than those in Study 1 and ranged from .87 (Academic Self-Perceptions) to .93 (Attitudes Toward School) for the pre-IB groups and from .87 (Academic Self-Perceptions) to .94 (Attitudes Toward School) for the IB group.
Relationships Between the Five Factors from the SAAS-R and Students’ Grade Point Average
Study 1
Structural equation modeling was used to examine the relationships between the five SAAS-R factors (predictor variables) and the outcome variable of students’ cumulative GPAs obtained from students’ records. These relationships, which were used to evaluate the criterion-related validity of the SAAS-R scores, were obtained by adding GPA to the five-factor modified CFA model (i.e., included three pairs of error covariances). Pre-IB students’ GPAs ranged from 2.38 to 4.00 (M = 3.57, SD = 0.40). In interpreting the results of these models it is important to keep in mind that the CFA models in Study 1 had less than acceptable fit.
For the pre-IB group, Academic Self-Perceptions (r = .33), Goal Valuation (r = .32), and Motivation/Self-Regulation (r = .41) had statistically significant (p < .001) positive correlations with GPA. The five factors explained 23.2% (p < .001) of the variance in students’ GPAs. When controlling for the other SAAS-R factors in the model, the unstandardized regression coefficients for Academic Self-Perceptions (b = 0.13, p < .05) and Motivation/Self-Regulation (b = 0.13, p < .05) were statistically significant (see Table 7).
Structural Equation Modeling Results of the Five Factors of the School Attitude Assessment Survey–Revised (SAAS-R) Predicting Students’ GPAs by Study and International Baccalaureate Group.
Note. ASP = Academic Self-Perceptions; ATT = Attitudes Toward Teachers; ATS = Attitudes Toward School; Goals = Goal Valuation; MOT/S-R = Motivation/Self-Regulation. b = unstandardized regression coefficient. SE = standard error for the unstandardized regression coefficient. All models included three pairs of error covariances (2 and 5; 1 and 34; 30 and 33).
p < .05. **p < .01. ***p < .001.
IB students’ GPAs ranged from 2.47 to 4.00 (M = 3.53, SD = 0.37). For the IB students, all five SAAS-R factors had statistically significant positive correlations with GPA. The five factors explained 23.9% (p < .001) of the variance in students’ GPAs. After controlling for the other SAAS-R factors in the model, the unstandardized regression coefficients for Academic Self-Perceptions (b = 0.19, p < .01) and Motivation/Self-Regulation (b = 0.10, p < .01) were statistically significant (see Table 7).
As an additional analysis to evaluate the criterion-related validity of the SAAS-R scores, we followed the approach used by McCoach and Siegle (2003a), which involved comparing the observed scores on the SAAS-R factors for academically able achievers and underachievers. Able achievers in Study 1 were defined using McCoach and Siegle’s criterion as those students with a GPA of 3.75 or higher, and underachievers were defined as those with a GPA of 2.99 or lower (McCoach and Siegle, 2003a, used 2.5 or lower in their study for underachievers, but because of the smaller sample sizes in Study 1, we used a less stringent cutoff to have a sufficient number of students in the underachiever group). Independent t tests for the pre-IB students revealed statistically significant differences between the groups for the Academic Self-Perceptions, Goal Valuation, and Motivation/Self-Regulation factors (see Table 8 for results of the independent t tests along with the values for Cohen’s d effect size). For the IB students, the two achievement groups differed significantly on all factors except for Attitudes Toward School with effect sizes ranging from 0.35 (Attitudes Toward School) to 1.52 (Academic Self-Perceptions). The average of the five effect sizes for the pre-IB group (M = 0.53, SD = 0.40, range = −0.02 to 0.98) was lower than McCoach and Siegle’s (2003a) average effect size (M = 0.89, SD = 0.36, range = 0.41 to 1.57); the IB group’s average effect size was slightly higher (M = 0.93, SD = 0.47) than the average of McCoach and Siegle’s (2003a) effects.
Effect Sizes and Independent Samples t Tests for the Five Factors From the School Attitude Assessment Survey–Revised (SAAS-R) Between High Achievers and Underachievers for Pre-IB and IB Students (Study 1).
Note. ASP = Academic Self-Perceptions; ATT = Attitudes Toward Teachers; ATS = Attitudes Toward School; Goals = Goal Valuation; MOT/S-R = Motivation/Self-Regulation. Response scale ranged from 1 (strongly disagree) to 7 (strongly agree). High achievement in Study 1 was defined as a GPA of 3.75 or higher; underachievement was defined as a GPA of 2.99 or lower. Effect sizes reported by McCoach and Siegle (2003a) were 0.46 (ASP), 0.78 (ATT), 0.67 (ATS), 1.23 (Goals), and 1.29 (MOT/S-R). McCoach and Siegle defined the achievers as gifted students “in the top 10% of their class or had at least a 3.75 GPA. Gifted underachievers were in the bottom half of their high school class or had a GPA at or below 2.5” (p. 425).
Study 2
As in Study 1, GPA was added to the five-factor CFA model (three error covariances were included) for the SAAS-R. Pre-IB students’ semester GPAs ranged from 0.33 to 4.00 (M = 3.32, SD = 0.65) and IB students’ semester GPAs ranged from 1.00 to 4.00 (M = 3.28, SD = 0.56). Correlations between the five SAAS-R factors and students’ GPAs were statistically significant (ps < .001) for all factors for both pre-IB and IB students. Attitudes Toward School showed the weakest relation to students’ GPAs for the pre-IB and IB students (rs = .16 and .15, respectively). Motivation/Self-Regulation had the strongest relation to students’ GPAs for both groups (rs = .42 and .41, respectively). In the structural equation model, the five factors explained 19.3% and 19.4% (p < .001) of the variance in pre-IB and IB students’ GPAs, respectively. After controlling for the other SAAS-R factors in the model, the unstandardized regression coefficients for Academic Self-Perceptions and Motivation/Self-Regulation factors were statistically significant (see Table 7).
Similar to Study 1, we used the approach by McCoach and Siegle (2003a) to compare the high achievers with the underachievers as a source of criterion-related validity evidence for the five SAAS-R scores. As in Study 1 high achievers were defined as those students with GPAs of 3.75 or higher, but underachievers were defined as those with a GPA of 2.50 or lower (McCoach & Siegle, 2003a, used 2.5 or lower in their study). Independent t tests for the pre-IB students and IB groups revealed statistically significant differences between the groups for all five factors (see Table 9 for results of the independent t tests along with the values for Cohen’s d effect size). The averages of the five effect sizes for the pre-IB (M = 0.89, SD = 0.31, range = 0.60 to 1.32) and IB groups (M = 0.92, SD = 0.44) were similar to McCoach and Siegle’s (2003a) average effect (M = 0.89, SD = 0.36, range = 0.41 to 1.57).
Effect Sizes and Independent Samples t Tests for the Five Factors From the School Attitude Assessment Survey–Revised (SAAS-R) Between High Achievers and Underachievers for Pre-IB and IB Students (Study 2).
Note. ASP = Academic Self-Perceptions; ATT = Attitudes Toward Teachers; ATS = Attitudes Toward School; Goals = Goal Valuation; MOT/S-R = Motivation/Self-Regulation. Response scale ranged from 1 (strongly disagree) to 7 (strongly agree). High achievement in Study 2 was defined as a GPA of 3.75 or higher; underachievement was defined as a GPA of 2.5 or lower. Effect sizes reported by McCoach and Siegle (2003a) were 0.46 (ASP), 0.78 (ATT), 0.67 (ATS), 1.23 (Goals), and 1.29 (MOT/S-R). McCoach and Siegle defined the achievers as gifted students “in the top 10% of their class or had at least a 3.75 GPA. Gifted underachievers were in the bottom half of their high school class or had a GPA at or below 2.5” (p. 425).
Discussion
As researchers continue to explore the psychological factors associated with students’ academic achievement, there is a need for measurement instruments that produce reliable and valid scores for various student populations including those in specialized programs such as the IB Diploma program. The SAAS-R represents an important addition to the measurement tools available to researchers and its importance is reflected in its increasing use in the field. Despite this increasing use there has not been a corresponding increase in researchers’ efforts to evaluate the psychometric properties of the SAAS-R when used with their own data. Given that problems with score reliability and validity can compromise statistical analyses such as multiple regression and ANOVA (Vacha-Haase & Thompson, 2011), there is a need to better understand and evaluate the psychometric properties of measures like the SAAS-R. To address this need, we evaluated the reliability, factor structure, measurement invariance, and criterion-related validity of the scores from the SAAS-R for two samples of students in pre-IB and IB programs. In interpreting the results of the present two studies, it is important to note that the schools and students were from one southeastern state and were not randomly selected. Only students who had parent permission and who completed signed assent forms participated. Data also were collected at one point in time and, therefore, it is not possible to evaluate the stability of the results over time.
Overall the reliabilities of the scores of the five SAAS-R factors were good, with all values from the pre-IB and IB groups across the two studies greater than .80. The importance of estimating reliability for one’s own scores and reporting these values, rather than relying on the values in a test manual, is underscored by the finding of considerable variation in the reliability estimates across the various groups and conditions in the current two studies. The greatest variation in the reliability estimates across the subscales was for Attitudes Toward Teachers with Cronbach’s alphas ranging from .83 (IB in Study 1) to .92 (pre-IB and IB groups in Study 2). This is the only subscale that was modified in Study 2 based on students’ feedback from Study 1 that there was some confusion about the frame of reference for answering questions about their teachers. The modification involved clarifying that student respondents should consider their IB teachers and their IB classes when answering the seven questions within the Attitudes Toward Teachers subscale (e.g., I relate well to my IB teachers). Clarifying the frame of reference for answering the SAAS-R items (the first item on the SAAS-R was one of the seven items that was changed and may have provided the proper frame of reference for all of the SAAS-R items) may have resulted in the improved reliabilities and model fit exhibited in Study 2. It should be noted that in addition to changing the wording for seven items in Study 2, the sample was much larger in Study 2 (n = 1,149 vs. n = 312) and came from a more heterogeneous group of schools; the diversity of schools in Study 2 resulted in slightly more variability in the students’ demographic characteristics (see Table 1). Additional research is needed to determine if this improved reliability and model fit with the wording change are replicated with different samples of IB students. These results also suggest that ongoing efforts to use qualitative approaches like cognitive interviewing (Willis, 1999) are recommended to understand how students comprehend and interpret the content of questionnaire items.
The variability in the reliability estimates for the five subscales is even greater when one considers the values from studies that have used the SAAS-R (see online supplemental materials, Appendix A; e.g., values for the Attitudes Toward Teachers ranged from .68 to .92). Reporting information about score reliability is critical for interpreting the results of statistical analyses (e.g., low reliability may attenuate statistical relations between variables). The reported reliabilities also can be used by researchers conducting reliability generalization studies that are designed to describe and explain the variability in score reliability across groups and conditions with the goal of improving the measurement process. In the present review of studies that used the SAAS-R, 63% calculated Cronbach’s alpha reliabilities for the data collected in their studies, which was higher than what Warne et al. (2012) found (53.3%) in articles published recently in gifted education journals.
While information about the reliability of scores from an instrument is important, this information does not test whether the factor structure underlying the instrument is consistent with newly collected data. Factor structures may change over time and vary across different subgroups of students, and therefore it is critical to evaluate the extent to which the measurement model underlying the instrument fits the data that have been collected. To date, there have been a limited number of factor analyses of the SAAS-R. In Study 1, the fit indices of the five-factor model for the SAAS-R were generally below the acceptable values, suggesting greater misfit than McCoach and Siegle’s (2003a) model. In contrast, in Study 2 the fit of the same five-factor model to the data from a slightly modified version of the SAAS-R for the pre-IB and IB students was similar to McCoach and Siegle’s (2003a) model. The variation in model fit when an instrument is used with different samples under different conditions underscores the importance of conducting and reporting the results of CFAs so that researchers have a better sense of the robustness of the factor structure and can gauge the meaningfulness of the statistical comparisons using the scores from the instrument. In addition, reporting the results of CFAs can provide the original test developers and other researchers with information about the strengths and weaknesses of the measure and potential ways of improving the measure. In the present study, there were areas of model misfit that were consistent across groups and studies. For example, there was misfit that involved covariation of the errors of similarly worded pairs of items (e.g., 2. “I am intelligent”/5. “I am smart in school”; 30. “I spend a lot of time on my schoolwork”/33.”I put a lot of effort into my schoolwork”; 1. “My classes are interesting”/34. “I like my classes”). If these sources of misfit (covariances for these three pairs of items) are replicated in future research, one solution would be to remove one of the two similarly worded items in each pair. For exploratory purposes, we removed one item from each pair of items that exhibited correlated errors to determine the effect on reliability and model fit for the Study 2 data. Based on discussion within the research team, we decided to remove Items 1, 2, and 30. The largest drop in reliability was for Academic Self-Perceptions for the IB group, which decreased from .87 to .84. Overall model fit was acceptable for the pre-IB and IB groups, respectively: SBχ2(454, N = 589) = 1115.32, p < .0001 (CFI = .92, TLI = .92, RMSEA = .05, SRMR = .05) and SBχ2(454, N = 560) = 1057.24, p < .0001 (CFI = .93, TLI = .93, RMSEA = .05, SRMR = .05). These results suggest that it would be reasonable to drop these three items if the measure was used with IB students. If researchers decide not to drop these items, it is important that they examine the SAAS-R using a latent variable framework (rather than observed variables as reflected in subscale scores) so that measurement error and correlated errors can be taken into account when examining relationships between the SAAS-R scores and relevant outcomes.
As a relatively new instrument the equivalence (invariance) of the psychometric properties of the SAAS-R (i.e., item factor loadings, item intercepts, residuals for items) has not been examined for different groups of students. The present study’s tests of measurement invariance served two purposes. First, these tests evaluated if the psychometric properties of the SAAS-R items were functioning similarly across the pre-IB and IB groups as a way of evaluating potential biases in the instrument’s scores. Second, these invariance tests provided another approach to exploring how two groups of students (pre-IB and IB) at different developmental stages (9th-12th grade) and with different curricular experiences perceive themselves, their teachers, and schools. Overall, the results from the measurement invariance tests showed that the structure of students’ attitudes and the relationship of the items to the five factors underlying the SAAS-R were similar for pre-IB and IB students. The biggest difference between the groups showed up on the intercepts of three items in the Motivation/Self-Regulation factor where, after controlling for students’ level of Motivation/Self-Regulation, pre-IB students more strongly endorsed these items (i.e., uniform DIF). These items (“I check my assignments before I turn them in”; “I complete my schoolwork regularly”; “I am organized about my schoolwork”) may represent a secondary dimension or factor within the Motivation/Self-Regulation factor that reflects student conscientiousness. While there is little agreement on what steps should be taken if non-invariance is detected, one option for handling DIF is to delete the non-invariant items from the analyses (Cheung & Rensvold, 1999; Millsap, 2011; Sass, 2011). When these three items were deleted, the pre-IB group was 0.15 standard deviations higher on the latent mean for Motivation/Self-Regulation compared with the IB group. A second option for handling DIF is to retain these items and use caution in comparing the latent means. In the present study, the differences in the item intercepts were relatively small so the intercepts were retained and allowed to be freely estimated in each group. Retaining these three items and comparing the latent means for Motivation/Self-Regulation produced nearly identical results. Additional measurement invariance studies are needed to determine if these results will be replicated.
Although the value of measurement invariance studies is widely recognized, there are challenges associated with implementing these studies. With multiple groups, larger sample sizes are needed and with instruments like the SAAS-R that contain many items measuring several factors, the number and complexity of statistical comparisons increase substantially. The fact that there is not one universally accepted approach to conducting invariance tests or for evaluating the practical significance of differences on the various item characteristics (e.g., intercepts) adds to the complexity.
Finally, support for the criterion-related validity of the scores from the SAAS-R subscales was provided by the significant, positive relationships between these factors and students’ GPAs. Consistent with McCoach and Siegle’s (2003a) finding of a large effect for the Motivation/Self-Regulation subscale between gifted high achievers and underachievers, the present two studies found that one of the strongest relations was between this subscale and students’ GPAs. Academic Self-Perceptions and Goal Valuation, two additional self-directed beliefs reflecting students’ levels of cognitive engagement with school, also showed moderate to strong relations with students’ GPAs. The SAAS-R factors assessing more other-directed beliefs reflecting affective engagement, Attitudes Toward School and Attitudes Toward Teachers, tended to yield smaller associations.
Overall the results from the current two studies highlight the importance of evaluating the psychometric properties of a measure each time it is administered. Results from Study 2 focusing on the reliability and validity of scores from a modified SAAS-R provide strong support for the use of the SAAS-R as a research tool to examine the psychological factors associated with pre-IB and IB students’ academic achievement. Results from Study 1, which were based on a smaller and more homogeneous sample of pre-IB and IB students, were not as strong in terms of reliability and model fit. Additional research, using national samples with larger and more diverse groups of pre-IB and IB students, is needed to determine if these results will be replicated. Ongoing validation work also is needed to evaluate how the five factors underlying the SAAS-R relate to other theoretically relevant variables such as students’ academic engagement and self-efficacy. For researchers using the SAAS-R with other populations and under different conditions, it is critical that reliability and validity of the scores are evaluated for the researchers’ own data and reported in the literature so that variation in the psychometric properties of the SAAS-R can be described, and factors associated with this variation (e.g., sampling error) can be explored.
Footnotes
Authors’ Note
The opinions expressed are those of the authors and do not represent views of the Institute of Education Sciences or the U.S. Department of Education, the funding agency.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported in this article was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A100911 to The University of South Florida.
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
