Cross-Level Measurement Invariance in School and Classroom Environment Surveys

Abstract

Measures of classroom and school environments are central to policy efforts that assess school and teacher quality. These measures are often formed by aggregating individual survey responses to form group-level measures, and assume an invariant measurement model holds across the individual and group levels. This article explores the tenability of this assumption by applying multilevel factor analysis to two well-known surveys: the Working Conditions Survey, which assesses school environments, and the Tripod Classroom Environment Survey. The examples illustrate the consequences of using common factor analytic methods that assume cross-level invariance, and demonstrate how distorted perceptions of factorial structure can influence inferences about the relationship between working conditions and teacher mobility.

Keywords

educational policy school/teacher effectiveness learning environments factor analysis research methodology

As school districts strive to create comprehensive programs to appraise teaching quality and teacher performance, measures of school and classroom environments have become increasingly important. Information about classroom and school environments is used in a variety of policy contexts and for a variety of purposes. First, it is used for teacher and school evaluation. Mihaly, McCaffrey, Staiger, and Lockwood (2013) described how such measures can be used to “support decisions for tenure, retention, and compensation” (p. 5). Memphis bases 5% of a teacher evaluation on student surveys. By the fall of 2013, 10% of teacher evaluation in Chicago public schools will be based on student surveys (Butrymowicz, 2012). In New York City, teacher and parent surveys about the school environment can account for up to 15% of a school’s score on its annual Progress Report (New York City School Survey, n.d.). Schools with low progress report scores risk being closed.

Second, information about school and classroom environments is often used to predict important outcomes, such as student achievement and teacher retention. A report from the Measuring Effective Teaching (MET) project showed that classroom environment data predict teacher’s value added (VAM) scores (Bill & Melinda Gates Foundation, 2010). Loeb, Darling-Hammond, and Luczak (2005) discussed how separating the effects of student demographic factors from the effects of working conditions is an important step in developing policy interventions to improve teacher retention. Better understanding how targeted improvements in working conditions may improve retention is particularly critical for schools serving high-poverty, low-achieving student populations, where teacher turnover rates may be as high as 50% (Ingersoll, 2001).

Data about school and classroom environments are often collected by administering surveys to teachers and students who function as “raters” of the environments in which they work and study. Individual ratings are aggregated to form group-level variables, and inferences are then made about group qualities (Bliese, 2000; Chan, 1998).

While the use of this type of aggregated group-level variable is intuitive and appealing, the validity of inferences based on these aggregated variables entails a complex and nuanced set of assumptions. Substantively, it is assumed that the aggregates refer to the same constructs as the individual responses. Statistically, it is assumed that there is cross-level invariance in the measurement model (e.g., Bliese, 2000); that is, there is invariance in the measurement structure across the individual (within-group) level and the between-group level.

Cross-level measurement invariance imposes strict constraints on the measurement model that may not be met in empirical data (Zyphur, Kaplan, & Christian, 2008). There is a long history of methodological research on this issue (e.g., Cronbach, 1976; Harnqvist, 1978; Longford & Muthén, 1992; Reise, Ventura, Nuechterlein, & Kim, 2005). In much of the applied education research and policy literature, however, cross-level invariance is assumed, rather than explored. This article aims to make two distinct contributions to the education policy literature: (a) provide two clear, policy-relevant illustrations of the importance of correctly modeling between-classroom and between-school variables, and (b) demonstrate how inferences about the impact of school working conditions on a teacher’s planned movement may differ depending on whether or how cross-level noninvariance is modeled. In doing so, this article expands on recent work (D’Haenens, Van Damme, & Onghena, 2010; Marsh et al., 2012; Marsh et al., 2009; Zyphur et al., 2008) that calls attention to the importance of finding empirical evidence to support the cross-level invariance assumption.

Theoretical Framework

Many surveys of school and classroom environments assume a specific measurement model. At the within-group level, there is assumed to be measurement error among the survey items, so that variance among the items is caused by unobserved (latent) differences among individual students or teachers.

At the between-group level, it is assumed that students or teachers are objective raters of the environments in which they study or work, and that variance between raters within the same classroom or school is attributable to sampling error and represents “noise.” However, averaging over individual raters, variance between schools represents actual variance in the quality of working conditions, or variance between classrooms represents true variance in the quality of classrooms.

Under these assumptions, it is appropriate to use a latent trait model where the group qualities themselves are conceived of as effects-indicated latent variables (Bollen & Lennox, 1991; Marsh et al., 2009). In an effects-indicated model, it is assumed that a latent variable causes variance in the indicators. In the case of school or classroom environment surveys, it is assumed that unobserved, latent aspects of the school or classroom environment cause variance between individual schools or classrooms. This is sometimes referred to as reflective aggregation (e.g., Marsh et al., 2009).

It is important to distinguish an effects-indicated model from another possible model, a so-called composite model where indicators are formed by making linear combinations of indicator variables (e.g., Bollen & Bauldry, 2011). An example of this sort of composite indicator would be socioeconomic status (SES). A set of indicators of SES can be used as a weighted combination to describe an individual student’s SES. It is not a claim that an individual student has a latent SES that causes variance among the indicators. Individual student SES indices can then be aggregated to form a school-level variable (e.g., Raudenbush & Bryk, 2002). Composite variables of this type implicitly impose cross-level measurement invariance, because there is only a single indicator for each individual (Marsh et al., 2009). Survey-based indicators of school and classroom environments are rarely conceived of as composite variables in this way, and so this type of model and this type of cross-level measurement invariance are not the focus of this article.

Statistical Background

In the reflective aggregation model typically underlying school and classroom environment surveys, the assumption that group means refer to the same constructs as individual responses implies a two-level measurement model with cross-level factorial invariance (Marsh et al., 2009). Specifically, the factor structure is assumed to be configurally and metrically invariant (Meredith, 1993)—meaning that at both levels the same number of factors are found, the same items load onto the same factors, and the strength of association between these items and the underlying factors is the same. The basic factor model (e.g., Bollen, 1989) can be expressed as

y = Λ η + ε,

where y is a p-variate vector of observed scores measuring η. η is a m × 1 vector of latent variable scores on m factors, assumed to be normally distributed with 0 expectation. $Λ$ is a p × m matrix of factor loadings. ε represents a p × 1 vector of residuals, which are assumed to be identically and independently distributed.

Factor analytic procedures based on this model assume that the observations are independent. When individuals are associated with groups (teachers with schools, students with classrooms), this independence assumption is likely to be violated. There are several models that account for the fact that observations are nested in groups (e.g., Lee, 1990; McDonald & Goldstein, 1989; Muthén, 1991, 1994; Rabe-Hesketh, Skrondal, & Zheng, 2007). Several of these formulations are based on a score decomposition model articulated by Cronbach and Webb (1975):

y_{i j} = y_{j} + (y_{i j} - y_{j}),

where y_ij is a p-variate vector of observed scores for individual i in group j. y_ij can be decomposed into independent between-groups (y_j) and within-groups (y_ij – y_j ) components. Using this decomposition, a multilevel factor model can be expressed:

y_{i j} = Λ_{B} η_{B} + Λ_{W} η_{W} + u_{j} + ε_{i j} .

There are two random effects here—a between-group random effect u_j and a within-group random effect ε_ij. There are also two sets of factor loadings ( $Λ_{B}$ and $Λ_{W}$ ) and two latent variables (η_B and η_W). If there is measurement invariance and the factor loadings are invariant across levels (i.e., $Λ_{B} = Λ_{W} = Λ$ ), Equation 3 can be rewritten as

\begin{matrix} y_{i j} = Λ (η_{B} + η_{W}) + u_{j} + ε_{i j} \\ = {Λη}_{i j} + u_{j} + ε_{i j} . \end{matrix}

In this way, the latent trait of individual i in group j can be expressed as a sum of independent between- and within-latent components: $η_{i j} = η_{B} + η_{W}$ With cross-level measurement invariance, the latent variable can be conceived of as a decomposition, just as the observed variable in Equation 2 (Marsh et al., 2009). However, in the more general case where $Λ_{B} \neq Λ_{W}$ , these simplifications are not possible.

The Assumption of Cross-Level Invariancein Education Policy Literature

Methodological research on cross-level measurement invariance has a long tradition. Cronbach (1976) and Harnqvist (1978) showed that factorial structures could vary across levels. Cronbach (1976) cautioned that a researcher might need,

one set of factors for his between-groups theory and another set of factors for his within-groups theory. To be sure, he may find that the two sets of constructs coincide, but that is a possibility to be evaluated, not assumed. (p. 203)

Longford and Muthén (1992) stated that, “the focus of a substantive analysis may be on the within-group, the between-group factor structure, on the structure at both levels, or on the comparison of the factor structures” (p. 582). Longford and Muthén also noted that between-groups phenomenon may be completely unrelated to within-groups phenomenon. Other more recent methodologically focused work (e.g., Reise et al., 2005; Zyphur et al., 2008) has shown that even if the number of factors is the same across levels, item loadings may not be the same across levels, and items may load on different factors across levels of analysis.

There are far fewer examples in the applied education and education policy literature that explicitly investigate the assumption of cross-level measurement invariance (notable exceptions include D’Haenens et al., 2010; Holfve-Sabel & Gustafsson, 2005; Toland & De Ayala, 2005). As testament to the fact that this gap between theory and practice persists, Marsh et al. (2012) noted that, “despite the clear resolution of this methodological issue for more than a quarter of a century, it is still an area of ongoing confusion in the educational literature” (p. 111). D’Haenens et al. (2010) claimed that they were “unaware of any educational effectiveness studies applying [multilevel exploratory factor analysis]” (p. 212) to investigate potential differences in factorial structure across levels of analysis.

In fact, it is still far more common to find researchers using methods that assume cross-level invariance than it is to find researchers using modeling approaches that are appropriate for the data and research questions at hand. This is particularly true for the analysis of school and classroom environment variables that are constructed by aggregating survey responses. Two of these methods, (a) factor analysis on the disaggregated covariance matrix and (b) factor analysis on the unweighted group means are discussed in more detail in the following section, and several policy-relevant recent examples are provided.

Cross-Level Invariance Implied by Single-Level Factor Analyses

Single-level factor analyses that do not model the between- and within-factor structures of hierarchically structured data de facto impose invariance constraints on the factor structure (Zyphur et al., 2008). This is because when a single-level factor analysis is conducted, either on the disaggregated responses or on the group means, only one $Λ$ matrix is estimated, and the η matrix can take only one form.

Perhaps the most common single-level approach used in school climate research is the factor analysis of the total disaggregated covariance or disaggregated correlation matrix. Validation studies of the South Carolina School Climate survey (DiStefano et al., 2008), the Working Conditions Survey (WCS; Moir, 2009), and the student survey of the classroom environment included as part of Georgia’s Race to the Top grant application (Balch, 2012) all find validity evidence for inferences regarding aggregated school and classroom environment variables based on the results of exploratory factor analysis (EFA) conducted on the disaggregated covariance or correlation matrices. There are also many examples from the policy literature on school and classroom environments. Ladd (2011) examined the relationship between teacher working conditions and teacher retention with working conditions variables that were derived from an EFA on the disaggregated correlation matrix. Ryan and Patrick (2001) used a similar approach to investigate the relationship between classroom environment and student motivation and engagement. For situations where individuals are nested in groups, conducting factor analyses on this matrix conflates within and between sources of variance, can bias parameter estimates (Preacher, Zyphur, & Zhang, 2010), and can lead to substantively misleading inferences about relationships between indicators, or about relationships with external variables (Reise et al., 2005).

Another commonly used single-level approach is to conduct a single-level factor analysis on the covariance or correlation matrix of the unweighted group means. Hoy and Clover (1986) used this approach to develop a measure of elementary school climate. There are two issues with factor analyses based on this matrix. First, each group is given the same weight, regardless of the number of individuals in that group. Second, the elements of this matrix also reflect between- and within-variance sources (Muthén, 1994) and can lead to misleading inferences about relationships between indicators, or about relationships with external variables.

Implications for Assuming Cross-Level Invariance in Policy and Practice

One of the most pervasive uses of factor analysis in policy research is to justify the formation of linear composites. This practice is sometimes called rank reduction, and is described in many sources (e.g., Alwin, 1973; Bollen & Lennox, 1991; Cronbach, 1976). Note that the linear composite that results from rank reduction is distinct from a composite of the sort described in Bollen and Bauldry (2011) and referenced above. In the case of rank reduction, a linear composite is used as a proxy for a latent variable. It is still an underlying assumption that the composite has “conceptual unity” (Bollen & Bauldry, 2011, p. 4), and that variance in the indicators is caused by a common underlying latent variable.

In studies of school and classroom environments, the rank reduction process often takes one of two forms: (a) Unit-weighted linear composites are formed based on the results of a factor analysis, and these individual scores are then averaged together to form a school- or classroom-level variable (e.g., Balch, 2012; Ladd, 2011); and (b) factor scores are formed based on the factor analysis, and these factor scores are aggregated to the group level (e.g., DiStefano et al., 2008). In each of these examples, rank reduction is justified based on the results of factor analyses conducted on the disaggregated covariance or correlation matrix.

However, by assuming cross-level measurement invariance in this way, there is a strong possibility that the approach used in these studies could result in the formation of unsupported linear composites, and could result in obscured or spurious information about prediction and correlation among policy-relevant constructs.

For example, this approach could result in identifying the wrong number of factors, or in associating items with the wrong factors altogether. It may be, for example, that items in a survey of the classroom environment distinguish two within-class latent variables, such as student engagement and instructional rigor. But it is also conceivable that at the classroom level, classrooms that are engaging are also rigorous, and that classrooms vary in these two traits fairly equally (Muthén & Asparouhov, 2011). Thus, at the classroom level, there is only one broadly defined academic factor. Researchers and policy makers who assume cross-level invariance risk assuming they are working with two distinct dimensions of classroom quality, when in fact, they are not.

One gap in the literature is a clear illustration of how violations of the assumption of cross-level measurement invariance would influence policy conclusions. This changes the issue of invariance from one of methodological interest to one of policy importance. Although Marsh et al. (2009) called attention to the importance of testing the cross-level invariance assumption, the empirical example used throughout the article does not illustrate the consequences that may arise from assuming cross-level invariance. And, while Zyphur et al. (2008) presented a case where there is evidence for factorial noninvariance (different patterns of loadings), that study did not present any cases where the number of factors differs across levels, and it did not illustrate how differences in the number of factors across levels may influence policy-relevant considerations, such as the determination of relationships with external variables.

The purpose of the present study is (a) to illustrate cross-level measurement noninvariance using two empirical examples, and (b) to demonstrate the possible consequences that may arise for policy and practice when invariance is assumed. The first example comes from the WCS (Moir, 2009), which is a survey administered to measure aspects of school working conditions. The second comes from the Tripod Classroom Environment Survey (Ferguson, 2010), which is administered to measure aspects of classroom environment. These two surveys provide particularly salient examples for several reasons. First, both surveys are widely used to inform school policy decisions in the United States. Second, both surveys have an aggregated unit-of-analysis. For the WCS, the unit-of-analysis is the school; for the Tripod, the unit-of-analysis is the classroom. Finally, in both surveys, it is an explicit measurement claim that variance between raters (teachers or students, respectively) constitutes error variance, and that variance between schools or classrooms represents true variance in environmental qualities. Using these two surveys, the following research questions were addressed:

Research Question 1: What is the multilevel factorial structure of the WCS? What is the multilevel factorial structure of the Tripod Classroom Environment Survey? Is there empirical evidence of cross-level measurement invariance in either case?

Research Question 2: What are the consequences of ignoring the multilevel structure and conducting a factor analysis on the disaggregated data? Would the resulting inferences about policy-relevant dimensions of school and classroom environments differ from those based on a full multilevel analysis?

Research Question 3: What are the consequences of ignoring the multilevel structure and conducting a factor analysis on the unweighted group means? Would the resulting inferences about policy-relevant dimensions of school and classroom environments differ from those based on a full multilevel analysis?

Research Question 4: How do inferences about the importance of school leadership in the intended departure of teachers differ depending on whether or how cross-level noninvariance is modeled?

Method

Sample and Data Sources

The WCS

This survey was designed to assess teaching conditions at the school level. The sample data comes from the 2008 survey, administered to teachers and principals at schools in K–12 public and charter schools across the state of North Carolina. For this analysis, only surveys completed by teachers were considered, and because of some evidence that factorial structure may differ across levels of schooling (Ladd, 2011), only elementary schools were used in this analysis. This resulted in a data set with 42,155 individual teacher cases in 1,267 schools. Although the average school size is approximately 33 teachers, schools in this analysis range from 6 teachers to 75 teachers. This analysis focuses on a set of 36 survey items (Table 1) that were designed to measure five theoretical dimensions of the school environment: Time (adequacy of time for planning and teaching), Distributed Leadership (similar to the “Expanded Roles” factor defined by Ladd (2011), this factor includes teacher involvement in setting school and classroom-level policy, including involvement in decisions about curriculum, instruction, professional development, and other school policies), School Leadership (includes support for teachers, shared vision, and a trusting environment), Professional Development (sufficiency of funds and resources to support professional development), and Facilities and Resources (availability of resources, safety and cleanliness of facilities). There are two scales used in the survey. One has 5 points (1 = strongly disagree and 5 = strongly agree) and is used for every item in the Time, Leadership, Professional Development, and Facilities and Resources dimensions. The other scale also has 5 points (1 = no role at all and 5 = the primary role) and is used in the Distributed Leadership items.

Table 1

Descriptive Statistics for the Working Conditions Survey

Name	Item text	M	SD	ICC
TIME1^a	Reasonable class size	3.42	1.36	.15
TIME2^a	Time to collaborate	3.34	1.36	.15
TIME3^a	Protected from interfering duties	3.31	1.34	.10
TIME4^a	Minimal paperwork	3.27	1.30	.14
TIME5^a	Sufficient noninstructional time	2.86	1.37	.15
FACR1^b	Access to instructional materials	3.95	1.15	.13
FACR2^b	Access to instructional technology	4.01	1.18	.15
FACR3^b	Access to communications technology	4.05	1.14	.17
FACR4^b	Access to office equipment	3.82	1.26	.19
FACR5^b	Sufficient Internet	4.08	1.11	.14
FACR6^b	Adequate professional space	3.86	1.22	.13
FACR7^b	Clean environment	4.02	1.18	.26
FACR8^b	Safe environment	4.31	0.98	.18
DLDR1^c	Role in selecting resources	3.38	1.02	.14
DLDR2^c	Role in devising teaching technique	3.64	1.02	.13
DLDR3^c	Role in setting grading standards	3.26	1.14	.07
DLDR4^c	Role in determining professional development	2.65	1.08	.12
DLDR5^c	Role in hiring new teachers	1.91	1.02	.27
DLDR6^c	Role in student discipline	2.93	1.13	.10
DLDR7^c	Role in budget decisions	2.10	1.04	.19
DLDR8^c	Role in school improvement planning	3.22	1.06	.13
SLDR1^d	Atmosphere of trust and respect	3.64	1.29	.22
SLDR2^d	Clear expectations	3.99	1.14	.20
SLDR3^d	Student conduct rules enforced	3.62	1.32	.23
SLDR4^d	Support for teachers disciplinary efforts	3.83	1.24	.22
SLDR5^d	Leadership supports teachers	3.87	1.21	.20
SLDR6^d	Faculty and staff have shared vision	3.93	1.08	.17
SLDR7^d	Evaluations handled appropriately	4.18	1.06	.17
SLDR8^d	Evaluations handled consistently	4.12	1.09	.17
SLDR9^d	Useful feedback	4.09	1.09	.16
SLDR10^d	Leadership concerns addressed	3.67	1.18	.18
PROF1^e	Sufficient funds for professional development	3.46	1.29	.15
PROF2^e	Opportunities for learning	3.77	1.13	.11
PROF3^e	Adequate time for professional development	3.67	1.17	.11
PROF4^e	Sufficient instructional technology training	3.52	1.22	.12
PROF5^e	Professional development is useful	3.80	1.07	.10

Note. ICC = intraclass correlation.

Time.

Facilities and resources.

Distributed leadership.

School leadership.

Professional development.

A large set of teacher-level covariates were derived from this survey, and were used in the analysis of intended teacher departure. These include indicators of a teacher’s race, gender teaching experience, education (i.e., whether a teacher has an advanced degree), whether a teacher was trained through an alternative certification pathway, and whether the teacher is certified by the National Board for Professional Teaching Standards. In addition, the outcome variable, indicating a teacher’s intent to leave a school, was constructed from this survey. The item reads, “Which best describes your professional intentions in the next 2 years?” and the answer options are (1 = continue teaching at my current school, 2 = continue teaching in my current district, 3 = continue teaching in this state, 4 = leave teaching for another position in education, 5 = leave teaching for personal reasons, 6 = retire from teaching, and 7 = leave teaching for another reason). The item was recoded into a binary variable, with Option 1 recoded as 0, and Options 2 to 7 recoded as 1. Thus, the outcome variable indicated whether a teacher intended to leave a school in the next 2 years.

Several school-level variables were constructed from statewide administrative data. These include an indicator of whether a school was in one of the four largest metropolitan areas in the state, indicators of school demographics (percentage of students who are Black, percentage of students who are Hispanic, percentage of students on free or reduced lunch, school-mean teacher experience, student teacher ratio, and an indicator of whether a school hit its target for expected academic growth). A full list of covariates used in the analysis (and descriptive statistics) is available in Table 2.

Table 2

Descriptive Statistics for the Tripod Survey

Item	Item text	M	SD	ICC
CAPT1^a	This class does not keep my attention—I get bored.	3.37	1.31	.10
CAPT2^a	My teacher makes learning enjoyable.	3.61	1.27	.18
CAPT3^a	My teacher makes lessons interesting.	3.64	1.17	.24
CAPT4^a	I like the ways we learn in this class.	3.75	1.22	.15
CARE1^b	My teacher in this class makes me feel that he or she really cares about me.	3.62	1.13	.14
CARE2^b	My teacher seems to know if something is bothering me.	3.17	1.25	.11
CARE3^b	My teacher really tries to understand how students feel about things.	3.69	1.11	.15
CHAL1^c	My teacher asks questions to be sure we are following along when he or she is teaching.	4.11	1.14	.07
CHAL2^c	My teacher asks students to explain more about answers they give.	3.78	1.02	.10
CHAL3^c	In this class, my teacher accepts nothing less than our full effort.	3.93	1.01	.10
CHAL4^c	My teacher doesn’t let people give up when the work gets hard.	3.93	1.08	.12
CHAL5^c	My teacher wants us to use our thinking skills, not just memorize things.	4.07	1.01	.09
CHAL6^c	My teacher wants me to explain my answers—Why I think what I think.	3.92	1.01	.11
CHAL7^c	In this class, we learn a lot almost every day.	3.91	1.03	.15
CHAL8^c	In this class, we learn to correct our mistakes.	3.96	1.01	.12
CLAR1^d	If you don’t understand something, my teacher explains it another way.	4.00	1.03	.14
CLAR2^d	My teacher knows when the class understands, and when we do not.	3.74	1.01	.11
CLAR3^d	When he or she is teaching us, my teacher thinks we understand even when we don’t.	3.34	1.18	.06
CLAR4^d	My teacher has several good ways to explain each topic that we cover in this class.	3.89	1.04	.17
CLAR5^d	My teacher explains difficult things clearly.	3.81	1.05	.14
CONF1^e	My teacher wants us to share our thoughts.	3.85	1.16	.12
CONF2^e	Students get to decide how activities are done in this class.	2.64	1.16	.11
CONF3^e	My teacher gives us time to explain our ideas.	3.70	1.04	.13
CONF4^e	Students speak up and share their ideas about class work.	3.72	1.10	.13
CONF5^e	My teacher respects my ideas and suggestions.	4.03	1.01	.11
CONS1^f	My teacher takes the time to summarize what we learn each day.	3.58	1.12	.12
CONS2^f	My teacher checks to make sure we understand what he or she is teaching us.	4.07	0.98	.13
CONS3^f	We get helpful comments to let us know what we did wrong on assignments.	3.86	1.06	.12
CONS4^f	The comments that I get on my work in this class help me understand how to improve.	3.80	1.09	.13
CONT1^g	Student behavior in this class is under control.	3.72	1.23	.16
CONT2^g	I hate the way that students behave in this class.	3.55	1.27	.16
CONT3^g	Student behavior in this class makes the teacher angry.	3.30	1.28	.20
CONT4^g	Student behavior in this class is a problem	3.62	1.26	.21
CONT5^g	My classmates behave the way my teacher wants them to.	3.43	1.14	.24
CONT6^g	Students in this class treat the teacher with respect.	3.85	1.11	.26
CONT7^g	Our class stays busy and doesn’t waste time.	3.69	1.11	.22

Note. ICC = intraclass correlation.

Captivating.

Caring.

Challenging.

Clarifying.

Conferring.

Consolidating.

Controlling.

The Tripod Classroom Environment Survey

The Tripod Survey assessment is designed to assess seven dimensions of teaching practice, often referred to as the “Seven C’s”: Caring, Captivating, Conferring, Clarifying, Challenging, Controlling, Consolidating. This version of the Tripod Survey contains 36 items (Table 3) and was administered in an urban school district in California in 2010. All items have 5-point scales (1 = totally untrue and 5 = totally true). The sample used in this analysis contained 6,386 students in 349 classrooms. The average classroom size was approximately 18 students, and the range was from 5 to 33 students. For illustrative purposes, the Tripod Survey is treated as a two-level survey in this analysis, and independence between classrooms within the same school is assumed. In actuality, however, classrooms are clustered within schools, and ignoring this clustering may also influence the inferences about the measurement model.

Table 3

Covariates Used in Regression Analysis: Working Conditions Survey

Category	M	SD	ICC
Outcome
Leave	0.35	.48	.08
Teacher characteristics
Black teacher	0.10	.31	.19
Hispanic or Latino teacher	0.01	.11	.03
Other teacher	0.03	.18	.10
Male teacher	0.06	.24	.01
Teacher experience
2 to 3 years	0.12	.33	.02
4 to 6 years	0.15	.36	.01
7 to 10 years	0.16	.37	.01
11 to 20 years	0.26	.44	.01
>20 years	0.24	.43	.03
Has a graduate degree	0.33	.47	.02
Nationally Board Certified	0.12	.32	.03
Trained in a master’s program	0.18	.39	.02
Alternative training program	0.06	.24	.02
School characteristics
% Black or African American students	0.27	.23	—
% Hispanic or Latino students	0.13	.12	—
% free/reduced lunch	0.56	.25	—
% teachers 0 to 3 years experience	0.24	.11	—
% teachers 4 to 10 years experience	0.29	.09	—
% teachers 11+ years experience	0.47	.13	—
AYP met in 2007	0.36	.48	—
Growth met in 2007	0.24	.43	—
Log school membership	6.32	.40	—
New administrator	0.25	.43	—
Geographic indicators
Wake County LEA	0.11	.31	—
Guilford County LEA	0.05	.21	—
Cumberland County LEA	0.04	.19	—
Winston-Salem/Forsyth LEA	0.05	.21	—
Charlotte-Mecklenburg LEA	0.07	.26	—
Sample size (teachers/schools)	42,155/1,267

Note. ICC = intraclass correlation; AYP = adequate yearly progress; LEA = local education agency.

Analytic Approach

Multilevel exploratory factor analysis (MEFA)

To address the first research question, this article follows the MEFA procedure described by Van de Vijver and Poortinga (2002) and Reise et al. (2005), which is based on a procedure first outlined by Muthén (1994). (a) The item intraclass correlations (ICCs) are inspected to determine the amount of variance at the between-group level to assess whether a multilevel factor analysis is warranted. Muthén noted that if all ICCs are close to zero, a multilevel factor analysis may not be warranted. (b) Maximum likelihood estimates of the within-group correlation matrix and between-group level correlation matrix were obtained using Mplus version 6.11 (Muthén & Muthén, 2010). (c) EFA was then conducted on these two matrices separately. Factors were extracted using minres factor analysis. Oblique (oblimin) rotation was used so that the factors were free to correlate.¹

In conventional EFA, there is a long and rich literature on methods for determining the number of factors to retain (e.g., Fabrigar, Wegener, MacCallum, & Strahan, 1999; Floyd & Widaman, 1995; Ford, MacCallum, & Tait, 1986). There is relatively little research on factor retention issues in MEFA. Some studies, however, have suggested that features of the between-group correlation matrix may result in the extraction of too many factors if maximum likelihood–based approaches to factor selection are used (e.g., Briggs & MacCallum, 2003; Browne, MacCallum, Kim, Andersen, & Glaser, 2002; Schmitt, 2011). Thus, this article uses parallel analysis (Horn, 1965) to determine the number of factors to retain. Parallel analysis is a simulation-based approach. It compares the eigenvalues of the collected data with eigenvalues of data with the same structure that is generated as random “noise.” The basic logic is that eigenvalues associated with substantive factors should be larger than the eigenvalues extracted from randomly generated data (Hayton, Allen, & Scarpello, 2004). Studies (e.g., Crawford & Koopman, 1973; Humphreys & Montanelli, 1975; Schmitt, 2011) have consistently shown that parallel analysis provides trustworthy estimates of the number of factors to retain in EFA. D’Haenens et al. (2010) used parallel analysis on the within- and between-groups correlation matrices. All parallel analyses were conducted using the paran package in R (Dinno, 2012).

To address the second and third research questions, two additional EFAs were conducted. The first was based on the total (disaggregated) covariance matrix:

S_{T} = \frac{\sum_{j = 1}^{J} \sum_{i = 1}^{N_{j}} (y_{i j} - y_{\cdot \cdot}) {(y_{i j} - y_{\cdot \cdot})}^{'}}{N - 1} = {\hat{Σ}}_{T},

where y_ij is the p-variate vector of observed scores for individual i in group j, y is a vector of item grand means, N is the total sample size, and J is the number of groups. The second was an analysis on the unweighted group-mean covariance matrix,

S_{B} = \frac{\sum_{j = 1}^{J} (y_{j} - y_{\cdot \cdot}) {(y_{j} - y_{\cdot \cdot})}^{'}}{J - 1},

where y_j is a p-variate vector of means for group j. Both of these matrices were rescaled to be correlation matrices in the EFA analyses. These analyses investigate whether different factor structures would be extracted in those commonly misspecified cases. To apply a consistent and objective criterion to all of the analyses, decisions about how many factors to extract were based on parallel analysis.

To address the fourth research question, a linear probability ordinary least squares (OLS) regression model was used, similar to that used in previous research (Ladd, 2011). In that model, the predicted outcome for teacher i in school j can be expressed as

{LEAVE}_{i j} = f (X_{i j}, S_{j}, {WC}_{j}) .

That is, an individual’s intended departure is modeled as a function of characteristics of the individual teacher, X_ij; characteristics of the school, S_j; and school-level working conditions variables, WC_j. All regressions were weighted by the number of responses in each school, and standard errors were clustered at the school level. Factor scores were used to represent working conditions variables. Factor scores were estimated from a factor model using all loadings greater in magnitude than .3, and all cross-loadings were modeled explicitly. For the model assuming cross-level measurement invariance, school-level scores were formed by averaging over individual factor scores. For the model incorporating noninvariance, school scores were formed by using between-level factor scores from a multilevel factor analysis.

Other approaches for modeling this relationship include hierarchical linear models, hierarchical generalized linear models (e.g., Raudenbush & Bryk, 2002), and multilevel latent variable models (e.g., Marsh et al., 2009). These models can potentially result in different inferences about the relationship between school working conditions and intended teacher departure. For simplicity we use OLS in this study, as the focus is not on differences across statistical models, but on how inferences may differ depending on whether or how cross-level noninvariance is modeled.

Using linear composites as proxies for latent variables may also have consequences on the inferences that are made and may bias parameter estimates. The extent of this bias is a function of measurement error and sampling error (e.g., Lüdtke, Marsh, Robitzsch, & Trautwein, 2011; Marsh et al., 2009; Preacher et al., 2010; Raudenbush & Sadoff, 2008). Although factor scores (or other error-corrected variables) are often used in regression analyses to address the issue of measurement error, procedures in this vein (e.g., Croon & van Veldhoven, 2007; Raudenbush & Sadoff, 2008) are less efficient than multilevel structural equation models (Lüdtke et al., 2011). Under general conditions, the use of factor scores will also yield biased parameter estimates (Skrondal & Laake, 2001). A thorough investigation of these issues is beyond the scope of the current study.

Results

What Is the Multilevel Factorial Structure of These Two Surveys? Is There Empirical Evidence to Support the Assumption of Cross-Level Measurement Invariance in Either Case?

ICCs range from around .07 to around .27 for the WCS (Table 2), and from around .06 to .26 for the Tripod Survey (Table 3). Although this shows that individual responses within clusters share a nontrivial amount of similarity, there is also variability in terms of how much variance of each item is accounted for at the group level. Some items function better as indicators of group-level phenomenon than others. Overall, in both surveys, ICCs of this size provide sufficient evidence that a MEFA is warranted. In fact, this range of ICCs is consistent with past research (Marsh et al., 2012).

For the WCS, parallel analysis suggested the extraction of six factors at the within level and five factors at the between level. Within schools, the factor structure includes a Time factor, a Facilities and Resources factor, a School Leadership factor, a Teacher Evaluation factor, a Distributed Leadership factor, and a Professional Development factor. There are no significant cross-loadings. The Teacher Evaluation factor consists of three items—SLDR7, SLDR8, and SLDR9—inquiring about the handling of performance evaluation. The two strongest loading items in the School Leadership factor focus on aspects of student discipline (SLDR3: “The school leadership consistently enforces rules for student conduct”; SLDR4: “The school leadership support teachers’ efforts to maintain discipline in the classroom”). The Distributed Leadership factor identified in the within-level analysis contains items about the roles teachers play in establishing classroom, curricular, and administrative policy.

The between-school structure of the WCS differs from the within-school (teacher-level) structure (Table 4). There is considerably more cross-loading. In total, there are seven items that load onto more than one factor. This indicates that the factor structure may be less well defined at the school level than at the teacher level. MEFA results in D’Haenens et al. (2010) also show more significant cross-loading at the group level. The Time factor and the Teacher Evaluation factors are similarly constituted at the school level as they are at the teacher level. However, the Professional Development items no longer load onto a distinguishable factor. These items now load with seven of the Facilities and Resources items. This suggests that schools that provide adequate facilities and resources also provide adequate professional development (Muthén & Asparouhov, 2011). Thus, at the school level, there is a broadly defined resources factor, where quality professional development is conceived of as a school-wide resource. This is reasonable, as PROF1 reads “Sufficient funds and resources are available to allow teachers to take advantage of professional development activities.”

Table 4

Rotated Factor Loadings for the Working Conditions Survey: Multilevel Analysis

	Within schools						Between schools
	Factor						Factor
Item	1	2	3	4	5	6	1	2	3	4	5
TIME1	−.01	.01	.11	−.04	−.01	.46	−.16	.17	.00	.43	.16
TIME2	−.04	−.06	−.01	.00	.09	.73	−.02	.00	−.08	.95	.05
TIME3	.01	.05	.07	.03	−.06	.64	.06	.14	.17	.64	.07
TIME4	.11	.17	.03	.05	−.08	.50	.08	.18	.44	.45	−.09
TIME5	.02	.00	−.01	−.01	.03	.75	.05	.00	.10	.88	.00
FACR1	.07	.01	.51	.02	.06	.05	.08	.12	.05	−.03	.7
FACR2	−.01	−.03	.75	−.02	.04	−.02	−.02	.00	−.02	−.07	.86
FACR3	.01	.00	.74	.01	.01	−.02	−.07	−.02	.14	−.03	.7
FACR4	.06	.07	.53	.00	.01	.06	.04	.05	.30	−.07	.47
FACR5	−.01	−.01	.61	.01	.01	.01	−.12	.02	.04	−.06	.69
FACR6	.02	−.01	.50	.02	.03	.13	−.05	.06	−.01	.11	.62
FACR7	−.01	.16	.38	.03	−.01	.07	−.06	.25	−.08	−.01	.45
FACR8	−.01	.21	.39	.07	−.03	.06	.00	.45	.01	−.02	.39
DLDR1	.64	−.03	.10	.03	−.02	.00	.13	.00	.60	−.01	.21
DLDR2	.57	−.01	.11	.06	−.08	.02	.13	.00	.63	.17	−.01
DLDR3	.57	−.05	.06	.01	−.04	.01	−.05	.06	.55	.27	.05
DLDR4	.65	−.01	−.03	−.04	.13	.01	.11	.00	.46	.19	.23
DLDR5	.58	−.02	−.07	−.03	.01	.03	−.08	−.03	.46	.12	.03
DLDR6	.57	.15	.01	−.04	−.02	.00	−.03	.43	.35	.16	.11
DLDR7	.65	−.01	−.05	−.04	.03	.03	.00	.04	.51	−.01	.25
DLDR8	.61	.03	.00	.05	.03	−.01	.12	.08	.48	.12	.16
SLDR1	.13	.41	.02	.19	.05	.07	.29	.27	.52	−.05	.05
SLDR2	.05	.55	.03	.17	.07	.00	.28	.59	.16	−.01	.07
SLDR3	−.01	.88	.01	−.06	.01	.02	.02	1.00	−.08	.05	.02
SLDR4	−.02	.88	.02	−.02	.00	.01	.02	.95	.03	.03	.00
SLDR5	.08	.59	.01	.20	.03	.04	.25	.52	.37	.02	−.06
SLDR6	.10	.36	.03	.22	.14	.02	.23	.37	.38	−.02	.17
SLDR7	−.01	−.02	.01	.92	−.01	.01	1.00	.00	.00	.00	−.01
SLDR8	.00	−.02	.00	.92	−.01	.01	1.00	−.02	.01	.01	.01
SLDR9	.01	.09	.00	.73	.07	.00	.86	.10	.00	.03	.06
SLDR10	.20	.37	−.02	.18	.16	.06	.33	.26	.46	.07	.06
PROF1	.05	−.01	.07	−.02	.63	−.03	.03	−.10	.03	.10	.75
PROF2	.04	.04	−.02	.03	.60	.09	.20	.07	.00	.40	.44
PROF3	−.03	−.02	−.02	.00	.82	.05	.17	−.01	−.04	.29	.60
PROF4	.00	.02	.12	.00	.63	−.02	.14	.00	−.02	.09	.74
PROF5	.05	.07	.04	.06	.63	−.03	.17	.12	.04	.17	.60
Cross-loadings	0	0	0	0	0	0	1	4	5	2	2

Note. All loadings greater than |.3| are shown in bold. Strongest loadings for each item are shaded. For oblique rotations, standardized factor loadings can be greater than 1 (Jöreskog, 1999). Complete item text is available in Table 1.

The most interesting differences concern the School Leadership and Distributed Leadership factors from the teacher-level analysis. These factors are differently constituted at the school level. While at the teacher level the School Leadership and Distributed Leadership factors were clearly distinguished, with no cross-loading items, at the school level this is not true. The strongly loading items about discipline in the School Leadership factor associate with other items about school safety, including FACR8 (“Teachers and staff work in a school environment that is safe”) and discipline policy (DLDR6 asks about the role teachers play in “Establishing and implementing policies and student discipline”). There are also items from the School Leadership factor concerning trust and mutual respect that associate closely with the Distributed Leadership items at the school level (SLDR1: “There is an atmosphere of trust and mutual respect within the school”; SLDR6: “The faculty and staff have a shared vision”; SLDR10: “The school leadership makes a sustained effort to address teacher concerns about leadership issues;”). This makes for a broader conceptualization of Distributed Leadership at the school level, and this pattern of loadings makes sense, as Harris (2004) noted that, “collaboration and collegiality are at the core of distributed leadership” (p. 15). Harris (2003) noted that for distributed or dispersed leadership to work, a high degree of trust is essential. The cross-loaded items suggest that there is some conceptual overlap between the Distributed Leadership items and other aspects of School Leadership. These two factors correlate approximately .61.

For the Tripod Survey, parallel analysis suggests five factors at the within level, and two factors at the between level. For the within-classroom factorial structure, 21 of the first 29 items load onto a single factor (Table 5). These items deal with a broad range of the academic and emotional dimensions of classroom environment, but the strongest loading items are about understanding: CARE3, “My teacher really tries to understand how students feel about things”; CONS2, “My teacher checks to make sure we understand what he or she is teaching us”; and CLAR1, “If you don’t understand something, my teacher explains it another way.”

Table 5

Rotated Factor Loadings for the Tripod Survey: Multilevel Analysis

	Within schools					Between schools
	Factor					Factor
Item	1	2	3	4	5	1	2
CAPT1	.32	.18	.38	−.16	−.18	0.74	.24
CAPT2	.01	.89	−.01	.05	−.15	0.87	.07
CAPT3	.43	.33	−.02	.18	−.18	0.87	.07
CAPT4	.01	.84	.03	.03	−.02	0.84	.14
CARE1	.73	.05	−.03	.01	−.16	1.05	−.19
CARE2	.58	.04	−.09	.08	−.16	1.06	−.38
CARE3	.80	−.01	−.04	.00	−.08	1.04	−.18
CHAL1	.02	.68	.00	−.08	.28	0.76	.14
CHAL2	.28	.19	−.02	.06	.38	0.58	.27
CHAL3	.66	−.06	.00	.04	.08	0.84	.12
CHAL4	.57	.03	.00	.11	.14	0.83	.19
CHAL5	.30	.28	.01	.01	.33	0.73	.31
CHAL6	.54	−.04	−.04	.15	.28	0.65	.28
CHAL7	.40	.14	.00	.20	.09	0.68	.26
CHAL8	.44	.11	−.01	.16	.19	0.78	.19
CLAR1	.84	−.03	.05	−.08	.02	0.93	.05
CLAR2	.69	.01	−.03	.02	−.02	0.97	.01
CLAR3	.29	.02	.36	−.28	−.03	0.71	.30
CLAR4	.57	.13	.02	.18	−.02	0.82	.24
CLAR5	.67	.03	.04	.04	−.07	0.91	.10
CONF1	.02	.60	.03	−.04	.27	0.69	.18
CONF2	.13	.12	−.23	.32	−.15	0.91	−.34
CONF3	.47	.09	−.03	.18	.14	0.94	.02
CONF4	.24	.23	.03	.13	.22	0.81	.11
CONF5	.66	.04	.04	.01	.05	0.88	.06
CONS1	.55	.04	−.06	.16	.03	0.96	−.1
CONS2	.81	−.03	.05	−.06	.06	0.95	.02
CONS3	.51	.10	.00	.09	.17	0.82	.17
CONS4	.70	.04	−.02	.01	.00	0.89	.09
CONT1	−.05	.19	.14	.34	.16	0.28	.78
CONT2	−.07	−.02	.67	.06	.02	−0.09	1.01
CONT3	−.01	.04	.66	.04	−.04	0.21	.79
CONT4	−.03	−.01	.80	.07	.02	0.06	.94
CONT5	.08	.00	.13	.67	−.03	0.38	.70
CONT6	.22	−.02	.26	.43	−.04	0.45	.64
CONT7	.17	.07	.13	.49	.07	0.40	.67
Cross-loadings	3	1	1	0	1	7	7

Note. All loadings greater than |.3| are shown in bold. Strongest loadings for each item are shaded. For oblique rotations, factor loadings can be greater than 1 (Jöreskog, 1999). Complete item text is available in Table 2.

The Controlling items load distinctly onto two separate factors at the within-classroom level. One of those factors deals with positive aspects of classroom discipline “Students in this class treat the teacher with respect” (CONT6). The other, with negative aspects: “Student behavior in this class is a problem” (CONT4). Two other items load onto the factor dealing with negative aspects. CAPT1: “This class does not keep my attention—I get bored.” And CLAR3: “When he or she is teaching us, my teacher thinks we understand even when we don’t.” These items also deal with negative dimensions of the classroom environment.

The between-classroom-level analysis (Table 5) shows two factors—one of which is dominated by items relating to the academic and emotional support of a classroom, and one of which is dominated by items related to classroom management (the Control items). This suggests that at a classroom level, teachers vary in their ability to provide academic and emotional support and to manage behavior in the classroom. Some teachers are adept at providing academic support, but less adept at managing the classroom, and vice versa (Muthén & Asparouhov, 2011). There is substantial cross-loading at the between level, with seven items loading onto both factors.

What Are the Consequences of Ignoring the Multilevel Structure and Conducting a Factor Analysis on the Disaggregated Data?

For the WCS, parallel analysis suggested extracting six factors from the total, disaggregated correlation matrix. The pattern of factor loadings, and their relative magnitude, is consistent with the within-school factor structure that was suggested by the multilevel factor analysis (Table 6).

Table 6

Rotated Factor Loadings: Disaggregated Analysis

	Working Conditions Survey							Tripod Survey
	Factor							Factor
Item	1	2	3	4	5	6	Item	1	2	3	4	5
TIME1	−.02	.03	.12	−.05	.00	.46	CAPT1	.32	.20	.40	−.18	−.20
TIME2	−.04	−.06	−.01	−.01	.09	.76	CAPT2	.01	.90	.00	.05	−.15
TIME3	.02	.06	.07	.04	−.05	.65	CAPT3	.43	.37	.00	.17	−.21
TIME4	.13	.18	.02	.06	−.08	.50	CAPT4	.01	.84	.04	.03	.00
TIME5	.03	−.01	−.01	.00	.02	.78	CARE1	.77	.05	−.03	.00	−.14
FACR1	.08	.03	.51	.03	.08	.04	CARE2	.63	.04	−.12	.05	−.16
FACR2	−.02	−.03	.76	−.01	.06	−.02	CARE3	.83	−.02	−.03	−.01	−.08
FACR3	.03	−.02	.74	.01	.01	−.01	CHAL1	.04	.67	.00	−.08	.32
FACR4	.10	.07	.54	.02	−.01	.05	CHAL2	.32	.18	−.02	.07	.40
FACR5	−.02	−.02	.63	.00	.01	.01	CHAL3	.69	−.07	.01	.05	.09
FACR6	.00	.00	.54	.01	.02	.14	CHAL4	.62	.03	.02	.11	.13
FACR7	−.04	.18	.41	.02	−.01	.06	CHAL5	.33	.29	.03	.02	.33
FACR8	−.01	.26	.41	.06	−.05	.05	CHAL6	.6	−.07	−.03	.14	.29
DLDR1	.68	−.03	.09	.03	−.01	−.01	CHAL7	.43	.16	.01	.19	.08
DLDR2	.62	−.01	.09	.07	−.1	.04	CHAL8	.49	.12	.00	.13	.18
DLDR3	.59	−.05	.04	.00	−.03	.03	CLAR1	.85	−.02	.07	−.07	.00
DLDR4	.65	−.01	−.04	−.03	.15	.01	CLAR2	.72	.01	−.02	.03	−.03
DLDR5	.6	−.04	−.05	−.04	.00	.04	CLAR3	.30	.04	.40	−.28	−.03
DLDR6	.56	.18	.00	−.05	.01	.01	CLAR4	.58	.15	.05	.18	−.04
DLDR7	.65	.01	−.03	−.04	.05	.01	CLAR5	.69	.04	.04	.05	−.07
DLDR8	.61	.04	.00	.05	.04	.00	CONF1	.05	.58	.04	−.02	.29
SLDR1	.16	.41	.03	.22	.04	.06	CONF2	.20	.15	−.25	.31	−.19
SLDR2	.05	.58	.03	.19	.08	.00	CONF3	.55	.1	−.02	.14	.11
SLDR3	−.02	.90	.02	−.05	.02	.02	CONF4	.29	.25	.03	.12	.18
SLDR4	−.01	.90	.03	−.02	.00	.01	CONF5	.70	.03	.05	−.01	.05
SLDR5	.10	.59	−.01	.22	.03	.05	CONS1	.61	.04	−.08	.13	.01
SLDR6	.13	.38	.03	.23	.13	.02	CONS2	.84	−.03	.06	−.06	.05
SLDR7	−.01	−.01	.01	.93	−.01	.01	CONS3	.56	.11	.01	.07	.15
SLDR8	.00	−.02	.01	.94	.00	.01	CONS4	.72	.04	.00	.02	.01
SLDR9	.00	.09	.00	.75	.08	.00	CONT1	.04	.21	.19	.39	.14
SLDR10	.22	.36	−.02	.21	.15	.06	CONT2	.09	−.01	.73	.07	.02
PROF1	.05	−.02	.09	−.02	.64	−.03	CONT3	.01	.04	.70	.06	−.05
PROF2	.05	.05	−.02	.04	.6	.12	CONT4	.02	−.01	.84	.07	.03
PROF3	−.02	−.01	−.03	.00	.82	.06	CONT5	.10	.02	.17	.68	−.03
PROF4	.00	.02	.15	.00	.63	−.02	CONT6	.23	.01	.30	.46	−.05
PROF5	.06	.07	.03	.06	.65	−.02	CONT7	.19	.08	.18	.50	.06
Cross-loading	0	0	0	0	0	0		4	1	3	1	2

For the Tripod Survey, parallel analysis suggested extracting five factors (Table 6). The factor structure is also similar to the within-structure in the multilevel analysis, both in terms of the pattern of loadings and their relative magnitude.

Importantly, in both the WCS and the Tripod Survey, analysis of S_T results in a factorial structure that is inconsistent with either the between-classroom level or the between-school level of the corresponding multilevel analysis. This is consistent with other findings (e.g., D’Haenens et al., 2010; Holfve-Sabel & Gustafsson, 2005; Reise et al., 2005) and provides a clear illustration of the methodological consequences of assuming cross-level invariance (e.g., Julian, 2001; Marsh et al., 2012).

What Are the Consequences of Ignoring the Multilevel Structure and Conducting a Factor Analysis on the Unweighted Group Means?

For the WCS, parallel analysis suggested five factors. The patterns of association (Table 7) between the items are different from at the between level of the multilevel analysis (Table 4). In particular, some Professional Development items associate more strongly with Time items, and some associate more strongly with Facilities and Resources items. In addition, the Teacher Evaluation factor is slightly less distinct. Overall, there is far more cross-loading than in the between level of the multilevel analysis, showing that an analysis of the unweighted group means distorts the factor structure and makes it more difficult to identify.

Table 7

Rotated Factor Loadings: Group-Means Analysis

	Working Conditions Survey					Tripod Survey
	Factor					Factor
Item	1	2	3	4	5	Item	1	2
TIME1	.12	.09	.16	−.19	.41	CAPT1	.41	.45
TIME2	.00	.00	.00	−.05	.88	CAPT2	.78	.08
TIME3	.01	.15	.16	.03	.63	CAPT3	.76	.16
TIME4	−.13	.34	.24	.06	.45	CAPT4	.76	.10
TIME5	−.04	.15	.05	−.02	.82	CARE1	.97	−.17
FACR1	.61	.11	.10	.11	.03	CARE2	.88	−.3
FACR2	.82	.03	.01	.00	−.01	CARE3	.93	−.12
FACR3	.70	.21	.00	−.06	−.05	CHAL1	.66	.03
FACR4	.47	.36	.08	.05	−.09	CHAL2	.59	.10
FACR5	.68	.05	.03	−.08	−.03	CHAL3	.81	.04
FACR6	.63	.05	.08	−.06	.08	CHAL4	.61	.26
FACR7	.48	−.05	.28	−.06	−.02	CHAL5	.72	.16
FACR8	.43	.08	.44	−.02	−.06	CHAL6	.74	.06
DLEAD1	.12	.77	−.03	.12	−.05	CHAL7	.75	.07
DLEAD2	−.03	.79	.00	.08	.04	CHAL8	.77	.05
DLEAD3	.00	.66	.00	−.03	.16	CLAR1	.92	−.02
DLEAD4	.11	.58	−.01	.09	.20	CLAR2	.92	−.06
DLEAD5	.00	.61	−.03	−.11	.06	CLAR3	.27	.42
DLEAD6	.03	.46	.38	−.04	.11	CLAR4	.85	.13
DLEAD7	.14	.61	.04	.02	.00	CLAR5	.9	−.01
DLEAD8	.10	.57	.07	.14	.05	CONF1	.47	.29
SLEAD1	.03	.28	.38	.35	.02	CONF2	.63	−.19
SLEAD2	.05	.02	.63	.31	.05	CONF3	.84	−.01
SLEAD3	.03	−.06	.97	.00	.04	CONF4	.7	.13
SLEAD4	.02	.02	.94	.01	.02	CONF5	.78	.11
SLEAD5	−.06	.19	.59	.29	.06	CONS1	.88	−.13
SLEAD6	.12	.21	.42	.3	.06	CONS2	.92	−.04
SLEAD7	−.02	.04	.02	.96	−.02	CONS3	.58	.26
SLEAD8	.00	.05	.01	.95	−.01	CONS4	.85	.00
SLEAD9	.03	.01	.12	.84	.03	CONT1	.44	.5
SLEAD10	.03	.28	.34	.37	.13	CONT2	.14	.93
PROF1	.52	−.02	−.12	.14	.32	CONT3	.14	.79
PROF2	.28	−.03	.07	.23	.50	CONT4	.01	.94
PROF3	.39	−.11	−.03	.23	.52	CONT5	.58	.44
PROF4	.57	−.07	.00	.2	.29	CONT6	.50	.53
PROF5	.37	−.03	.1	.23	.39	CONT7	.53	.46
Cross-loadings	4	3	6	4	4		5	5

For the Tripod Survey, parallel analysis suggests the extraction of two factors (Table 7). The structure suggested by the analysis of the group-mean correlation matrix is fairly similar to that of the between level of the multilevel analysis. There is still one large factor; however, the control items no longer load as distinctively onto a separate factor.

In summary, in the WCS and Tripod Surveys, analysis of the unweighted group means has the effect of distorting the perceived factorial structure, and leads to inferences that are not consistent with either the within or between level of analysis. This is consistent with theoretical results discussed elsewhere (Preacher et al., 2010). Conceptually, this distortion makes sense. There are at least two distinct sources of bias that are present in this group-means analysis. First, differences in group size are not accounted for, and this may distort the correlation matrix. Second, the correlation matrix of group means contains between and within sources of variance (Muthén, 1994), and to the extent that the between- and within-correlation matrices have different structures, this will also have the effect of distorting inferences about the factorial configuration.

How Would a Multilevel Factor Analysis Alter the Policy Conclusions About the Importance of School Leadership in the Intended Departure of Teachers?

The previous analyses provide clear, policy-relevant illustrations of how the assumption of cross-level invariance may be unjustified in empirical data sets. In addition, the previous analyses suggest that in many studies that assume cross-level measurement invariance, inferences about relationships with external variables may be distorted. In other words, evidence of cross-level measurement invariance may lead to different substantive inferences and policy conclusions. For example, Ladd (2011) concluded, “Among the working conditions factors, the dominant factor, by far, is the quality of leadership” (p. 256). This conclusion was based on an analysis that assumes cross-level measurement invariance. Would conclusions similar to Ladd’s hold if the cross-level noninvariance found in the MEFA were explicitly modeled?

Based on the 2008 WCS data, when cross-level measurement invariance is assumed, and the group means of the factors found by analyzing the disaggregated correlation matrix S_T are used as predictors, School Leadership still emerges as the dominant working conditions factor (Table 8). Even with a large number of conditioning covariates, the coefficient of School Leadership suggests that the quality of leadership “protects against intended teacher departures” (Ladd, 2011, p. 248). Thus, when assuming cross-level measurement invariance, the substantive inferences and policy conclusions are consistent with those found by Ladd (2011).

Table 8

Linear Probability Models: Working Conditions Survey

	Invariance assumed		Noninvariance
	Coefficient	SE	Coefficient	SE
Working conditions
Time	−.005	.008	.001	.006
Facilities	−.016	.007*	−.012	.007
Distributed leadership	.000	.007	−.066	.009***
School leadership	−.086	.011***	−.021	.007**
Teacher evaluation	.003	.008	−.001	.007
Professional development	.009	.008	NA	NA
Teacher characteristics
Black or African American teacher	.067	.012***	.065	.012***
Hispanic or Latino teacher	.009	.025	.009	.025
Other teacher	.068	.018***	.067	.018***
Male teacher	.062	.013***	.062	.013***
Teacher experience
2 to 3 years	.062	.015***	.062	.015***
4 to 6 years	.055	.014***	.055	.014***
7 to 10 years	.024	.015	.023	.015
11 to 20 years	−.032	.014*	−.032	.014*
>20 years	.007	.015	.007	.015
Has a graduate degree	.042	.009***	.042	.009***
Nationally Board Certified	.015	.009^†	.015	.009^†
Trained in a master’s program	.000	.011	.000	.011
Alternative training program	−.009	.013	−.008	.013
School characteristics
% Black or African American students	.028	.006***	.028	.006***
% Hispanic or Latino students	.002	.005	.002	.005
% free/reduced lunch	−.001	.007	−.002	.006
% teachers 0 to 3 years experience	.633	.827	.797	.819
% teachers 4 to 10 years experience	.513	.690	.651	.683
% teachers 11+ years experience	.735	.998	.935	.988
AYP met in 2007	.009	.009	.007	.009
Growth met in 2007	−.006	.009	−.006	.009
Log of school membership	.008	.006	.007	.005
New administrator	.023	.009*	.023	.009*
Geographic indicators
Wake County LEA	.026	.014^†	.029	.014*
Guilford County LEA	−.054	.016***	−.055	.016***
Cumberland County LEA	−.011	.022	−.016	.021
Winston-Salem/Forsyth LEA	−.054	.015***	−.053	.015***
Charlotte-Mecklenburg LEA	.006	.025	−.002	.025
R ²	.066		.067

Note. NA = not applicable; AYP = adequate yearly progress; LEA = local education agency.

†

Significant at .1. *Significant at .05 level. **Significant at .01. ***Significant at .001.

However, using the same linear probability model with the factors that emerge from the model incorporates cross-level measurement invariance, a different set of conclusions are reached. Importantly, the Distributed Leadership factor emerges as the dominant factor in “protecting against intended departures” (Table 8).

This is consistent with existing research that shows that teacher reports of job satisfaction are positively related to shared responsibility and collaboration in schools (The MetLife Survey of the American Teacher, 2009) and that distributed leadership is positively related to school improvement (e.g., Muijs & Harris, 2003). This finding also marks a critical difference from the analysis based on the assumption of cross-level invariance. In that analysis, the coefficient of the Distributed Leadership factor is not statistically significant.

The factor-score correlations offer further evidence of how multilevel factor analysis can alter policy conclusions (Table 9). First, factor-score correlation patterns can differ; for example, the associations between a school’s Time score and the other working conditions dimensions are systematically weaker for the model that incorporates cross-level measurement invariance compared with a model that assumes cross-level measurement invariance. Second, there is some variation in the strength of association between factor-scores across the two methods (i.e., the relationship between a factor derived from the MEFA and the corresponding factor derived from the single-level EFA). The weakest correlation (.87) is for a school’s Distributed Leadership score. While this is still a substantial relationship, it is also sufficiently low to yield different substantive conclusions about the importance of Distributed Leadership in “protecting against intended teacher departures,” depending on which analytical approach is used.

Table 9

Factor-Score Correlations: Multilevel and Disaggregated Analyses

	Time	Facilities	Distributed leadership	School leadership	Teacher evaluation	Professional development
Time	.97	.72	.75	.74	.62	.80
Facilities	.68	.95	.67	.64	.56	.77
Distributed leadership	.64	.69	.87	.81	.69	.74
School leadership	.54	.57	.83	.92	.88	.73
Teacher evaluation	.52	.61	.87	.75	.99	.66
Professional development	—	—	—	—	—	—

Note. All correlations are significant at .001. Lower triangle = correlations of factor scores obtained from multilevel analysis. Upper triangle = correlations of factor scores obtained from disaggregated analysis, assuming cross-level measurement invariance. Diagonal = factor score correlations across the two methods.

Overall, the fact that the analysis based on an assumption of invariance suggests that Distributed Leadership is not an important predictor of planned teacher departure, and the analysis that incorporates cross-level noninvariance suggests that Distributed Leadership is an important consideration, and illustrates the potential inferential consequences of assuming noninvariance.

Summary

Although awareness of the importance of testing cross-level measurement invariance has been well known in methodological research for nearly a quarter century, analytic methods that assume cross-level invariance are still widely used in the educational policy literature, particularly with regard to school and classroom climate variables that are based on aggregated survey responses. It is common to find studies that make policy recommendations based on single-level factor analyses that ignore the clustered, hierarchical structure of the data, and use linear composites to create individual scores. This article used two examples to investigate whether there is empirical evidence to support the assumption of cross-level measurement invariance, and whether using factor analytic techniques that assume cross-level invariance would influence the analysis of empirical data. The results reflect some general patterns that are worth noting here.

There Can Be Significant Differences in Factorial Structure Across Levels

In these two empirical examples, fewer factors were found at the between-group level than the within-group level of analysis. In the case of the WCS, the multilevel analysis suggested six within-school factors and five between-school factors. In the case of the Tripod Survey, the differences in factorial structure are even greater. While there is support for five factors at the within-classroom level, there is only support for two factors at the between-classroom level.

This exploratory analysis may, as Cronbach (1976) suggested, lead to the articulation of a specific (and independent) theory for constructs that exist and are distinguishable between groups (school or classroom). For example, in the WCS, there are five dimensions of school climate that are distinguishable based on aggregated survey responses. For the Tripod Survey, there are two dimensions of classroom environment that are distinguishable based on aggregated survey responses. This, in fact, is consistent with other factor analyses conducted on the Tripod data, which found that the items from the “Five Support C’s” (Conferring, Consolidating, Captivating, Caring, Clarifying) and Challenge load onto one factor as an “amorphous group” (Ferguson, 2010, p. 6).

Analysis of the Total Correlation Matrix Can Distort Perception of the Between-Level Factorial Structure

The results of the factor analyses on the total correlation matrix did not predictably show concordance with the between-level structure for either survey. In both cases, the structure that was identified bore a strong resemblance to the within structure identified in the multilevel analysis. As there were fewer factors identified at the between level, this can lead to an individualistic fallacy (Alker, 1969), where phenomenon that occurs between individuals are assumed to occur between groups.

Analysis of the Group-Mean Correlation Matrix Can Distort Perception of the Between-Level and Within-Level Factorial Structures

The factor analysis on the unweighted group-mean correlation matrix yielded results that were not consistent in factorial structure with any of the other analyses. While this analysis did suggest five factors for the WCS, the patterns of loadings were different from in either the disaggregated analysis or the multilevel analysis. In the case of Tripod, two factors were identified, but again the patterns of association were not consistent with the between level of the multilevel analysis.

Inferences About Relationships Between School or Classroom Climate and Policy-Relevant Variables May Differ Under the Assumption of Cross-Level Invariance

Linear probability models based on invariance yielded substantively different inferences than those based on noninvariance. Specifically, Distributed Leadership, which did not emerge as an important factor in “protecting against intended departure” in the model assuming cross-level invariance, was the most important factor in the model based on cross-level noninvariance. The identification of shared leadership as an important working conditions factor potentially led to a different set of policy recommendations and could potentially inform a different set of interventions.

Conclusion

The results of this study have direct implications and raise important questions for applied research and policy. Factor analysis is commonly used for rank reduction. Based on the results of a factor analysis, linear composites are created that act as proxies for factors and that may be interpreted directly or included in a range of predictive or inferential statistical analyses. In this kind of analysis, depending on which correlation matrix was analyzed, there may be evidence for completely different linear composites. These composites differ not only in the number of included items but also in the way they would be defined and articulated. This means that, depending on which factor analysis was conducted, different qualities of school or classroom environment would be defined, and entirely different sets of relationships would be explored.

Improperly constructed linear composites make appropriate theory testing difficult if not impossible, with important implications that are not only methodological but also eminently practical. If an intervention targeted at improving retention is found not to have the desired effects, for example, it would be impossible to disentangle “theory failure” from “implementation failure” (Raudenbush & Sadoff, 2008). In other words, it would be impossible to determine if an intervention designed to improve teacher retention failed because it was ill-conceived and based on a faulty model of teacher mobility, or if it failed because the theory was sound but the intervention was implemented poorly. In the first case, policy should address the articulation of a better theory of teacher mobility. In the second case, policy should address mechanisms to support proper implementation.

Footnotes

Acknowledgements

The author is grateful to Joan Herman, Jia Wang, and Noelle Griffin for their support; and to José-Felipe Martinez, Li Cai, and Peter Bentler for their valuable advice and feedback. The author is also grateful to the North Carolina Education Research Data Center for part of the data used in this research.

Author’s Note

The findings and opinions expressed in this report are those of the author and do not necessarily reflect the positions or policies of the Bill and Melinda Gates Foundation or the U.S. Department of Education.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research for this article was supported in part by Grant 52306 from the Bill and Melinda Gates Foundation with funding to the National Center for Research on Evaluation, Standards, and Student Testing (CRESST). Part of this research was made possible by a predoctoral advanced quantitative methodology training Grant (#R305B080016) awarded to University of California, Los Angeles (UCLA) by the Institute of Education Sciences of the U.S. Department of Education.

Notes

Author

JONATHAN SCHWEIG is a doctoral student at the University of California, Los Angeles. His research focuses on multilevel modeling, teacher evaluation, and the measurement of classroom environments and processes.

References

Alker

H. R.

(1969). A typology of ecological fallacies. In Dogan

(Ed., Stein

, Series Ed.), Quantitative ecological analysis in the social sciences (pp. 69–86). Cambridge, MA: MIT Press.

Alwin

D. F.

(1973). The use of factor analysis in the construction of linear composites in social research. Sociological Methods and Research, 2, 191–214.

Balch

R. T.

(2012). The validation of a student survey on teacher practice (Unpublished doctoral dissertation). Graduate School of Vanderbilt University, Nashville, TN.

Bill & Melinda Gates Foundation. (2010). Learning about teaching: Initial findings from the measures of effective teaching project. Seattle, WA: Author.

Bliese

P. D.

(2000). Within-group agreement, non-independence, and reliability: Implications for data aggregation and analyses. In Klein

K. J.

Kozlowski

S. W. J.

(Eds.), Multilevel theory, research, and methods in organizations: Foundations, extensions, and new directions (pp. 349–381). San Francisco, CA: Jossey-Bass.

Bollen

K. A.

(1989). Structural equations with latent variables. New York, NY: Wiley.

Bollen

K. A.

Bauldry

(2011). Three Cs in measurement models: Causal indicators, composite indicators, and covariates. Psychological Methods, 16, 265–284.

Bollen

K. A.

Lennox

(1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 110, 305–314.

Briggs

N. E.

MacCallum

R. C.

(2003). Recovery of weak common factors by maximum likelihood and ordinary least squares estimation. Multivariate Behavioral Research, 38, 25–56.

10.

Browne

M. W.

MacCallum

R. C.

Kim

C. T.

Andersen

B. L.

Glaser

(2002). When fit indices and residuals are incompatible. Psychological Methods, 7, 403–421.

11.

Butrymowicz

S. A.

(2012, May 14). Student surveys for children as young as 5 years old may help rate teachers. The Washington Post. Retrieved from http://www.washingtonpost.com/local/education/student-surveys-may-help-rate-teachers/2012/05/11/gIQAN78uMU_story.html

12.

Chan

(1998). Functional relations among constructs in the same content domain at different levels of analysis: A typology of composition models. Journal of Applied Psychology, 83, 234–246.

13.

Crawford

C. B.

Koopman

(1973). A note on Horn’s test for the number of factors in factor analysis. Multivariate Behavioral Research, 8, 117–125.

14.

Cronbach

L. J.

(with Deken J. E., & Webb, N.). (1976, July). Research on classrooms and schools: Formulation of questions, design and analysis. Occasional Paper of the Stanford Evaluation Consortium, Stanford University, Stanford, CA.

15.

Cronbach

L. J.

Webb

(1975). Between-class and within-class effects in a reported aptitude × treatment interaction: Reanalysis of a study by G. L. Anderson. Journal of Educational Psychology, 67, 717–724.

16.

Croon

M. A.

van Veldhoven

M. J.

(2007). Predicting group-level outcome variables from variables measured at the individual level: A latent variable multilevel model. Psychological Methods, 12, 45–57.

17.

D’haenens

Van Damme

Onghena

(2010). Multilevel exploratory factor analysis: Illustrating its surplus value in educational effectiveness research. School Effectiveness and School Improvement, 21, 209–235.

18.

Dinno

(2012). paran: Horn’s Test of Principal Components/Factors [R package version 1.5.1]. Retrieved from http://CRAN.R-project.org/package=paran

19.

DiStefano

Monrad

D. M.

May

R. J.

Smith

Gay

Mindrila

Rawls

(2008, March). Parent student, and teacher perceptions of school climate: Investigations across organizational levels. Paper presented at the annual meeting of the American Educational Research Association, New York, NY.

20.

Fabrigar

L. R.

Wegener

D. T.

MacCallum

R. C.

Strahan

E. J.

(1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272–299.

21.

Ferguson

(2010, October 14). Student perceptions of teaching effectiveness. Discussion brief from the National Center for Teacher Effectiveness and the Achievement Gap Initiative, Harvard University, Cambridge, MA.

22.

Floyd

F. J.

Widaman

K. F.

(1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7, 286–299.

23.

Ford

J. C.

MacCallum

R. C.

Tait

(1986). The application of exploratory factor analysis in applied psychology: A critical review and analysis. Personnel Psychology, 39, 291–314.

24.

Harnqvist

(1978). Primary mental abilities at collective and individual levels. Journal of Educational Psychology, 70, 706–716.

25.

Harris

(2003). Teacher leadership and school improvement. In Harris

Day

Hopkins

Hadfield

Hargreaves

Chapman

(Eds.), Effective leadership for school improvement (pp. 72–83). London, England: Routledge.

26.

Harris

(2004). Distributed leadership and school improvement leading or misleading? Educational Management Administration & Leadership, 32, 11–24.

27.

Hayton

J. C.

Allen

D. G.

Scarpello

(2004). Factor retention decisions in exploratory factor analysis: A tutorial on parallel analysis. Organizational Research Methods, 7, 191–205.

28.

Holfve-Sabel

Gustafsson

(2005). Attitudes towards school, teacher, and classmates at classroom and individual levels: An application of two-level confirmatory factor analysis. Scandinavian Journal of Educational Research, 49, 187–202.

29.

Horn

J. L.

(1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179–185.

30.

Hoy

W. K.

Clover

S. I. R.

(1986). Elementary school climate: A revision of the OCDQ. Educational Administration Quarterly, 22, 93–110.

31.

Humphreys

L. G.

Montanelli

R. G.

Jr. (1975). An investigation of the parallel analysis criterion for determining the number of common factors. Multivariate Behavioral Research, 10, 193–205.

32.

Ingersoll

(2001). Teacher turnover and teacher shortages: An organizational analysis. American Educational Research Journal, 38, 499–534.

33.

Jöreskog

K. G.

(1999). How large can a standardized coefficient be? Retrieved from http://www.ssicentral.com/lisrel/techdocs/HowLargeCanaStandardizedCoefficientbe.pdf

34.

Julian

M. W.

(2001). The consequences of ignoring multilevel data structures in nonhierarchical covariance modeling. Structural Equation Modeling, 8, 325–352.

35.

Ladd

(2011). Teachers’ perceptions of their working conditions: How predictive of planned and actual teacher movement? Educational Evaluation and Policy Analysis, 33, 235–261.

36.

Lee

S. Y.

(1990). Multilevel analysis of structural equation models. Biometrika, 77, 763–772.

37.

Loeb

Darling-Hammond

Luczak

(2005). How teaching conditions predict teacher turnover in California schools. Peabody Journal of Education, 80, 44–70.

38.

Longford

N. T.

Muthén

B. O.

(1992). Factor analysis for clustered observations. Psychometrika, 57, 581–597.

39.

Lüdtke

Marsh

H. W.

Robitzsch

Trautwein

(2011). A 2 × 2 taxonomy of multilevel latent contextual models: Accuracy and bias trade-offs in full and partial error-correction models. Psychological Methods, 16, 444–467.

40.

Marsh

H. W.

Lüdtke

Nagengast

Trautwein

Morin

A. J.

Abduljabbar

A. S.

Köller

(2012). Classroom climate and contextual effects: Conceptual and methodological issues in the evaluation of group-level effects. Educational Psychologist, 47, 106–124.

41.

Marsh

H. W.

Lüdtke

Robitzsch

Trautwein

Asparouhov

Muthén

(2009). Doubly-latent models of school contextual effects: Integrating multilevel and structural equation approaches to control measurement and sampling error. Multivariate Behavioral Research, 44, 764–802.

42.

McDonald

R. P.

Goldstein

(1989). Balanced versus unbalanced designs for linear structural relations in two-level data. British Journal of Mathematical and Statistical Psychology, 42, 215–232.

43.

Meredith

(1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 4, 525–543.

44.

The MetLife Survey of the American Teacher: Collaborating for Student Success. (2009). Retrieved from https://www.metlife.com/assets/cao/contributions/foundation/american-teacher/MetLife_Teacher_Survey_2009_Part_1.pdf

45.

Mihaly

McCaffrey

D. F.

Staiger

D. O.

Lockwood

J. R.

(2013). A composite estimator of effective teaching. Seattle, WA: Bill & Melinda Gates Foundation.

46.

Muijs

Harris

(2003). Teacher leadership—Improvement through empowerment? An overview of the literature. Educational Management Administration & Leadership, 31, 437–448.

47.

Muthén

B. O.

(1991). Multilevel factor analysis of class and student achievement components. Journal of Educational Measurement, 28, 338–354.

48.

Muthén

B. O.

(1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22, 376–398.

49.

Muthén

B. O.

Asparouhov

(2011). Beyond multilevel regression modeling: Multilevel analysis in a general latent variable framework. In Hox

Roberts

J. K.

(Eds.), Handbook of advanced multilevel analysis (pp. 15–40). New York, NY: Taylor & Francis.

50.

Muthén

B. O.

Muthén

L. K.

(2010). Mplus (Version 6.11) [Computer software]. Los Angeles, CA: Author.

51.

New Teacher Center (2009) Validity and reliability of the North Carolina teacher working conditions survey. Santa Cruz, CA: Author.

52.

New York City School Survey. (n.d.). Retrieved from http://schools.nyc.gov/Accountability/tools/survey/default.htm

53.

Preacher

K. J.

Zyphur

M. J.

Zhang

(2010). A general multilevel SEM framework for assessing multilevel mediation. Psychological Methods, 15, 209–233.

54.

Rabe-Hesketh

Skrondal

Zheng

(2007). Multilevel structural equation modeling. In Lee

S.-Y.

(Ed.), Handbook of latent variable and related models (pp. 209–227). Amsterdam, Netherlands: Elsevier.

55.

Raudenbush

S. W.

Bryk

A. S.

(2002). Hierarchical linear models (2nd ed.). Newbury Park, CA: SAGE.

56.

Raudenbush

S. W.

Sadoff

(2008). Statistical inference when classroom quality is measured with error. Journal of Research on Educational Effectiveness, 1, 138–154.

57.

Reise

R. P.

Ventura

Nuechterlein

K. H.

Kim

K. H.

(2005). An illustration of multilevel factor analysis. Journal of Personality Assessment, 84, 126–136.

58.

Ryan

A. M.

Patrick

(2001). The classroom social environment and changes in adolescents’ motivation and engagement during middle school. American Educational Research Journal, 28, 437–460.

59.

Schmitt

T. A.

(2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment, 29, 304–321.

60.

Skrondal

Laake

(2001). Regression among factor scores. Psychometrika, 66, 563–575.

61.

Toland

M. D.

De Ayala

R. J.

(2005). A multilevel factor analysis of students’ evaluations of teaching. Educational and Psychological Measurement, 65, 272–296.

62.

Van de Vijver

F. J.

Poortinga

Y. H.

(2002). Structural equivalence in multilevel research. Journal of Cross-Cultural Psychology, 33, 141–156.

63.

Zyphur

Kaplan

Christian

(2008). Assumptions of cross-level measurement and structural invariance in the analysis of multilevel data: problems and solutions. Group Dynamics: Theory, Research, and Practice, 12, 127–140.