Abstract
Teacher data-driven decision making (DDDM) is a professional practice of great prominence in the current K-12 education system. Moreover, teacher self-efficacy and anxiety around DDDM represent important measurement targets in both research and practice. This study consequently examined the validity, reliability, and invariance of data collected via the Data-Driven Decision-Making Efficacy and Anxiety (3D-MEA) Inventory among U.S. in-service (N = 365) and pre-service (N = 457) teachers. The 3D-MEA is intended to measure four dimensions of self-efficacy related to DDDM, as well as anxiety related to DDDM. Multi-group confirmatory factor analyses established structural measurement invariance, which supports meaningful and effectively interchangeable interpretation of 3D-MEA scores with both in-service and pre-service teachers. Reliability estimates were also high in both populations. Limitations and future directions are discussed.
At the same time as the world’s technological capacity to gather, store, and share information grew exponentially, the federal No Child Left Behind (NCLB) legislation greatly expanded the collection of academic data on students in the U.S. (NCLB, 2002). Both the culture of accountability in education, and increased access to data of many kinds, have proliferated also in response to other federal legislations and initiatives, including the Individuals with Disabilities Education Improvement Act (IDEIA, 2004), the American Recovery and Reinvestment Act (ARRA, 2009), the Race to the Top competition, and the Every Student Succeeds Act (ESSA, 2015). While technology and policy have enabled or prioritized examination and use of education data, the day-to-day responsibility for analyzing, interpreting, and using these data to make decisions that improve teaching and learning has predominantly fallen on classroom teachers (Mandinach & Gummer, 2016).
Unfortunately, teachers’ data literacy and capacity to engage in data-driven decision making (DDDM) has not necessarily kept pace with the growth in education data technology or access to education data themselves (Means et al., 2010, 2011). Online diagnostic assessments, student learning and intervention platforms, extant data warehouses, standards-based official assessments, and the like provide diverse, immense, and sometimes continuous streams of data to educators at different time points and on different time scales. The COVID-19 pandemic, and resultant use of new digital systems and tools for remote learning which often produce their own data as well, have further amplified the need for teachers to be proficient in using data to adapt their teaching to the needs of their students (Dorn et al., 2020; Kaffenberger, 2021).
Prior research indicates that effective and responsible engagement in DDDM is difficult for both in-service teachers (i.e., practicing teachers) as well as pre-service teachers (i.e., those training to become teachers) (Dunn et al., 2013; Wayman & Jimerson, 2014). Two noted factors that can influence how effectively educators use data to make decisions are teacher self-efficacy for DDDM and teacher anxiety for DDDM. Teacher self-efficacy for DDDM is a teacher’s belief in their ability to use DDDM in their classroom, and teacher anxiety for DDDM is a teacher’s stress and anxiety induced by engagement in this practice (Authors, 2016; Dunn et al., 2013). Given the challenges of in-service and pre-service teacher self-efficacy and anxiety around DDDM, and continued emphasis on this practice, proper measurement of these constructs in both of these populations is thus essential in research and practice.
Dunn et al. (2013) developed the Data-Driven Decision-Making Efficacy and Anxiety (3D-MEA) Inventory as a self-report instrument to measure these important constructs and their sub-domains using 20 items and an agreement response format. Past research on the 3D-MEA found strong evidence for the quality of the instrument with each pre-service teachers and in-service teachers (Authors, 2018; 2020; Dunn et al., 2013). While evidence from separate evaluations of the 3D-MEA with these two populations is highly favorable, the current study more formally investigates the functioning of the instrument across pre-service teachers and in-service teachers. It is possible that the 3D-MEA may not function equivalently for in-service teachers and pre-service teachers given that the former as a whole are definitionally more experienced in the education field, whereas the latter are novices by definition; such differences in experience might affect these groups’ respective interpretations of and thus responses to 3D-MEA items. Also, as noted by Authors (2020), several 3D-MEA items reference the respondent’s “district,” which may complicate the item response process and introduce measurement error for respondents who have no district affiliation, including pre-service teachers.
Accordingly, the main purpose of this study is to provide evidence of the reliability, validity, and invariance of 3D-MEA scores among pre-service teachers and in-service teachers. The present study, particularly, addressed the following research questions: (1) To what extent do 3D-MEA inventory scores exhibit the same factorial measurement structure for pre-service teachers and in-service teachers? (2) To what extent do 3D-MEA inventory scores exhibit internal consistency (reliability) among both pre-service teachers and in-service teachers?
The ability to estimate self-efficacy or anxiety related to DDDM accurately among both pre-service and in-service teachers is critical in both research and practice. These estimates can instrumentally inform decisions about the scope or nature of learning opportunities, scaffolds, or supports to avail to either of these populations to advance their DDDM capacity. There are also situations in which a measure of such constructs might be used simultaneously with both pre-service and in-service teachers, necessitating cross-population invariance in the functioning of the instrument. Researchers or teacher educators may be interested in comparing these two populations vis-a-vis their DDDM self-efficacy or anxiety or sub-domains thereof. For example, researchers may ask whether traditionally younger, pre-service teachers have higher self-efficacy with respect to technology-related DDDM tasks than in-service teachers. Should we want to well understand such differences, the instrument needs to be psychometrically invariant for these groups. Others might be interested in tracking developmental changes in DDDM self-efficacy or anxiety during the transition from pre-service teacher to in-service teacher, which also necessitates measurement invariance. The establishment of measurement invariance would help allay concerns in any such studies that observed differences or changes were not actually real but instead artifactual.
Theoretical Framework and Literature Review
The Nature of Self-Efficacy and Anxiety
Research and theory on the nature of self-efficacy and anxiety in general—and self-efficacy and anxiety related to teaching and data-driven decision making in particular—informed our analysis. Bandura’s (1977) social cognitive theory was initial basis for the development of the 3D-MEA instrument (Dunn et al., 2013). Bandura’s (1977, 1994) theory described a cognitive system wherein one’s ability to be successful at new tasks is partly determined by their self-efficacy and anxiety for those tasks.
Teacher self-efficacy beliefs generally were defined by Tschannen-Moran and Hoy (2001, p. 783) as “a judgement of his or her capabilities to bring about desired outcomes of student engagement and learning.” Research indicates that teachers with a strong sense of self-efficacy are more likely to implement new practices (such as DDDM) than teachers with more limited self-efficacy (Guskey, 1988). Conversely, concerns, feelings of anxiety, and stress can hinder a teacher’s ability to take up new innovations or implement new practices. Because of the significant impact they can have on teacher practices, researchers of teacher practices, including DDDM practices, remain invested in exploring task-specific teacher self-efficacy and anxiety.
Definitions of data-driven decision-making efficacy and anxiety latent variables.
Note. DDDM = data-driven decision making.
Data-Driven Decision-Making Efficacy and Anxiety Reliability and Validity Evidence
Prior evidence on the psychometric properties of the 3D-MEA also informed this research. Four published papers have specifically examined the psychometric properties of the data derived from the 3D-MEA. In general, these three studies emphasized internal consistency-related reliability evidence and internal score structure-based validity evidence. Several smaller-scale substantive studies have also reported data that technically constitutes or may be considered reliability and/or validity evidence (e.g., Authors, 2017; Green et al., 2016), but given their small scale, we emphasize in this section, the four larger-scale studies specifically aimed at investigating the 3D-MEA vis-à-vis score reliability and validity.
Dunn et al. (2013) reported considerable reliability and validity evidence for the 3D-MEA in the paper in which the instrument was first published. In this first psychometric evaluation of the 3D-MEA, the authors used data from 1728 in-service teachers in the U.S. pacific northwest region. They first used exploratory factor analysis, and cross-validated findings in a separate sample using confirmatory factor analysis, concluding that a five-factor model (with four self-efficacy dimensions and one anxiety dimension) well accounted for the observed 3D-MEA data. Authors (2018) then independently conducted a confirmatory factor analysis of 3D-MEA data collected from 365 Illinois in-service teachers, finding a good fit of the same five-factor model and high score reliability.
Two groups of researchers have also formally investigated research questions about 3D-MEA score validity and reliability with pre-service teachers. Dunn et al. (2020) used exploratory factor analysis to examine the internal score structure of 3D-MEA data collected from 602 pre-service teachers over multiple semesters from a course in a single teacher-education institution. The authors hypothesized that given the relative inexperience of pre-service teachers with DDDM would likely mean that self-efficacy would be a unidimensional construct for this population (rather than four separate DDDM self-efficacy dimensions) and this is indeed what they found.
Around the same time, Authors (2020) used confirmatory factor analysis to test the original five-factor structure of 3D-MEA scores with a somewhat smaller, but still adequately sized, and potentially much more heterogeneous sample of pre-service teachers. The authors found an excellent fit of the five-factor model in both training and validation sub-samples in their study. While these findings diverge from those of Dunn et al. (2020), this may be due to differences in sampling. Authors’ (2020) study featured pre-service teachers from at least 38 states and 132 pre-service institutions, and thus may have provided a more heterogeneous sample for testing the 3D-MEAs score structure.
The findings of prior studies have provided favorable evidence for the hypothesized 3D-MEA score structure and score reliability among pre-service teachers and in-service teachers. Despite these separate 3D-MEA psychometric evaluations with these two populations, however, the literature lacks a formal test of the 3D-MEA scores’ invariance across these two important populations with which it might be used jointly (e.g., to compare, for measurement of change). Consequently, the present study aimed to provide evidence of the reliability, validity, and invariance of 3D-MEA scores across both pre-service teachers and in-service teachers.
Method
Data Source
The present study secondarily analyzed data collected from pre-service and in-service teachers between Spring 2014 and Spring 2018. We leveraged data gathered in the context of a research program related to pre- and in-service teachers’ data literacy, including various survey research, intervention, and psychometric studies (e.g., Authors, 2018, 2020). Some of the intervention studies during which data were gathered involved both pretest and posttest data collection, and when data from both time points were available, the relevant pretest data were used. 1 All data were collected via Qualtrics survey software. The authors obtained Institutional Review Board for all studies from which these data were sourced, as well as the present secondary analysis.
Instrumentation
Item descriptive statistics.
Notes. PST = pre-service teachers. IST = in-service teachers. Response format was: 1 = Strongly disagree, 2 = Disagree, 3 = Neither disagree nor disagree, 4 = Agree, and 5 = Strongly agree.

Fitted confirmatory factor analytic model. Notes. Graphical representation of the five-factor model estimated from model 4. The confirmatory factor model featured five inter-correlated latent variables, each inferred by between three to six indicator variables.
Participants
The present study focused on 3D-MEA score properties in two populations: pre-service teachers and in-service teachers. The pre-service teacher sample comprised 457 pre-service teachers from 132 different teacher-education institutions and 38 different states in the U.S. Based on available data, the pre-service teacher sample was 92% female and 8% male. The pre-service participants reported the following race/ethnicity distribution: 87% White, 5% two or more races, 4% Black or African American, 3% Asian, 1% American Indian or Alaska Native, and 0.23% Native Hawaiian or other Pacific Island; and 10% identified as Hispanic/Latino. The mean age of the pre-service teachers was 24.89 years old (SD = 6.78). The pre-service teachers reported that they intended to teach in various school levels: 86% grades K-5 elementary, 5% grades 6–8 middle school, 5% grades 9–12 high school, 4% multiple school levels, and 0.44% pre-kindergarten. A large majority of the pre-service teacher sample reported intending to teach elementary students.
The in-service teacher sample comprised 365 participants from 76 different schools in one Midwestern U.S. state (Illinois). Based on available data, the in-service teachers’ gender distribution was as follows: 79% female, 21% male. The in-service teachers reported their races and ethnicities as 98% White, 1% two or more races, 1% Asian, 0.33% American Indian or Alaska Native, and 0.33% Black or African American; and 3% identified as Hispanic/Latino. The mean age of the in-service teacher sample was 39.78 years old (SD = 11.27). The in-service sample represented teachers of various levels, including pre-kindergarten (1%) teachers, grades K-5 elementary (32%) teachers, grades 6–8 middle school (40%) teachers, grades 9–12 high school (23%) teachers, and teachers of multiple school levels (3%). The average years of teaching experience was 13.40 (SD = 8.97).
The sizes of the per-service and in-service teacher samples were deemed adequate based on evidence available from prior simulation studies related to factor analytic techniques and invariance testing therein. However, the reader will note that precision in such studies is complex and conditional on myriad factors such as the size of item communalities and the extent of factor overdetermination (Cheung & Rensvold, 2002; MacCallum et al., 1999; Meade & Bauer, 2007).
Analytic Approach
The current study sought to investigate whether the 3D-MEA instrument had the same measurement properties between pre-service teachers and in-service teachers. Therefore, multigroup confirmatory factor analysis (MGCFA) was employed as the primary analytic tool. All analyses were conducted in R, primarily using the packages lavaan, semTools, and semPlot (Epskamp et al., 2019; Jorgensen et al., 2016; Rosseel, 2012). We began by examining the nature and scope of missing data, and also assessed univariate and multivariate normality, respectively, using Shapiro–Wilk and Mardia’s tests (Mardia, 1970) to aid in the selection of an estimator.
We then conducted three preliminary analyses, specifically single-group confirmatory factor analyses (CFAs) with: (i) pre-service teacher data only, (ii) in-service teacher data only, and (iii) the combined pre-service teacher and in-service data. To examine the fit of a given model, we examined a variety of fit indices such as the RMSEA, SRMR, and CFI and interpreted them in relation to fitness thresholds specified from the relevant literature (Browne & Cudeck, 1993; Byrne & Stewart, 2006; Hirschfeld & Brachel, 2014; Hu & Bentler, 1998; 1999; Schumacker & Lomax, 1996). We also used chi-square goodness-of-fit tests, though we emphasized fit indices given the sample sizes of the present study.
Next, we formally tested the invariance of model parameters across these two populations using MGCFA (Cheung & Rensvold, 2002; Jöreskog & Sörbom, 1981; Meredith, 1993). This involved comparing the fit of successive multi-group CFA analyses in which parameters were increasingly constrained equal across groups: Model 0 = parameters unconstrained across groups (i.e., configural invariance); Model 1 = factor loadings constrained equal across groups (i.e., metric invariance); Model 2 = factor loadings and item intercepts constrained equal across groups (i.e., scalar invariance); Model 3 = factor loadings, item intercepts, and residual item variances/covariances constrained equal across groups (i.e., strict invariance); and Model 4 = item factor loadings, item intercepts, residual item variances/covariances, and factor variances/covariances constrained equal across groups (i.e., structural invariance). To evaluate invariance, we used both likelihood-ratio difference tests and changes in the comparative fit index (ΔCFIs) to formally compare the fit of the increasingly constrained models (Byrne & Stewart, 2006; Hirschfeld & Brachel, 2014). Given the sensitivity of chi-square tests to sample size, however, we emphasized ΔCFIs in evaluating invariance.
The final analyses were the computation of the coefficient alphas (Cronbach, 1951) and omegas (McDonald, 2013) to examine the reliability of 3D-MEA scores in each group. We also estimated and report latent factor inter-correlations.
Results
Missing Data Analysis
Item-level missing data were negligible, ranging from 0.12% to 0.73% of cases. Little’s (1988) Missing Completely at Random (MCAR) test was used to understand the nature of the missingness. Little’s test results showed that missingness was not related to any other values in this dataset. In other words, there was insufficient evidence to reject the null hypothesis that the missing values are MCAR,
Distributional Assumption Analysis
The data were checked for univariate and multivariate normality (i.e., skewness and kurtosis) before conducting CFAs and MGCFAs. Univariate normality (Shapiro-Wilk) tests for all items (p < .001) and Mardia’s multivariate skewness (b 1 = 5853.26, p < .001) and multivariate kurtosis (b 2 = 77.76, p < .001) tests were statistically significant. Due to this evidence for distributional non-normality, a robust maximum likelihood estimator was used in this present study.
Descriptive Statistical Analysis
The 20 3D-MEA item response means (M), standard deviations (SD), skewness statistics (S 3 ), and kurtosis statistics (K 4 ) for both the pre-service and in-service teacher samples are shown in Table 2. Skewness scores were lower than two and kurtosis scores did not exceed the seven for both pre-service and in-service teachers’ responses. Thus, despite statistical evidence of non-normality, skewness and kurtosis did not appear excessive in magnitude (Curran et al., 1996).
Single-Group Confirmatory Factor Analysis
Summary of fit for all confirmatory factor models.
Notes. CFA = confirmatory factor analysis. PST = pre-service teachers. IST = in-service teachers. Model 0 = parameters unconstrained across groups. Model 1 = factor loadings constrained equal across groups. Model 2 = factor loadings and item intercepts constrained equal across groups. Model 3 = factor loadings, item intercepts, and residual item variances/covariances constrained equal across groups. Model 4 = item factor loadings, item intercepts, residual item variances/covariances, and factor variances/covariances constrained equal across groups. Individual model chi-square and chi-square difference statistics are robust, using the Satorra-Bentler (2001) adjustment.
aChi-square statistic adjusted for non-normality using the Satorra-Bentler (2001) method.
bCalculated using the Satorra–Bentler-adjusted (2001) chi-square statistic.
cCalculated using the methodology outlined in Brosseau-Liard et al. (2012) and/or Brosseau-Liard and Savalei (2014).
p < .05, **p < .01, ***p < .001.
The fit of the five-factor model was also preliminarily assessed among pre-service teachers and in-service teachers separately, also using single-group CFA. Results showed the five-factor model fit well in both groups. The chi-square test was statistically significant for the pre-service teacher sample data, X 2 (160, N = 457) = 398.17, p < 0.001; however, the fit indices show that the five-factor model fits the current pre-service teacher sample: CFI = 0.96, RMSEA = 0.055 (90% CI: 0.047–0.064), and SRMR = 0.04. The chi-square test result was statistically significant for in-service teacher sample data too, X 2 (160, N = 365) = 317.13, p < 0.001, however, fit indices indicated that the five-factor model well fit for in-service teachers: CFI = 0.96, RMSEA = 0.061 (90% CI: 0.051–0.071), and SRMR = 0.04. Given these results, we proceeded with formal invariance testing using multi-group CFA.
Multi-Group Confirmatory Factor Analysis
The configural invariance of the five-factor measurement model between pre-service teachers and in-service teachers was examined first as shown in Table 3. Unsurprisingly, given the results of the single-group CFAs, the results suggested that the five-factor 3D-MEA measurement model fit the data well for both pre-service and in-service teachers according to corresponding CFI, RMSEA and SRMR fit indices. While the chi-square test of fit was significant, X 2 (320, N = 822) = 644.98, p < .001), fit indices were as follows: CFI =0.96, RMSEA = 0.058 (90% CI: 0.052–0.064), and SRMR = 0.04.
The metric invariance of the five-factor measurement model between pre-service teachers and in-service teachers was then examined as a second step. In this model, the factor loadings were constrained to be equal, meaning that the relationships between latent factors and items for both groups were equal. The results showed good model fit. While again the chi-square fit test was significant, X 2 (335, N = 822) = 669.65, p < .001), fit indices indicated good model fit: CFI = 0.95, RMSEA = 0.057 (95% CI: 0.051–0.064), and SRMR = 0.05. The non-significant corrected chi-square difference test, Δχ2 (15, N = 822) = 22.88, p = .09, indicates that the configural invariance model and metric invariance model fit equally well. In addition, the ΔCFI did not exceed .01 indicating only a trivial degradation in fit relative to the configural invariance model.
For the third step of the MGCFA, scalar invariance was investigated, in which the factor structure, factor loadings, and item intercepts were constrained to be the same for both pre-service and in-service teachers. Overall, the results showed that this model fit well in absolute terms: X 2 (350, N = 822) = 733.64, p < .001, CFI =0.95, RMSEA = 0.060 (90% CI: 0.054–.066), and SRMR = 0.05. The corrected chi-square differences test Δχ2 (15, N = 822) = 75.70, p < .001 indicated possible non-invariance. However, ΔCFI did not exceed the .01 cut-off point again indicating only possibly trivial differences in these parameters across groups.
Next, we examined strict measurement invariance by comparing the fit of the strict invariance model (with factor loadings, item intercepts, and residual item variances/covariances constrained equal across groups) and the scalar invariance model (in which residual item variances/covariances were not equal). While the chi-square difference test between models suggested non-invariance Δχ2 (20, N = 822) = 31.50, p < .05, the ΔCFI was trivial implying strict invariance for practical purposes. Finally, structural invariance was examined via likelihood ratio test and ΔCFI value (ΔCFI <.01) for comparison of the structural invariance and strict invariance models. Both the likelihood-ratio test, Δχ2 (15, N = 822) = 10.44, p = .79 and ΔCFI (−0.002) implied structural invariance of 3D-MEA scores for pre-service and in-service teachers.
Reliability Analysis
Reliability estimates.
Notes. IST = in-service teachers; PST = pre-service teachers; Access = self-efficacy for data identification and access; Technology = self-efficacy for data technology use; Interpretation = self-efficacy for data analysis and interpretation; Instruction = self-efficacy for the application of data to instruction.
Inter-Factor Correlational Analysis
Inter-factor correlations.
Notes. Access = self-efficacy for data identification and access; Technology = self-efficacy for data technology use; Interpretation = self-efficacy for data analysis and interpretation; Instruction = self-efficacy for the application of data to instruction.
Discussion
In many statistical studies that involve comparing populations, the desired study outcome is to detect a difference in model parameters between those populations (e.g., a mean difference between the two populations). However, in invariance studies the hoped-for outcome is instead no difference in the psychometric model parameters across groups. In this study, this is effectively what we found: no meaningful differences in the psychometric model parameters for pre-service and in-service teachers. Specifically, we found evidence for structural invariance, the highest level of invariance. This means that the 3D-MEA item factor loadings, item intercepts, and item error variances were equal for both in-service and pre-service teachers, as were the latent factor variances/covariances.
With the estimation of Model 4 in which factor variances and covariances were also constrained equal across groups, we noticed a slight uptick in the SRMR fit index (SRMR = .08). The SRMR fit index is known to be particularly sensitive to misspecification of factor variances and/or covariances (Chen, 2007; Hu & Bentler, 1998) and this increase may imply non-equivalence in these parameters. However, this phase of invariance testing pertains to “structural” invariance rather than “measurement” invariance (Dimitrov, 2010) and other model results did imply structural measurement invariance.
In lay terms, these invariance analyses clearly suggest that the 3D-MEA measures the relevant constructs similarly in these two important populations. Another way to think about these findings is that 3D-MEA scores mean the same thing across these two key populations (Byrne & Stewart, 2006). At the same time, we also provided evidence that 3D-MEA score reliability was similarly high for both in-service and pre-service teacher populations, indicating similar levels of random measurement error in these two groups.
The implications of our findings pertain most directly to the measurement concerns of researchers. Structural 3D-MEA invariance across these two populations affords researchers the opportunity to credibly use this instrument with both populations. In particular, this structural invariance evidence helps substantiate the appropriateness of using the 3D-MEA to draw mean comparisons in DDDM self-efficacy and anxiety between pre-service and in-service teachers. It also establishes one necessary condition for well measuring changes in DDDM self-efficacy and anxiety longitudinally over time from the pre-service stage to the in-service stage. This invariance also means that researchers can similarly interpret correlations between this measure and other measures administered to pre-service or in-service populations. In the absence of invariance, observed differences or changes, or differences in correlations with external measures, could be artifacts of measurement error. Such analyses may be necessary in various kinds of studies conducted by researchers involving these populations, including studies that ask developmental, theory-building, or intervention research questions.
These affordances may also benefit practitioners (e.g., higher education personnel, K-12 district staff, and professional development providers) who might use the 3D-MEA for various purposes with both of these populations. These include identifying pre-service or in-service teacher needs, or evaluating professional learning opportunities served up to both pre-service and in-service teachers. High-quality information derived during such efforts may in turn help improve teacher development mechanisms for DDDM, DDDM practices themselves, and potentially student learning and development outcomes.
Our findings about the invariance and reliability of 3D-MEA scores weigh in favor of effectively interchangeable use of this instrument with both U.S. in-service and pre-service teachers. However, the present study had a number of key limitations and delimitations. Chief among the study’s limitations relate to the sampling design, which yielded fairly diverse but not necessarily representative samples of all U.S. in-service and pre-service teachers. For example, our in-service and pre-service samples were limited in terms of the states from which the participants were drawn and/or the school levels in which they work. Additionally, our PST sample was disproportionately female (92% vs. 75% nationally) and white (87% vs. 79% nationally), and our IST sample was disproportionately white (98% vs. 79% nationally) relative to the U.S. teacher population generally (Taie & Goldring, 2020). Our findings should certainly be replicated independently with larger and more representative samples of in-service and pre-service teachers.
Also, our study only provided select forms of reliability and validity evidence concerning the 3D-MEA, generally the forms of validity evidence provided by prior studies of this instrument (i.e., validity evidence based on score internal structure). While sporadic data do exist from substantive studies that could be considered validity evidence (e.g., evidence of correlations with other measures), other forms of validity evidence should be formally gathered as well to more comprehensively validate inferences derived from this instrument. For example, researchers should deploy methodologies such as think-aloud protocols to collect validity evidence based on test-taker response processes; and statistical modeling to collect validly evidence based on score relations with other variables (AERA et al., 2014), especially objective measures of teacher data literacy and implementation of data use practices. These limitations notwithstanding, our findings do present compelling evidence of what the study set out to establish: the invariance of 3D-MEA scores across two critical teacher sub-populations.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
