Abstract
With the proliferation of test-based accountability policies, educators and students alike are under pressure to improve test performance. However, little is known regarding the stress experienced by educators in response to these policies. The purpose of this article is to describe the initial development and validation of a new measure of stress associated with high-stakes testing. Psychometric properties were examined within a sample of 8,084 educators in a southeastern state in the United States. An exploratory and confirmatory factor analysis of the Educator Test Stress Inventory supported a bifactor model of teacher test stress, with one general factor of Total Teacher Stress and two narrow factors of Sources of Stress and Manifestations of Stress. This study is an important first step in establishing a reliable and valid measure of teacher stress and better understanding the impact of high-stakes testing and educational accountability policies. Implications for the assessment and intervention of teacher stress are discussed.
Introduction
School districts across the United States use high-stakes testing as a way to measure school efficacy, teacher effectiveness, and student achievement. In combination with the accountability movement, high-stakes tests have been associated with both intended and unintended consequences, such as increased student achievement and dropout rates, and narrowing of student intervention services (Koretz & Hamilton, 2006). The psychosocial impacts of these policies have extended to educators. More recent policy changes directed at teachers, such as the use of student test performance to determine tenure and/or merit pay decisions, necessitate an examination of the stress and anxiety that teachers may experience in response to the implementation of high-stakes testing and test-based accountability.
Teacher stress, school climate, and job satisfaction have received much attention from researchers and policy makers over the last two decades (Collie, Shapka, & Perry, 2012; Shann, 1998). Since the implementation of No Child Left Behind, many teachers have experienced job dissatisfaction, perceived job insecurity, and pressure to raise test scores (Donnelly & Sadler, 2009; Heath, 2007). Of particular relevance is that teachers may experience more anxiety related to testing than their students (Mulvenon, Stegman, & Ritter, 2005). Moreover, teacher stress and anxiety may increase student test anxiety and subsequently lower test performance (Klassen, Usher, & Bong, 2010). Increased student anxiety may result from teachers’ use of “fear appeals”—messages intended to convey the importance, timing, and consequences of failing forthcoming tests (Putwain & Best, 2011). Although fear appeals may be well intentioned (e.g., motivating students to succeed on tests), teachers may inadvertently increase student anxiety related to testing situations (Putwain & Roberts, 2009). However, the influence of test-based accountability policies on teacher test stress and subsequent use of fear appeals is all but unknown. This is principally due to the lack of evidenced-based assessments of educator stress related to test-based accountability policies.
There are several widely used assessments for student test anxiety, such as the Test Anxiety Inventory (Spielberger, 1980), the Test Anxiety Scale (TAS; Lowe, Grumbein, & Raad, 2011), and the FRIEDBEN Test Anxiety Scale (Friedman & Bendas-Jacob, 1997; von der Embse, Kilgus, Segool, & Putwain, 2013). Similarly, there are several measures for assessing general teacher stress, including the Teacher Stress Inventory (TSI; Fimian, 1988) and the Wilson Stress Profile for Teachers (WSPT; Luh, Olejnik, Greenwood, & Parkay, 1991). Research has suggested that test-based accountability has increased educator stress; however, these studies have typically relied on measures of general job stress that do not assess high-stakes test stress (Berryhill, Linney, & Fromewick, 2009; Richards, 2012). These limitations have prevented a reliable and systematic evaluation of the prevalence of educator test stress.
Currently, there are no psychometrically defensible assessments of stress specifically related to high-stakes testing. Such an assessment could (a) provide valuable information for researchers examining the effects of test-based accountability policies on teacher’s mental health, (b) measure context-specific stress distinct from other sources of stress (e.g., classroom behaviors, workload), and (c) help practitioners to select and evaluate evidenced-based interventions (Jennings, Frank, Snowberg, Coccia, & Greenberg, 2013) by identifying the sources and manifestations of educator stress. Given the fundamental changes in education brought forth by test-based accountability policies, there is a need for a brief, reliable, and valid assessment of teacher stress related to the testing experience.
Conceptualization of Teacher Stress
Social-cognitive theorists (e.g., Lazarus & Folkman, 1984) describe stress as being the result of an individual’s appraisal of his or her situation as threatening relative to his or her personal and social resources. Teacher stress has been defined as a negative emotional experience that is a function of job-related pressures and the individual ability to cope (Kyriacou, 2001). Teacher stress has also been operationalized more clinically in terms of anxiety and depression, and by measuring physiological indicators of stress such as salivary cortisol, blood pressure, and heart rate (Roeser et al., 2013). Teacher stress may be a result of inadequate time required to prepare for high-stakes testing (i.e., conceptualized as a limited personal/social resource; Berryhill et al., 2009), decreased personal interaction with students (i.e., role constraint), lack of familiarity with the curriculum and inappropriate or developmentally inadequate test instructions (i.e., role ambiguity; Lasky, 2005), and eroding sense of professionalism for teachers (i.e., role conflict; Berryhill et al., 2008).
Overall, teacher stress has been related to negative teacher outcomes such as burnout, depression, poor performance, absenteeism, and teacher attrition (Klassen et al., 2010). Furthermore, stress has repeatedly demonstrated an inverse relationship with teacher job satisfaction, contributing to a lower perceived quality of life for some teachers (Klassen et al., 2010; Kyriacou, 2001). In general, high-stakes testing has been shown to be stressful for teachers across several studies (e.g., Berryhill et al., 2009; Fantuzzo et al., 2012; Richards, 2012), suggesting that teacher stress may increase with the corresponding rise in the utilization of high-stakes testing for teacher job evaluations. However, research has typically relied on measures of general job stress that may correspond to multiple stressors (e.g., poor salary, classroom management difficulty, integration of new lesson plans). These limitations highlight the need for measurement of stress specifically related to high-stakes testing and accountability policies.
Measurement of Teacher Stress
Teacher stress has been measured in a myriad of ways throughout the literature. One of the most well-known assessments is the TSI (Fimian & Fastenau, 1990), a measure consisting of 49 items with adequate reliability (α = .75 to .88). The scale measured teacher self-reported stress on a 5-point scale (5 indicated major stress) and consisted of 10 subscales. Subscales included “Professional Investment,” “Behavioral Manifestations,” “Emotional Manifestations,” “Work-Related Stressors,” and various physiological manifestations of stress such as gastronomical and cardiovascular (Fimian & Fastenau, 1990). Constructs were organized into “manifestations” and “sources” of teacher stress. The TSI is nearly 25 years old and has not been updated since the implementation of high-stakes testing policies. Moreover, the length of the scale may preclude the brief measurement of teacher stress. Multiple, brief assessments may reveal fluctuations in stress throughout the school year and help to determine when to provide teacher stress intervention (Gold et al., 2010; Roeser et al., 2013). A context-specific assessment offers a distinct advantage over general, state anxiety measures (e.g., The State–Trait Anxiety Inventory [STAI]; Spielberger & Vagg, 1984) by measuring stress specific to the presenting stimuli or situation; modifying the wording of existing state anxiety assessments may change psychometric properties leading to measurement invariance (Byrne, Shavelson, & Muthen, 1989). There exists a need for a reliable and valid measure of educator test stress to be developed within a large and diverse sample.
Purpose of This Study
The purpose of this study was to evaluate the reliability and construct-related validity of a new assessment of educator stress related to high-stakes testing, the Educator Test Stress Inventory (ETSI). This study is an important first step for examining (a) the prevalence of educator stress, (b) predictors of both educator stress and teaching practices (i.e., use of fear appeals), and (c) changes over time in educator stress as related to policy shifts (e.g., use of student test performance in the evaluation of teacher tenure and job performance). A more nuanced and explicit understanding of educator stress may ultimately lead to best practices in providing systemic supports and enhancing both teacher’s wellness and student’s mental health.
Based on the past literature, three specific variables influenced the development of the ETSI including the following: (a) sources of teacher test stress (e.g., perceived pressure from administrators), (b) manifestations of teacher test stress (e.g., disorganization, physiological symptoms), and (c) a general factor of test stress. The ETSI measures anxiety specific to testing (rather than perceived consequences including ranking or funding) such that sources identify causes of and manifestations measure symptoms of test-related stress (Fimian, 1984). The general test stress factor expands upon earlier conceptualizations (e.g., Fimian, 1984) by measuring a single, general stress factor beyond that which is accounted for by narrow content factors while controlling for multidimensionality (DiStefano, Greer, & Kamphaus, 2013).
The first purpose of this study was to refine the ETSI with exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). It was hypothesized that analyses would support a usable assessment composed of a small number of items. A second purpose was to evaluate the technical adequacy of the ETSI through an evaluation of concurrent validity. It was hypothesized that the ETSI scales would be moderately correlated with established measures of state anxiety (i.e., situational anxiety such as a testing experience) and trait anxiety (i.e., pervasive anxiety).
Method
Participants
A total of 63,122 public school employees across a southeastern state were recruited to participate in a brief (15-20 min) online survey. The final participant group included 6,788 teachers, 78 administrators, 122 counselors/social workers, and 1,063 other educational professionals (e.g., school psychologists, paraprofessionals) who completed the survey for a total of 8,084 participants. The total response rate was approximately 13%. Participants were primarily female (81.8%, N = 6,610). Teaching experience ranged from 1 to 48 years, while participant ages ranged from 21 to 73 years. Additional demographic characteristics are reported in Table 1. Demographic characteristics of participants were similar to educators across the state.
Participant Demographic Characteristics.
% = the percentage of participants.
Missing data within each category resulted in total percentages less than 100%.
“My district evaluates annual job performance based upon student test performance.”
“My district bases teacher salary/bonus pay on student test performance.”
Instruments
ETSI
The ETSI (Appendix A) is a new multidimensional assessment of educator stress and anxiety related to high-stakes testing and educational accountability policies. The ETSI was developed for use with public school educators ranging from kindergarten to 12th grade. The final version of the ETSI includes 11 items measuring two factors of educator stress (Fimian, 1988) and a general test stress factor. The ETSI includes a Sources of Stress (ETSI-S) subscale that measures sources of stress and anxiety (i.e., pressure from administrators, parents) and consists of 5 items (score range of 5-25, M = 17.25, standard deviation [SD] = 4.43), and a Manifestations of Stress (ETSI-M) subscale that measures the physiological and emotional symptoms of stress (e.g., “I feel anxious during testing,” “I experience a pounding chest/increased heart rate during testing”) and consists of 6 items (score range of 6-30, M = 14.94, SD = 5.01). The Total subscale consists of 11 items (score range 11-55, M = 32.18, SD = 8.51) and is a measure of overall teacher stress and anxiety related to testing. Participants self-report on a 5-point Likert-type scale ranging from “very much disagree” to “very much agree.”
A systematic content validation process was used for item development (Haynes, Richard, & Kubany, 1995). A review of the teacher stress literature and existing assessments (e.g., TSI, TAS, WSPT) led to the creation of an initial pool of 20 items that was subsequently sent to eight content area experts. Items were evaluated on a scale of 1 to 5 (with 1 indicating least important and 5 indicating most important) for readability, clarity, and importance of each item relative to the proposed factor structure. Several items were revised and those that were rated as most relevant to the proposed factor structure and consistent with theory were considered for inclusion resulting in a total of 15 items.
STAI
The STAI (Spielberger, 1989) was used to assess construct-related convergent validity of the ETSI (Marteu & Bekker, 1992). The STAI is a 40-item assessment that measures both state anxiety and general (i.e., trait) anxiety (Spielberger, Gorsuch, Lushene, Vagg, & Jacobs, 1983). The STAI is available in more than 30 languages and is used in more than 3,000 research studies (Spielberger, 1989). The median alpha coefficient for the trait scaled was .90, while the state alpha coefficient was .93 (Spielberger et al., 1983).
Procedures
Recruitment was completed via a database of publicly available email contact information provided by the State Department of Education. The university Institutional Review Board approved all study procedures. Consent was obtained prior to completion of the online survey. Each participant was assigned a unique code, and no identifying information was collected. Participants were offered an opportunity to win one of three US$100 gift cards to a national retail store if they provided an email address (not connected with the survey response). Reminders were automatically sent after 1 week if the survey was not complete. A survey of test attitudes, teaching practices, and school climate was distributed along with the ETSI, STAI, and a demographic form. The majority of surveys were completed between 12 and 17 min. Demographic variables included age, gender, years taught, grade(s), and subject(s) taught. Data collection took place over 2 weeks in the spring of 2013, approximately 3 weeks prior to the annual statewide test.
Data Analysis
The EFA and CFA are recommended when developing a new assessment (Fabrigar, Wegener, MacCallum, & Strahan, 1999). An alternative bifactor model was examined as part of the CFA. A bifactor model posits a “general” factor that reflects the common variance among all items (Holzinger & Swineford, 1937). A random number generator was used to assign 8,084 participants to the EFA sample (N = 4,042) and the CFA sample (N = 4,042). Chi-square tests indicated no significant differences between the two samples regarding multiple demographic variables (e.g., gender, years teaching). The direct oblimin procedure was used for factor rotation, and principal axis factoring (PAF) method was used to extract factors, given its relative insensitivity to measurement error, lack of distributional assumptions, increased likelihood of yielding a converged solution, and decreased likelihood of factor overextraction (Taylor & Pastor, 2007). The following methods were used to determine which factors to extract, including parallel analysis (PA; Horn, 1965), minimum average partials test (MAP; Velicer, 1976; Velicer, Eaton, & Fava, 2000), and scree plot analysis (Cattell, 1966). Factor loadings were examined to identify which items loaded onto each of the hypothesis factors and to identify whether a simple structure was achieved.
A CFA was conducted in Mplus version 7.1 (Muthén & Muthén, 1998-2013) with the CFA sample. The CFA, through which factors were extracted using maximum likelihood, only considered items retained through the initial EFA. Model fit was measured with chi-square goodness-of-fit test, the Tucker–Lewis Index (TLI; Tucker & Lewis, 1973), Comparative Fit Index (CFI; Bentler, 1990), and root-mean-square error of approximation (RMSEA; Steiger & Lind, 1980). Hu and Bentler’s (1999) suggested model fit was followed including a non-statistically significant chi-square test, CFI and TLI ≥ .95, and RMSEA ≤ .06. Modification indices were reviewed to determine whether model fit would be improved with alterations consistent with the hypothesized factor structure and theory. Finally, the concurrent validity of the two subscales of the ETSI was evaluated through bivariate correlation with the State Anxiety (STAI-S) and Trait Anxiety (STAI-T) subscales of the STAI. Correlational magnitude was considered using Cohen’s (1988) criteria with .10 to .29 representing small, .30 to .49 medium, and >.50 large correlations.
Results
A total of 9,226 educators responded to the email inquiry. Complete item responses were available for 8,084 participants, with incomplete data for 1,142 (12.39%). A majority of the missing data (812) did not record any responses to the entire survey (i.e., the participant opened the online survey link, clicked agree to participate, and did not input any additional data). Listwise deletion was used with missing data due to the relatively small overall percentage of missing data (Kline, 2011) and the subsequent deletion of cases was unlikely to significantly impact statistical power.
EFA
Scree plot analysis supported extraction of two factors, as did the results from both original and revised versions of the MAP test, which suggested that the smallest average squared partial correlation (.021) and fourth power partial correlation (.001) were found after removing the effect of the first two factors. PA findings differed, suggesting the extraction of seven factors. Yet, the two-factor solution was preferred, given support across three discrete factor retention analyses. These two factors were extracted from the dataset using PAF in seven iterations, requiring 10 iterations for the rotation to converge. A review of the pattern matrix was indicative of relatively simple structure, with most items loading on a single factor. This conclusion was supported by a review of the structure matrix, which indicated that the majority of items were more highly correlated with one particular factor. See Table 2 for a review of both pattern and structure coefficients.
Pattern and Structure Matrices.
Note. ETSI = Educator Test Stress Inventory. Extraction method: principal axis factoring. Rotation method: Oblimin with Kaiser normalization. Rotation converged in 10 iterations. Factor 1 = ETSI-M (Manifestations of Stress), Factor 2 = ETSI-S (Sources of Stress).
ETSI items were reviewed relative to exclusionary criteria. Findings supported the removal of four items due to one or more of the following reasons: weak inter-item correlations, low communalities, low factor loadings, multidimensionality, or some combination thereof. Those specifically identified for removal included Items 4, 6, 7, and 11. The two resulting factors were consistent with the model proposed by Fimian (1984), corresponding to the ETSI-M and ETSI-S, respectively.
CFA
Results of the initial EFA informed two subsequent CFAs. The two-factor models at the basis of both analyses were established in accordance with procedures used by DiStefano et al. (2013) and then compared in an evaluation of relative fit. The first of these models represented a correlated factor structure, wherein each item was loaded on one of the two factors, ETSI-S and ETSI-M, which were permitted to covary. Initial model fit was poor (see Table 3). Standardized residual covariances were analyzed to determine the possibility of possible model misspecification. Results supported permitting within-factor covariances between error terms associated with (a) Items 1 and 3, (b) Items 9 and 10, and (c) Items 13 and 14; these modifications were considered acceptable due to commonality of model-implied factor and conceptual similarity of item content (e.g., “I feel pressure from parents to raise test scores,” “I feel pressure from administrators to raise test scores”). A second CFA of the correlated factor model was then conducted while permitting these modifications (see Figure 1). All items were found to load statistically significantly on their model-implied factor, with standardized pattern coefficients ranging between .52 and .86. This suggested each item loaded onto its corresponding factor, yielding support for the hypothesized factor structure. Appropriateness of the factor model was further evaluated via review of multiple model fit statistics (see Table 3).
Model Fit Indices for the Educator Test Stress Inventory.
Note. df = degrees of freedom; RMSEA = root-mean-square error of approximation; CI = confidence interval; CFI = Comparative Fit Index; TLI = Tucker–Lewis Index; SRMR = standardized root-mean residual.

Two-Factor Model for the ETSI.
A second set of CFAs considered a bifactor structure, wherein each item loaded on one of the two aforementioned narrow factors (i.e., ETSI-S and ETSI-M), as well as a general “Total” factor (ETSI-T). Modification indices resulting from an initial bifactor CFA supported allowing the same three error covariances specified with the correlated factor models, as well as covariance between the narrow factors. A second bifactor CFA, which specified each of these covariances, was then conducted (see Figure 2). While pattern coefficients ranged between .44 and .93 for ETSI-T, values fell between .11 and .46 for ETSI-M and .27 and .60 for ETSI-S. This pattern of findings suggested that although predominantly statistically significant predictors of both narrow and general factors, items tended to be better measures of the broad educator stress construct than of its constituent latent variables. Each of the remaining fit statistics supported its appropriateness (RMSEA = .040 [90% confidence interval (CI) = 0.035, 0.045], CFI = .991, TLI = .983, standardized root-mean residual [SRMR] = .015). A χ2 difference test of nested models was conducted to evaluate the relative performance of the bifactor and correlated factor models. The statistically significant finding indicated that the bifactor structure afforded superior fit and should therefore be preferred as a model for ETSI interpretation, χ2(11) = 466.42, p < .001.

Bifactor Model for the ETSI.
Internal Consistency Reliability
Alpha coefficients were calculated to evaluate the internal consistency of the emerging narrow and general factors. Both narrow factors were associated with adequate internal consistency (Robinson, Shaver, & Wrightsman, 1991), with alpha coefficients equaling .85, .82, and .89 for ETSI-M, ETSI-S, and ETSI-T, respectively.
Bivariate Correlation
Each resulting ETSI factor was compared with the STAI-S and STAI-T subscales in an evaluation of the concurrent criterion-related validity of each ETSI factor. Three ETSI scores were calculated for each participant by first reverse scoring all negatively worded items and then deriving the sum of item scores within each particular factor. Summed scale scores could then range between 6 and 30 for ETSI-M, 5 and 25 for ETSI-S, and 11 and 55 for ETSI-T.
Correlational analyses were conducted twice, including once within the EFA sample and once within the CFA sample. See Table 4 for a summary of findings. All ETSI–STAI correlations were statistically significant, falling in the moderate (r > .30) or high (r > .50) ranges (Cohen, 1988). However, correlations were high for the broad ETSI-T subscale, which was found to be the strongest predictor of all three STAI subscales (i.e., STAI-S, STAI-T, and STAI-Total). The narrow ETSI-M and ETSI-S subscales were similar in terms of correlational magnitude, with neither subscale consistently outperforming the other in terms of correspondence with STAI subscales.
Pearson Product–Moment Correlation Coefficients.
Note. ETSI = Educator Test Stress Inventory; ETSI-M = ETSI-M (Manifestations of Stress); ETSI-S = ETSI-S (Sources of Stress); STAI = State–Trait Anxiety Inventory; STAI-S = STAI-State; STAI-T = STAI-Trait.
Correlation is statistically significant at the p < .01 level.
Discussion
This study describes the development of a new assessment of educator test stress. There exists a need for an assessment to examine the psychosocial impacts of test-based accountability policies that will lead to identification and intervention in supporting educator wellness. The ETSI is an important first step to examine stress related to high-stakes testing. Results from an EFA and CFA provide support for a bifactor model of educator test stress, including ETSI-S, ETSI-M, and Total subscales. These results were consistent with the hypothesized structure and the theoretical underpinnings of educator stress. Each subscale had strong internal consistency, suggesting relative homogeneity among items. The ETSI may therefore be said to identify both sources and manifestations of test stress, providing initial support for use in evaluating the psychosocial impacts of test-based accountability policies.
Data support the concurrent validity of the ETSI-S and ETSI-M subscales of the ETSI and STAI-S and STAI-T subscales of STAI. Both ETSI-M and ETSI-S were moderately correlated with STAI-S and STAI-T, with slightly stronger correlations with the STAI-S subscale. The Total subscale on the STAI was moderately correlated with the Total subscale on the ETSI. These data offer preliminary evidence of correspondence with established instruments (e.g., STAI), thus supporting the ETSI’s utility in measuring a specific type of stress.
Implications for Practice
Descriptive results indicated a number of respondents reporting a high degree of stress related to the testing experience; 28% of participants experienced significantly “high” anxiety as measured on the STAI (as defined by 1 SD greater than the mean; Davey, Harley, & Elliott, 2013). Primary sources of test stress came from administrator pressure (75% indicating agree or strongly agree) and parents (36% indicating agree or strongly agree). These data suggest the ETSI may provide important information related to sources and manifestations of teacher stress and the potential need for intervention (e.g., Gold et al., 2010). Teacher test stress interventions have typically relied on general measures of job stress compared with more targeted assessment for student test anxiety interventions (Roeser et al., 2013; von der Embse, Barterian, & Segool, 2013). The ETSI may provide a more accurate indicator of intervention effectiveness, specifically evaluating educator stress before a high-stakes examination.
The ETSI was developed to periodically (within and across school years) evaluate teacher stress related to testing and the corresponding influence of educational policies (e.g., changes in teacher tenure, use of standardized testing for merit pay) across time. The assessment of teacher stress and wellness is important, given its connection to attrition, job satisfaction, turnover, and negative school climate (Collie et al., 2012; Fantuzzo et al., 2012; Klassen et al., 2010; Roeser et al., 2013). In addition, teachers who experience high levels of test stress may be more likely to use fear appeals intended to motivate their students to perform well on the test (author, under review), yet have the opposite effect of raising student test anxiety and lowering test performance (Putwain & Best, 2011; von der Embse & Witmer, 2014; von der Embse & Hasson, 2012).
Pending additional psychometric support, the ETSI may also be used in conjunction with assessments of school climate. School climate is a powerful factor that can promote resilience or become a risk factor for members of the school community (Freiberg & Stein, 1999). Research has shown that teachers’ perceptions of school climate significantly contribute to their sense of stress (Skaalvik & Skaalvik, 2009), teaching efficacy (Pas & Bradshaw, 2012), and job satisfaction (Taylor & Tashakkori, 1995). Teacher test stress may play an important role in overall school climate and subsequent well-being, motivation, teaching efficacy, and job satisfaction. School climate assessments, such as the Delaware School Climate Survey-Staff (Bear, Yang, Pell, & Gaskins, 2014), provide valuable information related to social support (e.g., Teacher–Student Relations) and structure (e.g., School Safety) within the school. Data from the ETSI could supplement school climate assessment by contributing information specific to teachers’ well-being.
Limitations and Future Research
There are several limitations to the ETSI, as well as this study. First, the ETSI does not measure all aspects of teacher stress. While moderately correlated with a general measure of anxiety (i.e., the STAI), the ETSI is designed for use with stress related to high-stakes testing (as opposed to classroom management stress). In addition, scales such as the TSI are more comprehensive (utilizing 49 items) in evaluating sources and manifestations of teacher stress. These types of items were not included in the development of the ETSI for sake of brevity and intended use for multiple times throughout the school year. Teacher test stress may change dramatically throughout the year (e.g., rising as the test approaches), thus the recommended regular use. Despite the brevity of the ETSI scales, internal consistency ranged from .81 to .89. Regarding validity, evidence is limited to factorial validity, face validity, and convergent validity (the latter demonstrated through significant correlations with the STAI). Future research will be needed to examine predictive validity (e.g., relationship to job satisfaction, school climate, use of fear appeals), concurrent validity (relationship to additional measures of teacher stress), and test–retest reliability (e.g., measurement of teacher stress multiple times throughout the school year) of the ETSI.
Other limitations include a participant sample from one southeastern state. Given the wide variability in how individual states use test scores, it is important to not generalize the test stress experienced by educators in one state. Future research is needed to validate the use of the ETSI under different accountability conditions and uses of test scores between states. While response rates were moderate for survey research (13%; Hopkins & Gullickson, 1992), there is a possibility of response bias if respondents were not reflective of the general teaching population. Those who were suffering from symptoms of high test stress may have been more likely to respond to the survey.
Finally, replications are needed prior to the use of the ETSI within schools. The ETSI was not designed to “diagnose” teachers with high levels of stress; information could be used instead, to provide a summary of sources or manifestations of test stress within the school to inform systematic supports (Gold et al., 2010; Roeser et al., 2013). Indeed, many current interventions for teachers (e.g., those utilizing a mindfulness approach) do not use evidence-based assessments for summative or formative evaluations of efficacy. Future research may evaluate the utility of the ETSI to determine intervention effectiveness.
In sum, the ETSI is the first assessment to evaluate high-stakes test stress with a large and diverse sample. This study provides initial evidence supporting the factorial stability and convergent validity of the ETSI. The ETSI holds promise for schools and researchers interested in assessing the test stress experience by teachers, as high-stakes testing and accountability policies permeate education.
Footnotes
Appendix A
Please answer the following questions on a scale of
(1) Strongly Disagree, (2) Disagree, (3) Neither Agree nor Disagree, (4) Agree, and (5) Strongly Agree.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
