Abstract
The purpose of this study is to validate the widely adopted Teachers’ Sense of Efficacy Scale (TSES) for the East Asian context. The researchers seek to find out whether TSES holds validity and reliability and is appropriate for use to measure teacher efficacy in China, Korea, and Japan. 489 teachers from the three countries participated in the study. CFA models were used to test the factorial structures of TSES for both 24-item long and 12-item short forms, and three-factor models were tested for both forms. The results showed that three-factor models did not fit well for all three country groups for the long form. The short form models’ fit indices for all three countries were acceptable. The researchers proposed an 11-item three-factor model by removing one item from the original TSES short form and further tested the revised 11-item TSES for the three Asian groups. Results suggested that the 11-item model provided good model-data fit. In the following measurement invariance test, the 11-item TSES was used as the baseline model. The results from the test of the measurement invariance of the revised 11-item TSES scale suggested that this scale can be used in cross-culture studies about teacher efficacy in the East Asian context. The instrument may produce results that are possible to conduct comparisons across the three cultures.
In recent decades, numerous studies on teacher efficacy have been conducted by educational researchers. Teacher efficacy is found to be a critical factor that contributes to teacher effectiveness and teacher retention (e.g., Ashton, 1984; Cho & Shim, 2013; Coladarci, 1992; Hong, 2010; A. W. Hoy & Spero, 2005; Martin, Sass, & Schmitt, 2012).
Most teacher efficacy research has its theoretical base in Bandura’s theory of self-efficacy (Bandura, 1977, 1986). According to Bandura (1986), self-efficacy is defined as “people’s judgments of their capabilities to organize and execute courses of action required to attain designated types of performance” (p. 391). Bandura (1977) postulated the self-efficacy concept as a powerful mediator and argued that “people’s level of motivation, affective states, and actions are based more on what they believe than on what is objectively true” (p. 2) because self-efficacy beliefs help determine what individuals do with the knowledge and skills they have. Thus, self-efficacy influences individuals’ choice of activities or goals, as well as how much effort they expend, how long they persevere in the face of difficulties, and their resilience to failures (Bandura, 1993; Pajares, 1996). Bandura (1997) also viewed self-efficacy as a universal construct that holds true across different cultures.
In relation to teacher motivation and teacher effectiveness, research found that teachers with high efficacy beliefs are more likely to exert greater effort, persistence, resilience, quality of teaching, and involvement in teaching innovation (e.g., Hansen, 2006; Lavelle, 2006; Tschannen-Moran & MacFarlane, 2011; Tschannen-Moran & Woolfolk Hoy, 2007), whereas teachers with low self-efficacy are more likely to feel the stress and result in burnout (e.g., Brouwers & Tomic, 2000; Chwalisz, Altmaier, & Russell, 1992; Leiter, 1992; Rabinowitz, Kushnir, & Ribak, 1996; Schwarzer & Hallum, 2008).
One of the underlying assumptions of self-efficacy theory is the reciprocal determinism, which implies that self-efficacy beliefs and behaviors interact with the social environment to influence each other (Bandura, 1986; Pajares, 2000). Thus, as Schunk and Pajares (2009) noted, self-efficacy beliefs are “cognitive, goal-referenced, relatively context-specific, and future-oriented judgments of competence that are malleable due to their task dependence” (p. 39). Tschannen-Moran and Woolfolk Hoy (2007) also argued that teacher efficacy is context specific.
A number of studies have looked into teachers’ self-efficacy in relation to specific contexts such as subject matter, teaching task, school environment, school leadership, and socioeconomic status of the community (e.g., Ashton, Buhr, & Crocker, 1984; W. K. Hoy & Woolfolk, 1993; Lee, Dedick, & Smith, 1991; Webb & Ashton, 1987). However, human psychology is culturally situated and socially constructed (Bruner, 1996), and a positive sense of self seems to be more important to individuals in countries that promote and foster individuality and self-reliance than in countries that value collectivism (Schooler, 1990). Therefore, cultural influences cannot be overlooked in teacher self-efficacy research, and it is necessary that teacher efficacy be researched through a broader, cross-cultural, and international perspective.
There are a few existing teacher efficacy instruments (e.g., Gibson & Dembo, 1984; W. K. Hoy & Woolfolk, 1990), but Teachers’ Sense of Efficacy Scale (TSES; Tschannen-Moran & Woolfolk Hoy, 2001) is by far the most well recognized instrument due to its reported construct validity and reliability. Several studies have been conducted to validate the TSES instrument to find out if the instrument can apply to different cultural contexts and be used to adequately measure teacher self-efficacy in different countries. For example, Klassen et al. (2009) explored the validity of TSES across five countries, including Canada, Cyprus, the United States, Korea, and Singapore. They conducted multigroup confirmatory factor analysis (CFA) and found evidence of reliability and measurement invariance across the five countries. In another study conducted by Tsigilis, Koustelios, and Grammatikopoulos (2010), the researchers used a within-construct and between-construct approach to examine the validity and reliability of the Greek form of the TSES (factorial validity, internal consistency reliability, temporal stability, and correlation with external criteria). The results indicate that the TSES is appropriate for use to assess teachers’ efficacy within the Greek educational context. Nie, Lau, and Liau (2012) revised the TSES and further tested its reliability, factorial validity, and predictive validity and suggested the valid use of the scale with Singaporean teachers. However, research on cross-cultural scale validation with a special focus on East Asian countries with a collectivist orientation is limited.
The purpose of this study is to validate the widely adopted TSES for the East Asian context with data collected from three East Asian countries including China, Korea, and Japan. All three countries are collectivist societies and also under the influence of Confucianism. The researchers want to find out whether TSES holds validity and reliability and is appropriate for use to measure teacher efficacy in these three countries. By doing this research, we want to contribute to the existing body of research on this vitally important topic of teacher efficacy.
Method
Participants
All participating schools were located in the metropolitan areas in the capital cities in the three countries. The research team contacted our collaborators who were educators or teacher educators in each city and asked that they help us identify a few public elementary, junior high/middle, and high schools in each area. Afterward, the researchers contacted the principal in each school to ask for permission to collect data from the teachers in his or her school. The number of participating schools in China, Korea, and Japan is four, six, and seven, respectively.
Once permission was given, the researchers went to each school to make a short presentation to the teachers about the purpose and procedures of the study and asked for volunteers. Only the surveys completed by the volunteers were collected and used in this study. The response rate is 90%. Table 1 presents the demographic information of participants for each country.
Participants’ Demographic Information.
Note. Values in the parentheses indicate percentage.
Measures
TSES, developed by Tschannen-Moran and Woolfolk Hoy (2001), was used to collect data to measure the self-efficacy of teachers in the three countries. Tschannen-Moran and Hoy developed two forms of TSES: a long form and a short form. Both forms include three subscales: efficacy for instructional strategies, efficacy for classroom management, and efficacy for student engagement. The long form includes 24 items with 8 items in each subscale, and the short form includes 12 items from the long form with 4 items in each subscale. All the items are measured on a 9-point Likert-type scale from 1 (nothing) to 9 (a great deal). Both forms have very similar psychometric values and therefore are considered almost identical in terms of their effectiveness in measuring teacher self-efficacy. The complete set of psychometric properties of both forms of the instrument is reported in Tschannen-Moran and Woolfolk Hoy (2001).
The research team translated the original instrument, which is written in English, to Chinese, Korean, and Japanese through a translation and back-translation procedure (Sperber, Devellis, & Boehlecke, 1994). The research team consisted of a group of researchers highly proficient in the spoken and written language for Chinese, Korean, and Japanese. The research team members who are proficient in the target language translated the original English version to the target language, and then back-translated. The inconsistencies between the back-translated version and the original English version were further discussed until the research team reached an agreement. Also, the translated version in a target language was reviewed and compared with the original English version by experts who are proficient in the target language, and who understand the rigor of empirical research. The experts also reviewed the inconsistencies between the translated version and back-translated version. Afterward, the translated instrument was sent to the collaborators who were also educators/teacher educators in each country for further input. The instrument also went through pilot testing with a small group of teachers in each country to ensure that the items in the instrument are easily understood and culturally appropriate.
Procedures
Individual members from the research team contacted the principals of the recommended schools in each country and asked for permission to collect data from the teachers in those schools. After obtaining the permission, the team members went to visit each school and make a presentation to the teachers on the purpose and the procedures of the study as well as instructions on how to fill out the survey. All teachers who volunteered to participate were asked to fill out the survey anonymously. The research team collected the surveys and entered them into a database for further analysis.
Analysis and Results
Given the factor structure of TSES has been well studied, CFA models were used to test the factorial structures of TSES for both long and short forms for China, Korean, and Japan data. Three-factor models were tested for both long and short forms as the three-factor structure has been established in Tschannen-Moran and Woolfolk Hoy’s (2001) research and further validated in several earlier studies (Klassen et al., 2009; Nie et al., 2012; Tsigilis et al., 2010). All the CFA models were estimated using the computer program Mplus 6.1. As the measures of TSES items were not normally distributed (all the scaling correction factors were more than 1.2), the maximum likelihood mean adjusted (MLM) estimator was applied for all the CFA model estimation in this study (Byrne, 2012; Muthén & Muthén, 2012).
The model is considered a good fit when comparative fit index (CFI) and Tucker–Lewis index (TLI) values are more than .95, root mean square error approximation (RMSEA) is less than .05, and standardized root mean square residual (SRMR) is less than .08 (Hu & Bentler, 1999). The model is considered acceptable when CFI and TLI values are more than .90 and RMSEA is less than .08 (Browne & Cudeck, 1993). The results of fit index values for three-factor model CFA for the three groups are presented in Table 2.
Model-Data Fit From CFA for Each Sample (MLM Estimation).
Note. The difference in the sample sizes for long form, short form, and Asian short form was due to the missing data for some items. CFA = confirmatory factor analysis; MLM = maximum likelihood mean adjusted; TLI = Tucker–Lewis index; CFI = comparative fit index; RMSEA = root mean square error approximation; SRMR = standardized root mean square residual.
The results showed that the TSES long form three-factor model did not fit well for all three country groups. The short form models’ fit indices for all three countries were acceptable.
Modification index (MI) suggested to freely estimate the factor loadings for one item in efficacy for classroom management subscale (“How well can you establish a classroom management system with each group of students?”) from two other factors (efficacy for instructional practices and efficacy for student engagement) would significantly improve the model-data fit for all three countries. Although the use of MI in model modification in structural equation modeling (SEM) is controversial, it provides some useful information on the cross-loading or overlapping content (Byrne, 2012). It showed this item may have cross-loading on another two factors (efficacy for instructional strategies and efficacy for student engagement factors) and content overlapping (error term correlation) with the item “How much can you do to calm a student who is disruptive or noisy?” Also considering that the group work was not a norm in these three Asian countries in their regular classroom teaching, we proposed an 11-item three-factor model by removing this particular item. We further tested the revised 11-item TSES for the three Asian groups. The results of fit index values for the 11-item model for three groups are presented in Table 2. 1 Results suggested that the 11-item model provided good model-data fit.
Measurement Invariance Test for the 11-Item TSES
In the following measurement invariance test, the 11-item TSES was used as the baseline model with 4 items loaded on efficacy for student engagement, 4 items loaded on efficacy for instructional strategies, and 3 items for classroom management (please refer to Table 4 for more detailed information). All 11 items were the same as the widely used short form (i.e., 12-item TSES) with one above-mentioned item removed. The 11-item TSES was used because its content coverage was similar to the original 12-item TSES, and it also had very good model-data fit for all three Asian countries. Following the recommendations by Byrne (2012) and Milfont and Fischer (2010), we did a series of tests for multigroup equivalence using CFA, and MLM estimators were used for all the CFA tests. The likelihood ratio (LR) χ2 statistics used for model comparisons were computed based on the log likelihood values and scaling correction factors obtained with the MLM estimator. The formulas for calculating the LR χ2 statistics are described on the Mplus website (http://statmodel.com/chidiff.shtml).
Three models were used to test measurement equivalence. Model 1 is the configural model. In this model, CFA was run for three groups together but there was no constraint for any parameters across groups. Model 1 fits the data well (scaling correction factor = 1.385, CFI = .986, TLI = .981, RMSEA = .043, SRMR = .039), indicating invariance of factorial structure among the three groups. In Model 2, equality constraints were imposed on the factor loadings across three groups. The model comparison test was conducted by comparing Model 1 (configural model) and Model 2 (metric equivalence model, equality constraints on factor loadings). The results are presented in Table 3 and the results showed that there was no significant difference in fit between Model 1 and Model 2. These results suggest the factor loadings were equal across Chinese, Korean, and Japanese groups. In Model 3, equality constraints were imposed on both factor loadings and intercepts across three groups. The model comparison test was conducted by comparing Model 2 (equality constraints on factor loadings) and Model 3 (equality constraints on both factor loadings and intercepts). The model comparison results are also presented in Table 3, and the results showed that there were significant differences between Model 2 and Model 3, thus indicating that the intercepts were not equivalent across Chinese, Korean, and Japanese groups. Factor loadings and Cronbach’s αs are presented in Table 4, and factor correlations are presented in Table 5.
Model-Data Fit From Multiple-Sample CFA.
Note. CFA = confirmatory factor analysis; LR = likelihood ratio.
Confirmatory Factor Analysis Results for Factor Loading on the 11-Item TSES.
Note. Factor loadings reported in this table were based on Model 2 (quality constraints on factor loading). Unstandardized factor loadings were the same across three groups because of the equality constraints on the unstandardized ones, whereas standardized factor loading varies because of the variances used in calculation. TSES = Teachers’ Sense of Efficacy Scale; ESE = efficacy for student engagement; ECM = efficacy for classroom management; EIS = efficacy for instructional strategies.
Correlations Among the Factors.
Note. Factor correlations reported in this table were based on Model 2 (quality constraints on factor loading).
Discussion
This study aims to further revise and validate Tschannen-Moran and Woolfolk Hoy’s (2001) teacher efficacy scale (both 24-item long form and 12-item short form) in three East Asian cultural contexts. The results from the test of the measurement invariance of the revised 11-item TSES scale verify that this scale can be used in cross-culture studies about teacher efficacy in the East Asian context, and in China, Korea, and Japan in particular. The instrument may produce results that are possible to conduct comparisons across the three cultures.
Factorial Structure of Teacher Efficacy Scale
CFA suggests that the 24-item long form TSES did not fit data collected from the three countries. The 12-item short form TSES fitted much better than the 24-item long form scale. However, although the factorial structure of the 12-item short form TSES is stable and consistent across three countries, our analysis found that 1 item (“How well can you establish a classroom management system with each group of students?”) does not fit well. This item is about teachers establishing a classroom management system to work with each group of students. Considering the school context in all three countries where classroom instruction still follows the “one-size-fits-all” model and grouping students for differentiated instruction is not a common practice, it is understandable that this item does not fit well. Therefore, the 11-item scale was proposed based on the 12-item short form TSES, and CFA was further conducted to test the model-data fit. It showed good model-data fit across the three groups.
Invariance Across Three Asian Cultures
The results support the configural equivalence. It is found that the factor structures are the same across the three groups. Results from the test for metric equivalence of the revised instrument show the factor loadings were equal across Chinese, Korean, and Japanese samples thus permitting comparisons of regression coefficients and covariance across different groups.
The results from the test for intercept equivalence of the revised instrument show the intercepts were not equal across the Chinese, Korean, and Japanese samples. It means that respondents across groups shared the same scale metrics but not the same scale origin, that is, they do not have the same score on the latent factors and the observed variables. Therefore, the nonequivalent intercepts suggest it should be used with caution if researchers want to compare factor means across countries because intercept invariance is required to compare factor means (Milfont & Fischer, 2010).
Limitations and Future Directions
A number of limitations should be noted. First, although the data were collected from the three East Asian countries that are under the influence of Confucianism, there may also be variability across the three countries and within each culture, especially for the Chinese culture. China is a large and also fast developing country. Given the fact that different regions have vastly different economic and ethnic diversities, there may be significant differences in their adoption of modern educational strategies typically used in Western developed countries such as the United States, Canada, and United Kingdom.
Second, the current scale shows factor loadings invariance across the three countries. However, the intercept is not equivalent across them. Future research may need to investigate why the intercept is not equivalent across the countries and how it may affect the interpretation of cross-cultural research findings.
Third, according to Tschannen-Moran and Woolfolk Hoy (2001), the 24-item long form and the 12-item short form TSES are comparable in their psychometric properties; however, this study only supported the use of the short form in the context of the three East Asian countries. The content validity of the long form needs to be examined closely so that only items appropriate for school and classroom contexts in the three Asian countries are kept to effectively measure teacher self-efficacy in the East Asian contexts. Generalizability of the current findings to other cultural contexts should be done with caution. Future research may also consider examining whether a measurement invariance scale for teacher efficacy can be developed across Asian and Western cultures because cross-culture studies between Asian and Western cultural contexts may yield valuable educational insights.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The first author wants to acknowledge the financial support from the University of Oklahoma through a VPR Summer Research Grant for this research. Other authors received no financial support for the research, authorship, and/or publication of this article.
