Abstract
This study presents an exemplar for psychometric evaluation and modification of established measures when applied to new populations. Specifically, we describe the use of two subscales (Career Decision-Making Self-Efficacy Scale and Math/Science Self-Efficacy Scale) from the Middle School Self-Efficacy Scale (MSSES) as outcome measures in an intervention study of high school students. Several researchers have utilized the MSSES with high school students since it was developed by Fouad, Smith, and Enochs 20 years ago, but few studies have examined it for construct validity with a high school sample, even though the measure was designed for a middle school population. Our findings demonstrated that the MSSES required modification for high school students in order to meet the standards of reliability and validity in a counseling intervention study. The discussion focuses on implications for career counseling and research, limitations of the findings, and suggestions for future research.
Keywords
Career counseling researchers recommend that counseling research not be published unless researchers have “minimal indications of the reliability and validity of the instrument on which the research is based” (Zytowski & Betz, 1972, p. 78) and recommend construct validity as the most feasible method for establishing new and existing instruments in counseling research (Oliver, 1979). In particular, only valid and reliable scores of outcome measures indicate effectiveness of a planned intervention in experimental design research. Throughout the research community, investigators agree that using an existing measure in a contextually different manner (e.g., with a new population) requires a pilot study of reliability and validity before employing said measure in a new study (Kimberline & Winterstein, 2008; Switzer, Wisniewski, Belle, Dew, & Schultz, 1999). Modifications to an existing measure may be necessary if the original validity study for an outcome measure employed a different sample than current research (Stewart, Thrasher, Goldberg, & Shea, 2012).
The purpose of this research is to present an exemplar validity study of an established measure in career counseling, the Middle School Self-Efficacy Scale (MSSES; Fouad, Smith, & Enochs, 1997) using a sample of high school students. Results of this study were used to modify the MSSES for an intervention study using a similar sample (Falco & Summers, 2017). The following sections describe the need for measures specific to social cognitive career theory (SCCT), foundations for developing the MSSES with previous reliability/validity information, the importance of testing validity on existing measures in new populations, and the need for assessment studies whenever modifications on existing measures are made.
SCCT
Self-efficacy refers to individuals’ perceptions about their capabilities for learning or performing tasks within specific domains. Since Bandura (1977, 1997) introduced the construct of self-efficacy, researchers have explored its role in various contexts including career development. In social cognitive theory, self-efficacy is said to influence behaviors and environments and, in turn, is influenced by them (Bandura 1986, 1997). Students with strong self-efficacy are more likely to set goals and create adaptive learning environments for themselves. Likewise, self-efficacy can be influenced by the outcomes of such behaviors (goal progress, achievement) and by input from the environment (e.g., feedback from teachers, social comparisons with peers). Bandura (1997) theorized that people acquire their self-efficacy beliefs from four sources: interpretations of performances, vicarious (modeled) experiences, social (verbal) persuasion, and physiological indexes (emotional arousal). It is generally agreed that even very young children differentiate their beliefs of competence and task value in different domains of functioning (e.g., Eccles et al., 1993; Marsh, Craven, & Debus, 1991). Studies with middle and high school students often assess students’ motivational orientations toward specific academic domains, with an understanding that they hold more or less differentiated perceptions toward these areas. High school students have more academic experience, which can help them better attune to the demands and possibilities of each domain, which would in turn contribute to finer differentiation between domains. In particular, as a result of their heavier concern with future college majors and career choices, high school students are believed to hold more differentiated task-value beliefs compared with middle school students (Bong, 2001).
Bandura’s (1997) self-efficacy construct, situated within social cognitive theory, is considered to have broad and important implications for career development theory and counseling (Betz, 2000). It has generated a great deal of research on career behavior in large part because of its explanatory potential. For example, Hackett and Betz (1981) first used self-efficacy theory, within a career development context, to explain women’s avoidance of math and science careers. Their subsequent studies (e.g., Betz & Hackett, 1981, 1983) oriented researchers to the ways in which self-efficacy can be used to understand, more broadly, individuals’ vocational behaviors.
Similar to Bandura’s (1986) social cognitive theory, SCCT (Lent, Brown, & Hackett, 1994) posits that self-efficacy is developed through learning experiences that interact with person/environment variables such as gender, ethnicity, social support, and barriers. Within the SCCT, individuals’ choices to pursue or avoid certain academic coursework and careers can be understood as the interplay between their self-efficacy beliefs, outcome expectations, and interests. SCCT posits that career choice behavior is influenced by outcome expectancies, interests, and career self-efficacy. The theory proposes an interactional influence of external/environmental factors and individual/cognitive variables on individuals’ career development. Within this model, one’s background influences one’s learning experiences which influence self-efficacy. Self-efficacy shapes one’s interest and outcome expectations which, ultimately, influence career choice (Lent, Brown, & Hackett, 2000).
The concept of self-efficacy has important theoretical, empirical, and practical applications because of the nomological network in which it is embedded (Cronbach & Meehl, 1955). Specifically, self-efficacy expectations have at least three behavioral outcomes (approach or avoidance, performance, and persistence), and it is these outcomes that make perceived self-efficacy an important explanatory construct (Betz, 2000; Klassen & Usher, 2010). By focusing on the relationship between cognitions, behaviors, and other environmental factors that, presumably, are malleable and responsive to intervention, SCCT provides researchers and practitioners a heuristic for both understanding and assessing the core constructs. Because of its precise nomological network, SCCT provides a mechanism for (1) identifying specific variables (such as self-efficacy) for intervention and (2) identifying specific outcomes (such as career choice behaviors) for assessment.
In this vein, SCCT has received much attention in the literature because of its prominent role in the implementation of career development interventions and the assessment of such interventions (Betz, 2004; Betz & Luzzo, 1996). Several studies evaluating career or vocational guidance interventions have demonstrated increases in self-efficacy by addressing one or more of the sources of self-efficacy originally proposed by Bandura (1997) including verbal persuasions, previous performance accomplishments, vicarious or observational learning, and emotional arousal (see Betz & Schifano, 2000; Falco & Summers, 2017; Hackett & Betz, 1981; Luzzo, Hasper, Albert, Bibby, & Martinelli, 1999; Sullivan & Mahalik, 2000).
However, as Betz and Hackett (2006) caution, researchers and practitioners interested in the application of SCCT must fully understand its meaning and implications, particularly with regard to measuring its constructs. This suggests that any measure attempting to assess constructs within the SCCT must consider the domain specificity of the behaviors being examined while also considering traditional methods of evaluation such as factor structure, internal consistency, and construct validity based on such concepts as Cronbach and Meehl’s (1955) nomological network. It is important to keep in mind that a score is valid if what was intended to be measured was in fact measured. Validity refers to the degree to which evidence supports the inferences that are drawn from the measurement instruments or procedures (e.g., interventions) themselves. Therefore, a particular score derived from an established measure may be valid for one purpose or population but has little or no validity for another (Gullickson & Howard, 2009; Yarbrough, Shula, Hopson, & Caruthers, 2010). For the purposes of our own intervention study (Falco & Summers, 2017), we wished to use Fouad, Smith, and Enochs’s (1997) MSSES to investigate change over time and between an experimental and control group of high school girls. Girls were particularly salient in our evaluation of the MSSES because compared to boys, their self-efficacy for learning STEM content and pursuing STEM careers tends to decline in adolescence (Bandura, Barbarnelli, Caprara, & Pastorelli, 2001). Our intervention was specifically designed to support and increase girls’ sense of self-efficacy for STEM (Falco & Summers, 2017). However, the MSSES had not been evaluated for construct validity among high school samples, even though there are several studies that have used the measure for similar samples. The following section provides a review of the MSSES and a rationale for conducting further validity testing with an established measure.
The MSSES
The MSSES (Fouad et al., 1997) is widely used in career counseling interventions for adolescents and was originally developed to assess a career-related self-efficacy intervention for Hispanic and Latino middle school students. The instrument consists of 46 scale-response items total with two subscales (24 items) designed to specifically measure aspects of self-efficacy: career decision-making self-efficacy (CDMSE), a process variable (12 items), and math/science (STEM; Science, Technology, Engineering, and Math) self-efficacy (MSSE), a content variable (12 items). Responses are obtained using a 5-point Likert-type scale asking students to rate the degree to which they agree or disagree with a series of statements ranging from strongly agree (1) to strongly disagree (5).
When Fouad and her colleagues (1997) first developed and established the MSSES, items for CDMSE were modified from the CDMSE Scale (CDMSES; Taylor & Betz, 1983). Items in the Math and Science Self-Efficacy subscale followed the format used in the Math Tasks subscale of the MSSES (Betz & Hackett, 1983; Lent, Lopez, & Bieschke, 1993). A reliability and validity analysis was first conducted separately for the process items (that included the CDMSE) and the content items (that included the MSSES) to determine their initial factor structure. Fouad and her colleagues concluded that all 12 items from the CDMSE formed a distinct factor, while only the math items from the MSSES held up under the scrutiny of a validity test. Second, remaining items of the process and content scales were combined to determine whether they were distinct constructs from one another, and the results concluded this to be true. Criterion validity was established by calculating subscale means and applying the instrument to an intervention. Fouad and her colleagues claim that their study demonstrated adequate reliability and validity of their instrument and that these scales measure outcomes of intervention programs designed to promote career decision-making and math/science career awareness among middle school students, particularly for female and minority students. It appears as though the three recommended steps for establishing construct validity of their instrument were followed using structural validity techniques (Clark & Watson, 1995; Cronbach & Meehl, 1955): “(a) articulating a set of theoretical concepts and their interrelations, (b) developing ways to measure the hypothetical constructs proposed by the theory, and (c) empirically testing the hypothesized relations among constructs and their observable manifestations” (Clark & Watson, 1995, p. 310).
Since the publication of the MSSES, other researchers have used it in a myriad of studies with varied selection of items and samples (see Table 1). By conducting a search of published articles and dissertations citing Fouad et al. (1997), we found that 10 studies used some version of the CDMSE: 6 were for middle school samples, 1 was a middle–high school sample, and 3 were a high school sample. We found six studies used some version of the MSSES: four were a middle school sample, one was a high school sample, and one was a college sample. Only one study used both the CDMSE and the MSSES for a middle school sample. These studies are summarized in Table 1, which provides a description of the items, samples, reliability estimates, and validity evidence. We chose to use the MSSES because it was designed to measure STEM career self-efficacy using SCCT as a framework. Our intervention was designed to support STEM career self-efficacy of high school girls incorporating the four sources of self-efficacy while also addressing perceived barriers, including gender issues in career development (Falco & Summers, 2017). Low self-efficacy and career indecision, together, may create important psychological barriers to girls’ choice and persistence in career decision-making, especially for traditionally male-dominated occupations in STEM. At the time of the intervention, the MSSES was established and widely used, and it seemed to be the best measure available to tap these constructs (MSSE and career decision self-efficacy [CDSE]) for adolescents.
Summary of Research Using Career Decision-Making Self-Efficacy (CDSE), Math/Science Self-Efficacy (MSSE), or Both From the Middle School Self-Efficacy Scale.
As evident from Table 1, most of the published studies and dissertations found reliability evidence for their samples, but few conducted a validity analysis on self-efficacy scores from the MSSES, even when their sample demographics differed from that of Fouad’s. According to the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014), Standard 1.4 states, If a test score is interpreted for a given use in a way that has not been validated, it is incumbent on the user to justify the new interpretation for that use, providing a rationale and collecting new evidence, if necessary (p. 24).
However, many of the studies in Table 1 simply cited the evidence from the Fouad et al. study as evidence of validity. Even Fouad and Guillen (2006) recommend that because there have been many adaptations of the Fouad et al. (1997) instrument without replication of the studies, the reliability and validity of the adapted instruments need to be further evaluated.
Evaluating Validity of Existing Measures for New Populations
Consistent with the concept of measurement evolution is the perspective that the validity of the measure is not a property of a test or measure, but of a measure tested under a particular set of conditions (Messick, 1995; Sechrest, 2005). This perspective holds that because validity pertains to understanding the meaning of scores, construct validity can only be established incrementally based on the accumulation of evidence on how the measure relates to other measures (Sechrest, 2005). Well-designed modifications thus contribute to enhancing the validity of measures in new and diverse populations. According to Zumbo (2006), “construct validity involves generalizing from our behavioral or social observations to the conceptualization of our behavioral or social observations in the form of the construct” (p. 49). Tests of construct validity are strong when they are based on well-articulated theory and well-planned empirical tests. When trying to establish (or reestablish) construct validity of a measure, factor analytic techniques are useful in determining whether a group of items hypothesized to assess a construct actually do cluster together when they are analyzed with items from other scales and whether items within a measure describe a unified construct (Cronbach & Meehl, 1955). Specific procedural recommendations for establishing construct validity in survey research are as follow (Bohrnstedt, 2010, p. 377): Do an exploratory factor analysis (EFA) of all items in an initial pool. Retain enough factors (m) to explain the covariation among the items using fit statistics as a guide. When m > 1, examine both the rotated and unrotated solutions to determine whether factors beyond the first are substantively meaningful or unwanted “nuisance” factors. Remove items that are poorly related to no factors or clearly represent more than one domain. Refactor the remaining items using confirmatory factor analysis (CFA) to verify that they are congeneric or near congeneric.
If a measure does not perform in the above steps as expected, it behooves researchers to make modifications to the measure to meet the standards of validity and reliability while maintaining theoretical integrity of the instrument. According to Stewart, Thrasher, Goldberg, and Shea (2012), a tremendous amount of information would be gained on how various modifications affect the reliability and validity of measures in new populations, as well as point to new strategies and methods for test and measurement assessment, if assessment studies of measure modifications were published. Stewart and his colleagues suggest that researchers should provide details of the modification and its assessment in a separate methods paper by reporting: (1) features of the original measure that required modification, (2) source of information on the basis for modifications, (3) specific type of modification made, and (4) how the modified measure was tested for psychometric adequacy and results.
The recommendation in measurement research is to test for the applicability of existing measures for a new sample by following the aforementioned process (EFA followed by CFA), as well as testing factor invariance across groups (Sass, 2011; Van de Schoot, Lugtig, & Hox, 2012). Examining measurement invariance involves evaluation of the latent variable model underlying a set of test scores and testing for numerical equality across groups. Latent variables are the explicit definitions of psychological constructs (Byrne, Shavelson, & Muthen, 1989; Widaman & Reise, 1997). When researchers are concerned only in the extent to which an instrument is equivalent across independent samples, measurement equivalence generally focuses solely on the invariant operation of the items and, in particular, on the factor loadings (Byrne, 2012). Should a researcher be interested in subsequently testing for latent factor mean differences, then tests for measurement equivalence must include a test for the equality of the observed variable intercepts as such equality is assumed in tests for factor mean differences. This was an important consideration for us to make in our analysis as many researchers listed in Table 1 have tested for gender differences using the MSSES, but it has not been established that the measure is valid for boys and girls using factor loadings invariance and/or testing for latent factor mean differences. Therefore, our research set out to test the structural validity of the MSSES with a sample of high school students in hopes that these results were applicable in a separate intervention study with a similar sample (Falco & Summers, 2017).
Method
Participants
Paper and pencil surveys were administered to 368 tenth graders attending a medium-sized, public high school in southeastern Arizona (47% male; 31% White, 27% Latino/a, 24% Asian American,1% Native American, 1% African American, and 13% mixed race or Other). Data were collected anonymously by the school counselor. The sample was randomly divided into two groups, so that we could run exploratory factor analyses on a sample separate from the CFAs (see Table 2 for descriptive data of each sample).
Sample Demographics.
Measures
CDMSE
Fouad et al. (1997) adapted their measure of career decision-making for middle school students from the items originally developed for adult populations by Taylor and Betz (CDMSE: Taylor & Betz, 1983; CDMSES–Short Form: Betz, Klein & Taylor, 1996), particularly as they relate to the “process items” in Part I of the Fouad et al. (1997) scale. Items from this and all other subscales begin with the prompt, “Please indicate the degree to which you agree or disagree that you could do each statement below by writing the appropriate number to the right of each statement” using the following rating system: 1 = very high ability, 2 = high ability, 3 = uncertain, 4 = low ability, and 5 = very low ability (item values were reversed after scoring such that a higher value indicated higher self-efficacy). We specifically administered the 12 self-efficacy items of the CDMSE measure, which had an overall reliability estimate of α = .77 for our initial sample (n = 182). See Table 3 for a complete list of items.
Descriptive Statistics, Standardized Loadings for Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA), and Solution Uniqueness for the Career Decision-Making Self-Efficacy Scale.
Note: Gray shadings indicate items retained for the current study. *p < .01; **p < .01; ***p < .001
MSSE
Fouad et al. (1997) used a format similar to the items from the Math Tasks subscale of the MSSES (Betz & Hackett, 1983; Lent et al., 1993) and was intended to measure MSSE beliefs for middle school–related tasks. Students were asked to rate their level agreement/disagreement for each item using the following rating system: 1 = strongly agree, 2 = agree, 3 = uncertain, 4 = disagree, and 5 = strongly disagree. We specifically administered the 12 self-efficacy items for the MSSES, which had an overall reliability estimate of α = .86 for our initial sample (n = 182). See Table 4 for a complete list of items.
Descriptive Statistics, Standardized Loadings for Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA), and Solution Uniqueness for the Math/Science Self-Efficacy Scale.
Note: Gray shadings indicate items retained for the current study. *p < .01; **p < .01; ***p < .00
Analyses
CFAs of an instrument are most appropriately applied to measures that have been fully developed and their factor structures validated (Byrne, 2012). However, Fouad et al.’s original factor structure was tested and recommended for middle school students who were predominantly Latino/a, even though it has been used widely for other samples (see Table 1). For our purposes, we wanted to validate the measure for 10th-grade students attending public high school in a midsize southwestern city, since our intervention study was conducted with this population (see Falco & Sumers, 2017). Therefore, we decided to begin with an EFA for self-efficacy items on a separate sample before conducing a CFA, a method commonly used in scale development research to establish construct validity (Worthington & Whittaker, 2006). Once the factor structure was established with CFA, we followed up with a test of structural invariance (including latent factor mean invariance) by comparing the equivalence of responses for boys and girls, then selected items with structural validity for girls only, since this was the population used in our intervention (Falco & Summers, 2017).
Results
Exploratory Factory Analysis
From an initial sample of 358 tenth graders, we randomly selected half (182: 86 boys and 96 girls) to conduct the two exploratory factor analyses, the first for 12 items from the CDMSE and the second for 12 items from the MSSE. For each test, we tried a one- and two-factor model solution in Mplus version 8 (Muthén & Muthén, 2012) based on the number of eigenvalues >1 accounted for by the data (Worthington & Whittaker, 2006). The data were normally distributed for each of the two scales and many items appeared to be correlated, thus we used maximum likelihood estimation method, applied oblimin rotation (due to several apparent correlations between items), and analyzed each model for goodness of fit (see Tables 3 and 4). The two-factor model fits the data best for both the CDMSES and the MSSES, with a significant χ2 difference test between the one-factor and two-factor models: CDMSE, Δχ2(11) = 240.191, p < .001; MSSE, Δχ2(11) = 255.410, p < .001. Item 8 from the CDMSE did not load significantly on either factor, so it was not included in subsequent analyses. Items 4 and 5 from the MSSE cross loaded significantly on both factors, so they were not included in subsequent analyses. Upon initial examination, the exploratory factor structure is different for our sample than Fouad’s who found that all 12 CDMSE items loaded on one factor and only the math items (6 items) from the MSSE loaded on one factor in her CFA (Fouad et al. did not start with an EFA).
CFA With Tests for Invariance
Using the items retained from the EFA, we conducted separate CFAs for the CDMSE and the MSSE, with two separate factors each. For the CDMSE, we named the factors “efficacy for self-appraisal” for Items 1–6 and “efficacy for exploring options” for Items 7 and 9–12. For the MSSE, we named the factors “math/science grades self-efficacy” for Items 1–3 and “math/science content self-efficacy” for Items 6–12. All items loaded significantly (see Tables 3 and 4), and there were no suggested constraints to be released for boys and girls. Neither one of the models met acceptable criteria for good fit: a comparative fit index (CFI) ≥ .90, a Tucker–Lewis index (TLI) ≥ .90, a standardized root mean square residual (SRMSR) = .08, a root mean square error of approximation (RMSEA) = .05, and a nonsignificant χ2 (Browne & Cudeck, 1992; Hu & Bentler, 1999; Kline, 2015): CDSE, χ2(86) = 229.788, p < .001, CFI = .804, TLI = .795, RMSEA = .134 (90% confidence interval [CI] = [.114, .156]), SRMSR = .099; MSSE, χ2(106) = 200.656, p < .001, CFI = .867, TLI = .862, RMSEA = .098 (90% CI [.077, .119]), SRMSR = .133. Additionally, the factor we labeled “self-appraisal” from the CDMSE was significantly higher for boys than for girls (standardized estimate of latent mean difference = 3.513, p < .001). Upon examination of each item’s contribution to solution uniqueness, we noticed that Items 1 and 6 from the CDMSE were somewhat low for girls (R2 = .157, R2 = .134, respectively), and Item 2 from the MSSE was low for boys (R2 = .144) and girls (R2 = .149). Although these items loaded significantly and were estimated as equivalent (i.e., noninvariant) between boys and girls, we wanted to use the best and most parsimonious set of items for our own study (which included only girls) and to describe our process if other researchers decide to use modified versions of the MSSES in the future (Worthington & Whittaker, 2006).
Deciding on Measures for Our Intervention Research
Because our related intervention study was conducted with only girls, we ran a single CFA analysis with just girls from the second sample, a modified version of “exploring options” factor of the CDMSE (retaining 5 items that had strong R2 values) and a modified version of “content self-efficacy” factor of the MSSE. 1 Results indicated that the instrument had good fit to the model (Hu & Benter, 1999; Marsh, Hau, & Wen, 2004), and we were confident using this version of the measure in our intervention study of girls attending the same school as our validation sample: χ2(43) = 103.570, p < .001, CFI = .921, TLI = .900, RMSEA = .085 (90% CI [.064, .106]), SRMSR = .071.
Discussion
It is recommended by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (2014) that when researchers use an instrument outside of its original intention or for a different population, a justification for validity must be made. Although the MSSES demonstrated evidence of construct validity for middle school students when it was first developed (Fouad et al., 1997), it has been modified and/or used for dissimilar populations without evidence of validity in several studies. For our own purposes, we wanted to demonstrate that the subscales we used from the MSSES were valid for the diverse population in our intervention study (see Falco & Summers, 2017). By conducting a separate validity study, we hope to provide other researchers with an example of this process. Our results contribute to the career development literature by providing new information on how modifications of the widely used MSSES affect the psychometric qualities of scores derived from the measure, having implications for its use in intervention studies. Therefore, our discussion is organized into two main areas of focus. First, we address the general importance of evaluating and reporting psychometric properties of modified instruments in counseling outcome research, then we address the specific findings of the present study, including the implications for practice and further research.
As Zytowski and Betz (1972) point out, counseling research is comprised of a number of discrete elements including theory, hypothesis development, design, participants, controls, and measurement. Our need to use and modify the MSSES (Fouad et al., 1997) arose from the need to evaluate the effectiveness of a career development intervention to improve CDMSE and STEM self-efficacy for high school girls. The intervention design utilized SCCT (Lent et al., 1994), and we hypothesized that CDMSE and STEM self-efficacy would increase for girls who participated in the small group intervention compared with girls in a no-treatment control group. In order to evaluate the effectiveness of the intervention for improving these specific outcomes, it was important that we measure them as accurately as possible. We chose the MSSES because it is an established and widely used measure, specifically developed for assessing the effect of intervention programs designed to promote career decision-making and math and science career awareness. Coupled with a need to measure the impact of such interventions in more precise and meaningful ways (Oliver, 1979; Whiston & Quimby, 2009; Zytowski & Betz, 1972), results of career intervention studies can neither advance theory nor improve practice unless they include a more complete picture of both the process of the intervention and the outcome(s) used to gauge the effectiveness of the intervention.
Because validity is context specific (Switzer et al., 1999), instruments that are valid for one purpose in one context may not be valid in another context. As such, it stands to reason that validity should be viewed as a process of accumulating evidence that supports the meaningfulness of a measure rather than as a discrete end point at which validity is “proven” (Stewart & Ware, 1992). An important point made by Zytowski and Betz (1972) was that information on reliability and validity should be reported routinely on all instruments used in counseling research. It is most efficient to use already developed measures for which reliability and validity have been established, but researchers and practitioners must be aware that, in order to obtain comparable results, it is necessary to use the instrument in the same way and for the same populations as the original researcher(s). Frequently, career counseling outcome studies use only portions of established instruments, something we found in our own review of the MSSES (Fouad et al., 1997; see Table 1). Unless the reliability and validity of a modified instrument can be established, there is no assurance that it measures the same construct or is as reliable as the original instrument. There can also be no assurance that the intervention being evaluated is having the desired effect on the outcome of interest.
For our sample of high school students, we found the structure of the measure quite different for the CDSES and MSSES. First, an EFA was conducted to see whether the factor structure proposed by Fouad et al. for the middle school sample was similar for our high school sample, even though Fouad et al. did not conduct an EFA in their study. Specifically, the CDSE appeared to be measuring two constructs for our sample, which we labeled “self-appraisal” and “exploring options.” Perhaps this difference is evident among high school students because they are closer to making professional development decisions than middle school students and are able to make subtle distinctions between evaluating what they are capable of and what needs to be done. Also, the factor structure of the MSSES did not divide into science- and math-related items but rather divided into Performance and Content scales that contained both math and science items. These findings remain consistent with theory but suggest that, for our study sample, the items in the scale are measuring task specificity related to math and science. This has important implications because, as Betz (2000) notes, uncovering this distinction helps researchers and practitioners better understand the nature of people’s choices. It also sheds light on the nature of the decision-making process for those interested in designing career development interventions.
Second, once we discerned the appropriate structure of the CDMSE and MSSES for high school students, we continued with a CFA and looked at structural invariance of the instruments between boys and girls with a second sample. Results showed that while all the factor loadings were significant and there were no constraints to be released, there were a few items that had lower contributions to the overall solution uniqueness, particularly for girls. Also, the overall fit of both the CDMSES and MSSES models was below what is recommended as acceptable, but this might have been due to a small sample size (Byrne, 2012; Marsh et al., 2004). Additionally, the RMSEA may be considered acceptable for smaller data sets in educational settings (Browne & Cudeck, 1992), and χ2 is known to be sensitive to larger correlations (Kline, 2015; Tanaka, 1993). Finally, we found latent mean differences between boys and girls for the “Self-Appraisal” subscale of the CDMSE and would therefore be appropriate for examining differences between boys and girls in a future study. For our own purposes, we decided to run a second CFA with just girls to ensure the subscales we wanted to use were valid, and the model had adequate fit.
Within the scope of self-efficacy research, Klassen and Usher (2010) have cited numerous issues in self-efficacy research with regard to measurement and have called for a stocktaking of the directions and domains of current self-efficacy research, so that measurement problems can be addressed. Evaluating established measures of self-efficacy for validity among different populations may be part of this stocktaking processes (Switzer et al., 1999). We expect that the differences we found between Fouad et al.’s results and ours are possibly because children’s ability-related beliefs and values become more negative in many ways as they get older, at least through early adolescence. This would explain why the high school students differentiated between different, more specific types of CDMSE and math–science self-efficacy beliefs than middle school students.
Older adolescents tend to believe they are less competent in many activities and often value those activities less. These differences are more pronounced in certain activity areas. The negative changes in adolescents’ achievement-related beliefs and values have been explained in two major ways. First, adolescents become much better at understanding and interpreting the evaluative feedback they receive and engage in more social comparison with their peers. As a result of these processes, many adolescents become more accurate or realistic in their self-assessments, so that their beliefs become relatively more negative (see Stipek & Mac Iver, 1989, for thorough discussion of how children’s processing of evaluative information changes). Second, the school environment changes in ways that make evaluation more salient and competition between students more likely, thus lowering some children’s achievement beliefs (Pajares, 1996; Urdan & Pajares, 2006).
Although Fouad and her colleagues were careful to design a measure congruent with self-efficacy theory, and were careful to validate scores derived from their new measure using original guidelines for establishing construct validity (Clark & Watson, 1995; Cronbach & Meehl, 1955), methods for invariance testing were not widely known or utilized at the time their measure was published. Furthermore, many researchers using the MSSES in their own investigations did not demonstrate validity evidence for their own samples (Table 1). This is particularly important when selecting items apart from the established subscale and/or using the measure for dissimilar samples. Future researchers and/or practitioners may prefer a more parsimonious survey based on our analysis of the Fouad et al. scale. However, we suggest that additional research be conducted on varied middle school and high school samples to further explore the psychometric properties of this measure for future use.
Conclusion
Research findings suggest that CDSE is an important variable associated with making and implementing career decision, and this has pointed to the need for the development and evaluation of counseling interventions designed to increase CDSE and related behaviors (Bergeron & Romano, 1994; Betz & Luzzo, 1996). Hackett and Betz (1992) explained, “there is a compelling need to determine the usefulness of self-efficacy theory in enhancing career development and broadening career choices” (p. 241). Lent and Brown (2006) claim that adequate tests of any theory are predicated on the availability of reliable and valid measures of the theory’s core constructs. In other words, theorists, researchers, and practitioners each have a vested interest in utilizing measures that are precise and valid. Without sound measures, it is difficult if not impossible to establish whether theory-discrepant findings are due to problems with the theory, flaws in operationalizing it, or inadequacies in enacting it in the form of an intervention (or all of the above). Widely used measures of self-efficacy within the career development literature tend to utilize well-established measurement techniques for scale construction and initial validation. However, as our findings demonstrate, evaluating construct validity for established measures is critical when researchers and/or practitioners are interested in using such measures for dissimilar samples, particularly to assess the outcomes of an intervention.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
