Abstract
Many of the Holland-based interest assessments differ in the number of answer options they present to clients, with some providing clients more sensitivity with which they can indicate their level of interest. Following anecdotal client comments, a study was developed to determine whether significant changes in inventory results occurred based on the number of answer options presented, while test items remain consistent. Two versions of the Self-Directed Search (SDS)–Fifth Edition were presented to 553 participants across two subsamples (312 Mechanical Turk and 241 college students). The published version of the SDS that presents clients with two response options was used as well as an altered version presenting clients with five-answer options. The internal consistency and profile stability across versions were explored. Statistically significant differences in internal consistency were found. Moderate-to-high profile stability for individuals and across test versions was detected. Implications for future research and practice are discussed.
The Self-Directed Search (SDS) was originally developed by John Holland in 1970 and used to measure personality or career interests using his realistic, investigative, artistic, social, enterprising, and conventional (RIASEC) categorization (Brown, 2001). The SDS is now in its fifth edition (Holland & Messer, 2013a). The fifth edition involved an extensive revision process that also attempted to stay true to the original inventory developed by Holland. The revision from the fourth to fifth edition included maintaining the same self-scoring format and interpretive features of previous versions, updating item content, and improving item reliability and validity. A standardization sample of 1,739 people matched to U.S. census demographics was used to explore the reliability and validity of the new version (Holland & Messer, 2013b). The revision team maintained the 2-point response format used through the majority of the SDS, despite their competitor’s decision to move the Strong Interest Inventory (SII) from a 3- to a 5-point answer format in their 2004 revision (Donnay, Morris, Schaubhut, & Thompson, 2004). The response option change on the SII, as well as evidence that test takers prefer more response options to express themselves (Preston & Colman, 2000), prompted the current research which sought to examine the reliability of SDS scores as a result of five-answer options as compared to two.
Interest Measurement With the SDS
There is a long history of measurement of vocational interests dating back to the early 1920s (Hansen, 1994) and since this time, a number of measures of vocational interests have been developed (Hansen, 2005). Despite a number of interest inventories being available, the SII and SDS have persisted as the more popular measures used in research and practice (Larson, Bonitz, & Pesch, 2013). Research by Savickas, Taber, and Spokane (2002) comparing the SDS with four other measures of interests found that SDS and SII scores were the best measures of Holland’s RIASEC themes, likely explaining their popularity.
As in previous versions, the SDS fifth edition at its most basic level yields a three-letter Holland code based on the RIASEC categorization for each test taker. It is assumed most readers are familiar with the RIASEC typology, but more information can be found in Holland (1997) or Reardon and Lenz (1998) for those looking to learn more. The SDS fifth edition includes five subscales including an occupational daydreams section in which test takers can list those occupations they have thought or dreamed about. The RIASEC code (e.g., SAE and RIC) is derived from the other four subscales. The activities and occupation subscales assess the test taker’s interest in a variety of activities and occupations associated with each RIASEC area. The competencies and self-estimate subscales assess a test taker’s skills as they compare themselves to their peers. The more “likes” a test taker provides for occupations or activities associated with an RIASEC area or the more competently they rate themselves in an RIASEC area, the higher the score they will receive in that domain. Additional scoring and psychometric details of the SDS subscales can be found in the Instruments section. In the currently published version, SDS test takers can only answer like or dislike to an activity presented or respond yes or no to a presented occupation or competence in an activity. There is not currently an option to respond neutrally or with varying grades of like, dislike, agreement, or disagreement.
In contrast, the SII (Donnay et al., 2004), a popular interest inventory which also measures Holland’s RIASEC themes, allows test takers to use a bipolar response scale with five-answer options (e.g., strongly like, like, indifferent, dislike, and strongly dislike) for each item. Again, this answer option format reflects a change from the previous version which included three-answer options (like, unsure, and dislike). SII’s developers stated rationale for increasing the answer options in their latest revision included attempts to increase the internal consistency of the scales and accuracy of measurement, while using fewer items (Donnay et al., 2004). Data from the manual confirm that the new scales, with five response options, have improved internal consistency over the prior version (Donnay et al., 2004). However, increases in the internal consistency of SII scales may be attributed to either changes in the response options, revising outdated wording, and/or the deletion of dated and low endorsement rate items (Donnay et al., 2004). Thus, the effect on internal consistency of increasing the response options alone is unknown.
Other than this research on the effect of response options on the SII reported in the manual, few other researchers have investigated whether changes in response options affect the reliability of vocational interest data. Only one published study was found that explicitly examined this issue. In 2013, Jones and Loe examined the differences in 2, 3, 6, and 10-point response options on the CogStyle, an ipsative measure of RIASEC, and ultimately concluded that more response options did not provide an obvious benefit over the 2-point response option. Because Jones and Loe did not administer the various response options to the same individuals, conclusions about test–retest reliability or the changes in resulting Holland code could not be absolutely tested. Additionally, an ipsative measure of Holland-related interest codes is not typical in practice settings, with many interest inventories providing comparison between an individual’s score and a similar reference group. Therefore, from a practical perspective, the current study extends the findings of Jones and Loe to provide more information about the practical implications of different response options on a commonly used measure of RIASEC-based interests.
In a 2009 study, Sampson, Shy, Hartley, Reardon, and Peterson altered the instructions to the SDS–fourth edition to allow test takers to place a question mark by any item they had difficulty deciding how to answer and deemed the occurrence of question marks as item response indecision. Essentially, these researchers created a three-answer option version of the SDS. Forty-six percent of their participants used the question mark in answering the SDS, indicating the small majority did not find it necessary to have an additional answer option. The researchers did not take the use of the question mark into account when scoring the measure. Instead they produced two Holland codes for each test taker presenting with item response indecision. One code was based on the question-marked items as being answered and scored as yes/like and the other code was based on question-marked items being answered and scored as no/dislike. Changes in Holland code across these two scoring approaches were assessed and in 18% of cases, some changes occurred in the resulting Holland code. While the Sampson et al.’s study addresses some of the important practical concerns about changes in SDS answer options, the current study extends upon this work by taking the additional answer options into account when scoring the measure. Additionally, five-answer options provide a more direct comparison to the SII’s response options.
The Effect of Varied Response Options on Reliability
Cox (1980) discusses the impact of the number of response options on subject-centered scales, such as those used in interest inventories. For subject-centered approaches, items are aggregated, thus lessening the impact of varying response options on internal consistency. However, Cox illustrates that the formula for α (Cronbach, 1951), a measure of internal consistency, is affected by (a) the variance in responses to each item, (b) the number of items that are aggregated to form the scale, and (c) covariance or interitem correlations. Therefore, as Cox notes, α may be increased when there is higher variance among responses to items, which can be achieved by increasing the number of response options for an item, but also when more items are aggregated, or when the covariance (i.e., interitem correlation) among items is higher.
A study by Jenkins and Taber (1977) investigated this issue. Using a Monte Carlo design, they examined the impact of increasing the covariance, number of items, and number of response options on α and test–retest reliability. They found that increasing the number of response options explained 2% of the variance in α coefficients, while in contrast, increasing covariance among items explained 50% and increasing the number of items explained 28% of the variance in α. In addition, Jenkins and Taber found that 13%, 25%, and 8% of the variance in test–retest correlations were explained by increasing covariance, the number of items, and number of response categories, respectively. Interactions between these factors accounted for little variance in α or test–retest correlations. Based on these data, it can be concluded that increasing the number of response options may have minimal impact on internal consistency, but increasing response options may have a larger impact on test–retest reliability. However, few studies examining the effect of response options on reliability account for the impact of covariance or length of scale, which may explain why little consensus has been reached on the best number of response options to use in psychological measurement (Beckstead, 2014; Chang, 1994). Furthermore, as mentioned, very little research has examined the effects of different response options on the reliability of interest scores.
The Current Study
The main goal of the current study was to assess the impact of varying the number of response options (two response options vs. five response options) on the reliability of SDS scores. Specifically, the first goal of our study was to investigate the effect of varying the response options on internal consistency of SDS scales. Based on the literature associated with the impact of response option increases on internal consistency (e.g., Jenkins & Taber, 1977), we expected that increasing the number of response options on the SDS would increase the internal consistency of scales, but likely these differences would not be statistically significant (Hypothesis 1).
A second goal of the current study was to examine the intraindividual stability of SDS profiles as a result of increasing the number of responses to five. Despite this high level of stability for interests at the population level, examination of individuals’ stability of interests over time suggests wide variation in the stability of one’s results (Lubinski, Benbow, & Ryan, 1995; Rottinghaus, Coon, Gaffey, & Zytowski, 2007; Swanson & Hansen, 1988) with test–retest correlations ranging from the low −.20s to high .90s across individuals. Moreover, examination of profile stability over much shorter periods, 2–6 weeks, suggests some instability in some individuals’ interests (Berk & Fekken, 1990; Holland & Messer, 2013b). Thus, to minimize potential error influencing the stability of individuals’ results due to time between assessment points, we choose to collect individuals’ data on both response versions at one point in time, so that differences in the relative ordering of one’s interest can be attributed to changes in varying the number of response options and not to changes in interests over time or experiences between assessment points. We assumed that when individuals responded to both versions of the SDS in succession, little differences would be found in the stability of individual’s profiles (Hypothesis 2). Furthermore, because researchers have found that interests tend to be less stable for young adults (Low, Yoon, Roberts, & Rounds, 2005), we expected that any potential differences in profiles between response versions would likely be greater in a college student sample than an adult sample (Hypothesis 3).
To better inform users of interest assessments, we were also concerned about practical implications of varying the response options on the SDS. Specifically, we were interested in examining if an individual’s Holland code changed as a result of increasing the number of response options, as in practice, counselors rely heavily on an individual’s Holland Code, or relative ordering of the RIASEC scales, in interpreting an individual’s results (Holland & Messer, 2013a; Zytowski, 2012). While no other research has examined the effect of varied response options on one’s Holland code, one small study reported in the SDS manual (Holland & Messer, 2013a) examined the consistency between SDS Holland codes, as measured with items with two response options, and SII Holland codes, whose items are measured using five response options. Among 51 participants, an exact match in Holland code across instruments was only found for one participant. Most frequently, it was found that just the first letter matched (e.g., AIS and ARC; 37%), the first and second letter matched (e.g., RIC and RIE; 13.7%), or the first and second SDS code matched any two letters in the SII Holland code (e.g., RIC, CRE; 19.6%). Because comparisons were made between results of two different instruments, the low rate of matching Holland codes may be attributable to either differences in response scales (e.g., the SDS assesses both interests and self-assessed skills while the SII focuses solely on assessing interests in deriving the RIASEC code), differences in items assessing each RIASEC theme or both. Therefore, we assumed, that when the same items were used, and only the response options varied, the rate of agreement in resulting Holland codes would be high (Hypothesis 4) and likely much higher than those found by Holland and Messer (2013a).
Method
Participants
Amazon Mechanical Turk nonstudent adult participants
Mechanical Turk (MTurk) is an online participant recruitment system managed by Amazon.com. MTurk solicits individuals to complete human intelligence tasks, such as transcribing videos or describing photos, for a nominal fee (Buhrmester, Kwang, & Gosling, 2011). For the current study, MTurk participants were compensated US$5 for completion of the survey. Demographics for Mturk workers suggest that they are diverse in terms of gender, ethnicity, socioeconomic status, and age (Mason & Suri, 2012; Paolacci, Chandler, & Ipeirotis, 2010). Researchers investigating the quality of data obtained from MTurk workers have found data to be as reliable (Buhrmester et al., 2011) and even more reliable than university subject pools (Paolacci et al., 2010). Furthermore, Rouse (2015) found that internal consistency of personality scales was increased when items to check for attentiveness were included in the survey. Initially 376 participants engaged in the survey on the MTurk system. Data for any participant who engaged in the survey but provided too little data for analysis or if the data appeared to be a duplicate from the same participant, the data were deleted from the analysis subsample (n = 27 total). To assure participants were attending to item content, two validity items (e.g., Please choose answer dislike and If you are paying attention, mark like for this item) were included in the survey. Any participant answering either of these items incorrectly was not included in analysis (n = 37). This resulted in data for 312 MTurk participants retained for analyses. This subsample’s demographic characteristics are detailed in Table 1. The Mturk sample was particularly diverse regarding geographic location (northeast–17%, southeast–29%, midwest–25%, southwest–6%, and west–23%) and career fields endorsed (e.g., marketing & sales, IT, arts, health, and education).
Participant Demographics.
Note. MTurk = Mechanical Turk.
College student participants
College students were recruited for participation through a midsized southeastern university’s online research system, Sona Systems, which is managed by the university’s psychology department. Students were eligible to receive course credit for psychology classes offering extra credit for Sona Systems-based research participation. Initially 366 college student participants engaged in the survey on the Sona Systems. Data for any participant who engaged in the survey but provided too little data for analysis were deleted from the analysis subsample (n = 34). Again to assure participants attended to items, two validity items (e.g., Please choose answer dislike and If you are paying attention, mark like for this item) were included in the survey. Any participant answering either of these items incorrectly was not included in analysis (n = 92). This resulted in useable data for 241 student participants. This subsample’s demographic characteristics are detailed in Table 1.
Instrument
The SDS–fifth edition (Holland & Messer, 2013a) is a self-report measure designed to measure career interest based on Holland’s theory of vocational personalities and work environments (Holland, 1997). The 264 item measure includes four scorable subscales: activities, competencies, occupations, and self-estimates. Each subscale includes 14 items for each of Holland’s RIASEC types, except for the self-estimates subscale which includes 2 items per RIASEC area. The SDS also includes an occupational daydream section, in which clients may list their occupational aspirations. Examples of items from the SDS include “Work with tools,” (activities and realistic item) and “I can file paperwork” (competencies and conventional item). Participants are asked to respond to Activity items with either like or dislike and to competencies and occupations items with either yes or no. Additionally, items in the self-estimates section require participants to provide a number from 1 (low) to 7 (high) regarding their perceived ability as compared to same-aged peers. Scores are determined by adding the number of self-reported likes, yes’s, and self-estimates ability rankings. A high score in any of the RIASEC types indicates a strong interest in that type. A summary code is created by ordering the three highest scoring RIASEC types from highest score to lowest score.
An alternate version of the SDS was created for the current study. The altered version retained the original items but changed the scoring method from a 2-point scale to a 5-point scale for all subscales with the exception of the retention of the original 1–7 rating scale for the self-estimates subscale. The answer choices for the altered version included strongly like, like, indifferent, dislike, and strongly dislike (occupations and activities subscales) and no confidence at all, very little confidence, moderate confidence, much confidence, and complete confidence (competencies subscale).
Psychometric properties of the SDS–fifth edition were examined for the current study. Internal consistency for all RIASEC subscales was above .70 within the current sample and is detailed in the Results section. Internal consistency reliability is above .70 across all subscales (i.e., occupations, activities, and competencies) with some lower exceptions for the self-estimates subscales (Holland & Messer, 2013b). Holland and Messer (2013b) report test–retest reliability across RIASEC summary scale scores using a sample of 49 individuals between 12 and 69 years of age. These results produced correlations ranging from .87 to .96 after a 2–4 month delay (Holland & Messer, 2013b). For evidence of convergent validity, scores on the SDS were compared to scores on the SII and the Occupational Information Network (O*NET) Interest Profiler (Lewis & Rivkin, 1999), with RIASEC correlations ranging from .24 to .71 and .31 to .80, respectively (Holland & Messer, 2013b). For evidence of predictive validity, the standardization sample was asked to identify themselves as either satisfied or dissatisfied with their careers. Forty-one percent of those who identified as satisfied had an occupation with the same high point code as they had received on the SDS. In contrast, only 21% of those who identified as unsatisfied were found to have matching high point codes. Correlations between the SDS print, desktop software, and Internet versions ranged from .85 to .98, suggesting that there is little difference between the versions (Holland & Messer, 2013b). This is of particular interest since the current study collected all data via online administration of the SDS.
Procedure
Following permission from Psychological Assessment Resources, the publisher of the SDS–fifth edition, and our university’s internal review board approval, the SDS–fifth edition was converted to an online format using the Qualtrics online survey system. The survey link created by the Qualtrics system was provided in our MTurk advertisement for that subsample and in our Sona Systems advertisement for the college student subsample. The MTurk and student participants were administered both versions (i.e., 2- and 5-point answer scale) of the SDS in succession but in varying order. The 2-point answer scale version utilized the answer format currently published for the SDS–fifth edition (i.e., like and dislike). The 5-point answer scale version utilized an altered format with a 5-point Likert-type scale (i.e., strongly like, like, indifferent, dislike, and strongly dislike) similar to that used in the most recent version of the SII. The occupational daydreams section was not administered, and the scoring for the self-estimates was maintained as it is in the published version (i.e., 1–7 rating scale) across both versions. MTurk participants received US$5 for successfully completing the survey, and student participants received one-extra credit point consistent with departmental research subject policy.
Data analyses
Participant responses on each version were summed for each letter and ranked from greatest to least to provide Holland codes for each version of the SDS–fifth edition. The 2-point scale was coded as 0 for dislike or no responses and 1 for like or yes responses as outlined in the SDS manual. Following the scoring for a 5-point response scale as used to score the SII (Donnay et al., 2004), for the modified 5-point SDS scales, strongly dislike or no confidence was coded as −2, dislike or very little confidence as −1, unsure or moderate confidence as 0, like or much confidence as +1, and strongly like or complete confidence as +2. Self-estimates were scored 1–7 for both version and according to the instructions in the published version of the SDS. Tied letter ranks were resolved by the researchers using the method outlined in the SDS–fifth edition manual. The tie resolving process outlined in the manual indicates that if a tie occurs, the letter with the highest score in the occupations subscale should be ranked first. Continued ties are resolved by reviewing scores on the activities subscale first and then the competencies subscale. If all ties are not resolved through this method, RIASEC order takes precedence. Participants received two separate Holland codes, one for the 2-point scale and one for the 5-point scale.
Data were analyzed separately for the MTurk and student subsamples. Descriptives, frequency tables, Cronbach’s α, and Spearman rank-ordered correlations were calculated to address the primary research questions. The researchers tested the hypothesis of no difference between the two versions by nonparametrically examining the strength and association between the two versions on whether the participants’ first letter Holland code on the 2-point version was similar to the first letter Holland code of the 5-point version. Additional descriptive analyses and frequency counts were performed to explore letter matching in specific ranks as well as the amount of time the Holland letter codes matched across the two versions.
Results
Frequency of use of the different response options for the two versions was examined first. For the 2-point version, across items in the activities, occupations, and competencies sections, the sample averaged use of the like and dislike response options similarly (49.61% and 50.39% of 252 items, respectively). Examining use of the five-answer options on the altered version of the SDS, average use of options were as follows: strongly like (14.83%), like (25.59%), indifferent (21.38%), dislike (21.66%), and strongly dislike (17.83%). Examined responses on the 5-point scale by individual use, all respondents used indifferent and dislike options at least one time (range 1–159 and 1–165, respectively). However, for strongly like, 3.4% (range 0–188) of the sample did not use this response option, while the percentage of no use was 0.4 (range 0–182) for like and 3.8 (range 0–216) for strongly dislike options.
Internal Consistency
Ratings from each Holland theme were evaluated using Cronbach’s α to assess consistency over the subscales within each type of response version. Table 2 shows the standardized α coefficients for the internal consistency reliability analysis for both response versions for each of the six RIASEC themes. For the full sample (N = 553), the standardized α coefficients for all of the Holland themes for the 2-pont version were relatively high with values ranging from 0.77 (conventional) to 0.87 (realistic and artistic). Likewise the α coefficients for all of the Holland themes for the 5-point version for the full sample also fell in the high range with values ranging from 0.79 (conventional) to 0.89 (realistic and artistic). Among subsamples, MTurk participants, for the majority of scales, had higher reliability coefficients than the student subsample across versions and Holland themes.
Reliability Coefficients and α-Paired Test of Differences Between Reliability Coefficients of the Two Response Versions of the SDS.
Note. SDS = Self-Directed Search; R = realistic; I = investigative; A = artistic; S = social; E = enterprising; C = conventional.
*Significant at p < .05 (two tailed). **Significant at p < .01 (two tailed).
The α-paired test was used to test for the equality of αs across the two scale versions within the same sample (Abd-El-Fattah & Hassan, 2011). Significant differences between the reliability coefficients of Holland’s letters across the 2-point and 5-point response versions by sample type is presented in Table 3. For the entire sample, the reliability coefficients for every theme, with the exception of C, was statistically significant from each other at the p < .01 threshold. Specifically, the 2-point scale version and the 5-point scale version produce reliability coefficients that are distinct in nature, although the item content remains the same. For the MTurk subsample, all t-values were statistically significant for each Holland theme by scale version at the p < .01 threshold. However, in the student sample, there was only statistical differences in α coefficients for the Scales I, S, and E at the p < .05 level. These findings are consistent with Cox’s (1980) and Jenkins and Taber’s (1977) conclusions that increasing the number of response options for items may slightly increase αs, but the slight difference has minimal impact on internal consistency. Overall, the α coefficients indicate the 5-point version had higher consistency than the 2-point version. Because the majority of the α levels did differ significantly, Hypothesis 1, which assumed no statistical significance would be found, was not supported.
Percent Match in Holland Codes Across 2-Point and 5-Point Versions of the SDS.
Note. SDS = Self-Directed Search; MTurk = Mechanical Turk.
Profile Stability
First, profile stability for individual test takers was assessed by examining the intraindividual consistency in the rank ordering of the RIASEC themes across the two SDS versions. This was done by calculating the Spearman’s ρ correlation between the rank order RIASEC themes between each version of the SDS for each individual. Correlations ranged from −.03 to 1.0 across the total sample, with a median correlation of .94. The range of correlations for the MTurk sample was .03–1.0 and −.03–1.0 for the student sample. To examine the mean correlation across the sample, individual correlations were transformed using Fisher’s z transformation (Fisher, 1921). A number of individuals had a correlation of 1 (n = 138, 24.95% of the total sample) or perfect agreement in the rank ordering of the RIASEC themes across the two SDS versions, which presented a problem in transforming scores. In the calculation of Fisher’s z (see Fisher, 1921 for the formula), a perfect correlation results in faction with a dominator of 0, and thus Fisher’s z cannot be calculated. Due to this, data for these individuals, with a perfect correlation between versions, were removed (n = 70 MTurk and n = 68 student) before calculating the average intraindividual correlation between SDS versions. The resulting average correlation among remaining participants was .87 (median r = .89), suggesting high profile agreement between versions for individuals and supporting Hypothesis 2. More specifically, for the MTurk sample, the mean correlation was .86 (n = 242) and the median correlation was .94. For the student sample (n = 173), the mean correlation was .89, and the median r = .94. Because the mean correlation for the student sample was higher than that of the MTurk sample, Hypothesis 3, which assumed lower stability among a college student sample, was not supported.
Secondly, consistency in the resulting Holland codes between versions was examined across the sample. Comparisons between Holland codes across versions were calculated. For the MTurk participants, the first letter Holland code matched across both the 2-point and 5-point response versions for 75.3% of the subsample, while for the student participants, the match occurred for 77.2% of the subsample. The percent match between the first two letters was 53.5% for the MTurk sample and 53.7% for the student sample. Understandably, lower agreement between versions was found for the full three-letter Holland code (36.9% MTurk and 39.8% student). Because the correlation between codes across the two versions were in the moderate to high range and comparisons of first letter matches was above 75%, Hypothesis 4, which assumed a high rate of agreement in Holland codes across versions, was supported.
Because a large part of the mission of this research was to understand how various response options would affect the practical use of the SDS, we considered an additional analysis used by Lubinski, Benbow, and Ryan (1995) that was not directly addressed in our hypotheses. Matches between the first letter code on the 2-point version and an adjacent code on the 5-point code, given the hexagonal structure of RIASEC, were calculated for those not having an exact match between first letter codes on the two versions of the SDS. For instance, adjacent codes for the first-letter code of A would be I or S, and for a first letter code of E, adjacent codes would be either C or S. For the MTurk sample, adjacent matches for the first letter code ranged from 1.6% (conventional) to 19.2% (enterprising). Adjacent matches for the first letter among the student sample ranged from 0.0% (realistic) to 23.5% (enterprising). Finally, in the remaining cases, the first letter code on the 2-point and 5-point where neither exact matches or adjacent matches. These nonadjacent matches ranged from 4.8% (social) to 22.2% (conventional) for the MTurk sample and 6.3% (Artistic) to 25.7% (Investigative) for the student sample. These results can be seen in Table 4.
Percent Match Between the First Letter Code Across 2-Point and 5-Point Versions of the SDS by Holland Theme.
Note. Exact matches reflect the letter code was the same across SDS versions (e.g., R = R). Adjacent matches reflect a match between an adjacent realistic, investigative, artistic, social, enterprising, and conventional theme (e.g., R = I or C, I = R or A, etc.). Nonadjacent matches were instances where neither an exact nor adjacent match was made. SDS = Self-Directed Search; MTurk = Mechanical Turk.
Discussion
After years of using the SDS and SII in practice and training, these authors noted several instances of client and trainees verbalizing preference for the more sensitive response options of the SII. Clients would make comments such as, “I like being able to indicate when I like something a lot or just a little bit.” This, along with notes of this same phenomenon in the literature (Preston & Colman, 2000), prompted the current study exploring the effects of response options on actual results of a Holland-based interest measure. To explore effects on response option variation within one measure, participants were administered two versions of the SDS. They were given the newest version of the SDS (Holland & Messer, 2013a) in its published version with two response options as well as a 5-point response option format using response options similar to that of the 2004 version of the SII (Donnay et al., 2004).
Initially we examined the claim that clients want more response options to use. Our data support that both samples used every response option on the 5-point scale frequently, including use of the option to indicate indifferences, or uncertainty about an item. Unlike results of Sampson et al. (2009) who found that 54% of their sample did not use an indifferent response on the SDS–fourth edition when given the option, we found that all participants used this option at least once, with on average it was used about 21% of the time. In addition, our sample frequently used the option to endorse strongly liking or strongly disliking an item frequently on average, suggesting more response options are utilized by test takers.
Next, the internal consistency of the two versions of the SDS was investigated. Results showed the internal consistency of the 5-point response option was statistically significantly higher than the 2-point response option. Yet, the practical significance of this finding should be discussed. The greatest difference between an RIASEC Scale’s internal consistency across versions was .04. All α levels across both versions and both samples were well within the acceptable range, as all were above .75. While the associated hypothesis not expecting a significant difference in α levels was rejected, there is little practical significance in the resulting αs. This finding appears to be consistent with our earlier discussion of the effect of varied response options on reliability and Jenkins and Taber’s (1977) research indicating increases in a measure’s response options account for about 2% of the change in internal consistency.
Cronbach (1950) understood that an increase in answer options may tend to raise internal consistency, but that this may not be meaningful if validity of the measure is not also proportionally affected by this change. This phenomenon, known as the attenuation paradox (Loevinger, 1954), suggests that attempts to increase the internal consistency of a scale, such as retaining only highly correlated items, at a certain point may reduce its construct validity (Clark & Watson, 1995). In other words, what is the purpose of increasing answer options if the practical use of the measure does not benefit? This may be relevant to our current findings that showed a statistically significant increase in internal consistency, but the actual increase is unlikely to affect accuracy of results or tendency of a practitioner to choose the measure over similar measures.
In evaluating the practical significance of altering the response format of the SDS, we examined consistency in individuals’ rank-ordered RIASEC profiles as well as consistency in obtaining the same three-letter Holland code across test versions. Examination of correlations between rank-ordered profiles between the two versions suggested that for many people, both versions of the SDS produced identical rankings of the six RIASEC themes, consistent with our expectations. Approximately 25% of the sample had identical profiles across the two versions of SDS. For the remaining individuals not having perfect agreement between versions, the mean correlation for both the MTurk and student samples was high, as was hypothesized, given that both versions were administered in succession and evidence of high consistency in rank-order scores for interests overall (Low et al., 2005).
Although only a handful of studies have examined profile consistency of scores at the intraindividual level, our results are consistent with prior findings in that the range of these correlations highlight that many individuals’ scores are highly consistent over time, while others may be less consistent (Rottinghaus et al., 2007; Swanson & Hansen, 1988). For our study, given that both versions were administered in succession, interpretation of these profile correlations suggests that having some additional response options may slightly alter some individuals’ profiles as perfect agreement (e.g., r = 1) was not found for every participant. This is additionally supported given the consistency in 3-point Holland codes across versions, where there was high agreement between the first letter of the Holland code but much less so for an exact match between the full three-letter Holland code among both subsamples.
Additional examination of the consistency in first letter code found that in many instances, if an exact match wasn’t made between versions, a match was made with an adjacent Holland theme. This is consistent with the findings of Sampson and colleagues (2009) that demonstrated any changes in Holland code when considering a third-answer option typically resulted in an adjacent match or switching of letters from first to second place in the code. In some instances, however, individuals highest letter code across the two versions were neither exact nor adjacent matches, indicating that their results changed substantially across versions. Moreover, results for individuals’ with different dominate RIASEC themes appeared to more likely to have fluctuated across versions, such as those with a first letter code of C among the MTurk sample and those with the first letter of I for the student sample. Similarly, a study by Lubinski et al. (1995) examined consistency in first letter codes in a sample of gifted youth whose interests were assessed at 13 years of age and again at approximately 28 years of age. They found that the largest proportion of nonadjacent matches at the follow-up were for those with dominate Investigative or conventional interests. Thus, it may be that interests in these two areas are less stable than other themes but also that changes in response options may affect the endorsement of these areas of interest, where individuals are less likely to score as highly in these themes when they are able to respond to items with increased sensitivity.
We found that profile consistency, examined via rank ordered themes, was higher among MTurk participants than among student participants, which is similar to Low, Yoon, Roberts, and Rounds’s (2005) findings that stability slightly increases with age. However, examination of the consistency in exact Holland codes found that the student participants’ highest theme tended to match across SDS versions, slightly more than for the MTurk participants (75.3% vs. 77.2%, respectively), suggesting that while the older, nonstudent, adults’ relative ordering of all six RIASEC themes is more similar across versions, college students have slightly more consistency in their first letter code across versions and less consistency of their overall profile. Furthermore, as noted in Table 4, we found that some RIASEC themes appeared to be less affected by changing the number of responses options, as the match between a first letter code varied across the six themes.
We also examined the consistency of resulting Holland codes across versions and found a fairly high level of consistency of first letter codes yielded (75.3% student and 77.2% MTurk) across versions. This is substantially higher than the findings of Holland and Messer (2013a), where consistency of codes yielded was compared between the SDS and the SII. Holland and Messer found only 37% of the same first letter codes were yielded across the two different measures of Holland-based career interests. It appears that the number of response options has much less impact when the item content remains the same. Therefore, the differences in Holland code yielded in the Holland and Messer study could likely be better attributed to the differences in the SDS and SII content, rather than the response options available to test takers. What this does not tell us is when there is a change in Holland code if one of the codes is a more accurate representation of the client’s interests. Is the intellectual burden of considering five-answer options worth any potential for gained accuracy? Does having more answer options actually improve the accuracy of your results or just occasionally (i.e., about 25% of the time) change them in a manner that may or may not be more accurate?
Strengths and Limitations
We see several strengths in the present study that should be highlighted along with acknowledging the study’s limitations. The participant samples are associated with several strengths. Interest inventories are very commonly used with college students, making the inclusion of this sample important and practical. Nevertheless, the college students are a very specific population and this particular study only sampled from one university, rendering it important to generalize these findings with some caution. Sampling through the MTurk system allows us to better generalize our findings to a broader group of individuals that include people from every region of the country across a variety of occupations. Yet, our samples still consist of people from a higher education level and do not fully represent many minority groups. In addition, our college sample was heavily female, reflecting the demographics of the Sona Systems participant pool. While gender differences in interests are well documented (Su, Rounds, & Armstrong, 2009) and evidence that the interaction of ethnicity and socioeconomic status may impact interest results (Slaney & Brown, 1983), this was not examined in our study.
Furthermore, we presented each version of the SDS in immediate succession to examine stability of the results over the two versions. While this was done to reduce possible error due to a time delay or experience, this may have introduced other error. Cohen and Swerdlick (2010) note that sources of error for test–retest reliability estimates can include experience, practice, memory, fatigue, and motivation. Moreover, Campbell et al. (1999) found that test–retest correlations over a brief 1 week delay were highest when psychological assessments were assessed via computer versus paper and pencil, or across modalities, suggesting that practice effects may be greater for computerized assessments.
Implications and Future Research
Although the increase in internal consistency found in this study may not seem practically significant, perhaps there are other benefits to the psychometrics of the SDS with an increase in response options. Future research could explore how increased response options affect the validity of the measure and client preference for response option number or type. Following research on the validity differences in 4-point and 6-point scales, Chang (1994) concluded that increasing the number of response options may improve validity only if doing so doesn’t create increase error, whereby individuals systematically use more response options over others (e.g., consistently not using one response option or using strong disagree and disagree interchangeably).
Furthermore, qualitative research on test takers preference for differing number of response options on the SDS could be beneficial. In a study by Preston and Colman (2000), they collected information on participants’ reactions to the use of the different number of response options, ranging from scales with 2–11 response options, and found that individuals felt scales with 7 or 10 response options were rated highest overall on their ease of use, quickness of use, and allowing one express his or her feelings. As stated, the inspiration for the current study came from client and trainee comments about the preference for five-answer options over two. Our data support most test takers utilize additional response options. Perhaps there is inherent or therapeutic value (Finn, 2007; Poston & Hanson, 2010) in providing clients what they prefer to increase their motivation to engage in the assessment process.
Further research could clarify if some clients benefit from increase response options and others are overburdened by more choices. Perhaps overwhelmed clients or those with little knowledge of their options could benefit from a more simplistic answer scale (i.e., yes/no and like/dislike). While clients with more experience, more knowledge, and less complexity in their presenting issues could appreciate and benefit from increased sensitivity in answer options through a 5 point or longer scale.
Conclusion
Our findings documented that increasing the number of response options on the SDS does have implications for the measure psychometrically and the resultant SDS scales scores for some clients. Particularly, increasing the response options may alter a minority of individuals’ Holland code or highest letter. Whether an increase in answer options produces a more accurate Holland code is still unclear. The positive effects on internal consistency are unlikely to have a practical impact on practitioner selection of the SDS or accuracy of results. As mentioned, on the SDS, as with most interest inventories, the summary Holland code is used to facilitate exploration of different vocational options (Holland & Messer, 2013a; Zytowski, 2012), thus changes in these assessment results may result in different occupations being considered for the client. Results seem to support the idea that practitioners can feel reassured that the most critical factor in the choosing their Holland-based interest measure should not be answer choice options available on that measure but the appropriateness of that measure for their client (e.g., reading level and cost). Furthermore, given the benefits of incorporating clients’ preferences into the assessment process, it may be beneficial to use interest assessments with more response options for clients who express the desire to have a wider range of options available.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project was funded through a financial grant from Psychological Assessment Resources.
