Test Review: Shipley-2 Manual

Abstract

The Shipley-2 (Shipley, Gruber, Martin, & Klein, 2009) is a revised and restandardized version of the Shipley Institute of Living Scale (Shipley, 1940), a brief yet robust measure of cognitive functioning and impairment. Like the previous version, the Shipley-2 assesses crystallized ability, which is gained through education and experience, and fluid cognitive ability, which is the capacity to use logic to learn and acquire new information or solve problems, as two distinct aspects of cognitive ability. According to Shipley et al. (2009), “These two aspects were proven to provide a well-grounded perspective in cognitive ability that has held up to scientific scrutiny and practical application” (p. 3). The Shipley-2 can also be used to reveal cognitive impairment by uncovering discrepancies between crystallized and fluid intelligence. In contrast to the previous version, the Shipley-2 has its own large, nationally representative sample and can be used with confidence in a broad array of clinical and educational applications without reference to scores from other instruments. In addition, the Shipley-2, with its extended age range, can be used with persons aged 7 through 89 in a group or individually. The test may not be appropriate for individuals whose first language is not English or who have uncorrected visual impairment. Although the test is self-administered, administration must be monitored by a person familiar and competent with psychological or educational testing. It can be administered and scored in 25 min. The test results should be used in conjunction with other information derived from concurrent or former assessments, detailed interviews and history taking, and observations.

Specific Description and Scoring

A standard administration of the Shipley-2 includes the Vocabulary scale to measure crystallized skills and then either the Abstraction scale or the Block Patterns scale to measure fluid reasoning skills. The pair of the Vocabulary and the Abstraction scales loads on verbal skills and yields Composite A score, while the pair of the Vocabulary and the Block Patterns scales loads on nonverbal skills and yields Composite B score. The authors provided a guideline for choosing between Abstraction and Block Patterns scales. Although the test is not a speeded test, each scale has different time limits and the scales can be administered in any order and any combination (excluding the pair of Abstraction and Block Patterns, which cannot be combined for a composite score). Test takers are also asked to provide demographic information, such as age and educational level.

The Vocabulary scale includes 40 items; each requires the respondent to choose the word among four options that is closest in meaning to the given word. The Abstraction scale presents 25 sequence-completion items. The Block Patterns scale is composed of 12 block patterns and multiple blocks are queried within the patterns, resulting in 26 items. The Vocabulary and Abstraction scales are provided as AutoScore forms, so the responses marked on the outside of the form are transferred to the Scoring Worksheet by the carbon paper. The Block Patterns scale is provided as a form that is scored with an Answer Key. The examinee earns 1 point for each correct answer and 0 points for each incorrect answer or the items left blank. Raw scores for each scale are transferred to the Profile Sheet and then converted to standard scores (M = 100, SD = 15), percentiles, age-equivalents, and confidence intervals as well as a composite score, which reflects overall cognitive ability, and the Impairment Index, which represents the discrepancy between vocabulary and abstract thinking.

Test Materials and Stimuli

The Shipley-2 kit includes an examiner’s manual, Vocabulary Autoscore form, Abstraction Autoscore form, and Block Pattern form. A two-sided Profile Sheet for Composite A and B is attached to the Vocabulary AutoScore form as its first page. The well-organized manual provides test users with a full understanding of administration, scoring, and interpretation procedures supported by examples and case studies from various settings. The manual also offers detailed interpretive guidance on the issues of response validity, differential cognitive functioning, item analysis, and retesting over time. It also includes multiple strategies to use and interpret the Shipley-2 depending on the referral, setting, and user’s needs. Computerized administration and scoring is also available.

Technical Adequacy

Standardization Sample

A total of 2,826 individuals were selected for the normative sample and divided into two groups (adults and children) for statistical analysis. During the standardization process, the authors chose samples according to U.S. Census percentages for gender, race, parents’ educational level, geographic region, and age to keep the sample representative for all of the U.S. population. The data were collected in a total of 23 sites in 16 states. During the test sessions, the time limit was 10 min for the Vocabulary and the Block Patterns scales, and 12 min for the Abstraction scale.

Reliability

Internal consistency and test-retest stability were used to measure the reliability of Shipley-2 scores. The authors chose the split-half method to report internal consistency for the three cognitive scales (Vocabulary, Abstraction, and Block Patterns) and two composites (Composite A and Composite B). The results were shown separately for adults and children. The older adolescents aged 17 to 19 years were included in both groups. However, these participants were not identical in the adult and child samples.

For the adults, both composite scores had quite high and strong reliability values. Reliability estimates for Composite A ranged from .88 to .97 with a median of .91; and reliability estimates for Composite B ranged from .91 to .95, with a median of .93. Internal consistencies for individual scales were lower across all the age groups. Vocabulary scale estimates ranged from .85 to .92, with a median of .90. For Block Patterns, estimates ranged from .88 to .94, with a median of .91. The Abstraction scale had the lowest internal consistency ranging from .66 to .91, with a median of .77. These results show adequate internal consistency of the scores across adult age groups (17-89).

In the child sample, internal consistencies were a little lower across age groups (7-19). Composite A had a reliability estimate that ranged from .82 to .91, with a median of .87 while internal consistency for Composite B ranged from .78 to .94, with a median of .89. As seen with the adult sample, individual scales’ reliability estimates were lower than the composites’ internal consistencies. The Vocabulary scale showed internal consistency ranging from .81 to .89 (median = .84), Block Patterns had reliability values that ranged from .69 to .94 (median = .85), and Abstraction had lower values ranging from .70 to .80 (median = .77). In general, these results support that scores are internally consistent in Shipley-2 except some age groups on individual scales.

The authors analyzed test-retest stability within three different categories: young children, teens, and adults. They reported correlations between two test sessions’ scores that were done with a 1- to 2-week interval. The test-retest correlation coefficients for the Shipley-2 ranged from .74 to .94 in three categories and from .76 to .94 in two composites. The young children group had the lowest test-retest reliability coefficients for scale and composite scores. In general, these results support that Shipley-2 scale scores are stable over time.

The standard error of measurement (SEM) was also reported for adult and child samples. The SEM is used to interpret how an individual’s obtained score is close to his or her “true” score. In the adult sample, the SEM values ranged from 2.64 to 7.94 and in the child sample, the SEM values ranged from 3.62 to 8.31. The higher SEM values in the child sample are an indicator of higher error associated with scores on particular subtests. This is also an outcome of lower reliability for the child sample. It can be argued that the Shipley-2 is more appropriate for adults than children based on reliability information.

Validity

Content validity, construct validity, concurrent validity, and group discriminative validity were evaluated to provide evidence of test validity for the Shipley-2. Content validity of the Shipley-2 concerns the adequacy with which the cognitive ability domain is sampled. Research on the original Shipley tasks indicated that the Shipley measures crystallized and fluid reasoning abilities well (Phay & York, 1990). The content validity was also evaluated regarding developing new items for the Shipley-2. For the Vocabulary scale, new items similar to the existing items were created to assess increasingly difficult words. Validity of block design tasks similar to the Block Patterns scale has already been investigated for various tests of cognitive ability such as the Wechsler tests. To demonstrate developmental change of Shipley-2 scores, the authors provided the mean and standard deviation of Shipley scale results by age. The results showed that scale scores increase year by year.

Construct validity of the Shipley-2 was evaluated in two ways. The first one was examination of the structural characteristics of the items and scales. These characteristics were evaluated using interscale correlations, item-to-total correlations, factor analysis, and item response theory (IRT) analysis. The interscale correlations between the Vocabulary and the Abstraction scales were .49 (adults) and .48 (children), whereas the correlation between the Vocabulary and the Block Patterns scales were .38 (adults) and .34 (children). These results supported the fact that the Block Patterns scale does not require verbal skills as much as the Abstraction scale. Item-to-total correlations for the three scales indicated that the items from each scale were well correlated with their own scales. These correlations supported the structure of the three Shipley-2 scales. The authors also conducted exploratory factor analysis for each of the Shipley-2 scales using raw scores. The results showed that all items were loaded onto a single factor with high factor loadings. Also, each scale was loaded highly onto separate factors. These results indicated that the Shipley-2 measures an overall cognitive ability and three distinct aspects of cognitive ability. The items and scales of the Shipley-2 were also evaluated within an IRT context. The Rasch model was used to determine the existence of an appropriate difficulty gradient on each scale. The authors found that for most items, ordering of the items matched to the difficulty gradient. Each scale of the Shipley-2 covers a wide range of ability. Therefore, it can be used as an adequate measure for people with different levels of ability. In addition to the structural characteristics, concurrent validity was evaluated as a part of construct validity by comparing scores to those obtained from similar tests. The authors found that the Shipley-2 had a strong consistency with other tests such as the Wechsler Adult Intelligence Test, Third Edition (WAIS-III) and the Wide Range Achievement Test 3 (WRAT 3).

Group discriminative validity was evaluated by comparing performance of a clinical sample with the standardization sample. The clinical sample consisted of adults and children with some cognitive deficiencies. One-sample t tests were used to examine whether average standard scores from the clinical sample were statistically different from the standardization mean of 100. The results showed that all scales were statistically significant with large effect sizes except that the Block Patterns scale had a moderate effect size. The Shipley-2 was able to distinguish individuals with a cognitive deficiency from individuals who do not have such a deficiency.

Commentary and Recommendations

The Shipley-2 was developed to assess cognitive functioning and impairment. The test is aligned with the stated theoretical model. The authors claimed that the standardization sample was representative of the U.S. population at the time of study although there was not a proper random sampling of participants. The test seems to measure what it is supposed to measure in a reliable way.

The Shipley-2 has several strengths. It is a robust measure of crystallized and fluid cognitive abilities. It is easy to administer, score, and interpret, and uses accessible reading levels to alleviate language effects. The expanded age-range, a new standardization sample, and updated norms are the strengths of Shipley-2 over its previous version. The manual is also straightforward and supported by examples and case studies from various settings to minimize administration errors. The administration time is relatively short and the user’s administration has a certain amount of flexibility: in groups or on an individual basis, timed or untimed, any single or any combination of scales, and paper-and-pencil or computerized. Specifically, it is useful to include the Abstraction and the Block Pattern scales for the evaluation of verbal and nonverbal skills separately. The test is very sensitive to detect examinees with potential cognitive difficulties. It has a high internal consistency of each scale and a low standard error of measurement. Although the subscales of the Shipley-2 are interrelated, each subscale measures a unique component of cognitive ability.

Although there are several strengths, the Shipley-2 has some weaknesses. The test is not appropriate for individuals who do not speak English as a first language or who have uncorrected visual impairment. The test developers should have used confirmatory factor analysis instead of exploratory factor analysis to examine the unidimensional structure of each scale and the overall factor. Although the Shipley-2 does not function differently across male and female groups, ethnicity might be a concern when administering the test since ethnic differences exist in some tests of cognitive ability. Also, socioeconomic status (SES) is another important characteristic that should be considered because of its moderating effect between test scores and ethnicity. More differential item functioning (DIF) and measurement invariance analyses should be conducted to examine whether the test and its items function in the same way for all examinees.

The Shipley-2 is the second edition of a test that has been widely used since 1940 to assess cognitive functioning and impairment. Although it testifies to its value by withstanding the test of time, the Shipley-2 brings much strength with its additional features.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Phay

A. J.

York

A. C.

(1990). Shipley Institute of Living Scale: Part 2-Assessment of intelligence and cognitive deterioration. Medical Psychotherapy, 3, 17-35.

Shipley

W. C.

(1940). A self-administering scale for measuring intellectual impairment and deterioration. Journal of Psychology, 9, 371-377.

Shipley

W. C.

Gruber

C. P

Martin

T. A.

Klein

A. M.

(2009). Shipley-2 manual. Los Angeles, CA: Western Psychological Services.