Abstract

Test Description
The Raven’s 2 Progressive Matrices Clinical Edition (Raven’s 2; Raven, Rust, Chan, & Zhou, 2018), published by NCS Pearson, is an individually administered nonverbal assessment of general cognitive ability developed to measure educative abilities, defined as the ability to think clearly and solve complex problems (pp. 1) in individuals aged 4 through 90 years. The test examines cognitive functions such as perception and attention to visual detail, inductive reasoning, fluid intelligence, broad visual intelligence, classification and spatial ability, simultaneous processing, and working memory. Appropriate for clinical or research purposes, the measure is suitable if time is limited, if only a general estimate of cognitive ability is required, or in a group screening format. It can be administered and interpreted by individuals who possess level B qualification as per Pearson’s guidelines, such as psychologists and professionals from related fields with at least a masters-level education who are familiar with educational and psychological testing standards.
The kit includes an administration manual, scoring template, 25 answer sheets, and test booklet. The manual provides general administration and interpretation guidelines. All forms are available in English and can be administered in paper (PF), digital short (DSF), or digital long (DLF) forms.
Measure Development
Development of the Raven’s 2 is based on the original Raven’s progressive matrices test; however, due to most of the original stimuli being part of the public domain and the resulting inflation of performance, the developers wanted to update the item bank and increase test security. As such, the revision focused on development of new items and updated norms, which took place over a 3-year period. The revision yielded enhanced reliability and validity, an extended age range, and increased user friendliness.
Administration and Scoring
General Description and Structure
The Raven’s 2 consists of 24 (DSF) to 60 (PF) visual items depicting a matrix of colored geometric shapes arranged in one of four layouts (1 × 1, 2 × 2, 3 × 3, and 1 × 6). Each matrix contains one empty cell and options for the examinee to select from to complete the matrix. The examinee is directed to view the matrix, determine the relation between the pictured items, extrapolate based on that relation to determine what the empty cell should depict, and then select their response from the options.
Administration
The Raven’s 2 must be administered in person but may be administered on paper or digitally (via Pearson’s Q-Global platform), with each format requiring differing supplies. The manual implies that the DLF is preferred as it allows for the creation of unique item banks drawn from an expansive set of parallel items. The DLF also reduces examiner workload and potential measurement error arising from guessing and testing time. The DSF reduces administration time, though yields reduced measurement precision. The PF is appropriate when an internet connection is not available or if examinees are uncomfortable with an online testing format.
Administration requires the examiner to provide standardized verbal instructions and/or feedback (that have been substantially reduced from previous versions of the measure) while the examinee views each matrix and indicates their response. An administration is considered valid when at least 16 items have been completed, regardless of administration format, when the discontinue criteria are met and/or when the time limit (which ranges from 20 to 45 minutes, depending on the age of the examinee and format of administration) is reached. Retesting of an examinee may occur between two weeks to six months, with the digital forms being ideal due to the expansive item bank that reduces the potential for practice effects.
Although there are various administration formats, universal nonverbal icons that cue necessary actions for both examiners and examinees are used, as the Raven’s 2 is a nonverbal measure of cognitive ability. For both digital and paper administrations, the “Wait & Listen” icon (red, yellow, or green in color) directs examinees to stop and listen for directions while examiners read the demonstration, sample, expanded, or test item directions. Similarly, examinees may use an “Assist” icon to indicate that they require assistance. Examiners may record the responses for examinees with special needs who are unable to record their own responses using either administration format.
Scoring
Scoring is completed either through use of the transparent overlay included in the kit and the normative tables in the manual or via Q-global, Pearson’s web-based scoring and reporting platform, on a fee-per-report basis. Raw scores are used to derive ability scores based on the item sets administered and examinee age, which are then converted to standard scores and a confidence interval. Percentile rank, normal curve equivalent, stanine, and age equivalent scores can also be derived.
Technical Adequacy
Test Standardization
Standardization occurred from December 2016 to August 2017 via digital administration. The sample of 2275 examinees was stratified on age, education, race/ethnicity, geographic region, and gender based on the U.S. general population as per the 2015 Census. The manual states that 35% of examinees completed an individual administration and 65% completed a group administration, with an average of three members per group. Desktop, laptop, and tablet devices were used in the digital standardization process. The item set for the PF was based on Rasch concurrent calibration item difficulty and an item response theory (IRT) analysis.
Reliability
Internal consistency
Reliability coefficients assess the degree to which items within a scale measure the same construct. While most test measures rely on traditional internal consistency measurements, the Raven’s 2 assessed reliability using IRT-based marginal reliability as individual examinees are administered a subset of items from the test bank. Dimitrov (2003) reported that IRT-based marginal reliabilities provide a similar measure of internal consistency to coefficient alpha by measuring distinct yet parallel sets of items within the test bank. The manual indicates very stable estimated marginal reliabilities across randomly assigned items with standard deviations ranging from .002–.006 across digital and paper forms. The DLF and PF demonstrated either good (.80s) or excellent (.90s) reliability across all age groups. The DSF demonstrated good (.80) reliability across all age groups.
Test–retest stability
Various participants completed the Raven’s 2 twice within an average of 36 days to examine temporal stability. Stability coefficients were calculated in five age bands (all age groups, ages 4–10, ages 11–16, ages 17–54, and ages 55–90) in relation to PF, DSF, and DLF. Corrected stability coefficients for the all age groups band were all above .80 (PF = .86, DSF = .82, and DLF = .87). Similarly, corrected stability coefficients were all above .80 in the 4–10 age band, (PF = .88, DSF = .85, and DLF = .89), 11–16 age band (PF = .86, DSF = .81, and DLF = .85), and 17–54 age band (PF = .89, DSF = .84, and DLF = .89). Corrected stability coefficients in the 55–90 age band ranged from .78–.84 (PF = .82, DSF = .78, and DLF = .84).
Validity
The Raven’s 2 is intended to measure educative and general cognitive abilities nonverbally. Confirmation of the measure’s accurate representation of intelligence occurred through an examination of special group studies, relations to other variables and measures, and content-oriented evidence.
Content validity
The Raven’s 2 is concerned with measuring an individual’s cognitive abilities through a progressing series of matrices. It is vital that the test adequately measures and represents various cognitive abilities that make up intelligence factors to maintain strong content-oriented evidence. Various domains of cognitive functioning and educative abilities such as executive function, spatial awareness, inductive reasoning, and classification must be employed to solve test items (Raven et al., 2018). Both external and internal experts were employed to establish strong content validity to avoid irrelevant response factors and extraneous factors (Raven et al., 2018). By eliminating irrelevant factors such as color distractors, unnecessary memory load, random guessing, fatigue, and visual acuity, the Raven’s 2 evaluates an individual’s cognitive abilities with little interference from separate processes.
Construct validity
The relation between the Raven’s 2 scores and performance on similar measures of cognitive functioning informed the validity of the measure. Specifically, correlations were investigated between the Raven’s 2 and the Kaufman Brief Intelligence Test—Second Edition (KBIT-2; Kaufman, 2004), Naglieri Nonverbal Ability Test—Third Edition (NNAT3; Naglieri, 2015), and the Wide Range Achievement Test—Fifth Edition (WRAT5; Wilkinson & Robertson, 2017). Correlations between the Raven’s 2 and KBIT-2 demonstrated both convergent and divergent validity. When compared to the KBIT-2’s nonverbal measures, there were strong correlations (PF = .75, DSF = .70, and DLF = .73). Conversely, when comparing the KBIT-2’s verbal measures to the Raven’s 2, there were only moderate correlations (PF = .50, DSF = .43, and DLF = .48), as expected. The relation between the Raven’s 2 and the NNAT3 indicated strong correlations (PF = .77, DSF = .70, and DLF = .70). Regarding the relation with the WRAT5, there were only moderate correlations between the five WRAT5 subtests and the Raven’s 2 (PF = .43–.59, DSF = .46–.62, and DLF = .45–.62) as expected due to the different constructs (intelligence vs. achievement) evaluated by the two tests. Additional information on correlations between Raven’s 2 and previous iterations of the Raven’s progressive matrices can be found in the manual.
Special groups studies
Special groups, including intellectual disability (ID), English language learners (ELLs), and intellectually gifted (GT) individuals, were matched to normative sample peers based on age, gender, ethnicity, and education level to examine test-criterion validity. The GT sample scored significantly higher than the normative sample on all three forms (p < .01). Conversely, individuals diagnosed with an ID scored significantly lower than the normative sample on all three forms (p < .01). When performance of the ELL group was compared to the normative sample, there was evidence that the ELL group scored higher on all three forms; however, these findings were not statistically significant, indicating that the Raven’s 2 is an acceptable valid cognitive ability measure for ELLs. When examined as a whole, the three special population scores indicate that the Raven’s 2 is a suitable and valid measure of general cognitive ability of individuals who fall within gifted, ID, and ELL categories.
Commentary and Recommendations
The Raven’s 2 possesses various features that allow it to be a useful nonverbal measure of cognitive ability. As expected, one of its strengths is its ability to measure cognitive ability nonverbally, which is useful in special populations and those with English language limitations. Another major feature of the Raven’s 2 is the wide age range, allowing administrators to examine general cognitive ability across a broad range of individuals. As well, building upon the original Raven’s progressive matrices, the item test bank of the Raven’s 2 is expansive, allowing for restricted permeability into the public domain and distinct yet parallel items in each administration. With respect to administration, the Raven’s 2 is easy to use, requires minimal training to learn and understand, and has efficient administration time. Moreover, the digital formats drastically reduce administrator workload. Finally, there is emerging evidence that the Raven’s 2 may have specific utility in evaluating the cognitive abilities of special populations, such as autism spectrum disorder, whose performance on the original version of the Raven’s is seen to be a better indicator of intelligence than other, more comprehensive, assessment measures (Barbeau, Soulières, Dawson, Zeffiro, & Mottron, 2013; Dawson, Soulières, Ann Gernsbacher, & Mottron, 2007; Nader et al., 2015; Soulieres, Dawson, Gernsbacher, & Mottron, 2011).
While the Raven’s 2 is a useful measure of nonverbal cognitive ability, it is not without its limitations. One of the most apparent is the fact that it examines limited domains of cognitive functioning. Although specific cognitive domains such as visual-spatial and fluid reasoning are addressed, it is limited to those factors assessed by matrices tests and thus does not assess other key cognitive factors such as working memory, processing speed, or factors found in more expansive tests reflecting more comprehensive factor models. As such, it may result in a poor estimate of cognitive ability for individuals who may have divergent abilities in those other domains. Given this limitation, the manual states that the Raven’s 2 is not recommended for use in making diagnoses or educational placement decisions in isolation of other, more comprehensive, cognitive data. Similarly, the Raven’s 2 should not replace comprehensive measures of general cognitive ability such as the Wechsler tests.
Overall, the Raven’s provides a quick, effective, and efficient estimate of general cognitive ability. Its construct breakdown, large test bank, and short completion time make it a useful measure for both clinical and research settings. As well, it may be useful in measuring the intelligence of special populations. Though the measure experiences some limitations, it remains both useful and recommendable.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
