Abstract
This comment responds to Maul’s (2012) article evaluating the validity evidence and argument for the Mayer–Salovey–Caruso Emotional Intelligence Test (MSCEIT) as a measure of emotional intelligence (EI). We suggest that Maul’s standards for establishing validity evidence are unrealistically high, and may not be met by other established psychometric tests. As an example, we show that evidence for the validity of Raven’s Progressive Matrices (RPM) is of a similar standard to the MSCEIT.
Keywords
Maul (2012) evaluates the validity of the Mayer–Salovey–Caruso Emotional Intelligence Test (MSCEIT) test battery for assessing emotional intelligence (EI) using criteria loosely based on generalizability theory and modern test standards. While Maul’s report card for the MSCEIT was not especially positive, we believe his standards of evaluation were exceptionally high, setting a bar that few existing psychometric tests could pass. In particular, trait EI assessments would clearly not meet these standards (see Roberts, Schulze, & MacCann, 2008)—a glaring omission if one intended a fair evaluation of the MSCEIT compared to other EI measures. To illustrate the severity of these standards, we evaluate the Raven’s Progressive Matrices (RPM; Raven, Raven, & Court, 2004) according to the four standards Maul used to evaluate the MSCEIT: (a) the scoring system must be adequate to infer the observed scores from task performance; (b) observed scores must generalize to the universe scores; (c) the universe scores must extrapolate to the target scores; and (d) scores on the target domain must be interpretable as reflective of the construct measured. The RPM is the most commonly used assessment of fluid reasoning ability in intelligence research.
First, the RPM scoring system may not be completely adequate to make inferences from observed performance to observed scores. For example, the RPM does not include partial scoring, even though some distractors are both empirically and conceptually more difficult than others. Second, RPM observed scores may not generalize to the universe of all possible scores. For example, variability over testing occasions relates to the amount of previous item exposure, and not just to variability in universe scores (Bors & Vigneau, 2001). In addition, there is evidence of differential item functioning on RPM items, where the relationship between observed scores and universe scores differs based on group membership (Abad, Colom, Rebollo, & Escorial, 2004). Third, the RPM universe scores may not extrapolate to the RPM target scores, at least in terms of adequate sampling of test content. Only matrix reasoning is sampled, and this only with visual material, rather than material from verbal, auditory, or other modalities. Fourth, structural analyses of RPM items suggest multiple distinct factors representing different constructs, which does not reflect the theoretical underpinnings or usual interpretations of observed scores (DeShon, Chan, & Weissbein, 1995).
Our goal with this example was not to disprove the RPM as a test of fluid reasoning. In fact, we believe that the RPM measures fluid reasoning sufficiently well to be a useful research instrument, despite its limitations. Instead, our goal was to point out that failure to meet such exacting psychometric standards is not unique to the MSCEIT, and could equally characterize almost any psychometric test used by researchers. That is, we believe that Maul’s (2012) argument throws the baby out with the bathwater. We acknowledge that the MSCEIT does have many distinctive measurement features (e.g., proportion-based scoring, rating the extent of a quality present in a stimulus), each with associated advantages and disadvantages. Nevertheless, there is evidence that at least three of the branches of the MSCEIT demonstrate evidence of structural validity, generalizability, and incremental prediction of valued and relevant outcomes (e.g., Fan, Jackson, Yang, Tang, & Zhang, 2010; Joseph & Newman, 2010; Mayer, Roberts, & Barsade, 2008). Like Maul, we also believe that the facilitation branch may be poorly defined and operationalized (a viewpoint that has been around for some time; e.g., Matthews, Zeidner, & Roberts, 2007).
In sum, while we acknowledge some psychometric shortcomings of the MSCEIT, we do not believe these are as dire or as far-reaching as Maul (2012) suggests. Indeed, accepting his critique wholeheartedly would throw whole areas of educational and psychological research into disarray (including research directed towards assessing emotion more generally). Moreover, we believe that there are many viable alternatives to the MSCEIT that can supplement or augment research into EI, such that measurement issues in the MSCEIT do not overshadow the entire area of EI.
Footnotes
Author note:
All statements expressed in this article are the authors’ and do not necessarily reflect the official opinions or policies of the authors’ respective institutions.
