Abstract
As a result of the upcoming Federal reauthorization of the Individuals With Disabilities Education Improvement Act (IDEA), practitioners and researchers have begun vigorously debating what constitutes evidence-based assessment for the identification of specific learning disability (SLD). This debate has resulted in strong support for a method that appraises an individual’s profile of cognitive test scores for the purposes of determining cognitive processing strengths and weaknesses, commonly referred to as patterns of strengths and weaknesses (PSW). Following the Fuchs and Deshler model, questions regarding the psychometric and conceptual integrity of the PSW model are addressed. Despite the strong claims made by many PSW proponents, the findings by this review demonstrate the need for additional information to determine whether PSW is a viable alternative to existing eligibility models and worthy for large scale adoption for SLD identification. Implications for public policy and future SLD research are also discussed.
There has been a vigorous debate within the field of school psychology regarding the best procedures for determining eligibility for special education and related services under the category of specific learning disability (SLD) since the passage of the final regulations for the current iteration of the Individuals With Disabilities Education Improvement Act (IDEA) in 2004 (e.g., Hale, Naglieri, Kaufman, & Kavale, 2004; Kavale & Flanagan, 2007). Prior to IDEA (2004), federal regulations emphasized the primacy of the ability–achievement discrepancy model for the identification of SLD. Public Law 94-142 (i.e., the Education for All Handicapped Children Act of 1975 mandated SLD eligibility determination: based on (1) whether a child does not achieve commensurate with his or her age and ability when provided with appropriate educational experiences, and (2) whether the child has a severe discrepancy between achievement and intellectual ability in one or more of seven areas relating to communication skills and mathematical abilities. These concepts are to be interpreted on a case by case basis by the qualified evaluation team members. The team must decide that the discrepancy is not primarily the results of (1) visual, hearing, or motor handicaps; (2) mental retardation; (3) emotional disturbance; or (4) environmental, cultural, or economic disadvantage. (United States Office of Education, 1977, 42, p. 65082)
In contrast to previous legislation, IDEA (2004) permitted state education agencies (SEA) to select between the discrepancy method and several alternatives by specifying that state adopted SLD eligibility criteria “must not [emphasis added] require the use of a severe discrepancy between intellectual ability and achievement . . . ” (IDEA, 34 C. F. R. § 300.307(a)(1)). Further stating, SEA “must [emphasis added] permit the use of a process based on the child’s response to scientific, research-based intervention” (IDEA, 34 C. F. R. § 300.307(a)(2)) and “may [emphasis added] permit the use of other alternative research-based procedures for determining whether a child has a specific learning disability” (IDEA, 34 C. F. R. § 300.307(a)(3)). The former alternative to the discrepancy method is a process commonly referred to as response-to-intervention (RTI), whereas the latter alternative opened the door for other research-based procedures to be used in SLD eligibility determinations.
The paradigm shift away from sole use of the discrepancy method was likely the result of a confluence of several factors spanning three decades (Aaron, 1997), including psychometric evidence of poor diagnostic convergence between different discrepancy formulas (e.g., Francis et al., 2005; Mellard, Deshler, & Barth, 2004; Reynolds, 1984) and research findings suggesting that individuals with discrepancies were able to benefit significantly from direct academic interventions (e.g., McMaster, Fuchs, Fuchs, & Compton, 2005; Vellutino, Scanlon, & Lyon, 2000). Although the discrepancy model has lost credibility, federal regulations codified in IDEA (2004) permitting alternative methods for determining SLD eligibility has provided momentum for continued debate over the role of cognitive testing in the identification of SLD.
Ahearn (2008) surveyed SEA representatives regarding SLD identification procedures after the enactment of IDEA (2004) and reported that six states required use of RTI while simultaneously prohibiting the discrepancy method for making SLD determinations (i.e., Colorado, Delaware, Georgia, Indiana, Iowa, and West Virginia), 10 states permitted use of all three methods described in federal regulations (i.e., Alabama, Arkansas, Florida, Kansas, Michigan, Nebraska, New Hampshire, Ohio, Oregon, and South Carolina), and 26 states allowed either the use of RTI or the discrepancy method for the identification of SLD (the Arizona SEA representative declined to respond to the survey). Consequently, the discrepancy method and RTI may be the most widely used procedures for diagnosing SLD in school-based settings nationwide.
However, alternatives to the discrepancy model for SLD identification have also been proposed that emphasize the role of strengths and weaknesses in cognitive processing as measured by individually administered standardized intelligence tests, collectively referred to as the patterns of strengths and weaknesses (PSW) approach (e.g., Hale, Flanagan, & Naglieri, 2008; Reynolds & Shaywitz, 2009). Johnson, Humphrey, Mellard, Woods, and Swanson (2010) reported moderate to large effect sizes in cognitive processing differences between students identified as SLD and typically achieving peers in a meta-analysis of 32 empirical studies. The authors concluded that measures of cognitive processing should be included in the evaluation and identification of SLD on the basis of their results. Moreover, a recent survey of 58 professional experts with experience in SLD assessment, regarding procedural best practices for diagnosing SLD, resolved that an “approach that identifies a pattern of psychological processing strengths and weaknesses, and achievement deficits consistent with this pattern of processing weaknesses, makes the most empirical and clinical sense” (Hale et al., 2010, p. 225).
Federal regulations require SLD diagnostic procedures to be “research-based.” Consequently, the PSW approach to cognitive test score interpretation must be accompanied by adequate evidence of reliability and validity (American Educational Research Association [AERA], American Psychological Association [APA], & The National Council on Measurement in Education [NCME], 2014). Given the nature and importance of the task(s) in which PSW is being prescribed, additional information is needed to determine the degree to which it is a viable alternative to existing eligibility models. The resultant purpose of the present review is to address a series of questions regarding the psychometric and conceptual composition of the PSW model following the model of Fuchs and Deshler (2007):
How is a cognitive weakness defined?
Are PSW models diagnostically valid?
Are factor-based scores from cognitive tests suitable for individual decision making?
Do school psychologists have adequate training to implement PSW with integrity?
How is a Cognitive Weakness Defined?
To date, several models have been proposed that operationalize PSW, including (a) the Concordance/Discordance Model (C/DM; Hale & Fiorello, 2004), (b) the Cattell-Horn-Carroll Operational Model (CHC; Flanagan, Alfonso, & Mascolo, 2011), and (c) the Discrepancy/Consistency Model (D/CM; Naglieri, 2011). It is beyond the scope of the present review to provide a detailed account of the procedures involved for each PSW method. However, it is noteworthy that each of these PSW models shares at least three core assumptions as it relates to the diagnosis of SLD despite underlying differences in theoretical orientation: (a) evidence of cognitive weaknesses must be present, (b) an academic weakness must also be established, and (c) there must be evidence of spared cognitive–achievement abilities (Flanagan & Alfonso, 2015).
Problems Associated With Operationalizing a Weakness in PSW Models
Lack of a uniform method for defining PSW
Establishing a cognitive processing weakness is the sine qua non of the PSW model. Yet, there is considerable disagreement regarding exactly how processing weaknesses should be operationalized (Flanagan, Fiorello, & Ortiz, 2010; Flanagan, Ortiz, Alfonso, & Dynda, 2006; Hale & Fiorello, 2004; Hale, Wycoff, & Fiorello, 2011; Naglieri, 1999). A survey of contemporary SLD diagnostic research reveals that there is little consensus on exactly how PSW should be defined and that different diagnostic models of SLD identification are likely to identify different students (Lyon & Weiser, 2013). Some models (e.g., CHC) use a normative psychometric approach wherein an individual’s cognitive–achievement scores are compared with the performance of a sample of same aged peers from the general population. In the CHC model, a weakness is defined as performance that falls below the average range (e.g., standard scores ranging from 90 to 110), whereas a deficit is defined as performance that falls greater than one standard deviation below the mean (e.g., standard scores at or below 85). Despite the weakness/deficit distinction, Flanagan and colleagues (2011) suggest that practitioners may utilize both thresholds for determining whether a particular score is indicative of an intraindividual weakness. Practitioners are also encouraged to utilize clinical judgment to determine whether strict adherence to the aforementioned thresholds is appropriate. To wit, Some children who struggle academically may not demonstrate academic weaknesses or deficits on standardized, norm-referenced tests of achievement . . . Therefore, it is not important to assume that a child with a standard score of 90 in broad reading is okay. (Flanagan et al., 2011, p. 245)
However, it is important to highlight that probability distribution theory determines that the use of such a high cut-off (e.g., 90) is likely to result in approximately 25% of the population being identified as having a cognitive weakness on any given measure (Decker, Schneider, & Hale, 2012), which is larger than the estimated proportion of school-aged children and adolescents diagnosed with SLD nationwide (Kena et al., 2014).
In contrast, the C/DM model utilizes an intraindividual parametric approach to identify cognitive weaknesses that is denoted by statistically significant differences between a cognitive strength score and a cognitive weakness score using the standard error of the difference (SED) statistic. However, the SED formula is heavily mediated by the reliability of the constituent measures. Thus, if the reliability coefficients of each of two reference indices are .90 or higher, differences of only 3 to 5 standard score points are required for a cognitive weakness to be indicated. As a result, Hale and Fiorello (2004) encourage practitioners to seek convergent evidence from multiple data sources to make eligibility determinations. This advice is prescient given that the use of SED is likely to result in high initial false positive decisions, given that many composite and subtest scores have corresponding alpha levels that exceed .90.
Failure to report PSW base rates
Flanagan and Alfonso (2015) have recently revised the CHC model in the form of the Dual Discrepancy/Consistency (DDC) model. The DDC model explicitly uses the Cross-Battery Assessment System (X-BASS; Flanagan, Ortiz, & Alfonso, 2015) proprietary software program for calculating cognitive strengths and weaknesses. According to Flanagan and Alfonso, the X-BASS program uses elements of previously released software programs associated with the cross-battery assessment model (e.g., Flanagan, Ortiz, & Alfonso, 2013). The X-BASS program can be used to derive statistically significant (p ≤ .05) critical values for cognitive weaknesses—users input their battery of cognitive–achievement scores and specify a priori which scores correspond to required elements of the DDC model. If the observed score from the specified cognitive weakness exceeds the critical value derived from the program algorithm, it is considered to demarcate a statistically significant cognitive weakness. The X-BASS also includes an additional computational element for synthesizing inputted scores into a pseudo general ability composite with a corresponding index that reports the likelihood that an individual presents with otherwise spared cognitive abilities aside from the observed cognitive weakness. While the X-BASS software potentially provides users with a more statistically rigorous method for determining strengths and weaknesses associated with the DDC model, additional psychometric information (e.g., base rates in clinical and non-clinical populations) is needed to determine the diagnostic value of the scores provided by the program.
Poor stability of observed difference scores
The D/CM model utilizes a hybrid approach in which a cognitive weakness is assessed using a combination of intraindividual (i.e., ipsative analysis) and normative appraisals. Ipsative analyses (Davis, 1959) involve making intraindividual comparisons by subtracting each individual cognitive score from the mean of the profile of scores, the resulting difference score is then determined to be significant if it exceeds a priori statistical thresholds. In contrast to other PSW methods, ipsative analysis of cognitive test scores has been thoroughly investigated (e.g., Glutting, Watkins, & Youngstrom, 2003; Watkins, 2000). The resulting difference score is a transformation of the original score. Consequently, interpretation is difficult because the test score has been removed from its norm-referenced anchor point, and the underlying psychometric properties of the transformed score are unknown (Glutting et al., 2003). Ipsative profiles have also consistently demonstrated poor stability and diagnostic accuracy at the subtest and composite levels (e.g., McDermott, Fantuzzo, & Glutting, 1990; McDermott, Fantuzzo, Glutting, Watkins, & Baggaley, 1992; McDermott & Glutting, 1997; Watkins, 2000). For example, Watkins (2000) investigated the degree to which Naglieri’s (2000) relative weakness (i.e., ipsative assessment), cognitive weakness (i.e., at least one factor score below a cut-score and a relative weakness present), and cognitive and academic weakness (i.e., both relative and cognitive weakness present concurrently with a low normative academic achievement test score) measured by the Wechsler Intelligence Scale for Children–Third Edition (WISC-III; Wechsler, 1997) and Cognitive Assessment System (CAS; Naglieri & Das, 1997) could accurately distinguish between students placed in special education programs and students placed in regular education programs. Results indicated only 24% to 51% of students in special education displayed a relative weakness, 17% to 32% of students in special education displayed a cognitive weakness, and 14% to 28% of students in special education displayed a cognitive and academic weakness suggesting these cognitive profiles are not commonly observed in students with disabilities. Furthermore, rates of false positive cases were similar indicating that relative weaknesses, cognitive weaknesses, and cognitive and academic weaknesses are also frequently observed in the cognitive profiles of students without disabilities (i.e., 27%–42%, 10%–13%, and 4%–6%, respectively). Yet, ipsative analysis at the composite score level continues to be recommended for the identification of SLD (e.g., Dehn, 2014; Naglieri, 2011) based on the belief that the development of more advanced cognitive measures that are based on theoretically derived models of intellectual abilities (e.g., CHC) allow users to overcome the aforementioned shortcomings of ipsative and similar types of intraindividual profile analysis (Flanagan et al., 2013). Additional psychometric examinations of the potential diagnostic accuracy of ipsative profiles generated from recently revised cognitive measures (e.g., CAS2, WISC-V) would permit more relevant inferences as to the potential efficacy of the D/CM method.
Are PSW Models Diagnostically Valid?
There is scientific consensus that SLD is a neurological disorder with an etiology demarcated by specific neurocognitive deficits that impair an individual’s ability to benefit from traditional academic instruction without significant psychoeducational remediation (Reynolds & French, 2005). Although proponents of PSW suggest that such methods provide a mechanism for determining the specific cognitive deficits responsible for an individual’s achievement difficulties as well as critical information for designing individualized treatment plans that take into consideration the dynamic nature of cognitive–achievement relationships (e.g., Hajovsky, Reynolds, Floyd, Turek, & Keith, 2014; S. B. Kaufman, Reynolds, Liu, Kaufman, & McGrew, 2012; McGrew & Wendling, 2010), this information is not sufficient for establishing the diagnostic utility of these procedures (McFall & Treat, 1999; Wiggins, 1973).
PSW as a Diagnostic Sign
Determining the accuracy of PSW as a diagnostic sign requires the computation of sensitivity and specificity statistics from a 2 × 2 contingency table that cross-tabulates decisions made from PSW with those from a gold standard diagnosis (a sample diagnostic contingency array is provided in Table 1). Sensitivity is the proportion of individuals with SLD who exhibit significant PSW (i.e., true positive decisions). Specificity represents the proportion of individuals without SLD who are not marked with psychoeducational profiles that contain significant PSW (i.e., true negative decisions). Diagnostic accuracy is often represented as the rate of “hits” (i.e., true positive and true negative decisions) in a selected population. Steubing, Fletcher, Branum-Martin, and Francis (2012) investigated the diagnostic accuracy of several PSW models using simulated WISC-IV assessment data and reported high diagnostic specificity (i.e., true negative decisions) across all models with only 1% to 2% of the population meeting SLD criteria. However, this high specificity came at the cost of an inflated Type II error rate (i.e., false positive decisions) calling into question the ability of PSW methods for improving upon the diagnostic accuracy of existing models (e.g., discrepancy model, RTI) for correctly identifying children and adolescents with SLD. The authors concluded that the inflated error was likely the result of transforming a continuous scale of measurement into a binary model in which scores at or below a cut-point are considered to be indicative of a disorder. Problems associated with artificially dichotomizing continuous data have long been known (Cohen, 1983; MacCallum, Zhang, Preacher, & Rucker, 2002). However, it is important to note that multidisciplinary evaluation teams must make “yes/no” eligibility decisions according to IDEA (2004) and some degree of decision error will undoubtedly occur when cut-scores are used to diagnose SLD as a result. In consideration of this problem, it is important to highlight that all cognitive–achievement measures contain measurement error and thus true scores are bound to fluctuate around arbitrary cut-off points with repeated diagnostic testing (Macmann, Barnett, Lombard, Belton-Kocher, & Sharpe, 1989). According to Fletcher, Stuebing, Morris, and Lyon (2013), the imposition of a different classification model (e.g., PSW) may shift the focus on underlying casual variables but does nothing to address these more fundamental psychometric concerns that apply to all diagnostic models that utilize these data.
Sample PSW Diagnostic Contingency Table.
Note. PSW = patterns of strengths and weaknesses.
Condition is known specific learning disability diagnoses.
More explicit investigations of individual PSW models have only recently begun to emerge within the technical literature. C/DM procedures have been used by researchers in a series of recent studies examining the potential diagnostic utility of the model for identifying SLD in a multitude of academic domains. Kubas et al. (2014) used C/DM diagnostic procedures with a referred sample of 283 children and adolescents from the United States and Canada. Results indicated that specific cognitive processing subtypes distinguished between those identified as having math SLD via the C/DM procedure and those that were not, with different cognitive skills mediating math performance across groups. Similar results have been replicated with respect to use of the C/DM model for diagnosing SLD in written expression (Fenwick et al., 2015) and for the CHC model in reading (Feifer, Nader, Flanagan, Fitzer, & Hicks, 2014). However, a recent investigation by Miciak, Taylor, Denton, and Fletcher (2015) found that the utilization of different assessment measures in the C/DM model resulted in poor classification agreement (k = .29), suggesting that test selection in PSW is not arbitrary and may result in inconsistent diagnostic decisions across practitioners and educational agencies. Moreover, Miciak, Fletcher, Stuebing, Vaughn, and Tolar (2014) administered multiple cognitive measures designed to measure CHC-related broad abilities to non-responders of a Tier 2 reading intervention program to test the reliability and validity of C/DM and CHC methods. Diagnostic overlap in which an individual met SLD criteria according to both PSW models fluctuated between 13.6% and 62.1% across different iterations of the models, indicating that model choice likely affects diagnostic decision.
Absence of Aptitude–Treatment Interaction
As previously mentioned, in addition to its potential promise as an identification model, Hale et al. (2010) suggested “processing assessment could also lead to more effective individualized interventions for children who do not respond adequately to intensive interventions in an RTI approach” (p. 229), implying the presence of an aptitude–treatment interaction. This assumes that unique broad cognitive abilities differentially predict deficits in specific academic abilities. However, latent variable modeling studies have provided inconsistent evidence in support of this contention. Multiple investigations have reported that broad cognitive abilities have direct effects on reading and math achievement beyond the general factor (Benson, 2008; Floyd, Keith, Taub, & McGrew, 2007; Hajovsky et al., 2014; Taub, Keith, Floyd, & McGrew, 2008; Vanderwood, McGrew, Flanagan, & Keith, 2001), but the predictive effects associated with those abilities are relatively small (Beaujean, Parkin, & Parker, 2014; Glutting, Watkins, Konold, & McDermott, 2006; Oh, Glutting, Watkins, Youngstrom, & McDermott, 2004; Parkin & Beaujean, 2012). Whereas there is emerging evidence (e.g., Compton, Fuchs, Fuchs, Lambert, & Hamlett, 2012; Fuchs, Hale, & Kearns, 2011) that individuals with SLD may present with discrepant cognitive profiles when compared with normal controls, these cognitive correlates rarely mediate academic intervention outcomes (Fletcher et al., 2011; Miciak, Stuebing, et al., 2014; Stuebing et al., 2014).
Are Factor-Based Scores From Cognitive Tests Suitable for Individual Decision Making?
The interpretation of subtest score profiles for diagnostic decision making has long been advocated in the technical literature despite suggestions that PSW is a new and revolutionary approach to cognitive test interpretation and SLD identification (Flanagan et al., 2006; Hale et al., 2011). More than 70 years ago, Rapaport, Gil, and Schafer (1945) proposed an interpretive framework that provided clinicians with a step-by-step process for analyzing intraindividual cognitive strengths and weaknesses based on the belief that individual variations in cognitive test performance served as evidence for the presence of a variety of clinical disorders and a multitude of related approaches have been subsequently developed (e.g., A. S. Kaufman, 1994; Naglieri, 2000; Prifitera & Dersh, 1993).
However, Gnys, Willis, and Faust (1995) characterized the belief that subtest scatter distinguishes individuals with SLD from those without disabilities as an illusory correlation—the false belief that two variables are related (Chapman & Chapman, 1967) due to the preponderance of evidence indicating the poor ability of subtest scatter to accurately discriminate between clinical and non-clinical subgroups (e.g., Kavale & Forness, 1984; Macmann & Barnett, 1997; Watkins, 2000). Nevertheless, Pfeiffer, Reddy, Kletzel, Schmelzer, and Boyer (2000) surveyed 354 nationally certified school psychologists regarding their use and perceptions of profile analysis for the diagnosis of SLD and reported approximately 70% of respondents believed the information obtained from profile analysis was clinically meaningful and 89% of respondents declared that they used profile analysis for making diagnostic decisions. Yet, Fletcher et al. (2013) argued, “It is ironic that methods of this sort [PSW] continue to be proposed when the basic psychometric issues are well understood and have been documented for many years” (p. 40). Hale and colleagues (2010) counter this critique by arguing that previously documented shortcomings of profile analysis do not hold for contemporary PSW models as a result of the development of more sophisticated measurement instruments and advances in cognitive and neuropsychological theory (e.g., McGrew, 2009). Although associated genetic and neuroimaging studies are important, they are not instructive for evaluating the clinical utility of the PSW models.
Poor Reliability of Latent Factor Scores
PSW models encourage clinician’s to interpret factor-based scores (e.g., broad ability indexes and composites) as primarily indicating orthogonal broad cognitive abilities. However, investigators have reported that many measures of specific abilities contain large proportions of g variance and relatively little unique variance that can be attributed to the abilities purported to be estimated by those measures (e.g., Canivez, 2011; Canivez & Watkins, 2010; Dombrowski & Watkins, 2013; Styck & Watkins, 2014; Watkins, 2006). Given the multidimensional nature of intelligence, additional research is needed to provide clinicians with a procedure for disentangling the many sources of construct irrelevant variance that contaminate measures of broad abilities (Canivez, in press; Reise, Bonifay, & Haviland, 2013). In the absence of such a procedure, the generation of reliable and valid inferences from cognitive profile data can at best be presently described as aspirational (Watkins, 2000). According to Beaujean et al. (2014), multidimensionality is not the problem per se, the problem occurs when interpretations of individual cognitive abilities and their related composites “fails to recognize that Stratum II factors derived from higher order models are not totally independent of g’s influence” (p. 800). As a result, it may be more useful to re-conceptualize broad cognitive abilities as different “flavors” of g (Carroll, 1993) in contrast to more discrete indicators of unique abilities.
Not surprisingly, complimentary studies assessing the incremental predictive effects of cognitive test scores have consistently found that observed factor-based scores rarely account for meaningful proportions of achievement scores after controlling for the strong predictive effects of the general intelligence score (e.g., Canivez, 2013; Glutting et al., 2006; McGill, 2015; McGill & Busse, 2015). Frazier and Youngstrom (2007) suggest that the inability of factor-based broad ability measures to consistently provide incremental predictive effects beyond the full scale IQ (FSIQ) score may be due to a variety of reasons including but not limited to (a) poorly defined measures, (b) measures that lack adequate measurement specificity, and (c) potential overfactoring of cognitive test batteries. For example, independent factor analytic investigations using more conservative procedures for factor extraction and variance partitioning (e.g., Canivez, 2008; Dombrowski, 2013; Strickland, Watkins, & Caterino, 2015) have generally failed to replicate the reported structure in many contemporary test manuals. This is not to suggest that there are not unique cognitive abilities apart from general intelligence that may be useful for practitioners to examine when determining whether an individual has SLD. However, practitioners must be mindful of the limitations of working with observed-level data when interpreting such information (Schneider, 2013).
Lack of Information Regarding Long-Term Stability of Observed Factor Scores
Another potential threat to valid factor-based score interpretation, as advocated in PSW models, involves the stability of these scores over test–retest intervals. Although the results of test–retest studies are often reported in technical and interpretive manuals for cognitive measures, the results of these studies often do not provide users with information as to the long-term stability of cognitive test scores. For example, in the technical manual for the Kaufman Assessment Battery for Children–Second Edition (KABC-II; A. S. Kaufman & Kaufman, 2004), adjusted average stability coefficients for the CHC-related cognitive composites ranged from .76 (Visual Processing) to .95 (Crystallized Ability) for ages 7 to 18. Although these coefficients generally meet established standards for evidence of temporal stability (e.g., Wasserman & Bracken, 2013), it is important to highlight that these coefficients were obtained over a relatively short re-test interval (12–56 days) with a relatively small sample of participants (n = 205). Although short-term stability is important, demonstrating the long-term stability (e.g., 1–3 years) of cognitive scores is critical as these scores provide the basis for potential long-term placements in special education and related intervention programs (Cronbach & Snow, 1977).
Recently, Watkins and Smith (2013) reported that 29% to 44% of WISC-IV composite scores (i.e., Verbal Comprehension Index, Perceptual Reasoning Index, Processing Speed Index, and Working Memory Index) for a sample of 344 students referred for special education evaluations demonstrated differences ≥ 10 standard score points across a mean test–retest interval of 2.84 years. These results suggest that an individual’s pattern of cognitive scores is not stable over time. Future research is needed to replicate this work with other measures of cognitive ability. It also is important to note that scores from intelligence tests are mediated by error that is unique to each testing situation and is not stable over time (see Figure 1). In addition, these effects may be exacerbated when different examiners assess the same individual across re-examination periods (see McDermott, Watkins, & Rhoad, 2014). Many PSW models (e.g., C/DM) encourage practitioners to engage in ongoing data-based decision making over time to protect against the well-known psychometric deficiencies of point in time profile analysis. However, more research is needed to determine whether the individual strengths and weakness generated from PSW assessment are sufficiently stable to permit the diagnostic inferences as required in current federal and state regulations.

Sources of influence on an individual’s performance on a cognitive measure.
Do School Psychologists Have the Training Necessary to Implement a PSW Model With Integrity?
In contrast to actuarial methods (i.e., the discrepancy model), PSW emphasizes the clinical judgment of the assessor in searching for patterns of strengths and weaknesses and identified areas of low achievement (Schultz, Simpson, & Lynch, 2012). According to Flanagan, Alfonso, and Ortiz (2012), “SLD identification is complex and requires a great deal of empirical and clinical knowledge on the part of practitioners” (p. 666). Thus, school psychologists must have advanced training and preparation in the theory of cognitive abilities, causal cognitive–achievement relationships, and advanced psychometrics and test interpretation for PSW to be implemented with fidelity in educational settings (Fiorello, Hale, & Wycoff, 2012; Weiner, 1989). However, Decker, Hale, and Flanagan (2013) argued that school psychology training programs do not adhere to evidence-based assessment practices due to (a) an overemphasis of interpreting the FSIQ score, (b) a gradual decline in the breadth and scope of the cognitive assessment training sequence as a result of RTI implementation, and (c) poor training in linking cognitive test results to evidence-based interventions. As a result, Decker et al. conclude that the cognitive assessment training sequence in many school psychology programs must be overhauled if the promise of cognitive testing is to be realized with contemporary measures, regardless of the method being used for SLD identification.
These criticisms are especially prescient given the multitude of diagnostic errors that are possible when engaging in interpretive procedures with multiple sources of data that place a premium on clinical judgment (Dawes, Faust, & Meehl, 1989; Garb, 2005; Watkins, 2009), notwithstanding standardized administration and scoring errors commonly identified on intelligence test protocols (Styck & Walsh, 2015). According to Gambrill (2005), clinical predictions often overlook a number of confounding causes, such as “the play of chance and misleading effects of small biased samples” (p. 453). A consistent theme across all of the PSW models examined in the present review was the need for users to integrate multiple sources of data to enhance the ecological and treatment validity of diagnostic decisions. Accordingly, Fiorello and colleagues (2012) suggest that “CHT [C/DM] avoids many of the difficulties of this process [profile analysis] by confirming or disconfirming hypotheses with further data collection, including further testing of psychological processes beyond a single cognitive test” (p. 487). Prescriptive statements such as these in education are rarely justified and require adherence to high standards of empirical evidence (Marley & Levin, 2011). It is not clear how such default statements provide users with the appropriate guidance for avoiding errors in clinical judgment given that strengths and weaknesses are endemic in the population. As stated by Flanagan et al. (2011), Most individuals have statistically significant strengths and weaknesses in their cognitive ability and processing profiles . . . Therefore, statistically significant variation in cognitive and neuropsychological functioning in and of itself must not be used as de facto evidence of SLD. (p. 242)
While we agree with Flanagan and colleagues that ridged adherence to any one specific interpretive heuristic is likely to lead to method bias (Cook & Campbell, 1979), additional research is needed to determine how users are to effectively integrate multiple PSW data sources in a manner that facilitates improved diagnostic decision making.
Discussion
Similar to Dombrowski and Gischlar (2014), we argue that it is beneficial and important to evaluate proposed models of SLD identification in relationship to established ethical codes (American Psychological Association, 2004; National Association of School Psychologists, 2010), psychometric standards (e.g., AERA, APA, & NCME, 2014), and relevant educational laws (i.e., IDEA, 2004). Fiorello, Flanagan, and Hale (2014) claim that PSW (a) is the only empirically supported SLD identification model that comports with the statutory definition of SLD; (b) when used in concert with an RTI-based pre-referral intervention system, is more likely to result in correct identification of SLD; and (c) provides users with information relevant for developing individualized interventions. Although evidence for the efficacy of the PSW model is presently accumulating, we believe that wholesale endorsement of these claims at the present time is premature.
The purpose of this review was to raise a series of psychometric and conceptual questions that have yet to be addressed within the empirical literature regarding the PSW model. This information is vital to establishing evidence-based procedures for SLD treatment and assessment given the futility of previous attempts at validating aptitude–treatment interaction (ATI) (Fletcher et al., 2013). Most of the research cited by PSW proponents has investigated relationships between specific cognitive variables and achievement (e.g., McGrew & Wendling, 2010) or the degree to which SLD subtypes can be identified using these procedures (e.g., Carmichael, Fraccaro, Miller, & Maricle, 2014; Fiorello, Hale, & Snyder, 2006). As stated by Miciak, Fletcher, et al. (2014), “Evidence for the existence of distinct disability subtypes is not ipso facto evidence for the reliability, validity, or utility of PSW methods for LD identification” (p. 23). Nor does it obviate the fundamental measurement issues, common to all diagnostic models, which complicate SLD classification (Macmann et al., 1989). In sum, at this early stage in research on the PSW model, additional empirical evidence is needed to determine the degree to which PSW is a more robust and valid alternative to existing procedures, or simply a better mousetrap.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
