Abstract
Individuals trained in the use of cognitive tests should be able to complete an assessment without making administrative, scoring, or recording errors. However, an examination of 295 Wechsler protocols completed by graduate students and practicing school psychologists revealed that errors are the norm, not the exception. The most common errors included failure to administer sample items, incorrect calculation of raw scores, failure to record responses verbatim, and failure to query. Significant differences were found between specific error frequencies of students and practitioners. Adequate training in administering the Wechsler scales is clearly essential. Based on the outcome of this study, it is recommended that programs training students to administer cognitive assessments provide ample feedback, and that practicing psychologists maintain best practices and take part in continuing education regarding cognitive assessments.
With the widespread use of IQ testing, it is important that the administration and scoring of cognitive assessments be correct. Assessment errors may impact special education classification, educational decisions, social security benefits, and even death penalty determination. University programs bear the responsibility for adequately training examiners (Alfonso & Pratt, 1997). Research shows that graduate students studying assessment make mistakes throughout the learning process (e.g., Alfonso, Oakland, LaRocca, & Spanakos, 2000; Belter & Piotrowski, 2001), and that certified, trained professionals continue to make scoring errors while they practice (e.g., Slate & Jones, 1993; Slate, Jones, Murray, & Coulter, 1993).
This study evaluated administration and scoring errors made by students and practicing school psychologists to determine error type and frequency on the Wechsler Intelligence Scale for Children–Fourth Edition (WISC-IV; Wechsler, 2003a) and the Wechsler Adult Intelligence Scale–Fourth Edition (WAIS-IV; Wechsler, 2008a).
The Wechsler series is the most commonly taught cognitive assessment in graduate programs (Alfonso et al., 2000; Belter & Piotrowski, 2001; Cody & Prieto, 2000). This is not surprising given the scales’ historical reign as the most popular measure of intelligence, a pattern not expected to change soon (Kaufman, 2009). Several studies examining administration errors by graduate students on the Wechsler series found that failure to record response, incorrect point assignment, and failure to query were among the most frequent errors made (Alfonso, Johnson, Patinella, & Rader, 1998; Belk, LoBello, Ray, & Zachar, 2002; Conner & Woodall, 1983; Loe, Kadlubek, & Marks, 2007; Patterson, Slate, Jones, & Steger, 1995; Slate & Jones, 1990; Warren & Brown, 1972). These errors appear to be consistent across all Wechsler tests (e.g., WISC, WAIS) and different editions (Ramos, Alfonso, & Schermerhorn, 2009; Warren & Brown, 1972).
Slate, Jones, Coulter, and Covert (1992) and Slate and Jones (1993) examined Wechsler protocols completed by practicing school psychologists and found numerous errors on all protocols, the most common error being awarding too many points rather than too few when scoring responses. Those studies suggest that more practice did not necessarily result in fewer errors.
Brazelton, Jackson, Buckhalt, Shapiro, and Byrd (2003) and Erdodi, Richard, and Hopwood (2009) examined adherence to “manualized” scoring (precisely following scoring rules in the test manual). Brazelton et al. (2003) found that those who reported administering more than 100 Wechsler tests during their career made fewer errors than those who had administered 10 or fewer. They also reported a relationship between scoring errors and professional position, with those working in schools as school psychologists or psychometrists making fewer errors than those in other positions, such as licensed psychologists. Erdodi et al. (2009), however, found that practitioners with more clinical experience actually made more errors. They suggest that as practitioners gain more clinical experience, they often use memory to score a test rather than relying on the manual.
This study addresses the number and type of errors differentiated by the experience level of testers, as well as identifying the most common errors and the most problematic subtests. This is the first study examining both student and practitioner errors at the same time, using the same criteria, and uses a much larger sample size than other similar studies.
Method
The WISC-IV (Wechsler, 2003a) and WAIS-IV (Wechsler, 2008a) were the most current editions available at the time of data collection. Despite changes from the WISC-IV to the WISC-V (Wechsler, 2014), the overall methods for establishing basals and ceilings, calculation of raw score, conversion to standard score, and scoring verbal items remain the same. Therefore, the information in this study continues to be relevant even as the Wechsler series continues to be revised.
Data Collection
Graduate student data
This study used data from 295 protocols completed in cognitive assessment courses taught at a university in the northeastern United States. One course, taught each fall, was for PhD students in an American Psychological Association (APA) accredited clinical psychology program. Another, taught every spring, was for master’s/certificate students in a National Association of School Psychologists (NASP) approved school psychology program. None of the students had prior experience with cognitive assessments. Students administered at least four Wechsler assessments during the semester. Both courses met once a week for 2.5 hr during a 15-week semester, had the same instructor, and had the same teaching assistant (TA). Overall, 3 years of data from both courses were analyzed.
The course instructor (third author) is well published in the field of psychoeducational assessment and was consistent across all semesters. The TA (first author) has been consistent, and during data collection was a practicing school psychologist who had passed advanced graduate-level courses on cognitive assessment.
An error checklist (see Supplemental Material), developed by the instructor and TA, included all agreed-upon errors on the Wechsler scales. At the beginning of each academic year, the instructor and TA examined five random student protocols to evaluate agreement in error determination and change the checklist as necessary, but no changes to the checklist were deemed necessary during data collection. Changes to the checklist due to the revision of the WISC-IV to WISC-V have been made but were not included in this article because the measures analyzed are only the WISC-IV and WAIS-IV.
As part of the courses, each student was required to read the test manuals and conduct multiple practice administrations of the assessments. During each weekly class, the instructor highlighted specific administration procedures for the assessments. Students administered all subtests of each test and computed all scores by hand, using the procedures and tables in the manuals. Students were required to submit completed protocols for review. Each protocol was reviewed by the TA, marked for specific errors, and returned to the instructor. The instructor then checked the protocols for agreement of identified errors. Any discrepancy found was discussed and agreed upon before tabulating the errors using the checklist. Protocols were returned to the student, and the instructor reviewed errors during class.
Students were not allowed to administer another test until their protocols had been scored and returned to them with feedback. As their errors were corrected and feedback given, the expectation was that students would learn from their mistakes, learn how to administer the assessments, and understand the importance of carefully reading and following the test manuals so these skills could be generalized to other assessments. Lectures included general and specific testing procedures and addressed specific questions raised by students. All students were required to submit a videotaped administration, which was reviewed by the instructor, who provided feedback on any specific administration errors observed. The videotaped administration allowed identification and correction of errors not identifiable using a protocol alone (e.g., failure to scramble blocks on Block Design or positioning the protocol where the examinee could see it).
The individual errors identified by the checklist from each protocol were recorded onto a spreadsheet. Records of the errors were kept by the course instructor, and the de-identified database of error information was made available to the researcher. Participants could not be identified directly or linked in any way through identifiers. The students were instructed to administer the assessments to individuals who did not display apparent intellectual, physical, language, or sensory limitations; were not likely to be referred for evaluation; and did not require specialized testing accommodations.
Practitioner data
In total, 40 WISC-IV protocols, administered by practicing school psychologists, were gathered from a large urban public-school district in the northeastern United States. These cognitive assessments were typically done as part of eligibility determination for special education. All identifying information for examiner/examinee was removed prior to protocol examination and recording in a database. Descriptive information such as gender, ethnicity, years of experience, or number of assessments per year was not recorded in the database to protect practitioners’ anonymity. Both the instructor and the TA reviewed each of the practitioner protocols for errors and agreement as to any identified error.
Definition of Error
Adherence to the test-manual instructions is essential for obtaining accurate scores on standardized assessments. Failure to adhere to the manuals’ rules and procedures can greatly affect raw scores, scaled scores, index scores, and Full Scale Intelligence Quotient (FSIQ), which affects the effectiveness of utilizing the data to assist those who are tested.
Errors found in cognitive assessments can be described by type and source, which are described below. In addition, errors can be described as general errors (those that apply across various cognitive assessment batteries) or as test-specific (WAIS-IV/WISC-IV). Both general and test-specific errors were examined in this study.
Type of error
Administration errors are those related to the specific administration rules or procedures delineated by the individual test manual. For example, for establishing a basal on the WISC-IV Vocabulary subtest, the manual states that if the first two items administered do not receive perfect scores, the examiner must administer the preceding items in reverse order until two consecutive perfect scores are obtained. Examiners who neglect to administer the items in reverse order when clearly instructed to do so would be making an administration error.
Scoring errors are any other errors that result in incorrect totals for raw, standard, or index scores. Examples included assigning an incorrect score to a response, incorrectly tabulating a raw score, incorrect conversion of raw to standard score, and using incorrect scores (e.g., using Sum of Scaled scores instead of Index score) for analysis.
Recording errors are errors resulting from a failure to record specific information or to record verbatim responses. For example, failure to record completion time for timed items is a recording error that can result in incorrect raw scores. Failure to record complete responses was counted as a recording error. The WISC-IV Administration and Scoring Manual (Wechsler, 2003b) recommends and the WAIS-IV Administration and Scoring Manual (Wechsler, 2008b) requires verbatim recording of verbal responses; and accurate evaluation of responses and calculation of scores require verbatim transcriptions. It is impossible for the examiner or the reviewer to confirm the accuracy of item scores without a verbatim transcript of the response.
Even with verbatim transcriptions of responses, it is impossible to identify all possible examiner errors. For example, there are specific rules for querying certain ambiguous and incomplete responses, repeating items on most subtests, and for providing certain prompts. The examiner is required to use notations of “(Q),” “(R),” and “(P)” whenever providing permitted or required queries, repetitions, and prompts, and forbidden to give additional help or indicate whether a specific response was correct, but a reviewer cannot tell if the examiner offered unauthorized assistance without recording such violations of standardization rules. Similarly, there is no way to be sure that responses and completion times were recorded accurately or that the examiner read instructions to the examinee verbatim as required.
Source of errors
Manualized errors are errors caused by deviations from the specific instructions in the test’s manual. All manualized errors can be traced to a specific page within the manual. For example, the WAIS-IV Administration and Scoring Manual (Wechsler, 2008b) says to administer sample items on 12 subtests. Failure to administer the sample items is direct noncompliance with the manual and is considered a manualized error.
Arithmetic errors are errors in basic math. For example, incorrectly summing item raw scores is an arithmetic error, as is failure to calculate examinee test age correctly.
Best practices errors result from a failure to adhere to administration or recording procedures that will make clinical judgment and scoring choices clear to other individuals examining the protocol, for example, failing to record examinee responses verbatim even when not required. Because of the difference between the WISC-IV and WAIS-IV manuals, we counted failure to record responses verbatim as best practices rather than manualized errors. Standardized tests are created so that children can be administered the same test in the same fashion no matter who tests them and precisely as examinees were tested to create the test norms. One way to provide evidence for a test administration’s validity and for examiners to check their own scoring is to record verbatim responses that can show that the examiner scored the response correctly.
Online Appendix A lists general errors coded by type and source. Online Appendix B (see Supplemental Material) lists all errors by subtest for the Wechsler tests coded by type and source.
For the purpose of this study, only initial errors were counted; calculation errors that resulted directly from initial errors were not counted. For example, if the examiner incorrectly recorded a raw score, this incorrect raw score could result in a subsequent scaled or standard score error; however, only the incorrect raw score was counted as an error. All subtests were examined for the following analyses.
Results
A total of 295 WISC-IV and WAIS-IV protocols from master’s students (n = 141), PhD students (n = 114), and practitioners (n = 40) were examined. Mean number of errors calculated for protocols completed by the master’s students, PhD students, and practitioners were 4.56, 5.57, and 4.55 errors, respectively.
General errors were first examined by type and source. When errors were grouped by either type or source, a clear pattern emerged. Table 1 shows the number and percentage of errors committed by new and experienced testers, as well as the breakdown by type and source errors. For the type of error, regardless of the experience level of the examiners, more than half (55%-59%) were classified as administrative in nature. For the other types of errors (scoring and recording), a difference emerged between the new and experienced testers. For both master’s and PhD students, more scoring errors (28% and 26%, respectively) were made than recording errors (16%). However, for practitioners, the opposite was seen, with more recording errors (24%) than scoring errors (17%).
Frequency (Number and Percentage) of General Errors by Type and Source From Wechsler Protocols Completed by New or Experienced Testers.
Note. Percentages are rounded to the nearest whole number. NE = number of errors; n = number of protocols available for error review.
When the source of the general errors was examined, manualized errors were most prominent, between 72% and 78% of source errors regardless of level of training and experience. The remaining sources of errors (arithmetic and best practices) were fairly evenly split for the master’s and PhD student examiners (16% vs. 12% and 13% vs. 12%, respectively). However, practitioners made far more best-practices errors (18%) than arithmetic errors (4%).
All protocols were examined for the specific errors made, regardless of the type and source of errors. In total, 24 specific errors were defined (see Online Appendix A), and frequencies of those errors for examiners were tabulated. Table 2 shows the percentage of test records, by the three groups that had Top 10 general errors.
Frequency (%) and Rank Order (RO) of Specific General Errors Made by New or Experienced Testers on Wechsler Subtests.
Master’s and PhD students had similar patterns of error commission. In both groups, the top four errors were the same. Failure to administer sample/practice/teaching items (40% and 44%, respectively), incorrect calculation of raw score (39%, 38%), failure to record responses verbatim (37%, 42%), and failure to query the examinee when instructed by the manual (37%, 44%) were all problematic areas for individuals learning to administer cognitive assessments.
For the practitioners, a different pattern emerged. Incorrect calculation of raw score, while the second and fourth most common error for the master’s and PhD groups, respectively, was the ninth most common error for the practitioners. Instead, failure to record responses verbatim (58%, Rank Order 1), failure to query when instructed by the manual (55%, Rank Order 2), failure to administer sample/practice/teaching items (43%, Rank Order 3), inappropriate start points (40%, Rank Order 4), and failure to query items with specific query criteria (33%, Rank Order 5) were the most common errors for the practitioners. This pattern of practitioners making fewer arithmetic errors and more manualized errors is consistent with prior studies (Belk et al., 2002; Loe et al., 2007; Slate & Jones, 1990).
Although this study counted as errors the lack of administration of sample items, without in-vivo observation of assessment administration, it cannot be determined whether the items were administered and not recorded, or not administered at all. However, the existence of this issue warrants identification and investigation. Sample and demonstration items are specifically part of the manualized administration, and failure to administer them could directly impact the test-taker’s performance on subsequent items.
Table 3 presents the percentage of occurrence and cumulative percentages of specific errors made by new or experienced testers on Wechsler subtests. Regardless of the experience of the examiner, errors are the norm. More than half of the protocols, regardless of examiners’ education levels, contained between 1 and 4 errors (66% of master’s students, 68% of PhD students, and 58% of practitioners). Overall, protocols completed by master’s students had between 0 and 13 errors (Median 2), protocols completed by PhD students had between 0 and 22 errors (Median 2), and protocols completed by practicing psychologists had between 0 and 14 errors (Median 3). While approximately 10% of the protocols of the master’s and PhD students were error-free, only one (3%) of the protocols completed by practitioners was totally error-free.
Frequency (%) and Cumulative Percentage of Specific Errors on Wechsler Subtests by Experience Level.
Note. All percentages were rounded to the nearest whole number, so percentages do not add up precisely to 100; n = the number of Wechsler (WISC-IV and WAIS-IV) protocols examined. New testers were the master’s and PhD students. WISC-IV = Wechsler Intelligence Scale for Children–Fourth Edition; WAIS-IV = Wechsler Adult Intelligence Scale–Fourth Edition.
To further examine where errors were made on the Wechsler tests, errors on each individual subtest were tabulated. Table 4 presents the percentage of protocols with at least one error made by master’s level students, PhD level students, and practitioners on specific core Wechsler subtests. All of the 10 core subtests had at least one error, although the percentage of errors across subtests varied widely. Picture Concepts (7%) and Coding (2%) had the lowest percent of protocols with errors while Block Design had the most (56%) and Similarities, Vocabulary, and Comprehension all had 44% or 43%. Of the 10 core Wechsler subtests examined, three (Block Design, Picture Concepts, and Matrix Reasoning) showed significant differences between the numbers of protocols with errors based upon the examiner’s experience level (χ2 = 7.186, p = .03; χ2 = 42.268, p ≤ .01; χ2 = 18.371, p = .001, respectively). For these three subtests, the practitioners made errors on significantly more protocols than did the less experienced examiners. On Block Design, approximately 50% of the protocols completed by students had at least one error, whereas 73% of the practitioner protocols had at least one error. On Picture Concepts, only 3% and 4% of the protocols completed by master’s and PhD students had at least one error, compared with 33% of the practitioner protocols. On Matrix Reasoning, 17% and 25% of the protocols completed by master’s and PhD students had at least one error, whereas 50% of the practitioner protocols had at least one error. Practitioners, as a group, made more errors on those subtests than did the students. One additional subtest, Digit Span, had a significant difference between the number of examiners of different educational levels making errors (χ2 = 8.290, p = .0158), with the PhD students making significantly more errors (30%) than the master’s group (15%) or the practitioner group (23%).
Percentage of Protocols With at least one Error on Specific Wechsler Subtests by Experience Level.
Note. All percentages are rounded to the nearest whole number.
Discussion
Results of this investigation demonstrate that errors on Wechsler cognitive assessments are common, regardless of examiner’s educational level, the test battery used, or the subtest examined. It is clear that errors persist, and in some cases increase, regardless of the training received or the experience gained after finishing graduate-level educational programs. It may be a bit more understandable for inexperienced graduate students to make errors because they are learning the tests and not using them in some official capacity. It is, however, very distressing to see that practitioners in the field, presumably well trained and doing assessments specifically to make educational decisions, make so many errors. In this study, errors were divided into type and source and an examination ensued regarding which errors fell into these categories. However, a simpler view of the error delineation could be termed as real versus best-practice error. For example, an incorrect calculation of a raw score is a real error, as it is very clearly spelled out and delineated in the test manual. In contrast, failure to record responses verbatim is a best-practice error because failure to record responses does not necessarily have a significant impact on the final scores, although best practice would suggest that it should be done. The WAIS-IV (Wechsler, 2008b) and WISC-V (Wechsler, 2014) both unequivocally state that it must be done, while the WISC-IV (Wechsler, 2003b) is less clear. Professionals should strive for best practice to ensure defendable results. Although not as clearly spelled out in the test manuals, best-practice errors can greatly reduce the validity of test results. As noted earlier, the practical implications of this type of error could be assigning an incorrect point value to a difficult-to-score response that was not recorded verbatim or difficulty in defending one’s results in a legal situation. Failure to record responses verbatim makes it impossible for examiners to double-check their scoring or solicit peer supervision later on.
Given the nature of this study, it is impossible to determine the actual numerical effect of such errors. For example, if an examiner made an error in basal or ceiling, there is no way to calculate whether items not administered would have been passed or failed, thus changing the raw-score-to-scaled-score conversion. Therefore, it is impossible to determine what the examinee’s raw, scaled, and Index scores “should have” been, had the assessment been error-free.
Analyzing the raw, scaled, or Index score change as impacted by error is an area for further study. However, simply analyzing the “correct” scoring of an inherently flawed administration gives limited additional information as to how much these errors distort the score from that of a perfectly administered assessment. The calculation of any potentially “correct” result is impossible.
Practitioner Error
The results obtained from this study confirm that errors made on standardized cognitive assessments by graduate students and practitioners are the norm rather than the exception. Consistent with prior research, failure to record responses verbatim was the most common error for practitioners (Alfonso et al., 1998; Belk et al., 2002; Loe et al., 2007). Surprisingly, the results also revealed that the experienced testers often made more errors than students. This finding has specific implications for the field of standardized assessment. Although it may be difficult to reach and persuade them, practitioners should be cautioned not to rely only on their memory for the scoring of verbal responses, and they should be admonished to double-check their administration procedures, tabulation of raw scores, and adherence to standardized procedures and best practices, to maintain accurate scoring. Practitioners administering the WISC-IV not only failed to record responses verbatim, but frequently used incorrect start points, failed to administer practice items, and failed to query the examinee when required, all errors that could alter examinees’ scores. The practitioners did not make as many arithmetic errors as graduate students, but made errors that tended to minimize testing time. For example, practitioners often disregarded the correct start points, instead beginning the test with more difficult items. Practitioners may have inferred that the examinee knew the initial items and not wanted to take the time for these “obvious” answers. Although this reasoning may seem logical, it goes against the administration directions in the manual, clearly violates standardization procedures, and deprives the examinee of possibly helpful practice the examinee should have had. Another common practitioner error was to give point values to responses without querying where the manual makes clear a query is needed. This is possibly due to examiners relying on their memory to award points rather than actively looking at a manual during administration, or possibly an attempt to save time. Overall, this pattern points to a failure to administer items with accuracy and best practices, and is consistent with Erdodi et al. (2009), who also suggested that practitioners may utilize their memory rather than using the manual to score responses. As they gain experience, some practitioners may also come to believe that their nonstandardized administration practices are superior to test-manual requirements. Practitioners have completed required courses in assessment administration and ostensibly know how to correctly administer assessments; therefore, we presume that errors made are by choice or carelessness and not unfamiliarity with materials. Regardless of the cause of these errors, continued training after certification may assist in minimizing these bad habits in favor of error-free administration. It would be helpful if state licensing boards and departments of education as well as national and state psychological organizations required continuing education in test administration and scoring.
Student Error
The most common errors found for students learning to administer the WISC-IV and WAIS-IV were as follows: failure to administer sample items, incorrect calculation of raw score, failure to query the examinee, and incorrect materials/instruction. Most of these errors appear to be due to failure to read and learn the manual carefully. Individuals administering cognitive assessments must rely on the manual as well as the test record, as both provide clearly marked sample-item information for every subtest. The manual also clearly delineates the appropriate materials to use in testing (such as a red pencil for the WISC-IV Cancelation subtest) as well as querying rules.
Although incorrect raw score calculation was often an arithmetic error, there also seemed to be confusion regarding whether to include in the overall raw score items that were not administered. Individuals learning to administer cognitive assessments should look to the test manual for correct methods of addition of the raw scores. Rectifying these errors can go a long way toward reducing manualized error.
Unlike the practitioners, students are in the process of completing their first assessment course and thus we presume that errors are due to unfamiliarity with materials and not to carelessness or a reliance on memory.
Wechsler Intelligence Scales
Some specific aspects of the WISC-IV and WAIS-IV may contribute to examiner error. The Wechsler series contains subtests that are often very different from each other, frequently with differing basal and ceiling rules. This diversity may cause errors if the tester erroneously generalizes rules from one subtest to another. Lengthy basals and ceilings may make administration of a subtest seem long and frustrating for both the experienced examiner and the test taker, leading practitioners to attempt to shorten administration time. The WISC-V has made basal and ceiling rules shorter and more consistent than on the WISC-IV.
A positive aspect of the current Wechsler series is the designated areas to record examinee responses (larger than in previous editions). This space allows examiners to more easily complete a verbatim recording.
Interestingly, the newest iteration (the WISC-V) claims interrater reliability of the Verbal Comprehension subtests to be .97 to .99. This claim warrants further study, as this finding is not within the overall trend found by previous research (Alfonso et al., 1998; Brazelton et al., 2003; Erdodi et al., 2009; Slate & Jones, 1993; Slate et al., 1992), and the error pattern found on the Wechsler tests examined in this study suggests the Verbal Comprehension subtests are highly prone to variability in administration and scoring. One wonders if the test publishers have special methods for training raters that could be used by university training programs.
Recommendations for Assessment Courses
Trainers should even further emphasize the need to maintain standardized assessment consistent with instructions in test manuals and best practices. Students need to be warned about the danger of gradual examiner drift away from standardized administration and scoring and must be persuaded that normative scores from standardized tests are meaningless if standardized procedures are not followed. Trainers should use specific examples to discuss the serious consequences of even small errors in test scores and consequent misclassifications of examinees. The most dramatic examples would be death penalty cases in which IQ scores help decide between life imprisonment and execution (e.g., Atkins v. Virginia, 2002; Hall v. Florida, 2014; Hensl, 2011). Questions about these issues should be included in final examinations.
Recommendations for Practitioners
The results of this study revealed that only 3% of practitioner-completed protocols and only 10% of the student-completed protocols were error-free. Given the significant impact that incorrectly completed protocols can have on an individual’s overall profile, this is highly problematic. Practitioners are urged to participate in continuing education opportunities in assessments, to continuously strive for error-free completion of protocols, and not to rely on memory to score responses. In addition, practitioners should always check whether a response warrants querying, and follow standardization procedures outlined in test manuals, even when doing so lengthens administration time. Examiners can be taught to draw a neat line through the “(Q)” and subsequent response and ignore the response if they later discover the query was not called for. School psychologists may also benefit from occasionally consulting with colleagues to double-check scoring of protocols, perhaps through a redacted-protocol-exchange system. Not only would this practice help practitioners become aware of their own errors, but the responsibility of checking another psychologist’s protocol for accuracy would require significant manual review. Above all, practitioners need to understand and accept the importance of accurate, standardized test administration (Lee, Reynolds, & Willson, 2003).
Recommendations for Future Research
This study did not examine the impact of examiner error on derived standard scores. Some errors, such as omitting sample items, do not directly affect item scores but might depress performance on an entire subtest. Also, errors that occur as a direct result of a previous error were deliberately not counted. Future research in this area is recommended, particularly in how errors can have a compounding effect. As assessments are updated to current versions, new manuals are made and new test procedures are put in place. Continued research on cognitive assessment errors is recommended, as errors are clearly the norm rather than the exception as demonstrated by this and all previous studies. Certain descriptive variables were not collected, such as gender and ethnicity. It is possible that the differences between graduate students and practitioners may have been impacted by these confounding variables. Also, other descriptive variables regarding the practitioners were not collected, such as years in practice and average assessments conducted per year. These variables might shed more light on the reasons practitioners in this study appear more susceptible to certain types of errors.
Supplemental Material
Appendices_for_JPA-18-0038.R1 – Supplemental material for Wechsler Administration and Scoring Errors Made by Graduate Students and School Psychologists
Supplemental material, Appendices_for_JPA-18-0038.R1 for Wechsler Administration and Scoring Errors Made by Graduate Students and School Psychologists by Erika Oak, Kathleen D. Viezel, Ron Dumont and John Willis in Journal of Psychoeducational Assessment
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material is available for this article online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
