Test Review: Kaufman,A. S.,& Kaufman,N. L. (2014). Kaufman Test of Educational Achievement,Third Edition

Abstract

Test Description

General Description

The Kaufman Test of Educational Achievement, Third Edition (KTEA-3) is a revised and updated comprehensive academic achievement test (Kaufman & Kaufman, 2014). Authored by Drs. Alan and Nadeen Kaufman and published by Pearson, the KTEA-3 remains an individual achievement test normed for individuals of ages 4 through 25 years, or for those in grades prekindergarten (PK) through 12 and above. Based on a clinical model of academic skills assessment in the broad areas of reading, mathematics, and written and oral language, the KTEA-3 follows the Cattell–Horn–Carroll (CHC) or Information Processing theoretical assessment approach. Detailed information regarding the composites’ structure and rationale for changes to subtest inclusion and/or exclusion is provided. Updates assess learning disabilities according to the Individuals With Disabilities Education Improvement Act (IDEIA; 2004) or the Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-V; American Psychiatric Association [APA], 2013) criteria. Norm-referenced for diagnostic and classification purposes, the KTEA-3 offers criterion-referenced pattern analyses of errors or individual strengths and weaknesses to facilitate intervention planning.

Test administrators are expected to hold a graduate degree or a bachelor’s degree with specific training in standardized test administration. Including updated norms, items, instructions, graphics, and expanded subtest floors and ceilings, 19 available subtests are selected based on examinee’s educational level. Formatted similarly to its previous editions, the core battery includes two reading, two math, and two written language subtests, and four new subtests provide in-depth assessment of reading-related skills and academic fluency.

Depending on the examinee’s educational level, administration time ranges between 24 and 85 min. Specific prompts, sample, and teaching items individualize administration and ensure that low scores are not due to the subject’s failure to understand instructions. The KTEA-3 continues to use item blocks, based on educational level, for the Reading Comprehension, Listening Comprehension, Written Expression, and Oral Expression subtests, with instructions for establishing each basal. A recommended administration order is only given for two subtests (i.e., Letter and Word Recognition before Word Recognition Fluency and Nonsense Word Decoding before Decoding Fluency), verifying that the examinee has the requisite skills to complete the latter. See Table 1 for a summary of KTEA-3 subtests and composite structure.

Table 1.

Summary of KTEA-3 Structure.

Composite	Grade(s)	Contributing composite/subtest scores
Academic Skills Battery	PK-12+	Reading, math, and written language
Reading	PK-K	LWR
	1-12+	LWR + RC
Math	PK	MCA
	K-12+	MCA + MC
Written language	PK	WE
	K-12+	WE + SP
Reading-related composites
Sound-symbol	1-12+	PP + NWD
Decoding	1-12+	LWR + NWD
Reading fluency	1-12+	WRF + DF + SRF
Oral composites
Oral language	PK-12+	OE + LC + AF
Oral fluency	PK-12+	AF + ONF
Cross-domain composites
Comprehension	PK-12+	RC + LC
Expression	PK-12+	WE + OE
Orthographic processing	1-12+	WRF + SP + LNF
Academic fluency	3-12+	DF + MF + WF

Note. KTEA-3 = Kaufman Test of Educational Achievement, Third Edition; PK = prekindergarten; K = kindergarten; LWR = Letter and Word Recognition; RC = Reading Comprehension; MCA = Math Concepts and Applications; MC = Math Calculation; WE = Written Expression; SP = Spelling; PP = Phonological Processing; NWD = Nonsense Word Decoding; WRF = Word Recognition Fluency; DF = Decoding Fluency; SRF = Silent Reading Fluency; OE = Oral Expression; LC = Listening Comprehension; AF = Associational Fluency; ONF = Object Naming Facility; LNF = Letter Naming Facility; MF = Math Fluency; WF = Writing Fluency.

Specific Subtest Descriptions

Reading

Letter and Word Recognition

The examinee identifies and pronounces letters and words.

Nonsense Word Decoding

For measuring sound-symbol awareness, nonsense words are read aloud.

Reading Comprehension

Early items involve matching a symbol or word(s) to a corresponding picture, reading a one-step instruction, and performing the action. Later items require the examinee to read a passage and answer comprehension questions.

Reading Vocabulary

Early items require the examinee to select one word from an array of three, synonymous to a given target word. On remaining items, the examinee reads a sentence and provides a synonym for target words.

Word Recognition Fluency

Word Recognition Fluency measures basic word reading fluency across two 15-s trials.

Decoding Fluency

In two 15-s trials, the examinee reads a list of nonsense words as quickly as possible.

Silent Reading Fluency

The examinee reads as many simple sentences as possible and indicates whether or not they are true or false within 2 min.

Mathematics

Math Concepts and Applications

The examinee applies math concepts, principles, and procedures to real-life situations.

Math Computation

The examinee writes solutions to math calculation problems.

Math Fluency

In an array of mixed operation problems, the examinee answers as many as possible in 60 s.

Writing

Written Expression

Level 1 items require tracing, copying, and writing letters, words, and a sentence from dictation. Levels 2 and above require writing dictated sentences, completing sentence stems, combining sentences, editing capitalization and punctuation, and writing an essay.

Spelling

Early items measure letter-sound awareness; later items require the examinee to spell dictated words.

Writing Fluency

The examinee writes a sentence about a target picture, completing as many items as possible in 5 min.

Oral Language

Listening Comprehension

After listening to a sentence or recorded passage, the examinee answers comprehension questions asked by the examiner.

Oral Expression

The examinee orally provides a complete sentence describing photographs, using target word(s) or specified phrases.

Associational Fluency

Given a semantic category, the examinee provides as many words as possible in 60 s.

Language Processing

Phonological Processing

The examinee manipulates sounds within words.

Object Naming Facility

Using a full-page card, the examinee names as many pictured objects as quickly as possible.

Letter Naming Facility

Using a card with a mix of upper and lowercase letters, the examinee names the letters as quickly as possible.

Test Manuals and Scoring System

The Administration and Scoring Manuals provide detailed instructions for use of and interpretation of subtests and composites. The 690-page Technical and Interpretive Manual may be accessed using Adobe Acrobat or another portable document format (PDF) reader. Organized using bookmarks and/or appendices, users may select (by clicking) the appropriate section, taking readers instantly to the desired page/table.

The KTEA-3 offers two scoring options, Q-interactive, Pearson’s web-based platform, or hand scoring with the Technical and Interpretive Manual available on the included flash drive; however, the discrepancy analysis of strengths and weaknesses for learning disability evaluations is not available for hand scoring. Q-interactive scoring options require an annual site license fee in addition to a flat rate charge per user and per test. Also, the Written Expression, Reading Comprehension, Listening Comprehension, Word Reading Fluency, and Oral Expression subtests require the user to use a “weighted raw score” (available in the manual if not using Q-global) to derive the subtests’ standard scores. Hand scoring forms must be printed out for subtest, composite, and intraindividual comparisons.

The KTEA-3 has a three-tiered scoring system, with a global Academic Skills Composite, broad academic area composites, and specific subtest or skill scores. Age-based norms are divided into narrower intervals for younger ages and wider intervals after age 14 to allow greater differentiation of individual performance. Grade-based normative information was obtained for fall and spring administrations, with interpolated performance for winter norms, allowing for more precise measurement. Each standard score has a mean of 100 and standard deviation of 15. Qualitative descriptors for examinee performance are based on a 10-point or a 15-point classification system, allowing the user to match descriptors across cognitive and other achievement measures. Additional age and grade equivalents, percentile ranks, normal curve equivalents, growth scale values, and stanines are available.

Test Materials and Stimuli

The KTEA-3’s materials are similar to those of its predecessor. The Administration Manual, Scoring Manual, and Technical and Interpretive Manual are all available for separate purchase as a printed/bound book; however, the manuals, audio files, scoring keys, hand scoring forms, letter checklist, qualitative observations form, and error analysis forms are all on the included flash drive. Audio files may be transferred to and/or played from a computer, laptop, or smart phone, or they may be copied to a CD. Administration requires two test stimulus easels, the examiner’s record form, a subject response booklet (for Math Fluency, Silent Reading Fluency, Writing Fluency, Spelling, Math Calculation, and Written Expression Level 1), and response booklets for Written Expression Levels 2, 3, or 4.

Test Construction and Item Analysis

Using Rasch calibration parameters, start points, reversal rules, and discontinue rules were established using the national standardization process results (Kaufman & Kaufman, 2014). Offering two parallel forms, Forms A and B, scores on both forms were calibrated using equating strategies, including common person, equipercentile, combining grade-appropriate item sets, and special linking studies. For the Reading Comprehension, Listening Comprehension, Written Expression, Oral Expression, and Word Recognition Fluency subtests, vertical scaling using item response theory (IRT) concurrent calibration was conducted individually for each form to determine start and stop points. Vertical scaling involves linking the total raw scores from different item sets (within subtests). Next, parallel form equating was completed using the equipercentile method and the entire grade-norm sample. As a result, item order is not strictly based on difficulty. Instead, these subtests contain item sets with decision points to continue or discontinue testing.

Item analyses were completed using IRT, specifically a Rasch analysis in conjunction with the software program WINSTEPS (Linacre, 2005). Items were dropped from several subtests due to gender and ethnicity biases (e.g., words and terminology) based on expert reviewers’ recommendations. Other methods for detecting and reducing bias included removing items that demonstrated poor fit and/or correlations with the other subtest items as explained by IRT.

Standardization Sample

Standardization data were collected over approximately one calendar year (August 2012-July 2013). Two sample data sets were derived, age norms (n = 2,050) and grade norms (n = 2,600), with data for grade norms collected in the fall and spring to obtain expected skill levels at those times of the year. Age and grade norms were calibrated using data from the students at a specific grade level who were in the expected age for each grade. This procedure yielded approximately 1,300 students in the fall (grade-based) normative sample and 1,300 students in the spring normative sample. Likewise, there were 1,025 females and 1,025 males in the age-norm samples. The KTEA-3 normative sample was stratified and matched the population in the United States, based on the U.S. Census Bureau’s American Community Survey 2012 1-year period estimates (Ruggles et al., 2010; although citation is 2010, reported census data are from 2012). Parent and subject education levels, ethnicity, and regional origins closely matched the U.S. population estimates. Six special groups are included in the representative sample, including individuals with specific learning disorders in reading, written expression, and mathematics; language disorder; mild intellectual disability; attention-deficit/hyperactivity disorder (ADHD); and academically gifted.

Adequacy of Reliability Estimates

Internal consistency reliability for both forms was estimated using the Spearman–Brown formula. Tests containing interdependent items, however, such as the Reading Comprehension and Written Expression item sets, were analyzed as one unit within the same half test (Kaufman & Kaufman, 2014). Split-half reliability coefficients overall yielded sufficient Pearson r correlations (.54 ≤ r ≤ .99), and mean correlation coefficients for composite scores ranged from the .70s to .90s (.72 ≤ r ≤ .98). For the Academic Skills Battery (ASB) composite score, the correlation coefficient is extremely reliable (r ≥ .97). Across three individual grade-level groupings (e.g., PK-2, 3-6, 7-12), 306 examinees were administered parallel tests between 1 and 14 days apart (mean test interval 7.5 days, SD = 5.9). For the composites, alternate form reliability ranged from .76 to .96, with the most variability for the reading understanding item set across grade groups (.76 ≤ r ≤ .91). For the Oral Expression and Written Expression subtests, the interrater agreement for both forms fell between 90% and 95%. For Reading Comprehension, Listening Comprehension, Writing Fluency, and Associational Fluency, interrater agreement in scores was greater than 98%.

Validity Estimates

Validity data for the KTEA-3 reflect multiple statistics, including effect sizes, p values, t values, and probability values with a reported significance level of α = .05. Existing intercorrelations between the subtests and corresponding composite standard scores range from the .70s to .80s for reading, math, written language, sound-symbol, decoding, and reading fluency. Intercorrelations for the oral language and oral fluency composites are lower, generally in the .40s and .50s. Confirmatory factor analysis (CFA) revealed correlations between the KTEA-3’s four factors (i.e., mathematics, reading, written language, oral language) and nine subtests (.50 ≤ r ≤ .92). High correlations within the oral language, reading, and written language factors suggest a degree of dependence within these achievement domains, specifically for language development. As a fifth factor, reading fluency was similarly intercorrelated (.51 ≤ r ≤ .93).

Concurrent validity for the KTEA-3 was evidenced through administration of additional achievement batteries, including the Kaufman Test of Educational Achievement, Second Edition (KTEA-II; Kaufman & Kaufman, 2004), Wechsler Individual Achievement Tests, Third Edition (WIAT-III; Pearson, 2009), Woodcock–Johnson Tests of Achievement, Third Edition (WJ-III Ach; Woodcock, McGrew, & Mather, 2001), and the Clinical Evaluation of Language Fundamentals, Fourth Edition (CELF-IV; Semel, Wiig, & Secord, 2003). All corresponding subtests and composites were correlated, yielding high to moderate correlations between subtests across batteries. Scores on the KTEA-3 maintained the highest correlations with the WIAT-III scores (.18 ≤ r ≤ .95) and lowest with the CELF-IV (.47 ≤ r ≤ .64).

Correlations between the KTEA-3 and measures of cognitive ability, including the Kaufman Assessment Battery for Children, Second Edition (KABC-II; Kaufman & Kaufman, 2004) and Differential Ability Scales, Second Edition (DAS-II; Elliot, 2007), were based on age norms. Employing a variability correction (Cohen, Cohen, West, & Aiken, 2003), global scores of cognitive ability demonstrated high correlations with the KTEA-3 ASB, with moderate to strong correlations between core academic and oral language composites (.65 ≤ r ≤ .75; Kaufman & Kaufman, 2014). As expected, correlations between the KTEA-3 and measures of cognitive ability are lower in comparison with correlations with alternate achievement measures due to the dichotomy between academic achievement and cognitive ability as individual constructs.

Across special groups, participants demonstrated expected performance across all domains and composites, suggesting that the KTEA-3 holds strong validity and clinical utility. In addition to small sample sizes for the special groups, limitations with the inclusion of the special group data include recruited participants versus random selection of examinees. Because diagnostic criteria were not evaluated in a uniform manner and interrater reliability was not assessed, participants may not be representative of the diagnostic category as a whole.

Conclusion

The most recent addition to the Kaufman series, the KTEA-3, is a revision of their broad-based academic skills assessment, yielding three core academic composites and 10 supplemental composites. Two parallel and nonoverlapping forms are available for the purpose of measuring academic progress allowing minimization of practice effects. Differing theoretical perspectives are offered by the manual, providing guidance for the assessment. One possible approach includes the CHC theory, utilizing the KTEA-3 subtests to assess eight broad and 20 narrow CHC abilities. Alternatively, examiners may employ an information processing approach, utilizing the KTEA-3 subtests to identify an individual’s underlying pattern of strengths and weaknesses. To accomplish this task, the administration manual provides a detailed table summarizing the demands of each subtest in terms of input received, process measured, and output required. Revisions align with both IDEIA’s (2004) specific learning disability categories along with the updated DSM-5 (APA, 2013) learning disorder criteria. Designed to measure academic achievement for grades PK through 12, or individuals between the ages of 4 and 25, the KTEA-3 assesses the population intended by the author with expanded content.

The standardization sample for the KTEA-3 was stratified to closely match the population based on a recent U.S. Census Bureau survey (Ruggles et al., 2010; although citation is 2010, reported census data are from 2012). Normative samples were broadly stratified based on age, grade, parent education level, ethnicity, geographic region, and special group designation. In addition, educational level was collected and controlled for participants between the ages of 19 and 25.

Item analyses evaluated both item difficulty and discrimination to adjust test bias. Except for oral fluency, composite reliabilities ranged from good (.80s) to excellent (.90s). The majority of the subtests also yielded reliability coefficients ranging from good to excellent, exception of Oral Expression (discussed previously). According to the publishers, lower reliabilities resulted from a limited ceiling for upper grades and older individuals, and the subtest was not developed to differentiate among examinees with above average oral expression skills. Thus, the Oral Expression subtest cannot be recommended for use with individuals above elementary ages. Construct validity was adequately established and verified using multiple methods, including intercorrelational comparisons, factor analyses, special group studies, and expert reviews.

Benefits to users include simplified administration procedures and instructions. In addition, both manuals (technical and interpretive), audio files, and supplemental forms and checklists are provided on an accompanying flash drive, allowing for more efficient navigation and portability. Similarly, the audio files may be played by any compatible electronic device. The use of prompts and teaching items ensure that low scores are not the result of misunderstanding the instructions. Decision points in the Reading Comprehension, Listening Comprehension, Written Expression, and Oral Expression subtests ensure that the final item set is not too simple, giving higher achieving examinees the chance to attempt more challenging items than possible using traditional ceilings. The expansion of the KTEA-3, including academic fluency and reading vocabulary measures, enhances the examiner’s diagnostic capabilities.

Two different scoring options include an online platform, Q-interactive, or hand scoring. Q-interactive scoring options require the site pay for an annual site license along with a fee based on the number of users and tests, potentially costly for some agencies. Although the hand scoring option is less expensive, it requires printing multiple forms for subtest, composite, and intraindividual comparisons. In addition, the hand scoring option does not yield the discrepancy analysis of strengths and weaknesses needed in the evaluation of learning disability. Although an error analysis is offered, describing students’ academic skills and recommend interventions, these results are not likely to provide enough specificity, in comparison with criterion-referenced tests, to plan specific instruction.

Overall, the KTEA-3 serves as a useful addition to currently available options for academic achievement assessment. High-quality materials, updated graphics, and new content appeal to examinees. New subtests and added supplemental composites increase diagnostic utility. As with any evaluation tool, results should be interpreted with careful consideration of both the tests strengths and technical limitations.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Arlington, VA: American Psychiatric Publishing.

Cohen

West

S. G.

Aiken

L. S.

(2003). Applied multiple regression/correlation analysis for the behavioral sciences. Mahwah, NJ: Lawrence Erlbaum.

Elliot

C. D.

(2007). Differential ability scales: Introductory and technical handbook (2nd ed.). San Antonio, TX: PsychCorp.

Individuals With Disabilities Education Improvement Act, 20 U.S.C. § 1400 (2004).

Kaufman

A. S.

Kaufman

N. L.

(2004). Kaufman Assessment Battery for Children (2nd ed.). Circle Pines, MN: American Guidance Service.

Kaufman

A. S.

Kaufman

N. L.

(with Breaux

K. C

). (2014). Technical & interpretive manual: Kaufman Test of Educational Achievement (3rd ed.). Bloomington, MN: NCS Pearson.

Linacre

J. M.

(2005). WINSTEPS Rasch measurement computer program. Chicago, IL: Winsteps.com.

Pearson. (2009). Wechsler Individual Achievement Test, Third Edition. San Antonio, TX: Author.

Ruggles

Alexander

J. T.

Genadek

Goeken

Schroeder

M. B.

Sobek

(2010). Integrated public use microdata series (Version 5.0) [Machine-readable database]. Minneapolis: University of Minnesota.

10.

Semel

Wiig

E. H.

Secord

W. A.

(2003). Clinical Evaluation of Language Fundamentals, Fourth Edition. San Antonio, TX: NCS Pearson.

11.

Woodcock

R. W.

McGrew

K. S.

Mather

(2001). Woodcock-Johnson III. Itasca, IL: Riverside Publishing.