Do Achievement Factors Change Across Development? An Investigation With the KTEA-3

Abstract

Theories of reading and writing development suggest that the factor structure of achievement batteries could change across development. As a result, it is important to test achievement batteries for invariance across development. The purpose of these analyses is to determine whether the factor structure of reading, writing, and oral language measures is invariant across grade ranges in the Kaufman Test of Educational Achievement, 3rd Edition. Results suggest that multiple interpretational models demonstrate measurement and structural invariance. Implications for practice are discussed.

Keywords

KTEA-3 confirmatory factor analysis psychoeducational assessment reading measurement invariance writing language and literacy

Models of reading and writing development suggest that relationships between component reading/writing skills vary across development (e.g., Berninger, 1999; Hoover & Gough, 1990). Despite this variability, achievement batteries’ measurement models may not account for these developmental changes. It is possible that constructs measured by achievement batteries change across development, and is important to establish age/grade-based measurement invariance.

Reading/writing competence depends on both language skills and decoding/spelling, respectively. The simple view of reading conceptualizes reading comprehension as the product of word reading and listening comprehension skills (Hoover & Gough, 1990). Catts, Hogan, and Adlof (2005) reported that these skills account for more than 70% of comprehension variance. Gough, Hoover, and Peterson (1996) initially demonstrated how the effects of these skills change across development. In their meta-analysis, word reading’s correlation with reading comprehension decreased from .61 in first grade to .39 in college, whereas listening comprehension increased from .41 to around .60. Garcia and Cain (2014) confirmed this developmental shift in a meta-analysis of 110 different studies. They reported that language skills begin to be better comprehension predictors than word reading around age 10, when students begin to demonstrate fluent decoding. Similarly, Berninger (1999) stressed that transcription skills (spelling/handwriting) demonstrate effects on essay composition performance, alongside language skills related to idea generation. The magnitude of these effects changes across development, because age correlates positively with decoding and spelling. In a series of cross-sectional studies, Berninger (1999) and her colleagues (Berninger, Cartwright, Yates, Swanson, & Abbott, 1994; Berninger, Whitaker, Feng, Swanson, & Abbott, 1996; Berninger et al., 1992) reported a decreasing influence for transcription skill on both composition fluency and quality across ages. Transcription accounted for 66% and 25% of fluency and quality variance, in early grades, but only 16% and 18% in junior high school students. Collectively, these findings suggest that decoding/comprehension may represent an appropriate reading composite at young ages, but not in older students. Writing may demonstrate the same change as students get older.

However, other analyses suggest that these developmental effects may not change the structure of achievement batteries across age/grade levels. Independent exploratory factor analyses (EFAs) of achievement batteries often do not demonstrate clean separations between decoding/spelling measures and reading comprehension/writing tasks. For instance, higher order analyses with the two most recent Woodcock–Johnson batteries described broad reading/writing and language/knowledge factors (Dombrowski, 2015; Dombrowski, McGill, & Canivez, 2018; Dombrowski & Watkins, 2013). However, the relatively large age ranges in these analyses may mask changes in age groups.

This lack of separation between decoding/spelling and language skills may occur because the skill areas are not necessarily independent of each other. Strong decoding skills allow readers to allocate more cognitive resources to comprehension (Garcia & Cain, 2014; Vellutino, Tunmer, Jaccard, & Chen, 2007). Similarly, transcription skills may constrain language effects on writing composition (Hayes & Berninger, 2009). When young writers are allowed to dictate their thoughts, removing the constraints of transcription skills, they tend to produce better text as more working memory is available for composing (Berninger, 1999; De La Paz & Graham, 1995). Along these lines, researchers reported indirect effects of language/crystalized intelligence skills on reading comprehension, mediated through decoding skills, via structural equation modeling (Floyd, Meisinger, Gregg, & Keith, 2012; Hajovsky, Reynolds, Floyd, Turek, & Keith, 2014).

Factor Structure of the Kaufman Test of Educational Achievement, 3rd Edition (KTEA-3)

The purpose of these analyses is to determine whether these developmental shifts occur within the KTEA-3 (Kaufman & Kaufman, 2014a) to a degree that affects measurement invariance across grade levels. Part of the KTEA-3’s construct validity evidence included a series of confirmatory factor analyses (CFAs; Kaufman & Kaufman, 2014b). Those analyses modeled one to four factors fitted to a sample of examinees between the ages of 6 and 25. A four-factor model best fits the normative sample. The oral/written language portion of this model (see Figure 1) included an oral language factor (listening comprehension, oral expression, associational fluency), a reading factor (word reading and comprehension), and a written language factor (spelling and written expression). This model also required residual correlations between spelling/word reading, reading/listening comprehension, and written/oral expression measures to fit the sample well. These residual correlations reflect supplemental composites that are also included in the KTEA-3. Reading/listening comprehension measures reflect the comprehension composite, whereas written/oral expression skills represent the expression composite. A model based on these supplemental composites has not yet been tested.

Figure 1.

KTEA-3 latent variable models.

As the manual analysis included such a large age range, it might mask the aforementioned developmental changes. The manual model may fit younger age groups well, when reading comprehension is highly related to word reading, but misrepresent the academic skills of older examinees. Alternatively, the supplemental composites may not describe the skills of younger students as well it as they might for older students, because listening and reading comprehension skills are more distinct in earlier grades (Garcia & Cain, 2014).

To determine whether these developmental trends affect the KTEA-3 factor structure, these analyses test measurement and structural invariance of KTEA-3 factors across grades. Tests of measurement invariance can determine whether the KTEA-3 factor structure accurately assesses its constructs to the same degree across grade levels, whereas tests of structural invariance can demonstrate whether aspects of measured constructs might differ across grade levels (Keith, 2015). These analyses included three different models. The first model was described in the KTEA-3 manual, whereas the second represents its inverse. In the second model, the manual model’s residual covariances are modeled as factors, whereas its factors are modeled as residual covariances. Finally, because the inverse model operationalizes the KTEA-3 supplemental composites slightly differently than the manual, a model with supplemental composites defined consistent with the manual is also included.

Method

Participants

Participants included examinees from the KTEA-3 age-based standardization sample in grades kindergarten through 12 (n = 1,727). The sample was stratified on demographic variables consistent with the U.S. Census Bureau’s 2012 estimate. Detailed information is provided in the battery’s technical manual (Kaufman & Kaufman, 2014b). Approximately half of these participants completed Form A (n = 865) and half completed Form B (n = 862), which were linked by equating studies during the standardization process (Kaufman & Kaufman, 2014b). The K-12 data set was subdivided into four grade bands: K-2 (n = 482), Grades 3 to 5 (n = 457), Grades 7 to 9 (n = 384), and Grades 9 to 12 (n = 404). These ranges were selected to create relatively large samples of participants in groups generally consistent with the structure of schooling in the United States (e.g., elementary, middle, high school).

Measures

These analyses included the reading, writing, and language measures in the core battery (Kaufman & Kaufman, 2014b). The Letter and Word Recognition subtest measures examinees’ ability to recognize letters as well as regular and irregular words. It demonstrates an average split-half reliability of .97 across pre-K through 12. The Reading Comprehension subtest requires young examinees to match words and short sentences to pictures. It also requires examinees to read passages and answer literal and inferential questions. It demonstrates an average split-half reliability of .88. The Written Expression subtest demonstrates an average split-half reliability of .86. It requires examinees to respond to a number of prompts for various writing skills presented in a storybook format. It also requires examinees to write an essay. The Spelling measure requires examinees to write words with regular and irregular patterns and has a split-half reliability of .95. Listening Comprehension requires examinees to listen to passages of formal speech and answer comprehension questions. Its format is similar to the Reading Comprehension task. Listening Comprehension demonstrates an average split-half reliability of .85. The Oral Expression measure requires examinees to describe a photograph and may specify target words examinees must include in their description. It demonstrates an average split-half reliability of .81. The Associational Fluency subtest requires examinees to quickly name category exemplars. It displays an average split-half reliability of .62. The Nonsense Word Decoding subtest measures examinees’ ability to read words that have no meaning, but conform to regular phonics patterns in the English language. It demonstrates an average split-half reliability of .96.

Model Development, Analyses, and Evaluation

These analyses included three models (Figure 1). The first was the same as the language-based portions of the four-factor model provided in the test manual (Kaufman & Kaufman, 2014b, Figure 2.1). The second represented its inverse. This model specified the manual model’s residual covariances as factors, and its factors as residual covariance. The Associational Fluency subtest was specified as part of the expression factor, though it is important to note that it is not part of that factor, as defined by the test manual (Kaufman & Kaufman, 2014b). The third factor included Spelling and Letter Word Recognition, labeled Decoding (though this is not the same Decoding factor as in the test manual, which does not include Spelling). Because it uses the same subtests, the inverse model can be compared with the manual model, but it does not exactly reflect the Decoding and Expression composites provided by the KTEA-3. Thus, the third model removes the Associational Fluency subtest, and replaces Spelling with the Nonsense Word Reading subtest. These modifications allow for a model that included the Comprehension, Expression, and Decoding composites provided by the battery.

Data were analyzed with the lavaan package (Rossell, 2012) in R (R Development Core Team, 2015), and included age-based standard scores. As an initial test of equivalency across grade levels, the subtest means, variances, and covariances were constrained to be equal across grade ranges, similar to Box’s M test (Keith, 2015). Next, the models were fitted to each grade range individually. Then a multigroup model was calculated to test configural invariance. Subsequent models sequentially constrained (a) unstandardized factor loadings, (b) manifest variables’ intercepts, and (c) residuals and residual covariances. To assess structural invariance, (a) factor variances, (b) covariances, and (c) means were constrained to be equal sequentially.

Fit statistics were interpreted consistent with Keith’s (2015) guidelines. They include a chi-square test, where a high p value suggests adequate model fit; the root mean square error of approximation (RMSEA), where values lower than .08 reflect an adequate fit and lower than .05 reflect an excellent fit; the comparative fit index (CFI), where values higher than .95 reflect appropriate fit; and the standardized root mean square residual (SRMSR), where values less than .08 suggest an appropriate fit. Nested models can be compared via a Δχ² as well as a ΔCFI, where a value less than .01 reflects no significant change in fit. Nonnested models can be compared via the Akaike information criterion (AIC), which favors the model with the lower value.

Results

Data Screening and Missing Values

In the K-12 sample, values for subtests’ skewness and kurtosis indicated that each distribution was generally normal, ranging from –.49 to .13, and .25 to .90, respectively. Standard deviations (SDs) ranged from 14.58 to 15.49. There were 12 missing values in the K-12 sample (0.007%). Little’s (1988) MCAR test, as implemented by the BaylorEdPsych package (Beaujean, 2012) indicated data were missing completely at random. Missing data were dealt with via full information maximum likelihood (FIML) as it makes use of all available data from each participant when estimating model parameters (Beaujean, 2014).

Model Testing

Table 1 lists fit statistics for the test of covariance equivalence across grade levels. Although strong fit values would indicate that KTEA-3 measures similar constructs to a similar degree across grade ranges, this initial test for model fit was equivocal. The chi-square test was significant, a sign of poor model fit, and though chi-square may be influenced by the large sample size, the SRMSR was also greater than the .08 cutoff for adequate model fit. Alternatively, at .078 and .956, the RMSEA and CFI suggested an adequate, though not excellent, fit.

Table 1.

Manual and Inverse Model Fit Statistics.

								RMSEA
										90% confidence interval
Name	n	χ²	df	p	Δχ²	Δdf	p	Estimate	p	Lower	Upper	SRMR	CFI	ΔCFI	AIC
Moment matrices	1,727	379.598	105	.000				.078	.000	0.069	0.086	.081	.956		93,705.747
Manual model
1. K-2	482	33.309	8	.000				.081	.032	0.054	0.110	.025	.988		25,912.551
2. Grades 3-5	457	7.079	8	.528				.000	.947	0.000	0.051	.014	1.000		24,667.774
3. Grades 6-8	384	4.138	8	.844				.000	.987	0.000	0.034	.009	1.000		20,915.354
4. Grades 9-12	404	27.886	8	.000				.078	.061	0.048	0.111	.023	.984		22,048.884
Configural	1,727	72.414	32	.000				.054	.321	0.038	0.071	.018	.993		93,544.563
Metric	1,727	87.858	44	.000	15.444	12	.218	.048	.566	0.033	0.063	.026	.993	.000	93,536.007
Intercepts	1,727	109.501	56	.000	21.643	12	.042	.047	.628	0.034	0.060	.029	.991	.002	93,533.650
Residuals	1,727	266.562	86	.000	157.060	30	.000	.070	.000	0.060	0.079	.041	.971	.020	93,630.711
Partial residuals^a	1,727	187.507	77	.000	78.006	21	.000	.058	.110	0.057	0.068	.039	.982	.009	93,569.656
Latent variances	1,727	232.289	86	.000	44.783	9	.000	.063	.016	0.053	0.073	.080	.976	.006	93,596.438
Latent covariances	1,727	282.724	95	.000	50.435	9	.000	.068	.001	0.059	0.077	.075	.970	.006	93,628.874
Latent means	1,727	296.075	104	.000	13.350	9	.147	.065	.002	0.057	0.074	.080	.969	.001	93,624.224
Inverse model
1. K-2	482	34.374	8	.000				.083	.026	0.055	0.112	.030	.988		25,913.615
2. Grades 3-5	457	9.017	8	.341				.017	.884	0.000	0.059	.013	.999		24,669.712
3. Grades 6-8	384	6.809	8	.557				.000	.931	0.000	0.054	.013	1.000		20,918.025
4. Grades 9-12	404	27.312	8	.001				.077	.069	0.047	0.110	.022	.984		22,048.309
Configural	1,727	77.512	32	.000				.057	.212	0.041	0.074	.020	.993		93,549.661
Metric	1,727	123.225	44	.000	45.713	12	.000	.065	.037	0.051	0.078	.043	.987	.006	93,571.374
Intercepts	1,727	138.719	56	.000	15.493	12	.216	.058	.121	0.046	0.071	.044	.987	.000	93,562.868
Residuals	1,727	287.140	86	.000	148.420	30	.000	.074	.000	0.064	0.083	.055	.968	.019	93,651.289
Partial residuals^b	1,727	197.214	77	.000	58.496	21	.000	.060	.053	0.050	0.071	.048	.981	.006	93,579.363
Latent variances	1,727	224.936	86	.000	27.721	9	.001	.061	.030	0.051	0.071	.066	.978	.003	93,589.085
Latent covariances	1,727	275.640	95	.000	50.704	9	.000	.066	.002	0.057	0.076	.075	.971	.007	93,621.789
Latent means	1,727	295.788	104	.000	20.148	9	.017	.065	.002	0.057	0.074	.080	.969	.002	93,623.938

Note. RMSEA = root mean square error of approximation; CFI = comparative fit index; AIC = Akaike information criterion; SRMR = standardized root mean square residual.

Released variance constraints on Reading Comprehension, Letter and Word Recognition, and Spelling subtests across grade ranges.

Released variance constraints on Listening Comprehension, Reading Comprehension, and Letter and Word Recognition subtests across grade ranges.

Manual model

The manual model fit varied across grade ranges according to the fit indexes in Table 1. It fit the third- to fifth- and sixth- to eighth-grade ranges extremely well, and the K-2 and Grades 9 to 12 ranges adequately. Table 2 contains the factor loadings, variances, and covariances for the manual model across grade levels. A review of coefficients suggested a high level of consistency across groups. The residual variances of Letter and Word Recognition, Spelling, and Reading Comprehension appeared to vary across grade ranges, however, based on a review of unstandardized coefficients.

Table 2.

Manual Model Factor Loadings, Covariances, and Variances Across Grade Ranges.

	Grades K-2			Grades 3-5			Grades 6-8			Grades 9-12
	Est	SE	Std	Est	SE	Std	Est	SE	Std	Est	SE	Std
READ to RC	1.00		0.93	1.00		.88	1.00		.78	1.00		.84
READ to LWR	0.91	0.03	0.89	.94	.05	.79	.99	.07	.77	.91	.07	.75
WRITE to WE	1.00		0.83	1.00		.81	1.00		.84	1.00		.84
WRITE to SP	1.06	0.05	0.90	1.04	.06	.86	.96	.07	.78	.91	.06	.79
LANG to LC	1.00		0.68	1.00		.63	1.00		.64	1.00		.72
LANG to OE	1.07	0.09	0.74	.97	.09	.69	.87	.10	.62	.79	.09	.58
LANG to AF	0.74	0.08	0.52	.75	.09	.52	.70	.09	.50	.75	.09	.51
RC with LC	7.53	4.96	0.12	17.20	6.40	.22	43.55	8.76	.39	28.45	8.35	.33
WE with OE	11.71	5.92	0.13	14.43	6.08	.15	5.25	7.06	.06	8.33	6.80	.09
LWR with SP	16.47	4.41	0.33	39.60	5.79	.59	43.52	7.90	.48	30.91	6.50	.36
READ with WRITE	190.88	14.71	0.94	134.10	11.42	.91	129.36	12.86	.90	124.84	12.15	.84
READ with LANG	119.98	12.12	0.80	110.91	12.21	.92	104.96	13.95	.91	126.11	14.97	.92
WRITE with LANG	104.65	11.31	0.74	103.43	11.80	.80	106.55	12.42	.88	111.65	11.58	.88
RC	31.94	5.05	0.13	39.72	5.56	.23	90.16	9.86	.40	69.33	10.10	.30
LWR	49.90	5.11	0.22	73.87	6.61	.38	91.09	9.81	.41	106.58	10.31	.44
WE	86.85	7.41	0.33	80.42	8.00	.34	62.56	9.40	.29	57.04	8.03	.29
SP	49.14	6.22	0.19	62.07	7.63	.27	90.38	9.93	.39	67.98	7.42	.38
LC	122.21	10.71	0.54	161.37	13.49	.60	141.16	13.10	.59	106.29	11.58	.48
OE	101.44	10.01	0.46	112.66	10.43	.53	119.34	11.06	.62	145.07	11.97	.67
AF	159.54	11.55	0.73	159.96	11.76	.73	142.36	11.51	.75	190.14	14.90	.74
READ	214.88	16.43	1.00	136.97	12.40	1.00	135.73	16.66	1.00	162.55	18.07	1.00
WRITE	190.48	17.65	1.00	158.34	16.00	1.00	151.72	16.94	1.00	137.46	14.82	1.00
LANG	104.72	14.36	1.00	106.36	16.33	1.00	97.49	16.10	1.00	116.59	16.73	1.00

Note. Est = unstandardized estimate; Std = standardized; Read = Reading Composite; RC = Reading Comprehension; LW = Letter/Word Recognition; Write = Writing Composite; WE = Written Expression; SP = Spelling; Lang = Language Composite; LC = Listening Comprehension; OE = Oral Expression; AF = Associational Fluency. Bolded values represent p > .05.

The resulting model fit for tests of invariance are also included in Table 1. The model demonstrated configural, metric, and intercept invariance across grade ranges, based on the pattern of fit and ΔCFI. Of course, intercept invariance would be expected, given that the KTEA-3 is normed so that the mean of each age/grade level is the same. The model was not invariant across grade ranges when subtest residuals and residual covariances were constrained across grade ranges. Because the variances of Letter and Word Recognition, Spelling, and Reading Comprehension appeared to differ across grade ranges, they were released to test a partial invariant model. Removing these constraints achieved partial residual invariance (see online supplement for additional results). It is important to highlight that some methodologists do not consider residual invariance to be a critical part of measurement invariance testing (Keith, 2015; Widaman & Reise, 1997). Models constraining latent factor variance, covariance, and means were all within acceptable fit.

Inverse model

The inverse model fit also varied across grade ranges according to fit indexes in Table 1. The pattern was the same as with the manual model. It fit well in the third- to fifth-grade and sixth- to eighth-grade ranges, and adequately in Grades K-2 and 9 to 12. Comparing manual/inverse model AIC values, it is interesting to note that the manual model fit better in Grades K-2, 3 to 5, and 6 to 8 ranges, but the inverse model fit better in the ninth- to 12th-grade range. Based on the coefficients provided in Table 3, in this model, Listening Comprehension’s loading on the Comprehension factor appeared to vary substantially across grade ranges. The residual variances for Reading Comprehension, Listening Comprehension, and Letter and Word Recognition also appeared to vary, based on a review of unstandardized coefficients.

Table 3.

Inverse Model Factor Loadings, Covariances, and Variances Across Grade Ranges.

	Grades K-2			Grades 3-5			Grades 6-8			Grades 9-12
	Est	SE	Std	Est	SE	Std	Est	SE	Std	Est	SE	Std
COMP to RC	1.00		.97	1.00		.96	1.00		.88	1.00		.86
COMP to LC	0.53	0.05	.53	.79	0.06	.61	.85	0.07	.73	.91	0.06	.80
EXP to WE	1.00		.84	1.00		.76	1.00		.79	1.00		.77
EXP to OE	0.67	0.05	.63	.81	0.06	.66	.73	0.07	.61	.82	0.08	.60
EXP to AF	0.51	0.05	.48	.63	0.06	.50	.57	0.07	.48	.75	0.08	.51
DECODE to LWR	1.00		.99	1.00		.92	1.00		.87	1.00		.80
DECODE to SP	1.01	0.04	.88	1.04	0.05	.87	.98	0.06	.84	.87	0.07	.80
RC with LWR	–4.72	6.20	–.22	–5.32	4.58	–.26	9.28	6.25	.18	19.27	6.98	.26
WE with SP	27.26	6.24	.39	33.65	5.91	.45	20.36	6.91	.28	26.12	6.50	.37
LC with OE	43.00	7.46	.30	23.43	7.71	.16	2.14	7.35	.02	–5.04	7.15	–.05
COMP with EXP	190.22	14.70	.90	133.42	11.40	.89	128.62	12.86	.83	123.85	12.11	.88
COMP with DECODE	200.90	15.26	.94	134.28	11.38	.82	125.82	14.06	.73	128.93	14.68	.80
EXP with DECODE	173.58	13.92	.88	125.31	11.44	.80	127.18	12.82	.84	111.79	11.79	.84
RC	16.04	10.16	.07	13.44	8.33	.08	50.32	11.05	.22	61.93	9.30	.27
LC	161.59	10.86	.72	166.98	12.08	.62	112.43	11.04	.47	81.77	8.92	.37
WE	81.63	9.50	.29	99.72	9.33	.42	79.21	9.81	.37	78.10	9.20	.40
OE	131.63	9.46	.60	121.73	9.50	.57	121.97	10.21	.63	138.87	11.40	.64
AF	166.71	11.36	.77	165.15	11.62	.75	146.52	11.32	.77	190.75	14.62	.74
LW	29.69	6.70	.13	30.60	6.76	.16	54.91	9.70	.24	87.62	11.25	.37
SP	60.37	7.30	.23	56.07	7.71	.24	66.88	9.67	.29	64.72	8.50	.36
COMP	230.23	18.79	1.00	163.29	14.32	1.00	175.89	19.02	1.00	170.32	17.89	1.00
EXP	196.39	18.94	1.00	138.45	15.74	1.00	135.06	16.41	1.00	116.05	14.44	1.00
DECODE	199.64	16.06	1.00	164.63	14.33	1.00	169.93	18.08	1.00	152.19	18.57	1.00

Note. Est = unstandardized estimate; Std = standardized; COMP = Comprehension Composite; RC = Reading Comprehension; LC = Listening Comprehension; EXP = Expression Composite; WE = Written Expression; OE = Oral Expression; AF = Associational Fluency; DECODE = Decoding Composite; LW = Letter/Word Recognition; SP = Spelling. Bolded values represent p > .05.

Model fit statistics for the inverse model’s tests of invariance also appear in Table 1. Just like the manual model, this model demonstrated configural, metric, and intercept invariance. Despite the apparent variability in Listening Comprehension’s factor loadings across grade ranges, constraining them to be equal within the metric invariance model did not degrade fit, according to the ΔCFI. The inverse model did not demonstrate invariance of residuals and residual covariances. Releasing the variances of Reading Comprehension, Listening Comprehension, and Letter and Word Recognition across groups resulted in partial residual invariance. Models constraining latent factor variance, covariance, and means were all within acceptable fit. Interestingly, comparing the fit measures of the manual and inverse model suggests that the manual model demonstrated a slightly better fit for measurement invariance tests, whereas the inverse model displayed a better fit for structural tests, though these differences appear minimal. See the online supplement for model factor loadings.

Supplemental model

The fit statistics for the supplemental model are listed in Table 4, and model coefficients are listed in Table 5. The Comprehension factor is the same as in the inverse model, and Listening Comprehension demonstrated the same apparent factor loading variability across grade levels. Listening Comprehension’s variance also demonstrated the greatest apparent variability across grades. In this model, the covariance between Letter and Word Recognition and Reading Comprehension (e.g., the Reading factor in the manual model) was nonsignificant in all grade ranges.

Table 4.

Supplemental Model Fit Statistics.

								RMSEA
										90% confidence
Data	n	χ²	df	p	Δχ²	Δdf	p	Est	p	Lower	Upper	SRMR	CFI	ΔCFI	AIC
Moment matrices	1,727	434.276	105	.000				.085	.000	0.077	0.094	.087	.956		91,355.486
1. Grades K-2	482	15.061	4	.005				.076	.121	0.038	0.118	.017	.994		21,084.093
2. Grades 3-5	457	3.692	4	.449				.000	.838	0.000	0.068	.008	1.000		21,160.624
3. Grades 6-8	384	9.451	4	.051				.060	.310	0.000	0.110	.013	.995		17,895.985
4. Grades 9-12	404	.780	4	.941				.000	.990	0.000	0.015	.005	1.000		18,790.823
Configural	1,727	28.985	16	.024				.043	.637	0.016	0.068	.011	.998		78,931.530
Metric	1,727	73.974	25	.000	44.989	9	.000	.067	.051	0.050	0.085	.044	.991	.007	78,958.514
Intercept	1,727	107.550	34	.000	33.576	9	.000	.071	.012	0.056	0.086	.046	.986	.005	78,974.091
Residuals	1,727	191.497	58	.000	83.947	24	.000	.073	.001	0.062	0.085	.053	.975	.011	79,010.038
Partial^a	1,727	171.448	55	.000	63.897	21	.000	.070	.003	0.058	0.082	.051	.978	.008	78,978.996
Latent variance	1,727	200.688	64	.000	29.240	9	.000	.070	.001	0.059	0.081	.067	.974	.004	79,007.228
Latent covariance	1,727	298.996	73	.000	98.308	9	.000	.085	.000	0.075	0.095	.076	.958	.016	79,087.536
Partial^b	1,727	240.457	70	.000	39.769	6	.000	.075	.000	0.065	0.086	.074	.968	.006	79,034.997
Latent means	1,727	255.676	79	.000	15.219	9	.085	.072	.000	0.062	0.082	.079	.967	.001	79,032.216

Note. RMSEA = root mean square error of approximation; CFI = comparative fit index; AIC = Akaike information criterion; SRMR = standardized root mean square residual.

Released variance constraints on listening comprehension variance across grade ranges.

Released constraints of covariance between comprehension and decoding across grade ranges.

Table 5.

Supplemental Model Factor Loadings, Covariances, and Variances.

	Grades K-2			Grades 3-5			Grades 6-8			Grades 9-12
	Est	SE	Std	Est	SE	Std	Est	SE	Std	Est	SE	Std
COMP to RC	1.00		.83	1.00		.95	1.00		.89	1.00		.87
COMP to LC	.52	0.05	.53	.80	.06	.62	.83	.07	.72	.88	.06	.79
EXP to WE	1.00		.81	1.00		.78	1.00		.81	1.00		.80
EXP to OE	.69	0.05	.62	.79	.06	.65	.70	.07	.60	.78	.08	.59
DECODE to NWD	1.00		.83	1.00		.84	1.00		.82	1.00		.79
DECODE to LWR	1.21	0.06	.98	1.00	.05	.93	1.15	.07	.88	1.30	.08	.96
RC with LWR	–6.75	6.21	–.87	.92	4.23	.04	11.16	6.01	.23	9.13	5.78	.30
LC with OE	47.04	7.73	.32	22.75	7.71	.16	5.12	7.84	.04	.77	7.09	.01
COMP with EXP	189.96	14.69	.91	134.95	11.43	.89	130.60	12.90	.82	123.74	12.16	.83
COMP with DECODE	167.60	14.06	.88	128.00	11.26	.78	106.39	11.79	.69	108.27	12.06	.71
EXP with DECODE	144.07	13.41	.87	125.54	12.35	.81	112.81	11.89	.83	85.07	10.41	.66
RC	8.55	11.38	.04	17.18	8.17	.10	46.33	11.48	.21	55.56	9.32	.24
LC	163.06	10.94	.72	165.01	12.06	.62	116.62	11.21	.49	84.07	8.95	.38
WE	94.99	10.84	.34	94.57	10.36	.40	74.09	11.31	.35	69.82	11.46	.36
OE	135.44	9.83	.61	123.75	9.68	.58	124.17	10.25	.64	140.86	11.73	.65
NWD	67.71	7.00	.31	68.37	7.84	.29	65.13	7.88	.33	80.86	8.82	.38
LWR	7.10	7.48	.03	26.63	6.66	.14	50.02	9.22	.22	17.19	11.30	.07
COMP	237.84	19.56	1.00	159.51	14.17	1.00	179.18	19.36	1.00	176.41	18.02	1.00
EXP	182.34	19.02	1.00	143.71	16.66	1.00	140.23	17.61	1.00	124.85	16.45	1.00
DECODE	151.63	15.53	1.00	168.33	16.40	1.00	131.27	14.79	1.00	131.50	15.37	1.00

Note. Est = unstandardized estimate; Std = standardized; COMP. = comprehension composite; RC = reading comprehension; LC = listening comprehension; EXP. = expression composite; WE = written expression; OE = oral expression; DECODE = decoding composite; NWD = nonsense word decoding; LW = letter/word recognition; SP = spelling. Bolded values represent p > .05.

The supplemental model demonstrated configural, metric, and intercept invariance. As with the other models, its residuals and residual covariances were not invariant across grade level, based on the ΔCFI. This model required a release of the equality constraints on the variance of Listening Comprehension. Structurally, this model did not demonstrate invariance in covariances between factors. Releasing constraints on the covariance between Comprehension and Decoding allowed for an acceptable model fit. For this component of the model, the K-2 grade range demonstrated the greatest association between the two constructs, whereas the sixth- to eighth-grade range displayed the smallest association (see online supplement).

Discussion

Developmental effects within reading and writing domains (e.g., Berninger, 1999; Hoover & Gough, 1990) suggest that factors measured by achievement batteries could vary across age/grade ranges. For instance, because decoding is more strongly associated with comprehension in younger students, and language skills are more strongly associated with comprehension in older students, loading of reading comprehension and word reading performance (or reading and listening comprehension) on a common factor could change across grade levels. The purpose of these analyses was to determine whether there were developmental effects within the KTEA-3 by assessing measurement and structural invariance across grade levels for three different models. These models included the oral/written language portions of the manual model, its conceptual inverse, and a model that included the KTEA-3 comprehension, expression, and decoding supplemental factors.

Although there were potentially small fit differences across grade ranges, collectively, these results suggest that the measurement properties of the KTEA-3 are generally invariant across grade ranges. Consistent with the reviewed changes in reading and writing development, the inverse model demonstrated a slightly better fit in the ninth- to 12th-grade range, and the supplemental model fit that grade range extremely well. However, when equality constrains were added across grade levels, this difference in model fit did not create invariance. Generally, the only developmental differences observed in these analyses involved subtests’ residuals. As Keith (2015) explained, residual invariance is not considered critical in measurement invariance.

Structurally, both the manual and inverse models were also invariant. The factors described in these models demonstrated the same amount of variance across grade levels and are associated with each other to the same degree across groups. Factor means were also invariant, though this should be expected, as the KTEA-3 is normed explicitly so that mean scores are the same across age/grade. The supplemental model demonstrated variability in the relationship between the decoding and comprehension factors across grade ranges.

Implications for Practice

Clinicians can be comfortable using the KTEA-3 across grade levels and interpreting both its core academic composites, and its supplemental cross-domain composites related to comprehension, expression, and also the decoding composite. These results indicated that the composites reflect the same skills across grade levels.

These analyses not only suggest that both sets of composites might be useful to interpret in clinical practice but also indicate that interpretational caution is needed. The small fit difference between the models and the need for correlated residuals indicates that subtests could contain systematic variance for multiple abilities. Reading Comprehension performance may reflect a general reading ability and comprehension ability. Spelling may reflect general writing skill and decoding. These additional abilities, reflected in the supplemental composites, underscore Berninger and colleagues’ (2006) insight of the artificial distinction between language, reading, and writing. Clinicians may find it challenging to determine the degree to which these abilities affect examinee performance on a measure. A model of reading assessment, such as that offered by Kilpatrick (2015), might capitalize on multiple abilities.

Limitations and Future Directions

These findings should be interpreted in light of a number of limitations. First, only three models were tested. Possibly, other models may provide a strong fit to the KTEA-3 standardization sample. For instance, the large correlations between latent factors suggest the presence of a higher order factor. Second, these models only include a subset of the subtests included with the KTEA-3. These results might change if additional subtests, such as Reading Vocabulary, were also included in the model. These additional subtests were excluded, because they were excluded in the manual model. Third, results may differ in other samples, such as a sample of students with disabilities. If these analyses were replicated with students with word reading disabilities, fit differences between these models may be greater across development. Fourth, as an anonymous reviewer hypothesized, if grade bands were grouped based on the raw scores where item types change, or where there are large differences between one grade and another, it might alter these results. These natural breaks in raw scores may be masked when converted to standard scores.

Next steps for research could include alternative analyses of the abilities measured by the KTEA-3, and replication in diverse groups. Because the KTEA-3 includes multiple new subtests, many of which were not included in the manual factor analysis, and because these subtests appear to measure multiple abilities, an EFA may provide additional insight to the abilities measured by the KTEA-3.

Conclusion

The KTEA-3 is a strong measure of academic achievement. The results presented here suggest that a subsection of its reading, writing, and oral language measures may reflect multiple abilities, though their measurement is invariant across grade ranges. Clinicians can feel comfortable that the interpretation of these scores is similar across development. This information could be bolstered by additional exploratory analyses that include the full battery of subtests.

Footnotes

Author’s Note

Jason R. Parkin, Department of Educational Psychology, University of Washington. Standardization data from the Kaufman Test of Educational Achievement, 3rd Edition (KTEA-3). Copyright © 2014 NCS Pearson, Inc. Used with permission. All rights reserved.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Supplemental Material

Supplementary material is available for this article online.

References

Beaujean

A. A.

(2012). BaylorEdPsych: R package for Baylor University Educational Psychology quantitative courses (R package Version 0.5). Retrieved from http://CRAN.R-project.org/package=BaylorEdPsych

Beaujean

A. A.

(2014). Latent variable modeling using R: A step-by-step guide. New York, NY: Routledge.

Berninger

V. W.

(1999). Coordinating transcription and text generation in working memory during composing: Automatic and constructive processes. Learning Disability Quarterly, 22, 99-112.

Berninger

V. W.

Abbott

R. D.

Jones

Wolf

B. J.

Gould

Anderson-Youngstrom

. . .Apel

(2006). Early development of language by hand: Composing, reading, listening and speaking connections; Three letter-writing modes; and fast mapping in spelling. Developmental Neuropsychology, 29, 61-92.

Berninger

V. W.

Cartwright

Yates

Swanson

H. L.

Abbott

(1994). Developmental skills related to writing and reading acquisition in intermediate grades: Shared and unique variance. Reading and Writing: An Interdisciplinary Journal, 6, 161-196.

Berninger

V. W.

Whitaker

Feng

Swanson

H. L.

Abbott

(1996). Assessment of planning, translating, and revising in junior high writers. Journal of School Psychology, 34, 23-52.

Berninger

V. W.

Yates

Cartwright

Rutberg

Remy

Abbott

(1992). Lower-level developmental skills in beginning writing. Reading and Writing: An Interdisciplinary Journal, 4, 257-280.

Catts

Hogan

T. P.

Adlof

S. M.

(2005). Developmental changes in reading and reading disabilities. In Catts

Kamhi

(Eds.), Connections between language and reading disabilities (pp. 25-40). Mahwah, NJ: Lawrence Erlbaum.

De La Paz

Graham

(1995). Dictation: Applications to writing for students with learning disabilities. Advances in Learning and Behavioral Disorders, 9, 227-247.

10.

Dombrowski

S. C.

(2015). Exploratory bifactor analysis of the WJ-III Achievement at School Age via the Schmid-Leiman orthogonalization procedure. Canadian Journal of School Psychology, 30, 34-50. doi:10.1177/0829573514560529

11.

Dombrowski

S. C.

McGill

R. J.

Canivez

G. L.

(2018). Hierarchical exploratory factor analyses of the Woodcock-Johnson IV full test battery: Implications for CHC application in school psychology. School Psychology Quarterly, 33, 235-250.

12.

Dombrowski

S. C.

Watkins

M. W.

(2013). Exploratory and higher order factor analysis of the WJ-III full test battery: A school age analysis. Psychological Assessment, 25, 442-455. doi:10.1037/a0031335

13.

Floyd

Meisinger

Gregg

Keith

(2012). An explanation of reading comprehension across development using models from Cattell-Horn-Carroll theory: Support for integrative models of reading. Psychology in the Schools, 49, 725-743. doi:10.1002/pits.21633

14.

Garcia

J. R.

Cain

(2014). Decoding and reading comprehension: A meta-analysis to identify which reader and assessment characteristics influence the strength of the relationship in English. Review of Educational Research, 84, 74-111. doi: 10.3102/0034654313499616

15.

Gough

P. B.

Hoover

W. A.

Peterson

C. L.

(1996). Some observations on a simple view of reading. In Cornoldi

Oakhill

(Eds.), Reading comprehension difficulties: Processes and intervention (pp. 1-13). Mahwah, NJ: Lawrence Erlbaum.

16.

Hajovsky

Reynolds

M. R.

Floyd

R. G.

Turek

J. J.

Keith

T. Z.

(2014). A multigroup investigation of latent cognitive abilities and reading achievement relations. School Psychology Review, 43, 385-406.

17.

Hayes

R. J.

Berninger

V. W.

(2009). Relationships between idea generation and transcription: How act of writing shapes what children write. In Bazerman

Krut

Lunsford

McLeod

Null

Rogers

Stansell

(Eds.), Traditions of writing research (pp. 166-180). New York, NY: Taylor & Francis.

18.

Hoover

W. A.

Gough

P. B.

(1990). The simple view of reading. Reading and Writing: An Interdisciplinary Journal, 2, 127-160.

19.

Kaufman

A. S.

Kaufman

N. L.

(2014a). Kaufman Test of Education Achievement, third edition. Bloomington, MN: NCS Pearson.

20.

Kaufman

A. S.

Kaufman

N. L.

(with Breaux

K. C

). (2014b). Technical and interpretive manual—Kaufman Test of Educational Achievement, Third Edition. Bloomington, MN: NCS Pearson.

21.

Keith

T. Z.

(2015). Multiple regression and beyond: An introduction to multiple regression and structural equation modeling (2nd ed.). New York, NY: Routledge.

22.

Kilpatrick

D. A.

(2015). Essentials of assessing, preventing, and overcoming reading difficulties. Hoboken, NJ: John Wiley.

23.

Little

R. J. A.

(1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83, 1198-1202.

24.

R Development Core Team. (2015). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

25.

Rossell

(2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1-36. Retrieved from http://www.jstatsoft.org/article/view/v048i02

26.

Vellutino

F. R.

Tunmer

W. E.

Jaccard

J. J.

Chen

(2007). Components of reading ability: Multivariate evidence for a convergent skills model of reading development. Scientific Studies of Reading, 11, 3-32.

27.

Widaman

K. F.

Reise

S. P.

(1997). Exploring the measurement invariance of psychological instruments: Applications in the substance use domain. In Bryant

K. J.

Windle

West

S. G.

(Eds.), The science of prevention: Methodological advances from alcohol and substance abuse research (pp. 281-324). Washington, DC: American Psychological Association.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.14 MB