Reproducing the Wechsler Intelligence Scale for Children

Abstract

One of the ways to increase the reproducibility of research is for authors to provide a sufficient description of the data analytic procedures so that others can replicate the results. The publishers of the Wechsler Intelligence Scale for Children–Fifth Edition (WISC-V) do not follow these guidelines when reporting their confirmatory factor analysis results. Consequently, scholars have been frustrated when they have tried to replicate the results in the WISC-V technical manual. I explain how the WISC-V publishers set the scale of their latent variables and demonstrate how to replicate the WISC-V models using the R statistical program.

Keywords

Wechsler Intelligence Scale for Children WISC-V research replication latent variable scaling effects coding

There has been an increasing call for psychology research to be more reproducible (Nosek et al., 2015; Open Science Collaboration, 2015). One way to do this is “the materials, data, and analysis scripts should be made available in addition to the final article so that other researchers can reproduce the reported findings or test alternative explanations” (Asendorpf et al., 2013, p. 113). This suggestion not only applies to scholarship published in peer-reviewed articles but also to commercially available tests. Standard 7.4 of the Standards for Educational and Psychological Testing (4th edition) states, “Test documentation should summarize test development procedures, including descriptions and the results of the statistical analyses that were used in the development of the test. . .” (American Educational Research Association, APA, & National Council on Measurement in Education, 2014, p. 126).

In the technical manual for the Wechsler Intelligence Scale for Children–Fifth Edition (WISC-V; Wechsler, 2014a), the publishers did not report some essential information involving their confirmatory factor analysis (CFA). Specifically, the publishers did not indicate how they set the scales for their latent variables. Had they used traditional latent scale-setting methods, not reporting all the details would be understandable as there is no expectation that they report every detail of every data analysis—especially those that they can assume knowledgeable readers would know. The WISC-V publishers did not use traditional scale-setting methods, however, which has caused those wishing to replicate the WISC-V’s models considerable frustration (e.g., Canivez & Watkins, 2016). Consequently, those wishing to conduct independent analyses of the WISC-V scores have been unable to reproduce the model’s degrees of freedom (df)—a crucial component in checking that latent variable models are specified correctly (Loehlin, 2004).

While being able to reproduce a model’s df might seem like a trivial issue, it is not. The df give an indication of a model’s parsimony and are used in many fit measures’ calculations. Thus, if a test publisher uses CFA as part of its argument for the scores’ validity, it is essential for individuals examining this validity evidence to be able to “follow which model parameters are free, fixed, or constrained to a value for identification or for another purpose” (Boomsma, Hoyle, & Panter, 2012, pp. 343-344). Hoyle and Isherwood (2013) thought this issue was so important that they added the criterion of being able to derive a model’s df to their latent variable model supplement of APA’s journal article reporting standards (JARS; APA Publications and Communications Board Working Group on JARS, 2008).

Effects Coding

The WISC-V publishers used the effects-coding method to set the scale of their latent variable. Little, Slegers, and Card (2006) developed this method for scaling a latent variable to be analogous to effects coding in ANOVA. For a single latent variable, it requires that the set of loadings have an average value of one, or, equivalently, the loadings sum to the number of unique indicator variables.¹ This constraint is shown in Equation 1.

\sum_{i = 1}^{p_{r}} λ_{i_{r}} = p,

where r indexes one specific latent variable and p is the number of indicator variables for the rth latent variable. Using this constraint scales the latent variance to be the average of the indicator variables’ variances. Little et al. argued that this method provides an optimal balance across a latent variable’s possible indicators to establish its scale.

Modified Effects-Coding Method Used for WISC-V Models

Using effects coding should only affect the scale of the latent variable; it should not affect model fit. Model fit measures—including df—using effects coding should be identical to those from more traditional latent variable scaling methods (Little, Card, Slegers, & Ledford, 2007). Thus, Canivez and Watkins’ (2016) inability to replicate the results in the WISC-V technical manual by consistently getting more df indicates that the WISC-V publishers modified the Little et al.’s constraints (i.e., Equation 1).

The effects-coding derivation the WISC-V publishers used comes from making all the latent variables in the model subject to the same constraint. This is shown in Equation 2.

\sum_{i = 1}^{p^{'}} λ_{i} = p^{'},

where p′ is the number of indicator variables for all latent variables in the entire model.

To date, the WISC-V publishers have not stated that they used effects coding in any of their released documentation for the instrument. Through an exploration of a variety of latent variable scaling methods, however, I was able to replicate the results from the WISC-V technical manual using the constraint shown in Equation 2. To prove that the constraints shown in Equation 2 are those used in the WISC-V technical manual (Wechsler, 2014b), in the appendix, I provide R syntax (R Development Core Team, 2015) to fit the WISC-V CFA model preferred by the publisher (Model 5e). I use the lavaan package (Rosseel, 2012) and fit the model using the all-ages summary statistics provided in the WISC-V technical manual (Wechsler, 2014b).² In Table 1, I compare my results with those provided in the technical manual. The χ² and model fit values are not the exact same because I used summary statistics and maximum likelihood estimation while the WISC-V publishers used individual test scores and weighted least squares estimation. Nonetheless, the values are very close and, more importantly, the df are the exact same.

Table 1.

Results From Fitting Model 5e From the WISC-V Technical Manual and the Modified Effects-Coding Scaling.

	Technical Manual^a	Modified Effects Coding
Estimator	Weighted least squares	Maximum likelihood
Data used	Individual test scores	Covariances
χ²	353.00	362.58
Degrees of freedom	92	92
Comparative fit index	.98	.98
Tucker–Lewis non-normed fit index	.98	.98
Root mean squared error of approximation	.04	.04
Akaike information criterion	441	451^b
Bayesian information criterion	692	701^c

Note. WISC-V = Wechsler Intelligence Scale for Children–Fifth Edition.

Results are from the All Ages group for Model 5e (Wechsler, 2014b, p. 82).

Calculated via χ² + 2 × number of free parameters to match the technical manual’s calculation.

Calculated via χ² + ln(n) × number of free parameters to match the technical manual’s calculation.

Cautions on Using Modified Effects Coding

Despite its use by the WISC-V publishers, the modified effects-coding scaling should be employed with caution because it could produce multiple problems. First, it provides a parameterization of the latent variables that is not equivalent to more traditional scaling methods, which is contrary to Little et al.’s (2007) intention. Second, it changes the interpretation of the latent variance’s scale to be the average of all variables’ variances, not just the indicators for a specific latent variable. As the WISC-V publishers’ preferred model involves a higher order latent variable, it means that all the variable’s metrics are in some manifest-latent hybrid scale that does not have a simple interpretation. Third, while it will not necessarily influence the χ² values, the df difference will cause a change in values of fit measures that use df (or number of estimated parameters) in their calculation. The magnitude of this change depends on the particular situation, as it depends on the particular model used, the sample size, and how the df are used for a particular fit measure. Until the modified effects-coding method is better understood, it should likely only be used when replicating the CFA models from the WISC-V technical manual.

Clinically, the sequelae of using the modified effects-coding scaling are only speculative at this point as there has not been any research on it. At one extreme, it could be that the scaling method used (and the subsequent different df) is just minutia and only of import to psychometrically inclined clinicians and scholars. At the other extreme, because the modified effects-coding scaling method affects the model’s df, it could lead to selection of a non-optimal model, which could then lead to an inaccurate interpretation of whatever the test scores are measuring. As an example, for a given model the modified effects coding will produce fewer df than traditional scaling techniques. This means that for a given χ² value, the p values will be smaller when using modified effects coding. Consequently, using the modified effects coding could lead to rejecting latent variable models that actually fit the data well. In any case, future research can now examine these issues as independent clinicians and scholars should now able to replicate the WISC-V validation CFA models.

Footnotes

Appendix

R Syntax to Reproduce the Results of Selected Confirmatory Factor Analytic Models from the Wechsler Intelligence Scale for Children–Fifth Edition (WISC-V) Technical Manual.

#load the lavaan package

library(lavaan)

# import unique wisc-v correlations from Table 5.1 of the technical manual

wisc5.values <-

c(.68,.65,.71,.59,.60,.56,.46,.47,.47,.38,.48,.51,.48,.40,.60,.45,.45,.45,.39,.47,.47,.46,.49,.48,.38,.47,.50,.47,.39,.42,.40,.35,.34,.39,.35,.33,.54,.53,.55,.46,.46,.46,.45,.50,.37,.47,.46,.46,.42,.42,.42,.44,.43,.34,.55,.39,.38,.36,.36,.35,.36,.38,.35,.30,.43,.51,.48,.49,.47,.43,.38,.39,.43,.40,.33,.54,.65,.49,.23,.21,.20,.24,.31,.20,.24,.19,.19,.31,.28,.25,.29,.28,.25,.29,.26,.34,.28,.29,.23,.23,.32,.32,.27,.28,0.58,.11,.10,.13,.14,.19,.13,.13,.11,.11,.15,.11,.09,.11,.30,.33)

# create full correlation matrix

wisc5.cor <- lav_matrix_lower2full(wisc5.values,diagonal = FALSE)

diag(wisc5.cor) <- 1

# name the variables

wisc5.var <-

c(“SI”,”VC”,”IN”,”CO”,”BD”,”VP”,”MR”,”FW”,”PC”,”AR”,”DS”,”PS”,”LN”,”CD”,”SS”,”CA”)

dimnames(wisc5.cor) <- list(wisc5.var,wisc5.var)

# import wisc-v standard deviations, which are all 3

wisc5.sd <- rep(3,16)

# convert correlations to covariances

wisc5.cov <- cor2cov(wisc5.cor,wisc5.sd)

# specify wisc-v model 5e using modified effects-coding

wisc5.model5e <-’

F1 =~ NA*SI + a*SI + b*VC + c*IN + d*CO + e1*AR

F2 =~ NA*BD + h*BD + i*VP

F3 =~ NA*MR + j*MR + k*FW + l*PC + e2*AR

F4 =~ NA*AR + e3*AR + f*DS + m*PS + g*LN

F5 =~ NA*CD + n*CD + o*SS + p*CA

g=~NA*F1 + q*F1 + r*F2 + s*F3 + t*F4 + u*F5

#modified effects-coding constraint

a+b+c+d+e1+e2+e3+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u==23

‘# fit model and produce parameter estimates and model fit values

wisc5.fit5e <- cfa(wisc5.model5e, sample.cov=wisc5.cov, sample.nobs=2200)

summary(wisc5.fit5e,fit.measures=TRUE, standardized=TRUE)

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing (4th ed.). Washington, DC: Author.

APA Publications and Communications Board Working Group on Journal Article Reporting Standards. (2008). Reporting standards for research in psychology: Why do we need them? What might they be? American Psychologist, 63, 839-851. doi:10.1037/0003-066X.63.9.839

Asendorpf

J. B.

Conner

De Fruyt

De Houwer

Denissen

J. J. A.

Fiedler

. . . Wicherts

J. M.

(2013). Recommendations for increasing replicability in psychology. European Journal of Personality, 27, 108-119. doi:10.1002/per.1919

Beaujean

A. A.

(2014). Latent variable modeling using R: A step-by-step guide. New York, NY: Routledge.

Boomsma

Hoyle

R. H.

Panter

A. T.

(2012). The structural equation modeling research report. In Hoyle

R. H.

(Ed.), Handbook of structural equation modeling (pp. 341-358). New York, NY: Guilford Press.

Canivez

G. L.

Watkins

M. W.

(2016). Review of the Wechsler Intelligence Scale for Children-Fifth Edition: Critique, commentary, and independent analyses. In Kaufman

A. S.

Raiford

S. E.

Coalson

D. L.

(Eds.), Intelligent testing with the WISC-V (pp. 683-702). Hoboken, NJ: Wiley.

Hoyle

R. H.

Isherwood

J. C.

(2013). Reporting results from structural equation modeling analyses in Archives of Scientific Psychology. Archives of Scientific Psychology, 1, 14-22. doi:10.1037/arc0000004

Little

T. D.

Card

N. A.

Slegers

D. W.

Ledford

E. C.

(2007). Representing contextual effects in multiple-group MACS models. In Little

T. D.

Bovaird

J. A.

Card

N. A.

(Eds.), Modeling ecological and contextual effects in longitudinal studies (pp. 121-147). Mahwah, NJ: Lawrence Erlbaum.

Little

T. D.

Slegers

D. W.

Card

N. A.

(2006). A non-arbitrary method of identifying and scaling latent variables in SEM and MACS models. Structural Equation Modeling: A Multidisciplinary Journal, 13, 59-72. doi:10.1207/s15328007sem1301_3

10.

Loehlin

J. C.

(2004). Latent variable models: An introduction to factor, path, and structural equation analysis (4th ed.). Mahwah, NJ: Lawrence Erlbaum.

11.

Nosek

B. A.

Alter

Banks

G. C.

Borsboom

Bowman

S. D.

Breckler

S. J.

. . . Yarkoni

(2015). Promoting an open research culture: Author guidelines for journals could help to promote transparency, openness, and reproducibility. Science, 348, 1422-1425. doi:10.1126/science.aab2374

12.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349. doi:10.1126/science.aac4716

13.

R Development Core Team. (2015). R: A Language and Environment for Statistical Computing (Version 3.2) [Computer program]. Vienna, Austria: R Foundation for Statistical Computing.

14.

Rosseel

(2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1-36. Retrieved from http://www.jstatsoft.org/v48/i02/

15.

Wechsler

(2014a). Wechsler Intelligence Scale for Children (5th ed.). Bloomington, MN: PsychCorp.

16.

Wechsler

(2014b). WISC-IV technical and interpretative manual. Bloomington, MN: PsychCorp.

Reproducing the Wechsler Intelligence Scale for Children–Fifth Edition

Abstract

Keywords

Effects Coding

Modified Effects-Coding Method Used for WISC-V Models

Cautions on Using Modified Effects Coding

Footnotes

Appendix

Declaration of Conflicting Interests

Funding

Notes

References