Abstract

Latent variable modeling refers to a class of models that includes factor analysis, structural equation modeling (SEM), growth curve modeling, item response theory (IRT), and latent class analysis (LCA)—all staple techniques in educational and psychological research. Increasingly these techniques are treated in a common “latent variable” framework. However, statistics textbooks that take this integrated approach, such as Skrondal and Rabe-Hesketh (2004), may be somewhat inaccessible for applied researchers and graduate students in these fields and do not contain examples to use as a starting point. A textbook that provides a bird’s-eye view of these techniques and shows how to apply them in R (R Core Team, 2014) is therefore a very welcome addition to the literature.
Latent Variable Modeling With R (Finch & French, 2015) aims to be such a contribution. According to the publisher’s website, https://www.routledge.com/products/9780415832458, the book is: intended for use in graduate or advanced undergraduate courses in latent variable modeling, factor analysis, structural equation modeling, item response theory, measurement, or multivariate statistics taught in psychology, education, human development, and social and health sciences, researchers in these fields also appreciate this book’s practical approach. The book provides sufficient conceptual background information to serve as a standalone text. Familiarity with basic statistical concepts is assumed but basic knowledge of R is not.
The book is organized by model, covering exploratory factor analysis (EFA; Chapter 2) and confirmatory factor analysis (CFA; Chapter 3), SEM (Chapters 5 and 6), growth curve modeling (Chapter 7), mixture modeling (Chapter 8), and IRT (Chapters 9 and 10). The first chapter introduces a few R commands, whereas the last chapter aims to demonstrate how to simulate data for Monte Carlo and power studies. In each chapter, the model is explained conceptually, an applied example using data from psychology and education is analyzed using various R packages, and the output is discussed in some detail in a didactic manner. Data, as well as Microsoft Word files containing R code and output, are provided on the publisher’s website, https://www.routledge.com/products/9780415832458.
Specific Content of the Book
The first chapter shows how data can be read into R and how to remove missings. The reader is not shown explicitly how to read in the SPSS data files provided on the website, although the appropriate function is mentioned.
Chapter 2 discusses EFA, using factanal and fa from the psych library (Revelle, 2014). The model is introduced using linear algebra. Methods to determine the likely number of factors, including parallel analysis, are demonstrated.
Chapter 3 is about CFA and uses lavaan (Rosseel, 2012; Rosseel, Oberski, Byrnes, Vanbrabant, & Savalei, 2013). Again linear algebra is used to introduce the model, and various fit measures are discussed. An example is then analyzed. The authors have a preference for the diagonally weighted least squares (DWLS) estimation method rather than the more standard robust maximum likelihood (ML); in my view, this choice is somewhat ideosyncratic but defensible.
Chapter 4 introduces SEM. Path models with latent variables are fit, and model comparison is demonstrated didactically.
Chapter 5 discusses multiple group SEM and invariance testing. Contrary to most of the literature on invariance testing (e.g., Davidov, Schmidt, & Billiet, 2011; Steenkamp & Baumgartner, 1998), the authors use the term “invariance” to mean “metric invariance,” and “fully invariant” in this book includes all latent variable means and (co)variances. Generally, the discussion of the model comparisons and their interpretation is clear.
Chapter 6 discusses models with feedback loops (“nonrecursive models” in SEM parlance) using
Chapter 7 on growth curve modeling is concise and clear. Model estimation and interpretation are demonstrated as well as average growth curve plotting using
Chapter 8 discusses latent class models using
Chapters 9 and 10 discuss parametric and nonparametric IRT using the R packages
The final chapter is a nice idea: It discusses how to perform Monte Carlo studies for the models in the other chapters. This may help readers perform simulation studies and power analysis. Simulations are exemplified for an SEM, IRT, and LCA model. The simulation code for SEM models uses self-programmed R code rather than existing packages for SEM simulation. The simulation code for
Cautionary Notes
The setup of the book is sensible and it will certainly contain useful information for those interested in latent variable modeling in R. Unfortunately, however, this first edition of the book is mired by a number of issues. Since some readers will benefit from guidance regarding these issues, some of them are listed in this section.
Throughout
The R packages used in the book are, for the most part, not cited. Readers who wish to use these packages can find information on proper citation under https://stat.ethz.ch/R-manual/R-patched/library/utils/html/citation.html
Chapter 1
There is a syntax error on Page 3 (quotes are missing).
Chapter 2
There is some confusion as to whether Σ is the correlation or the covariance matrix on Page 12. Another syntax error can be found on Page 19 (variable names cannot start with numbers in R). In addition, the output formatting is jumbled so that the reader cannot tell which numbers belong to which columns (one example is on p. 34).
Chapter 3
On Page 53, models with different numbers of factors are compared using a χ2 difference test, which is not the correct reference distribution (e.g., Andrews, 2001). 1 Equation 3.4 is missing a bracket.
Chapter 4
The model is introduced in Equation 4.1 as η = Bγ + ζ rather than the usual η = Bη + Γξ + ζ (e.g., Bollen, 1989). Thus, it omits any effects between endogenous latent variables (ηs), although such effects are actually used in later example models. I also found the use of γ for the latent exogenous variable confusing since this usually denotes a regression coefficient. Page 75 incorrectly asserts that “mathematically a covariance that is bidirectional behaves in the same fashion as a direct path that is unidirectional,” which is not true in general: It is possible to generate nonequivalent models by replacing a bidirectional path with a undirectional one in SEM. 2 I also found it potentially confusing for students that the term “essentially identical [model fit]” is used to indicate both that two models are equivalent (have identical fit) and that one model does not fit much worse than another (have similar fit).
Chapter 5
A notational inconsistency is introduced: x instead of y is now used as observed indicator, and later on in the same chapter x changed to mean a covariate (Equation 5.5 on p. 101). On Page 85, the notation makes it seem that the random variable ε is confused with its variance (it is the variance that is restricted to be equal, not the variable itself). On Page 93, a model is tested with equal intercepts, residual variances, and latent variable (co)variances but free loadings, which is not a standard approach (e.g., Davidov et al., 2011; Schmitt & Kuljanin, 2008; Vandenberg & Lance, 2000).
Chapter 6
Equations 6.5 and 6.7 have a notational inconsistency (B is assigned a new meaning twice). The definition of a nonrecursive model as a model in which “some of the relationships between the latent variables are bidirectional” (glossary on p. 314) is not that employed by other textbooks on SEM (e.g., Bollen, 1989), since this definition excludes cyclical models without bidirectional relationships such as
which are generally also considered nonrecursive.
Chapter 8
The model is introduced in Equation 8.1 that has incorrect subscripts (these should not be 1, 2, … , j but the values of the indicator variables). Equation 8.2 is missing the conditioning on the covariate z and uses the idiosyncratic notation
Chapter 9
Equation 9.1 has an error (j should be J), as does Equation 9.7 (Pearson’s χ2 but omits the square). There are several notational inconsistencies in the equations; for example, p sometimes is the number of parameters and sometimes a probability, and subscripts are omitted or introduced.
Chapter 10
Formatting issues make some output difficult to read (e.g., p. 241).
Chapter 11
Unfortunately, the authors appear to have been unaware of the existence of the
Conclusion
Latent variable modeling comprises an important set of techniques for a wide range of fields, including educational and behavioral statistics. A clear textbook demonstrating how such models can be fit in open source software is therefore a great idea. The authors have done a good job of selecting the methods to be discussed and in providing some easy-to-follow explanations of R output. The book may be most appropriate for more experienced researchers who already know the models behind the techniques and are merely seeking to learn how to apply them in R. Graduate students and others seeking to learn about these techniques should beware of the issues with the current edition of the book listed in the previous section.
Footnotes
Acknowledgment
Thanks are due to Minjeong Jeon and Jesper Tijmstra for their comments on earlier versions of this review.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Netherlands Organization for Scientific Research (NWO) [Veni grant number 451-14-017].
