Abstract
Two- and three-level designs in educational and psychological research can involve entire populations of Level-3 and possibly Level-2 units, such as schools and educational districts nested within a given state, or neighborhoods and counties in a state. Such a design is of increasing relevance in empirical research owing to the growing popularity of large-scale studies in these and cognate disciplines. The present note discusses a readily applicable procedure for point-and-interval estimation of the proportions of second- and third-level variances in such multilevel settings, which may also be employed in model choice considerations regarding ensuing analyses for response variables of interest. The method is developed within the framework of the latent variable modeling methodology, is readily utilized with widely used software, and is illustrated with an example.
Keywords
Multilevel modeling has witnessed an impressive growth in popularity across the educational and psychological sciences over the past several decades (e.g., Rabe-Hesketh & Skrondal, 2012). This may well be explained at least in part by the increasing number of studies in these and cognate disciplines that collect hierarchical data characterized by observations nested within higher order units (e.g., Hox et al., 2018). Two- and three-level models, which represent currently the most widely used classes of hierarchical models, account for the resulting lack of observation independence within clusters and offer a number of analytic benefits that cannot be matched by single-level modeling approaches (e.g., Goldstein, 2011). An important question that often arises in such multilevel designs deals with the extent to which variability between Level-2 and across Level-3 units is associated with observed variability at the lowest level of the data hierarchy, which usually consists of the units of analysis that one or more response variables are observed on (e.g., Raykov et al., 2016; see also Raykov, 2010).
Contemporary educational and psychological study designs make it often possible to collect data from all Level-3 and potentially all Level-2 units in multilevel settings, in particular those where not all Level 1 units are included but are instead sampled due to unmanageably large populations of studied subjects of interest (e.g., Carlson et al., 2020; Francis & Darity, 2021). For instance, studies of mathematics ability frequently obtain data from all educational districts in a state and possibly all schools within each district, while not all students are administered a given measure(s) due to time and related logistic constraints or because information for district-level decisions are needed within shorter timeframes than required for exhaustive study of the complete student body. Similarly, social surveys often aim to collect data from all neighborhoods in a county and possibly all counties in a state, whereas a census of its entire population is not feasible or applicable and likely to be prohibitive in terms of cost and time involved (see also the “Conclusion” section; e.g., Menold & Zuell, 2016). Under those circumstances, the question about proportion of second- and/or third-level variances in Level-1 outcome variability receives particular importance. The reason is that it becomes then possible to precisely evaluate higher order variance, since the entire pertinent population of units at those levels is observed. This opportunity is of special interest to empirical scholars as it can also improve model choice in studies with hierarchical data.
The present note addresses these variance proportion queries by discussing a latent variable modeling–based procedure for evaluation of third- and second-level percentage observed variances in three-level designs characterized by exhaustive observation of Level-3 and possibly Level-2 units and sampling at Level 1. The aim of the remaining discussion is to extend thereby the earlier work by Raykov (2010) and Raykov et al. (2016), which focused on designs with sampling at all three levels, to settings where population data are available (a) at both Level 2 and Level 3 or (b) only at Level 3. The resulting point and interval estimates supply important information about the extent to which second-level and third-level variability across all pertinent population units contribute to the variance of examined measures in the Level-1 population under investigation, such as students, respondents, patients, or clients. The outlined procedure is illustrated using an example.
Background and Notation
The following discussion is based on a generic three-level setting, for example, students nested within schools that in turn are clustered within educational districts, or respondents nested within neighborhoods clustered within counties (cf. Raykov, 2010; Raykov et al., 2016). Suppose yijk denotes the score on an outcome variable, say for a certain ability test, of the ith Level-1 unit (e.g., student) in the jth Level-2 unit (school) from the kth Level-3 unit (educational district; i = 1, . . ., njk; j = 1, . . ., Jk, k = 1, . . ., K; njk, Jk, K > 1). The (fully) unconditional three-level model, by analogy to the popular (fully) unconditional two-level model (e.g., Goldstein, 2011), represents the decomposition of the response score into a grand mean and corresponding first-, second- and third-level mean deviations. This model is defined as follows:
and
In Equations (1) through (3), eijk denotes the discrepancy between the outcome measure yijk from its mean π0jk in its (j, k)th Level-2 unit; r0jk is the deviation of this mean from that of the kth Level-3 unit, β00k; and u00k stands for the discrepancy between the latter and the grand mean γ000 on the dependent variable of interest. As usual, we assume that the mean deviations eijk, r0jk, and u00k are uncorrelated with each other as well as normally distributed with mean 0 and corresponding variances σ 2 , τπ, and τβ (e.g., Rabe-Hesketh & Skrondal, 2012). 1 The associated three-level mixed model results then by substitution as
i = 1, . . ., njk; j = 1, . . ., Jk; k = 1, . . ., K, where γ000 is its fixed effect and the remaining terms in its right-hand side (RHS) are the respective Level-3, Level-2, and Level-1 random effects (from left to right).
As indicated earlier, this note is concerned with the query of how to point and interval estimate the variance in the observed measure y stemming from variability between Level-2 and across Level-3 units, when at the latter or at both of these levels all of the population units are observed, unlike at Level 1 where the units are only sampled from the corresponding population due to various limiting constraints. For instance, as in the earlier example with students nested within schools within educational districts, the remainder of the note will be dealing with the case in which all districts (within a given state, say) and possibly also all schools within each district are available for examination and measurement on the response variable of interest for random samples of students, subjects, patients, or clients within each of the Level-2 units.
Proportion of Third- and Second-Level Variances in a Response Measure With Completely Observed Level-2 and/or Level-3 Populations
Using Equation (4) and following Raykov et al. (2016), the population proportion of second-level variance in the observed outcome is defined as
and referred to as “proportion of second-level variance” (PSLV), with all quantities appearing in the RHS of Equation (5) being population parameters. When it is appropriate to use the maximum likelihood (ML) method of estimation, an ML point estimator of the PSLV is obtained by substituting the ML estimators of the parameters involved in the RHS of Equation (5), due to the ML invariance property (e.g., Casella & Berger, 2002).
Similarly, as in Raykov (2010) the population proportion of third-level variance (PTLV) in the observed response, denoted δ, is defined as
and by analogy its point estimator results by substitution of the pertinent ML estimators when ML use is appropriate. Procedures for interval estimation of the PTLV and PSLV, based on an initial monotone transformation approach (Browne, 1982), are outlined and discussed in detail in Raykov (2010) and Raykov et al. (2016), respectively, where the relevant software applications are also described. This approach was used in Raykov (2010) and Raykov et al. (2016) in a three-level setting where samples from all three levels were available. However, as demonstrated below, the same transformation approach to interval estimation is also applicable in the setting of concern here, which differs from that earlier research in the complete examination of Level-3 and possibly Level-2 unit populations while sampling only from the Level-1 population of units of analysis.
Point Estimation
For the aims of this note, the relevant counterparts of the PTLV and PSLV indices in the above Equations (5) and (6) are readily obtained for the case where in at least one of the higher order levels all pertinent units are observed. To emphasize the difference from what may be viewed at present as usual multilevel designs with proper samples from any relevant population in educational and psychological research, we denote in this article the respective population variances at third and second levels as tβ and tπ when all of their units are available for observation and examination. In such cases, the respective population PTLV index of concern in the following discussion, denoted δ(2, 3), is
and the relevant population PSLV index, denoted κ(2, 3), is
Similarly, when only the units at Level 3 are observed exhaustively, the corresponding population PTLV and PSLV indices of importance for the remainder are
and
We next observe that the RHS of Equation (4) contains three latent variables—the Level-3, Level-2, and Level-1 random effects u00k, r0jk, and eijk, respectively. For this reason, that equation can be viewed as defining a latent variable model (e.g., B. O. Muthén, 2002). As such, it can be fitted using latent variable modeling software, for instance the popular package Mplus (L. K. Muthén & Muthén, 2021; cf. Raykov et al., 2016). Then, in three-level studies with all units observed at Levels 2 and 3, this model actually furnishes in the resulting respective estimates the values of the above population quantities tπ and tβ. By direct substitution of the latter, point estimates are then rendered of the PTLV and PSLV indices δ(2,3) and κ(2,3) following correspondingly Equations (7) and (8). All of these activities are automated in a latent variable modeling application, for instance with Mplus, by defining these indices as external parameters representing the right-hand sides of their defining Equations (7) and (8), respectively. (See Appendix A for the Mplus source code needed to attain this aim, and the next section for an example.) Likewise, in three-level studies where only the Level-3 units are exhaustively observed, use of the same generic approach with substitution only of the Level-3 variance tβ furnishes point estimates of the PTLV and PSLV indices δ(3) and κ(3) as defined in Equations (9) and (10) of relevance then (cf. Raykov et al., 2016). An empirical example of the discussed procedure for point estimation of these PTLV and PSLV indices for three-level studies with exhaustive observation of all higher order units is provided in the following illustration section.
Interval Estimation
Interval estimates of the PTLV and PSLV indices in Equations (7) through (10) that are the concern of this article are rendered in three steps. First, the population response variances at each level where all units are observed are furnished as described in the immediately preceding subsection. Second, these population values are substituted into the right-hand sides of the pertinent of Equations (7) through (10), and the same estimation procedure is utilized again, using a correspondingly restrictive version of the initially fitted three-level model (see next section for a specific application). The benefit of its utilization lies then in the provision of standard errors for the respective PTLV or PSLV parameter estimates. These standard errors, which are critical for achieving the aims of the present note, would not be obtained in conventional (default) applications of the software employed in three-level modeling (see also next); in addition, these standard applications do not provide confidence intervals of the PSLV and PTLV indices of concern here. In the third and final step, with the corresponding pairs of point estimates and standard errors for the PTLV or PSLV indices obtained in this manner, use of the initial monotone transformation approach mentioned above (e.g., Raykov, 2010; Raykov et al., 2016) renders a confidence interval at any prespecified confidence level (e.g., 95%) for any of these indices. This interval estimation method is automated in the R-function supplied in Appendix B.
We illustrate next the outlined approach to point and interval estimation of the proportions of second- and third-level variance in designs with exhaustive observation of second- and third-level units.
Illustration on Data
In order to demonstrate the applicability and utility of the discussed estimation method, we use adapted data from a three-level study (duToit & duToit, 2001; the data set employed below can be obtained from the authors on request). In it, consider a mathematics ability measure administered to n = 3,134 students randomly sampled from the 50 U.S. states in the nine geographic regions of the country. With these features, the study is one where all Level-2 units (states) and all Level-3 units (regions) are observed. These types of studies are becoming increasingly more relevant in educational research, when for instance ability-related data are collected from sampled students from all schools within all districts in a state of interest.
To point and interval estimate the PTLV and PSLV indices δ(2,3) and κ(2,3) of concern, as defined in Equations (7) and (8), we apply the three-step procedure outlined in the previous section. Accordingly, we first fit the three-level model defined in Equations (1) through (3) using the robust maximum likelihood method in order to accommodate some limited violations of normality (L. K. Muthén & Muthén, 2021). In Appendix A, the Mplus source code needed for this step is provided; its model constraint section renders statistically valid standard errors of the PSLV and PTLV indices, which make their interval estimation possible subsequently (see below for details). Owing to the fitted three-level model being saturated, it is associated with perfect fit (to the covariance and mean structure). Its estimates of the four parameters of relevance here are presented in Table 1. 2
Grand Mean and Random Effect Estimates for the Three-Level Model Fitted at Step 1 of Outlined Estimation Procedure.
Note. The symbol “—” is used to indicate that a respective standard error is not applicable or relevant (see main text).
As seen from Table 1 and based on the earlier discussion, the third-level variance, τβ, is determined as .212 for the studied population; similarly, the second-level variance, τπ, is determined as .045. We stress that .212 and .045 are correspondingly the population values, denoted previously tπ and tβ, of the second- and third-level variances of the outcome measure of concern. As indicated earlier, this follows from the fact that all Level-2 and Level-3 units in the population of interest are exhaustively observed in the study under consideration (see also Endnote 2).
In the second step of the outlined estimation method, we fit the same three-level model in Equations (1) through (3) while fixing the third- and second-level variances at .212 and .045, respectively, following the relevant Equations (7) and (8). (See Note 2 to Appendix A, which provides also the Mplus source code needed for that analysis.) This leads to the PTLV point estimate 0.219 (.002) and the PSLV point estimate .046 (.012; standard errors stated in parentheses).
In the third step of the procedure, using their point estimates and standard errors just obtained, we interval estimate the PTLV and PSLV indices with the earlier mentioned initial transformation approach that as indicated before is implemented in the R-function “ci.pslv” in Appendix B (cf. Raykov et al., 2016). In this way, we obtain a 95%-confidence interval (CI) for the PSLV, κ(2,3), as (.042, .050). This finding suggests that a practically highly plausible interval of values for the PSLV in the population of interest ranges between 4.2% and 5%. In the same way, the 95% CI for the PTLV, δ(2,3), is found to be (.189, .236). This suggests that a practically highly plausible interval of values for the PTLV in the studied population ranges between 18.9% and 23.6%. We interpret these findings as suggesting that in the relevant U.S. student population there is (a) notable between-state-variance and (b) marked cross region-variance. These variances, along with (c) the substantial within-state variance (estimated at 73.5% = 100% − 4.6% −21.9% in this study), make up the observed overall student variance in the mathematics ability measure of concern.
Conclusion
This note dealt with point and interval estimation of the proportions of second- and third-level variances in observed outcome scores in multilevel designs characterized by exhaustive observation of all Level-3 and possibly all Level-2 units, and sampling from Level-1 populations such as students, patients, respondents, or clients. Following a trend of increasing popularity of large-scale studies in the educational and behavioral sciences (e.g., Carlson et al., 2020; Francis & Darity, 2021; Menold & Zuell, 2016), the article offered a readily and widely applicable method for evaluation of these proportions. The discussed procedure provided point estimates and large-sample confidence intervals for the pertinent variance proportions, yielding respective ranges of plausible values for the population percentages (after multiplication by 100 of their endpoints) of second- and third-level variances in observed outcome differences for such empirical investigations. The note complements earlier research by Raykov (2010) and Raykov et al. (2016) by providing an analytic procedure not contained or of relevance in those prior sources. That earlier research was concerned exclusively with the setting where sampling from all three level unit populations was conducted, rather than exhaustive examination only of the Level-3 and possibly Level-2 populations as in this article. The present method is applicable when only the Level-3 population is completely studied, or when both the Level-2 and Level-3 populations are completely available. In this connection, it is worth emphasizing that irrespective of which level(s) population is exhaustively observed, the confidence intervals obtained with the approach in this article are relevant for the entire population of Level-1 units (students, patients, respondents, or clients), which is the ultimate target of inference for the respective proportion of explained variance indices.
Several limitations of the discussed estimation procedure need to be noted. As assumed at the outset, the method is best applied with normally distributed outcome measures. It may be argued that when used with robust maximum likelihood, the procedure possesses some robustness with regard to limited deviations from normality, particularly, regarding the resulting confidence intervals. We encourage future research addressing in more detail this robustness issue. Similarly, the underlying estimation procedure is best used with large samples with respect to number of units at any level where the pertinent population is not exhaustively studied, and hence, with large overall study sizes, due to it being based on ML estimation that itself is grounded in asymptotic statistical theory (e.g., Casella & Berger, 2002). Additional research is needed, which is to be aimed at developing possible guidelines or procedures that may be followed regarding sample size requirements for sampled populations, in order for the underlying asymptotic theory to obtain practical relevance. Further, in settings where a researcher intended to study an entire population at Level-2 and/or Level-3 units but in reality did not achieve this goal due to various logistic reasons and constraints, and ultimately managed only to obtain random sample(s) from them, it is recommended to use instead the applicable methods in Raykov (2010) or Raykov et al. (2016) that presume sampling from all three respective populations of units. Moreover, the underlying statistical estimation approach assumes a limited sampling fraction at Level 1 (ratio of sample to population size), which implies that the method discussed in this note is best used with sampling fractions up to 5% (cf. Cochran, 1977, p. 23). We would conjecture that the procedure would yield largely trustworthy results also with somewhat higher sampling fractions, but such a claim needs to be examined separately and not considered dependable until evidence and conditions are provided for its validity, which goes beyond the confines of this article. In empirical studies where the sampling fraction is well within the double digits in the percentage metric, we would argue that corresponding corrections for finite populations may well need to be introduced in the method and resulting estimates, which is similarly a topic for future research and outside the scope of this article (e.g., Heeringa et al., 2017).
Last but not least, since this article is concerned with the three-level empirical setting where random sampling of the Level-1 population is carried out rather than exhaustively examined, both the point and interval estimates of the PSLV and PTLV indices will be numerically affected (a) by the particular sample of Level-1 units and thus (b) by the proportion sampled lowest-level units of analysis. With increasing sampling proportion, however, the instability of these estimates will diminish as follows from basic sampling principles (e.g., Cochran, 1977).
In conclusion, this note offers to educational and psychological researchers a readily applicable means for point and interval estimation of proportions intermediate and highest-level variability in three-level designs with complete observation of all units at one or both of these levels and sampling at the lowest level, which are becoming increasingly popular in the educational and behavioral disciplines following a marked trend of growing interest in large-scale studies.
Footnotes
Appendix A
Appendix B
Acknowledgements
We are grateful to M. duToit and G. A. Marcoulides for valuable discussions on multilevel modeling, as well as to two anonymous referees for their critical comments on an earlier version of the article, which contributed substantially to its improvement.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
