Abstract
Given the natural hierarchical structure in school-setting data, multilevel modeling (MLM) has been widely employed in education research using a number of different statistical software packages. The purpose of this article is to review a recent feature of Stat-JR, the statistical analysis assistants (SAAs) embedded in Stat-JR (Version 1.0.5), with regard to their use for MLM. In this article, we review the features of Stat-JR’s SAAs and illustrate how to implement SAAs, using one of the Stat-JR interfaces to analyze multilevel models for the 1982 High School and Beyond data set. Results from Stat-JR SAA are compared with the results using HLM7.01 software. We also discuss recommendations and implications for future users of SAAs.
Given the natural hierarchical structure in school-setting data, multilevel modeling (MLM) has been widely employed in education research using a number of different statistical software packages. The purpose of this article is to review a recent feature of Stat-JR, the statistical analysis assistants (SAAs) embedded in Stat-JR (Version 1.0.5), introduced in November 2017, with regard to their use for MLM. Simply speaking, SAAs perform full statistical analyses based on users’ responses to a limited number of predefined questions about the user’s data set and intended analysis (e.g., Do you want to include any continuous predictors as candidates for inclusion in the model?). SAAs can produce annotated reports of the data analysis results, including context-specific text for interpreting tables and figures. In this article, we review the features of Stat-JR’s SAAs and illustrate how to implement SAAs, using one of the Stat-JR interfaces to analyze multilevel models for the 1982 High School and Beyond (HS&B) data set. We also discuss recommendations and implications for future users of SAAs.
Background
Since the seminal work of Raudenbush and Bryk (2002), hierarchical linear modeling (HLM), MLM (Goldstein, 2011), and random effects modeling (Laird & Ware, 1982) have become standard approaches for the analysis of educational data when the data structure is nested (e.g., students are nested within schools). With the development of these data analysis techniques, many statistical software packages, such as HLM, MLwiN, SPSS Mixed, SAS Proc Mixed, Stata meglm, and LME-R, have been developed and made available to researchers for analyzing complex relationships between contextual factors and individual outcomes. While the various software packages provide more options for multilevel analysis, it is challenging for researchers to switch from one program to another. When switching to a new program—for example, when working on a new project with different collaborators who use different software—researchers must go through the entire process of learning the new software, from how to conduct the analysis to how to interpret the result outputs. This can be tedious and time-consuming work, depending on how adept a researcher is at learning different software languages.
Stat-JR was designed to be an easy-to-use statistical tool that helps users interoperate different statistical software packages to implement advanced statistical applications (Browne, Charlton, Michaelides, et al., 2017b). Specifically, Stat-JR allows researchers to use different software packages without even needing to learn how to use them. The concept of Stat-JR originated with Jon Rasbash (“JR”); it was designed as a user interface that links an imported data set to a variety of statistical software packages, including R, Stata, SPSS, SAS, and MLwiN, through statistical analysis templates. Templates are the main building blocks in Stat-JR; they are segments of computer code that perform various operations such as drawing a graph, analyzing descriptive statistics, or fitting a particular statistical model. Because the Stat-JR system is written in the Python language (Browne, Charlton, Michaelides, et al., 2017a), users who are familiar with Python programming can edit the existing templates provided by the system or even create new templates for conducting data analysis.
In this review article, we focus on how beginners can use existing templates for MLM analysis. Thus, we do not assume prior experience in Python programming. Since Stat-JR is relatively new software for most readers of the Journal of Educational and Behavioral Statistics, we will first briefly overview the Stat-JR interfaces and will then introduce the new features of SAAs in the most recent version of Stat-JR to perform an MLM analysis.
Stat-JR 1.0.5
Since the release of the beta version of Stat-JR 0.1 in 2012, a number of updates and functions have been added to the program, leading to the currently available Version 1.0.5. Stat-JR 1.0.5 offers three different user interfaces: Template Reading and Execution Environment (TREE), Logging and Execution of Analysis Flows (LEAF), and Documents With Embedded Execution and Provenance (DEEP).
Stat-JR TREE
This is an interface for implementing a predesigned template (Browne, Charlton, Michaelides, et al., 2017b). Users can link a data set to an existing template that is included in Stat-JR or that has previously been created by other users, in order to conduct analytical operations. Users provide input by answering questions such as Which continuous variable is your response? in a menu-driven point-and-click interface (see Figure 1) that is simple to use and requires no prior programming knowledge.

Starting prompt for Stat-JR Template Reading and Execution Environment with drop-down list.
Users can specify the statistical package (e.g., SPSS) to be used for performing the data analysis if that statistical package has been installed on their local computer. After a user answers the questions, TREE will perform the data analysis on the preassigned statistical package. Statistical codes for running the intended analysis can also be created for different statistical packages, and users can edit existing templates to modify the analysis for their specific use. TREE is regarded as the basic interface for the Stat-JR system insofar as it allows users to work with one template at a time.
Stat-JR LEAF
Although a template is a basic unit for data manipulation or model fitting, it is possible to connect multiple templates in Stat-JR and integrate them into a workflow to perform a series of desired analyses. Stat-JR LEAF provides an environment for creating, modifying, and performing workflows for data analysis (Browne, Parker, Charlton, Michaelides, & Moreau, 2017). Using the Blockly programming system, as shown in Figure 2, users can visualize the program commands in blocks to build their own workflow for data analysis with LEAF. Each block in Figure 2 represents a particular section of the code (e.g., select a data set, set the variables, call templates) for conducting a data analysis.

Starting prompt for Stat-JR Logging and Execution of Analysis Flows using Blockly programming.
Stat-JR DEEP
Beyond templates and workflows for statistical analyses, the DEEP eBook system was introduced in Stat-JR as part of the Economic and Social Research Council (ESRC)-funded e-STAT project (see http://www.bristol.ac.uk/cmm/research/estat/objectives.html). The DEEP eBook system is intended to provide a more user-friendly environment by combining the narratives of traditional books and the interaction of software packages (Michaelides, Yang, Browne, Charlton, & Parker, 2017). The DEEP eBooks allow users to implement their desired statistical analysis through an interactive book-reading process. More specifically, an eBook has the form of a book with several pages that can be browsed using the DEEP interface. When browsing an eBook in DEEP, users are asked several questions related to their intended analysis in the first few pages of the eBook. The users’ responses to these questions determine the analytical strategy used and the results reported in later pages of the eBook, that is, the contents of the book are based on the choices that users make, such as the data set and variables to be used in the analysis. In this article, we use the DEEP interface to review the features of SAAs. We will provide more details on using DEEP in a later section where we illustrate its application.
All three Stat-JR interfaces are accessed through a default Web browser (e.g., Chrome, Firefox), for which the user is prompted when starting the program. As shown in Figure 3, the Web browser is opened with a black command window showing the default settings for the program. This command window is used to track the record of the activity for data analysis by presenting the Python programming codes. In Figure 3, the warning prompts such as “WARNING:root:Failed to load package WinBUGS (WinBUGS not found)” are not necessarily problematic, but it shows that Stat-JR fails to load the third-party statistical packages to interoperate from the default directory. If it generates the warning message for the package that the user intends to interoperate, the directory needs to be updated in the Settings.

Command window for Stat-JR interfaces.
Stat-JR offers two options for choosing a model estimation engine for running applications: either an in-house eStat engine or a third-party external package. The eStat engine is a built-in algebra system that runs with Markov chain Monte Carlo estimation procedures. Use of a third-party external engine is the unique feature of Stat-JR, which provides “interoperability” with other software. To use an external package, the third-party package must first be installed on the user’s machine, and the software must be appropriately located in the Stat-JR interfaces. The directory can be updated in the Settings for all three interfaces. (A list of packages supported by Stat-JR systems is available at http://www.bristol.ac.uk/cmm/software/statjr/downloads/additionalsoft.html)
SAAs
The SAAs are a new feature of Stat-JR 1.0.5 that provides annotated reports of data analysis results embedded in the Stat-JR templates (Browne, Charlton, Parker, et al., 2017). As their name suggests, SAAs serve as assistants for conducting statistical analyses; they do this by asking the user a series of questions about the user’s problem and data set. Then the SAAs execute the analysis and produce an annotated output to describe the analysis results. One of the challenging tasks in learning new statistical software is learning how to understand and interpret result outputs since new users will be unfamiliar with the output files’ format and/or content. Given that interoperability with other software packages is one of the primary functions of the Stat-JR interfaces, assistance in interpreting the result outputs is an essential element of Stat-JR. SAAs are available in all three interfaces, from a single template under the TREE interface (e.g., SAAex1_2 for mean comparisons or correlations between two variables) to more complex analyses using an eBook under DEEP (e.g., the SAA for many N level multilevel models for MLM) as well as in the LEAF workflow.
In this article, we illustrate the use of SAAs in a Stat-JR DEEP eBook for analyzing a series of multilevel models. For this example, we assume that readers have previous experience using HLM7.01 (Raudenbush, Bryk, & Congdon, 2013) but need to switch to MLwiN, which they have not used before. MLwiN is a specialized software for fitting multilevel models (Charlton, Rasbash, Browne, Healy, & Cameron, 2017) and is one of the external software packages supported by Stat-JR systems. We will demonstrate how to use Stat-JR SAAs to operate MLwiN in order to conduct the MLM analyses. Following an illustration of how to use a DEEP eBook with the assistance of SAAs, we compare the model results produced by the SAAs with the HLM7 results. For readers who are interested in learning MLwiN in more detail, the MLwinN User’s Manual provides step-by-step instructions on how to analyze the multilevel models (Rasbash, Steele, Browne, & Goldstein, 2017).
Illustrative Example: HS&B Data Set
The Data
We analyze the data from the 1982 HS&B survey to illustrate the features of Stat-JR’s SAAs that we discuss. The HS&B survey includes a sample of U.S. public and Catholic high schools. Following Raudenbush and Bryk (2002), we used a subsample including information about 7,185 students (i) nested within 160 schools (90 public and 70 Catholic; j). The average school size was 45 students. The analytical variables included two student-level variables, math achievement (outcome; mean = 12.75, standard deviation [SD] = 6.88) and socioeconomic status (SES; mean = 0.00; SD = 0.78), as well as one school-level variable, sector (0 = public, 1 = Catholic; mean = 0.44). Considering the cluster effect of the data, multilevel models were applied for data analysis. For our data set, we include only the complete cases without considering missing data.
Stat-JR’s DEEP and an eBook
We analyzed data in Stat-JR’s DEEP via the Chrome browser. Both Chrome and Firefox (but not Internet Explorer) are recommended for implementing Stat-JR’s SAAs (Browne, Charlton, Parker, et al., 2017). Users can refer to the Stat-JR 1.0.5 user guide for system installation and configuration and eBook importation (Browne, Charlton, Michaelides, et al., 2017a). An eBook has the form of a single zip file, which can be saved on the user’s hard disk or a flash drive. Users need to import the zip file for an eBook into the eBook system to operate the eBook functions. For our illustration, we used an eBook created by William Browne and Chris Charlton called combined.zip. When imported into DEEP, the eBook appears as SAA for many N level multilevel models, which uses MLwiN as the estimation engine. This eBook generates 12 pages of reports on the MLM analysis results. Table 1 shows the subtitles for the pages, which contain detailed annotations for the analysis results. Users can find combined.zip in the C:\Program Files\StatJR\ebooks folder.
Subheadings of the DEEP Interface eBook of SAA for many N level multilevel models
Importing Data to Stat-JR
We suggest that users first import their data set using the TREE or LEAF interface before using the data set with the DEEP interface. Stat-JR allows data sets in Stata format (i.e., with the extension .dta) and in .txt format to be imported. A data set in Stata format can be imported in two ways. First, researchers can import the data set into the TREE or LEAF interface via Dataset > Upload (menu options in the black bar at the top of the browser window), which will upload the data set to the temporary memory cache. Second, a Stata format data set can be directly saved in the folder C:\Users\YourName\.statjr\datasets, and then users select Debug > Reload datasets in the TREE or LEAF interface. For a .txt file, researchers can use the template LoadTextFile in the TREE interface to save the data set to the temporary memory cache.
Analytical Models
Following Raudenbush and Bryk (2002), four hierarchical models were specified to fit the data using the combined.zip consecutively based on their significance in univariable fitting. We answer No to including random slopes and interactions. We are then offered a choice of the likelihood ratio test or the Wald’s test for comparing models. We choose likelihood ratio (Wald works out p values directly from a particular model, while the likelihood ratio test needs to compare pairs of models; see Browne, Charlton, Parker, et al., 2017, for more information). After the above questions are answered, summary statistics (i.e., number of observations, mean, SD, and median) for the outcome variable are provided on the next page, with a histogram showing the distribution of the response (shown in Figure 4). While the user may need to make a subjective decision about transforming the variable, the SAA provides advice for diagnosing the normality of the outcome variable. In this example analysis, the skewness value is −0.181, and the SAA states, “Here the statistical significance may be to some degree due to the large sample size as from a practical perspective values of skew less than 2 in magnitude are not considered too big a skew.” For the case of normal responses, we answer No to logging the response variable and then click Submit to run the model.

(A, B) Images of the specification for one-way analysis of variance model on the pages 1 and 2 of eBook.
The results of the one-way analysis of variance (ANOVA) model are reported on page 5 of the eBook (Figure 5). First, model fit statistics (i.e., deviance, likelihood ratio [LR], and p value) are given in the report. While most MLM software generates these model-fit-related indices, the SAA kindly explains how to read the model fit results and recommends the best fitting model based on the LR test. Next, parameter estimates from the selected model are given at the bottom of the page; these are the intercept (γ00), school variance (τ00), and level-1 variance (σ2) in the current example. The ICC is called the variance partitioning coefficient (VPC) in this eBook, presumably following MLwiN, and the value is automatically calculated and reported under the parameter estimates table.

Results of one-way analysis of variance model.
Regression with means-as-outcomes model
In this model, means from each of many schools are predicted by a school-level predictor, the sector (Wj ; 1 = Catholic, 0 = public):
where γ01 is the regression coefficient of Wj
and

Images of including a school-level predictor Sector into the model.
SAAs encourage users to explore the characteristics of the predictors included in the model. For example, descriptive statistics (e.g., percentage of each category, mean, SDs) of the predictor sector are given on pages 3 and 4, with further test results regarding whether there is a mean difference in the response between public and Catholic schools. In this analysis, the mean difference is 2.806, with the Catholic schools having the larger sample mean. The SAA first presents the results of the model without any predictors on page 5, which is an identical model to the one-way ANOVA in the previous section. On page 6, the SAA reports the statistical significance of each predictor of interest within a random intercept model. Only statistically significant predictors will be taken forward into the next stage of modeling. The sector has shown a statistically significant predictive power (p < .001) and thus will be carried forward. The full results are presented on page 8. The parameter estimates of the school-level predictor sector (γ01), the intercept (γ00), the school variance (τ00), and Level-1 variance (σ2) are reported at the bottom of the page. In this example, the predictive power of the school-level predictor sector is statistically significant (p < .001) (see Figure 7).

Results of regression with means-as-outcomes model.
Random-coefficient model
This model considers the student-level predictor SES (
where γ10 and γ00 are the average regression slope and the intercept across schools, respectively,
The specification of the regression with a means-as-outcomes model is very similar to the specification of the one-way ANOVA model. In order to include the student-level predictor SES in the model, we answer Yes to including continuous predictors as candidates, and we select SES for the continuous predictor in the following question. Finally, we answer Yes to request the test for the random slope of SES (see Figure 8).

Images of including a student-level predictor socioeconomic status (SES) into the model and to request the test for random slope of SES.
Descriptive statistics (mean, SD, and median) for the SES are reported on page 3, followed by the Pearson’s correlation and Spearman rank correlation between the response and SES on page 4. On page 6, SES shows a statistically significant predictive power (p < .001), and thus, it will be taken forward into the next stage of modeling. The full results of the model with random intercept and slope are presented on page 10, titled Adding random slopes (Figure 9). In this example, the predictive power of SES is statistically significant (p < .001), but the random slope of SES (τ11) is not statistically significant (p = .104). Based on these results, at the bottom of the page, the SAA suggests that τ11 and τ01 be constrained to 0 in the final model.

Results of random-coefficient model.
Intercept-and-slope-as-outcomes model
Continuing the previous model, this model further explores whether slopes or intercepts of student variable SES vary between public and Catholic schools by regressing slopes and intercepts on the sector Wj :
where γ11 is the difference in the slope of
Following the procedures shown in Figures 1, 6, and 8, we can easily include a school-level predictor sector and a student-level predictor SES in the model. Furthermore, in order to estimate γ11 in the model, we answer Yes to testing for interactions, as shown in Figure 10.

Image of asking testing interaction effect.
The SAA considers all possible pairwise interactions (including quadratic terms) between the significant predictors within a random-intercept model. Since both sector and SES are statistically significant, the interaction of these two predictors is tested. The first table in Figure 11 shows that the interaction between sector and SES is statistically significant (p < .001). In the second table, the SAA further considers adding the quadratic term of SES (SES × SES) to the model; however, the results show that the quadratic term of SES is not statistically significant, and thus, the SAA concludes that there is no need to include the quadratic term of SES in the final model.

Results of testing interaction effect.
The full results are presented on page 10 of the eBook with the subtitle Adding random slopes, as shown in Figure 12. In this example, the random slope of SES (τ11) is not statistically significant (p = 1.0). Therefore, at the bottom of the page, the SAA excludes τ11 and τ01 from the final model.

Results of intercept- and slope-as-outcomes model.
Comparison of Results From Stat-JR’s SAA and HLM
After analyzing the four basic HLM models with MLwiN via Stat-JR DEEP SAAs, we also analyzed the same models using HLM 7.01 with a program default estimator (i.e., a restricted maximum likelihood [REML] estimator) and compared the model results. Table 2 presents the parameter estimates, standard errors (SEs), and p values based on the outputs from the Stat-JR DEEP SAAs and from HLM7.01. As shown in Table 2, the SAAs and HLM produced approximately identical estimates of the fixed effects across the four models. For random effects, the SAAs provided point estimates of variance components and corresponding SEs, while HLM reported point estimates and p values. The SAAs and HLM had approximately identical estimates of τ00 and τ11 in the one-way ANOVA model, resulting in an identical ICC of 0.18 (recall that ICC is called the VPC in the SAA). For the remaining three models, however, the SAAs produced relatively smaller estimates of τ00 and τ11 but similar or relatively larger σ2 in comparison to HLM. In particular, the SAAs and HLM yielded different point estimates and statistically significant results for τ11 in the random-coefficient model and the intercept-and-slope-as-outcomes model. As previously mentioned, τ11 in the random-coefficient model indicates unconditional variance in the slope. The SAA output showed a nonstatistically significant τ11 (0.40) in the random-coefficient model, suggesting that the slope of students’ SES (
Parameter estimates and standard error from four HLM models using Stat-JR 1.0.5 and HLM 7.01
Note. N/S = not statistically significant; df = degrees of freedom.
a Model 1 = One-way ANOVA; Model 2 = Regression with means-as-outcomes model; Model 3 = Random-coefficient model; Model 4 = Intercept- and slopes-as-outcomes model.
Discussion
In this article, we reviewed the new Stat-JR feature of SAAs for multilevel data analysis in the DEEP interface. Given the fast development of computing power and resources, new software for statistical applications is constantly being released. While it certainly is a great benefit to have more choices for software, it is challenging for many substantive researchers to learn new software under various circumstances. Stat-JR’s interoperability function is useful for switching to a new program without needing to learn how to use the new software. Since the SAAs provide an interpretation of the analysis results, users can easily understand the meaning of the estimated parameters and adopt the findings in their research.
The SAA feature in the DEEP interface can be especially useful for instructors who teach advanced statistics courses for graduate students without strong statistical backgrounds. From the authors’ experiences as instructors of statistics courses for many years, we know that some students struggle with complex model specifications and interpretation of the outputs of the data analysis produced by statistical software packages. Since SAAs in the DEEP eBook provide annotated results for data analysis with suggestions for the next step, they can guide students who are struggling with interpreting the results and who need some assistance regarding further data analysis to be conducted. Instructors can design their own eBooks to teach their statistics courses and develop the course material. The eBooks for an introductory statistics curriculum that interoperates with SPSS are available on the Stat-JR website: http://www.bristol.ac.uk/cmm/software/statjr/downloads/
While the Stat-JR SAA feature has great potential to serve as an assistant for statistical analysis, there are some limitations in the current version (1.0.5) of the SAAs. First and foremost, Stat-JR currently only runs under Windows operating system. To use Stat-JR from a Linux/Mac machine, it should be run through a virtual machine or a Windows emulator. Since this feature is still very new, there are not yet many available eBooks for DEEP interface users to utilize. Although users can search for and import new eBooks from myexperiment.org , an external website for sharing program code and/or packages—including Stat-JR—with other users (linked to the DEEP interface; Figure 13), to date no DEEP eBooks are available on the website. Also, given that eBooks can be freely developed and shared with other users, it will be essential to have cross-validation for the developed contents to verify whether the annotated interpretation is accurate. We expect that more eBooks will be available for operation with more external software packages in future versions of the program.

Import eBook in Documents With Embedded Execution and Provenance interface.
Another issue is the inconsistent analysis results coming from different software packages. As shown in our illustrative example, the SAA and HLM results differed in their parameter estimates of random effects in three conditional models (a regression with means-as-outcomes model, a random-coefficient model, and an intercept-and-slope-as-outcomes model). Because the current DEEP eBook with SAAs utilizes MLwiN as an external estimating engine for the multilevel data analyses, the discrepancy between the parameter estimates of random effects might result from the different estimators adopted by MLwiN and HLM (McCoach et al., 2018). Specifically, in this example, the Iterative Generalized Least Squares (IGLS) via Maximum Likelihood (ML) estimator was used in SAA, while the REML estimator was applied in HLM. The distinct computational algorithms behind IGLS and REML lead to differences in random effects parameter estimates. A comparison of MLwiN’s and HLM’s estimators and their random effects null hypothesis testing approaches is beyond the scope of this review article. Nevertheless, applied researchers need to be aware that the statistical conclusions can differ across different statistical packages. Therefore, relevant information (e.g., estimator, version of the package) about the adopted statistical package should be accessible to audiences for further replication or validation. Given the wider variety of packages available to applied researchers today, more future studies are needed to systematically compare the analytical results derived by different packages (e.g., McCoach et al., 2018).
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
