Abstract
The use of advanced quantitative methods within mixed methods research has been investigated in a limited capacity. In particular, hierarchical linear models are a popular approach to account for multilevel data, such as students within schools, but its use and value as the quantitative strand in a mixed methods study remains unknown. This article examines the role of hierarchical linear modeling in mixed methods research with emphasis on design choice, priority, and rationales. The results from this systematic methodological review suggest that hierarchical linear modeling does not overshadow the contributions of the qualitative strand. Our study contributes to the field of mixed methods research by offering recommendations for the use of hierarchical linear modeling as the quantitative strand in mixed methods studies.
Advanced quantitative methods are often missing in the quantitative strand of mixed methods research (MMR; Ross & Onwuegbuzie, 2014). The analysis for the quantitative strand typically involves basic analyses, such as descriptive statistics, regression methods, and t tests or analysis of variance (Ross & Onwuegbuzie, 2014; Wells et al., 2016). Although basic analyses do not necessarily weaken the quality of a mixed methods study, research questions tend to be less complex, conclusions and inferences can be limited, and the analysis may not provide a comprehensive or detailed understanding of the phenomena (Ross & Onwuegbuzie, 2014). One particular challenge that MMRs may face is failing to account for hierarchical data in their analyses as it may have adverse effects on the study findings and subsequent inferences drawn from those findings.
Education, health, and social science research data on individuals are often hierarchically structured due to influences from higher-level contexts. For example, elementary children (Level 1) within the same classroom (Level 2) tend to be more similar than children across different classrooms. When nested data structures are present, the assumption of independence among units is untenable, which renders traditional analytic techniques inappropriate (e.g., regression; Heck & Thomas, 2015). The consequences of single-level analyses, such as regression, with hierarchical data are well documented and include underestimated variances and standard errors as well as increased Type I error rates (Heck & Thomas, 2015). In more practical terms, researchers risk inaccurate results and erroneous conclusions when single-level analyses are chosen for nested data. The ubiquity of hierarchical data in education and social sciences highlights the need for quantitative methods that can account for such data structures.
Consider a scenario where a researcher is interested in students’ mathematics achievement. In typical regression analyses, the research question might be “how can socioeconomic status predict changes in students’ mathematics achievement in high school?” For a regression analysis, the predictor of socioeconomic status would be defined at the student level (e.g., Level 1). However, socioeconomic status is usually gathered based on income at the parent/guardian level (e.g., Level 2) or can be obtained as a school-level variable (e.g., Level 3). Using a basic regression analysis, the phenomenon of mathematics achievement is only explained based on student-level variables and fails to account for other potentially explanatory variables such as parental education level, teacher job satisfaction, or school type (e.g., private, public, charter). Regardless of mixed methods design type (e.g., convergent, exploratory, explanatory), integrating quantitative results that may not accurately reflect the phenomenon with a qualitative strand might lead to faulty inferences.
Hierarchical linear modeling (HLM; or multilevel models) has become a popular analytical choice to account for hierarchical data as it provides a regression equation for each hierarchical level (Hox, 1998). In addition to accounting for nested data, HLMs can accommodate nonnormal outcome distributions (binary, ordinal, counts; Heck & Thomas, 2015), relatively small sample sizes (McNeish & Stapleton, 2016), and missing data (Enders & Peugh, 2004). HLM can address a variety of research questions (Peugh, 2010), which range from interest in level-one effects (e.g., what predicts changes in bullying perpetration for students in three grades?; Guerra et al., 2011), interest in level-two effects (e.g., does student teachers’ level of detail-focus affect their tacit knowledge-building?; Friend Wise et al., 2009), and interest in cross-level effects (e.g., how do key practices of teaching writing impact student achievement?; Jesson et al., 2018). When implemented appropriately, hierarchical linear models can provide greater insight into research questions than basic statistical methods may achieve.
By using HLM for the hypothetical study aforementioned, the research question could be refined to “how can parent socioeconomic status (Level 2) and school type (Level 3) predict changes in students’ mathematics achievement (Level 1) in high school?” Parent, teacher, and/or school influences can then be included in the analysis to enhance the explanation of students’ mathematics achievement. Hence, using HLM to account for nested data structures provides more precise estimates of any treatment or grouping effects. In short, MMRs can better enhance the explanation and understanding of the phenomenon by integrating more accurate quantitative results with their qualitative findings.
Present Study
Unlike traditional systematic reviews that synthesize existing evidence about a particular treatment, intervention, or practice to determine its efficacy, MMR systematic methodological reviews typically focus on summarizing the methods used within a particular discipline, field of study, or topic area (Plano Clark & Ivankova, 2016). Topics such as bullying (Hong & Espelage, 2012), marketing (Harrison & Reilly, 2011), and health communication (Voorhees & Howell Smith, 2020) have been subjects of methodological reviews. However, there is a growing pool of reviews that focus on methodological issues in MMR studies across a variety of content areas. Some reviews focus on the use of particular qualitative methods within MMR studies such as grounded theory (Guetterman et al., 2019), case study (Guetterman & Fetters, 2018), community-based participatory research (DeJonckheere et al., 2018), and transformative approaches (Sweetman et al., 2010). Other reviews consider broader methodological issues such as sampling (Collins et al., 2007), integration (Boeije et al., 2013), and the added value of conducting MMR (Molina-Azorín, 2011). Only two systematic methodological reviews have focused on the use of advanced quantitative methods in MMR. Ross and Onwuegbuzie (2014) proposed a “quantitative analysis complexity continuum” that classified hierarchical linear models as one of the most rigorous types of quantitative analyses. They assessed MMR in the field of mathematics education research based on their continuum and found frequently missed opportunities where researchers could have adopted more advanced quantitative analytical methods. Plano Clark et al. (2015) studied the use of longitudinal quantitative designs across MMR in the health sciences. They found that key methodological details were missing, unclear, and at times contradictory; hence limiting the utility of these examples to guide researchers in implementing advanced quantitative analyses within a mixed methods framework.
The use of hierarchical linear models has yet to be examined in the context of MMR (MMR + HLM); therefore, this systematic methodological review aims to address that oversight. With a better understanding of how MMR studies have implemented HLM, researchers can be better equipped to describe the nature of complex phenomena while appropriately accounting for nested data structures. The investigation of HLMs is particularly important for educational and social science research due to the ubiquity of nested data in such fields, further improving the validity of results. The aim of this review is to investigate the use of HLMs and how such models can contribute to MMR. To achieve such purpose, the research questions of our systematic review were as follows:
What mixed methods designs (convergent, explanatory, or exploratory) are implemented in mixed methods studies when hierarchical linear models (HLMs) constitute the quantitative strand?
Which strand is given priority in such mixed methods studies?
What are the rationales for including HLMs as the quantitative strand?
How are HLMs integrated with the qualitative strand of mixed method studies?
Using existing published mixed methods studies, we summarize practices of hierarchical linear models and provide recommendations for enhancing the utility of HLMs in MMR. It is important to note that our systematic methodological review of mixed methods HLMs differs from the “multilevel mixed design” discussed in Headley and Plano Clark (2020) and Teddlie and Tashakkori (2009) in that our focus lies in the application of HLM in the quantitative analysis rather than the practice of integrating quantitative and qualitative findings at different levels of analysis.
Method
We conducted a systematic methodological review to rigorously study the methods and procedures of mixed methods published articles. A systematic review involves three phases of planning, conducting, and reporting the review (Kitchenham, 2004). Although the steps in conducting a review can vary between fields and methodologists, the main elements of a systematic methodological review remain the same: identification of relevant studies, selection of primary studies, quality assessment, data extraction, and data synthesis (Kitchenham, 2004; Petticrew & Roberts, 2006). Because we were interested in understanding the methodological choices made by researchers, our synthesis focused on the methods and not the content of the studies in our review.
Identification of Studies
Our initial search of published articles was conducted within three search engines: PsycINFO, ERIC, and EBSCOHost. These three search engines were selected because they represent popular databases in psychology (PsycINFO) and education (ERIC) and provide a comprehensive list of published articles across multiple interdisciplinary databases (EBSCOHost). Because all databases were selected when searching through EBSCOHost, there were some articles that were found in duplicate across the three search engines. The search time frame was not limited in order to find as many primary studies as possible, and as such, we included articles that were published through the beginning of 2019.
For the purpose of our review, MMR refers to a research design involving the collection, analysis, and explicit integration of quantitative and qualitative methods within a single study (Creswell & Plano Clark, 2018; Plano Clark & Ivankova, 2016; Tashakkori & Teddlie, 1998); hence, we only included articles that described an empirical study that in some way mixed or combined quantitative and qualitative strands. Although some methodological reviews limited their searches to studies that used the term “mixed methods” (e.g., Hong & Espelage, 2012), we did not require authors to self-identify as “mixed methods” to be included in the analysis. Our second inclusion criterion was that HLM was used for the quantitative strand. Finally, we only considered studies that were published in peer reviewed journals. Our search terms included
variations for hierarchical linear modeling (hierarchical linear model, hierarchical linear modeling, HLM, multilevel model, multilevel modeling, MLM)
variations for MMR (mixed method, mixed methods, mixed-methods, quantitative and qualitative, as recommended by Creswell & Plano Clark, 2018).
Any full-text article containing these terms was selected during the initial round.
We searched PsycINFO first, followed by ERIC and EBSCOHost, repeating the search in each database to ensure all relevant articles were retrieved. We did not conduct a blind double screening but did collaborate in reviewing the search results. Our initial search identified 222 potential articles. We screened the full text of initial results from each database and removed the following types of publications from further review: duplicate publications, dissertations and theses, books and book reviews, and editorial or methodological articles. After excluding nonempirical publications, the screening process yielded 36 potential studies.
Study Selection Criteria
We then conducted a formal review of the remaining 36 full-text articles to fully vet them for our inclusion criteria. The inclusion criteria specified that an article would be included in our review when (a) the quantitative strand had a hierarchical linear model analysis, (b) the article included the collection and analysis of qualitative data, and (c) there was evidence of integration between quantitative and qualitative strands. We examined the complete text of each article, identifying the studies that met our inclusion criteria. We excluded an additional 10 articles from review during this round for not meeting the inclusion criteria (e.g., lacking evidence of integration). Our final sample included 25 peer-reviewed mixed methods empirical studies that used HLM. See Figure 1 for flow chart of study selection process.

Flow chart of study selection process.
Quality Assessment
Traditionally, the next step in a systematic review is to evaluate the quality of the identified studies to determine if they are of sufficient methodological quality to warrant their inclusion in further analysis. Because we were interested in understanding the methods themselves in MMR + HLM studies, and our pool of MMR + HLM studies was relatively small, we did not conduct a quality assessment and retained all 25 available studies to demonstrate the wide range of information reported and to investigate these common practices.
Data Extraction
Data extraction is the process of obtaining information from each primary study (Kitchenham, 2004). To extract the data, the lead researcher (KB) developed a code list for collecting key information from each article based on the research questions and overall purpose of the review. The code list identified three main sections of information to be collected: article details, HLM methods, and MMR methods.
Article Details
We extracted the following article details: first author’s last name, article title, publication year, journal discipline or field, first author’s discipline, and first author’s country.
Hierarchical Linear Modeling Methods
Our coding of HLM information included the following: author-explicit rationale or description (if available) for conducting HLM, number of hierarchical levels and type of units at each level (e.g., students [Level 1] nested within schools [Level 2]), type of HLM (i.e., cross-sectional or longitudinal research), and sample size at each level.
Mixed Methods Research Methods
We extracted the following MMR information from the studies in our review: rationale for conducting a mixed methods study, basic MMR design, relationship of quantitative and qualitative samples, priority of the quantitative and qualitative strands, and point of integration where quantitative and qualitative strands were mixed. We provide additional information about our extraction rules for these elements.
There are a variety of reasons that researchers may consider when deciding to conduct a mixed methods study. While there are several typologies put forth in the mixed methods literature, we drew on the synthesis of Plano Clark and Ivankova (2016) when classifying the rationales provided in each study in our pool. These include complementarity or providing a more comprehensive understanding, development or building one strand from the other, triangulation or direct comparison of results, weakness minimization or using the strengths of one strand to compensate for the weaknesses in the other, and social justice.
In terms of MMR design, we extracted the three basic designs that have emerged from the literature: Convergent (Concurrent Quan + Qual) design, Explanatory (Sequential Quan → Qual) design, and Exploratory (Sequential Qual → Quan) design (Creswell & Plano Clark, 2018; Plano Clark & Ivankova, 2016). It is important to carefully consider the MMR design for a study because data collection, analysis, and reporting of results will be affected by this decision.
Researchers may also assign a priority to the quantitative strand, qualitative strand, or assign equal priority to both. Although priority has been debated in the mixed methods literature (Plano Clark & Ivankova, 2016), the relative weight or emphasis of a particular strand, including equal weight, helps communicate the researcher’s perspectives regarding the relative importance of each strand in the overall mixed methods design; therefore, we extracted priority information. When coding the priority of a study, we considered (a) explicit priority stated by researchers, (b) the overall intent of the study, (c) the research questions, (d) the level of detail presented for each strand, (e) the overall quality of the strand, and (f) the contributions of the results for each strand to the overall inferences generated by the study.
When integrating an HLM strand and a qualitative strand within a mixed methods study, researchers must carefully consider and justify their sampling decisions. Researchers must decide (a) what sampling strategies (e.g., random, purposeful) will be used to sample the quantitative and qualitative samples, (b) will the sampling strategies differ between the quantitative and qualitative samples, (c) whether the qualitative sample will be derived as a subset from the quantitative sample, and if so, what level(s) (e.g., student, teacher, school) will the qualitative sample be collected from, and (d) how will the quantitative and qualitative samples be evaluated as being representative of one another. For our review, we coded the relationship between the quantitative and qualitative samples, as well as the corresponding HLM level of the qualitative participants.
Finally, integration is the cornerstone of mixed methods studies, where researchers combine the quantitative and qualitative findings to create one cohesive picture of the phenomenon. Fetters et al. (2013) described four types of integration in mixed methods studies which we extracted: building (one strand informs the data collection of the other), connecting (sampling links the two strands together), merging (the two strands are mixed in analysis), and embedding (multiple points of data collection and analysis).
Using the coding protocol (see the appendix), we independently recorded information from the complete text of all 25 articles in MAXQDA 2018 (VERBI Software, 2017). In instances where the articles did not provide explicit information for the coding protocol, we inferred these elements based on supporting evidence within the article. After the initial coding, we independently reviewed each other’s coding, such that all codes were reviewed by all three researchers. Any discrepancies between the researchers were discussed until an agreement was reached. Initially, we struggled with interrater agreement when coding for basic MMR design. We only agreed on 13 (52%) designs inferred based on our independent coding. However, when we looked more closely, we saw that 10 (40%) of the studies were coded consistently as “explanatory” by one researcher but “convergent” by the other researcher. The former researcher was basing coding decisions on intent or purpose of integration, while the latter researcher was applying stricter methods procedure definition. These 10 studies were convergent in that data analysis of the quantitative strand did not build to data collection of the qualitative strand. However, the designs were also explanatory in that the qualitative data served to explain the quantitative results, often through explicit sequential data analysis strategies. These studies provide a clear example of Plano Clark et al.’s (2015) observation that the basic designs are “insufficient to describe the wide array of possibilities for how the quantitative and qualitative strands can relate to each other within these [longitudinal] designs” (p. 314). We decided to code these studies as convergent/explanatory hybrids, thus bringing our interrater agreement to 88%. We coded the remaining two studies as multiphase as they both had explanatory, exploratory, and convergent design features woven throughout their multiple phases. Hence, our final interrater agreement was 100%.
Data Synthesis
Next, the research team synthesized all extracted information for the three main blocks of information (i.e., article details, HLM, and mixed methods). In our systematic methodological review, the information captured through data extraction was synthesized in order to explore patterns, practices, and limitations among MMR + HLM studies. Because systematic reviews help summarize information on a specific issue within a given field (e.g., Gopalakrishnan & Ganeshkumar, 2013), we examined the prevalence of each code (quantitative) and the details or vividness of extracted information (qualitative). The quantitative data focused on counts of each study element, whereas the qualitative data focused on identifying the patterns in the information. The lead researcher prepared an initial report of the findings to the other two researchers who provided feedback regarding the general patterns and relevant trends to consider. We continued to discuss the patterns and interpretations until we agreed on the accuracy of the results.
Results
Article Details
There was a steady increase in the number of published MMR studies with hierarchical linear analyses (six studies published between 2005 and 2012; 19 studies published from 2013 to 2018), suggesting that HLM may be growing in utility in the mixed methods community. Thirteen studies (52%) were conducted by lead authors affiliated with institutions in the United States, three (12%) in the United Kingdom, three (12%) in Canada, two (8%) in Italy, one (4%) each in Australia, China, the Netherlands, and New Zealand. The disciplines of the primary authors were diverse including education (n = 13; 52%), social sciences (n = 7; 28%), health (n = 4; 16%), and other areas such as tourism (n = 1; 4%), and a nonprofit research organization (n = 1; 4%). There were studies that identified more than one discipline for the primary author; therefore, our sum is greater than 100%. Two studies provided no information regarding first author’s discipline. In addition, we examined the discipline or field of the journal that published the study, which included education (e.g., urban, distance, doctoral studies; n = 11; 44%), psychology (e.g., psychotherapy, developmental; n = 4; 16%), health sciences (n = 3; 12%), social research (n = 2; 8%), and other fields such as mixed methods or quantitative and qualitative journals, nursing, language, and travel research (n = 5; 20%). Given that HLM originated from sociology and has been widely popular in education and behavioral sciences, the high prevalence of MMR + HLM studies in education and psychology aligns with current and historical trends in research.
Hierarchical Linear Model Findings
Rationales
Authors of MMR + HLM studies in our review provided several reasons for choosing HLM for their quantitative analysis. The most common reason was to account for nested (clustered, hierarchical) data with interdependencies (n = 17; 68%). As mentioned in some studies (Chahine et al., 2016; Cho et al., 2013; Friend Wise et al., 2009; Misri et al., 2013), traditional analytic options cannot accommodate nested data, thus adding value to the use of HLM. Wong (2015) specifically noted that “conventional aggregation and disaggregation approaches would overestimate the parameters and undermine the true relationship” (p. 682) for individual and cultural variables that exist on different levels. Hence, HLM “is important to the study of social behaviors, as they often exist in multiple levels of analysis” (Wong, 2015, p. 682).
A common pattern found among the longitudinal studies was HLM’s ability to study trajectories of individuals over time (n = 6; 24%), such that when repeated measures are nested within individuals, HLM can capture longitudinal intra-individual differences. There were four (20%) studies interested in partitioning the variance into between and within sources using HLM. For example, Sammons et al. (2005) partitioned the variance in child outcomes (e.g., cognitive and social behavior) which could be attributed to differences between individual children (within-center variance) and between centers. Similarly, in Muijs’s (2015) study, HLM “divided the variance in outcomes between the different hierarchical levels” (p. 569), such that student characteristics explained around 18% of the total variance at the school level. It is understandably important for researchers to separate the systematic variance from error variance. See Table 1 for examples of the rationales for using HLM. Table 2 presents the rationale for each of our reviewed studies.
Rationales for Using Hierarchical Linear Modeling (HLM) in Mixed Methods Research (MMR).
Mixed Methods Research Designs, Hierarchical Linear Model Levels, and Rationales.
Specification of Hierarchical Levels
The majority of the studies in our review (n = 21; 84%) clearly specified the number of levels in their model. Although the remaining four studies were less detailed, the levels could be inferred based on the available information. A two-level model, such as students nested within schools (e.g., Croninger et al., 2012; Halvorsen et al., 2009; Harmey & Rodgers, 2017), was the most common model in our review of MMR + HLM studies, which is consistent with the broader research literature. Table 2 also indicates the number of levels for each of our reviewed studies.
Type of Model and Sample Size
The sample size in each study varied depending on the type of data (i.e., cross-sectional vs. longitudinal) and the number of levels in the model. The cross-sectional studies with two levels (n = 17; 68%) had level-two sample sizes ranging from six to 319 units with an average of 102.71 units at Level 2. The median number of level-two units, however, was 62 units, indicating that most two-level studies (n = 13; 52%) adhered to the general recommendation of at least 30 units for the highest level. For example, Friend Wise et al. (2009) analyzed 58 preservice teachers [Level 1] within six discussion groups [Level 2] using a simple hierarchical model that could estimate with a small level-two sample size. A cross-sectional study identified three levels of analysis with 606 students nested within 25 teachers from eight schools (Jesson et al., 2018). Their hierarchical analyses included simple models (i.e., unconditional and baseline models) and required fewer higher-level units as a result.
For the longitudinal studies (n = 6; 24%), the number of time points [Level 1] varied from two to six with a median of three time points. The sample size for higher levels in the model were similar to the cross-sectional studies, such that sample size ranged from 24 to 204 (with an average of 112) units. Among the longitudinal HLM studies, half of the studies were classified as two-level models (e.g., time nested within women) and the other half were classified as three-level models (e.g., time nested within students and schools). Consistent with the broader research literature, longitudinal HLM models were not as common among MMR + HLM studies as cross-sectional studies, which may reflect additional expense to collect longitudinal data and additional programming knowledge needed to run longitudinal HLM models.
An important detail to note is that six studies in our review used secondary, publicly available data (e.g., Early Childhood Longitudinal Study, U.S. Department of Education) for their quantitative data source, which includes MMR + HLM studies in our pool with sample sizes over 2,000 for Level 1 used secondary data (i.e., Chahine et al., 2016; Guerra et al., 2011; Halvorsen et al., 2009). For the corresponding qualitative strand, all but one of these studies used their own private data, meaning that the sample of qualitative participants was not drawn from the quantitative sample. While the use of secondary data can minimize financial costs involved in the collection of large sample sizes needed in HLMs, MMRs need to address the sample integration legitimation (Onwuegbuzie & Johnson, 2006) and account for any potential impact on research findings when quantitative and qualitative samples differ. Wao and Onwuegbuzie (2011) specifically address this issue in their study of doctoral students’ time to degree, noting that Because participants in the qualitative phase comprised a subset of the quantitative sample, the assessment of inferences was less problematic; that is, the sample integration threat (i.e., the extent to which the relationship between the quantitative and qualitative sampling designs yields quality meta-inferences) was minimized (Onwuegbuzie & Johnson, 2006). (p. 122)
Mixed Method Research Findings
Rationales
Based on Plano Clark and Ivankova’s (2016) summary of five typical rationales for conducting MMR, the most common rationale (n = 18; 72%) was to obtain a more complete understanding, or complementarity. Croninger et al. (2012) cited complementarity as their rationale for using MMR; they designed quantitative and qualitative strands to “complement each other—that is, to measure ‘overlapping but distinct facets’ of teaching in classes and schools that participated in the study” (p. 8). Triangulation, or directly comparing results, was used in four (16%) studies. Louick et al. (2016) described the aim of their study, which focused on struggling readers, was to “triangulate standardized, longitudinal reading performance, reading motivation survey data, and semi-structured motivation interviews . . . ” (p. 260). Minimizing weakness of one methodology with the strengths of the other was used in three (12%) studies. Sammons et al. (2005) stated that their research design was influenced by “both pragmatic and philosophical arguments that suggest mixed methods can offer complementary strengths and minimize weaknesses associated with reliance on only one paradigm” (p. 221). Using one strand to develop the other was used in two (8%) studies. Wong (2015) used a development rationale in designing his instrument about cultural heritage transmission through documentary film-making in Hawaii based on in-depth interviews with stakeholders. None of the studies in our review used a social justice rationale. There were four (16%) studies that provided more than one reason for selecting a mixed methods approach; therefore, our total does not sum to 100%.
Design Type
Using an extended version of the basic designs proposed by Plano Clark and Ivankova (2016), 10 (40%) studies employed a convergent/explanatory hybrid design, 9 (36%) employed a convergent design, 3 (12%) employed an explanatory design, 2 (8%) employed multiphase designs, and 1 (4%) employed an exploratory design.
Convergent/explanatory hybrid examples
As mentioned previously, convergent/explanatory hybrid studies were identified by the independence of quantitative and qualitative data collection, but with an express intent to use qualitative findings to explain quantitative results in some way. Guerra et al. (2011) specifically allocated qualitative data to “further collaborate, contextualize, and expand findings from the surveys” (p. 298) to better understand student perceptions about childhood bullying. Although HLM results identified normative beliefs as the strongest predictor for increased bullying in schools, qualitative focus group data revealed “how these characteristics contribute to bullying and victimization can vary for different youth and in different developmental contexts” (Guerra et al., 2011, p. 306). In Croninger et al.’s (2012) study, HLM results indicated that students within majority poverty classes (Level 2) had lower levels of achievement. The qualitative follow-up provided support that students in majority poverty classes were more reliant on teachers “to mediate the curriculum and provide multiple representations of mathematics” (p. 24). In their microgenetic study of early wiring behaviors, Harmey and Rodgers (2017) conducted quantitative analyses first to determine the amount of growth each student obtained in their intervention and then reanalyzed their video data qualitatively for details that might explain why each student obtained the amount of growth that they did. Hence, quantitative results provided important contextual information for their qualitative analysis.
Convergent examples
When employing a convergent design, quantitative and qualitative strands are independent from one another but provide a more complete picture when integrated. Halvorsen et al. (2009) combined results from complementary quantitative and qualitative strands “with the express purpose of minimizing biases and weaknesses inherent in each design” (p. 188). Using a large, national data set, the school [Level 2] contextual characteristics (e.g., teacher control, supportive principals, etc.) affected teacher responsibility toward student learning in a positive way. The qualitative analysis of teacher interviews, which were collected independently, corroborated and provided a richer description of this HLM finding. Similarly, Levy et al. (2016) found that students’ political interest increased over the 2012 election season as a function of living within certain communities [Level 2], where community characteristics (e.g., collective responses to political issues and candidates) had a moderate effect on student political interest. Analysis of qualitative interviews conducted throughout the election season identified that increased public attention to politics within each community was one related cause (Levy et al., 2016).
Explanatory examples
An explanatory design begins with the quantitative strand then supplements with a follow-up qualitative strand. Hauserman et al. (2013) began their explanatory study with a survey for teachers to identify transformational leadership behaviors of their principals. Using HLM, the authors identified teachers whose principals had the highest and lowest leadership behaviors for follow-up interviews. While the quantitative strand did not yield any significant results, the qualitative findings identified specific transformational leadership behaviors that were most important to teachers. Sammons et al. (2005) also used the results from their quantitative measure of preschool program quality to construct a sampling frame from which to draw a stratified random sample of sites for follow-up qualitative case studies. The case studies provided in-depth profiles that could then be compared.
Exploratory examples
An exploratory design begins with an initial qualitative strand then uses a follow-up quantitative strand to test or generalize the findings. Wong’s (2015) exploratory study is an example of using HLM for scale development that considers the impact of cultural-level variables [Level 2] on individual’s interest and purchase intentions [Level 1]. This study included a qualitative strand to “explore the complexity and challenges of cultural heritage transmission” (p. 673) and a quantitative follow-up study to “develop scale items using inputs from the first study” (p. 676). Although less common in the literature, Wong (2015) demonstrated the applicability of HLM in an exploratory design to corroborate the qualitative findings in an objective, statistical manner.
Multiphase examples
Although no longer considered one of the basic MMR designs, multiphase designs are characterized by use of both concurrent and sequential phases within a single study or across a program of research (Creswell & Plano Clark, 2011). Jesson et al. (2018) used a multiphase approach in their study of teaching writing in low-income schools. The study began with quantitative analysis of student achievement data, which the researchers used to purposively select teachers for a qualitative case study to investigate patterns of instruction. The case study findings were used to generate hypotheses about effective teaching practices. The researchers then conducted HLM to test their theory of “how key practices of teaching writing Level 2 impacted students’ achievement Level 1” (Jesson et al., 2018, p. 20). Brown et al. (2013) conducted exploratory interviews with homeless men regarding gender roles and sexual behaviors. The findings were used to develop a quantitative structured interview which informed a second round of analysis of the qualitative data to explore potential new themes. Researchers concluded the study by conducting HLM of the structured interview data informed by the revised qualitative findings. These sophisticated studies with multiple points of integration illustrate the potential for MMR + HLM designs to contribute to high-quality inferences.
Priority
Given the advanced nature of hierarchical linear models as a quantitative approach, we hypothesized that most HLM studies would have a quantitative priority. Only five (20%) studies had a quantitative priority, with predominant interest in the HLM analysis and interpretation. Cho et al. (2013) included the qualitative strand to explain the findings from the HLM analysis but emphasized the quantitative strand in their discussion section. Misri et al. (2013) geared much of their results and discussion toward HLM analysis, with only one paragraph to discuss qualitative findings.
Eighteen (72%) MMR + HLM studies were classified as equal priority. Although these studies included HLM in the quantitative analysis, the qualitative strand was not neglected or ignored in favor of sophisticated quantitative methods. Lavelli and Fogel (2013) incorporated qualitative and quantitative analyses in a complementary manner that “helped us to see how each new developmental period in a dyad’s communication depends upon the specifics of what occurred earlier . . . ” (p. 2267). Lumino et al.’s (2017) study included “the joint use of network measures, statistical models, and narratives [that] allows taking into account the quantitative and qualitative dimensions of network relationships of low-income single mothers” (p. 784). Generally speaking, the equal priority studies dedicated approximately equal attention to both quantitative and qualitative strands by describing analysis and results of both strands as well as emphasizing each strands contribution in the discussion.
The final two MMR + HLM studies (8%) had a qualitative priority, such that the qualitative analysis, findings, and discussion dominated the focus of the article. Sischo et al. (2015) self-identified that the “qualitative data were given priority over the quantitative data” (p. 479), emphasizing the importance of the qualitative insights that would have been unobtainable with only quantitative methods. Wao and Onwuegbuzie (2011) also self-identified that the qualitative phase “was given more weight with respect to addressing the research question” (p. 118) with HLM complementing the qualitative results.
Sampling
The studies in our review described a variety of sampling approaches. A clear majority (n = 19; 76%) used the same samples for the quantitative and qualitative samples. Of these, three (12%) studies included additional participants in the qualitative strand from the quantitative strand. Only one (4%) study used a subset of the quantitative participants for the qualitative strand. There were five (20%) studies that used different quantitative and qualitative samples.
In terms of the corresponding HLM levels, only six (24%) studies included participants from more than one level in their qualitative strand. Most studies (n = 18; 72%) sampled qualitative participants from Level 2 of the HLM. There were nine (36%) studies that sampled qualitative participants from Level 1 participants. Only one (4%) study sampled qualitative participants from Level 3 participants. Interestingly, four (16%) of the studies included participants in their qualitative stand that were not included in their quantitative strand.
Integration
Six (24%) studies in our review used a building approach to integration where one strand grows from the foundation of the other. Wong (2015) used this approach to build a quantitative survey from in-depth qualitative interviews. Other studies used a building approach when conducting sequential analysis. Cho et al. (2013) developed their qualitative analysis to focus on specific questions that were raised by their quantitative analysis of TOEFL iBT™ integrated writing tasks. The connecting approach is used in sequential designs when the results of one strand guide the participant selection process of another strand. There were three (12%) studies in our review that used a connecting integration approach. Lavelli and Fogel (2013) connected their strands by using their quantitative findings on mother–infant face-to-face communication to purposively select videos to reanalyze qualitatively. In our review, 18 (72%) studies integrated their strands through merging them after data collection either during data analysis or discussion. Most studies that used a merging approach did their integration during discussion. However, Desborough et al. (2016) transformed their findings from their grounded theory strand into variables which were used in their quantitative analysis of nursing care on patient satisfaction. Three (12%) studies in our review used an embedding approach to integration by integrating their studies at multiple points. Jesson et al. (2018) used quantitative student achievement data to select teachers for a qualitative case study, which then informed hypotheses that were tested with a wider sample of teachers to identify effective practices for teaching writing and student achievement data. Additional elements of integration include how integration is conveyed in published articles. All 25 (100%) studies in our review limited their presentation of integration to their narratives; none of the studies provides a joint visual display of their integration.
Table 3 provides the MMR design, rationales, priority, and integration for each of our reviewed studies.
Mixed Methods Designs, Rationales, Priority, and Integration.
Rationale based on Plano Clark and Ivankova (2016).
Discussion
MMR + HLM studies are uniquely situated to contribute high-quality, valid findings that can accurately inform policy and practice. MMR can contribute to a study’s validity by minimizing inherent biases of one method with the strength of the other method (e.g., Halvorsen et al., 2009), as well as by providing corroborating evidence of the results of one strand with the findings of the other (e.g., Hommes et al., 2014). Likewise, HLM strengthens a study’s validity by correctly accounting for hierarchically nested structures in the data, such as children nested within teachers or teachers nested within schools. This is especially important in education and social science settings, where the presence of hierarchically structured data dominates empirical research (Raudenbush & Bryk, 2002). When data are hierarchical, contextual effects from higher units must be accounted for to avoid inaccurate estimation and to strengthen valid inferences. Correctly accounting for nested data structures in a statistical model, researchers ensure more accurate findings and conclusions. Using hierarchical linear models within a mixed methods study not only strengthens the quantitative strand by accounting for nested data structures and partitioning the variance into systematic and error sources but also provides the additional support of the qualitative strand which can provide context, details, and explanation to the study’s findings. This approach avoids compounding measurement errors that could contribute to uninterpretable inferences in convergent designs, minimizing the ability of a qualitative follow-up in explaining quantitative results in an explanatory design, or incorrectly validating a theoretical model or framework in an exploratory design.
Contributions to Mixed Methods Literature: Recommendations for MMR + HLM Studies
Our study contributes to mixed methods literature by extending Headley and Plano Clark’s (2020) refined definition of multilevel MMR designs. Our systematic methodological review of the use of HLM, a particular quantitative multilevel analysis, in published empirical MMR studies both (a) describes the current use of HLM in mixed methods studies and (b) offers the following recommendations for future MMR + HLM studies: consider more exploratory MMR + HLM studies; allow research problems and questions to determine priority; plan for appropriate sample sizes; address sample integrations legitimation when using different samples; and obtain necessary training to conduct advanced quantitative analyses.
Consider More Exploratory MMR + HLM designs
Of the 25 MMR + HLM studies in our review, only one (4%) employed an exploratory design where the quantitative HLM strand supplemented the initial qualitative strand. Given the low frequency of exploratory HLM studies, it may be reasonable to speculate that hierarchical linear models are not as sought after for theory and/or instrument development. While the limited number of exploratory designs may support the notion that the quantitative strand holds less flexibility with expanding or testing qualitative findings, it also provides a clear opportunity for other researchers to develop exploratory HLM studies that would illustrate the potential of this approach.
Allow Research Problem and Questions to Determine Priority
Another contribution of this review to mixed methods literature is the finding that MMR + HLM studies do not necessarily have a quantitative priority. The concept of priority was of particular interest in our study because there is little information on priority when advanced quantitative methods are implemented. Because HLM is considered a complex quantitative method (Ross & Onwuegbuzie, 2014), it seemed reasonable to assume that mixed methods studies with HLM would have a quantitative priority. The results of our review, however, do not support this assumption. Of the 25 MMR + HLM studies included in this review, only five (20%) had a quantitative priority; the clear majority (n = 18; 72%) had an equal priority, leaving two (8%) studies with qualitative priority.
The nonquantitative priority of most MMR + HLM studies in our review supports that MMRs can use advanced quantitative methods (e.g., HLMs, longitudinal models, structural equation models) without losing sight of the qualitative strand when desired. Many studies in our review illustrated the effective use of HLMs while emphasizing the qualitative strand (Sischo et al., 2015; Wao & Onwuegbuzie, 2011) or emphasizing the importance of both strands (e.g., Croninger et al., 2012; Levy et al., 2016). The results suggest that the presence of hierarchical linear models does not require a quantitative priority.
Plan for Appropriate Sample Sizes
Although not solely unique to HLM, MMRs interested in using HLM should pay considerable attention to the sample size requirements. For proper estimation of hierarchical linear models, it is important to specify the number of levels in the model, the sample size at the highest level, and the complexity of the model. Sample size requirements are usually based on the highest level in the model (Maas & Hox, 2005), with general methodological guidelines recommending at least 30 units at the highest level (Hox, 1998). For example, when interested in the effects of school type (private, public, charter) on student mathematics achievement, MMRs should plan on collecting data from at least 30 schools. More complex models (e.g., longitudinal, three-level, etc.) may require additional participants for the model to produce valid results and inferences. One way to achieve adequate samples sizes in HLM analyses is to use secondary data sets, such as the Program for International Student Assessment, Add Health, or Trends in International Mathematics and Science Study. MMRs using secondary data need to carefully plan how they will address issues of sample integration legitimation (Onwuegbuzie & Johnson, 2006) if their qualitative participants are from a different, even if related or similarly derived, pool.
MMRs interested in using HLM also need to consider at which level(s) the qualitative sample will be derived from. Because multiple levels of analysis are used in HLM, as the number of HLM levels increase, so does the opportunity for qualitative data collection, and therefore qualitative sample sizes may be larger than usual. For example, will the qualitative sample focus only at the student level, where student perspectives regarding the phenomenon can be captured? Will the qualitative sample focus only at the school level, where any school contexts can be further explored? Qualitative data collection at only one level would most likely not increase sample size requirements. Alternatively, the qualitative sample could include both the student and school levels, where student perspectives and school context can be examined, requiring data collection at two levels and an overall larger qualitative sample size. Although many of these decisions will be guided by the research questions, the presence of HLM in the quantitative strand requires MMRs to consider the relative benefits and costs of collecting qualitative data at each level or focusing efforts on one level; these important sampling decisions will impact how integration can be achieved and the quality of the overall inferences.
Address Sample Integration Legitimation When Using Different Samples
Having access to the same sample in both strands may increase reliability of the research findings but could prove problematic when collecting a large quantitative sample needed for multiple levels. When researchers choose to use a subset of participants from the quantitative strand for the qualitative strand, or a different sample from the same population, the opportunity to generalize both the quantitative and qualitative findings is compromised (Onwuegbuzie & Johnson, 2006). The extent to which the qualitative sample represents the quantitative sample has a direct impact on the validity of integrating the results. The threat to legitimation posed by sample integration issues is further compounded in HLM studies if all levels of participants are not included in the qualitative strand, or when participants not included in the HLM are included in the qualitative strand. Hence, researchers need to carefully consider their sampling choices in MMR + HLM studies.
Obtain Necessary Training to Conduct Advanced Quantitative Analyses
We offer one final contribution to the mixed methods literature by suggesting guidelines and resources for training mixed methods scholars in how to conduct rigorous HLM analyses. Because HLM is an advanced quantitative method, researchers require training in designing, analyzing, and interpreting the models. Previous research has suggested that an intermediate-level proficiency in statistics is recommended for readers to adequately comprehend articles with this level of analyses (Wells et al., 2016), further implying that the same level of proficiency would also be required for researchers who desire to work with such advanced models. In order to interpret and report hierarchical model results adequately, researchers need an understanding of data structures, available centering methods, models of fixed versus random effects, error terms, and model fit indices (Nezlek, 2012).
In recognition that many doctoral programs do not provide this level of methodological coursework, tutorials and online courses are available that provide additional training and examples of best practices (Aguinis et al., 2013; Centre for Multilevel Modelling, 2017). There are also several statistical packages available for a wide range of skill levels that are capable of estimating these models including SPSS, HLM, SAS, Mplus, Stata, and R programming. Software reviews and step-by-step tutorials are available for many of these packages (Centre for Multilevel Modelling, 2017; McCoach et al., 2018).
Please see Table 4 for a checklist of guidelines or considerations when using HLM in mixed methods studies.
Checklist of Main Considerations When Using Hierarchical Linear Modeling (HLM) in Mixed Methods Research (MMR) Studies.
Limitations and Future Research
The majority of studies in our systematic review did not explicitly identify the mixed method design or priority, leading us to infer the information. The overall weight and emphasis of each strand as well as general conclusions obtained from the research were a few factors considered when we assigned a design and priority. We encourage future MMR + HLM studies to use the terminology or design notation (e.g., QUAL quan) in studies mixing quantitative and qualitative data, to clearly note the intended design and priority which would limit the amount of inferring made in systematic reviews such as this. Generalization of our findings is limited due to the small sample size of studies reviewed (n = 25) and the lack of statistical analyses of our results. The steady increase in MMR + HLM may diminish this limitation for future research.
Because the purpose of our systematic review was considered an exploratory method for gathering preliminary information regarding MMR + HLM studies, future research may be interested in exploring the methodological and statistical issues associated with implementing HLMs, such as failing to control for covariates (Hommes et al., 2014) or correlated independent variables (Cho et al., 2013). Additional issues including participant recruitment, reliability and validity of data, and the degree of qualitative and quantitative integration are also important elements for future research. Future research could benefit from considering the effect of the MMR design on the quantitative and qualitative findings, respectively. In addition, although our focus examined hierarchical linear models as the quantitative strand in the overall mixed methods design, researchers and practitioners may be interested in multilevel mixed methods designs that collects quantitative and qualitative data from multiple levels using multilevel theory (Headley & Plano Clark, 2020; Teddlie & Tashakkori, 2009). The multilevel mixed methods design, rather than using HLM as the quantitative analysis, may be better suited to answer MMR questions.
In summary, the results of our systematic review highlight the capabilities of hierarchical linear models beyond accounting for nested data, not only improving estimation of effects but identifying systematic versus error variance and answering more complex research questions. It is also important to emphasize that the inclusion of advanced quantitative methods can be associated with an equal or qualitative priority for the research study, further increasing hierarchical linear model’s applicability in the field of MMR.
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
