Abstract
This article discusses the basic assumptions and practices for meta-analysis as well as describing options and innovations for implementing this tool. Meta-analysis represents a family of techniques with different assumptions and procedures. I discuss some of the ongoing debates and limitations of the methods that continue to receive attention. In an era of “evidence-based” applications and educational material, meta-analysis (in all the forms) represents the best way to reduce Type II error and identify Type I error. Use of the technique provides facilitates the formulation and evaluation theoretical arguments as well as identifying the means to optimally generate future research efforts. The process of meta-analysis takes the scientist back to the future by reminding the community of the original premises that guided the formation of the statistical process.
Keywords
The introduction of meta-analysis provides a fundamental change in the thinking and procedures of the social sciences. Prior to the introduction of meta-analysis, a number of scholars challenged even the possibility of a scientific approach to the study of communication (e.g., O’Keefe, 1975). The alternative proposed by scholars involved some version of situational, conditional, or interpretive approaches that assumed that the generation of generalizable statements remained impossible. One example is the edited book by Cushman and McPhee (1980) dealing with the supposed inconsistency of research connecting attitudes to behaviors. The book has a series of chapters each of which proposes a theoretical or research “solution” to eliminate the inconsistency. The shift starting in the 1980s to meta-analysis created a means of evaluating the underlying factual assumption of research finding inconsistency. The answer to the Cushman and McPhee book is found in the review by Hale, Householder, and Greene (2002) of more than a dozen meta-analyses demonstrating a consistent finding of a high correspondence between attitudes and behaviors. Essentially, the “solutions” offered become unnecessary because meta-analyses demonstrates that the inconsistency never really existed.
As a technique, the basis of meta-analysis is more than 100 years old (Pearson, 1904). However, as an accepted and established procedure, the method is probably less than 30 years old (Chalmers, Hedges, & Cooper, 2002). Much of the literature and examples provided in this article existed prior to 1940. But like some fantasy novel, the “magic” of the methods used to derive results became forgotten in this early period. Essentially, the development of computers led to a focus on measurement and factor analysis which diverted or changed the trajectory of the social sciences. For example, the book, The Measurement of Meaning (Osgood, Suci, & Tannenbaum, 1957), illustrates the focus on factor analysis and the belief that accurate measurement can resolve underlying issues of inconsistency among findings. Meta-analysis represents to a large degree the restoration of theoretical argument as opposed to the development of measurement methods as a solution to inconsistency in research findings.
The procedures and alternatives for the conduct of a meta-analysis remain fluid and evolving as various communities establish normative rules for what constitutes an acceptable set of standards for the conduct and reporting of results. The process continues as dynamic and emergent with a constant generation of alternative and newer procedures and suggestions for various applications to the problems of social science data analysis (Allen & Dilbeck, 2018). This article considers the basis for the procedures, outlines the process of the technique, and then describes the limitations of the process. The focus is on the variety of procedures, innovations, and options emerging for those practicing the technique.
Meta-analysis provides for many possible applications and involves a host of potential options for the conduct of the procedure. The most fundamental assumption of all the applications and procedural options requires a process that (a) reduces the level of Type II error (False Negatives), (b) permits the identification of Type I errors (False Positives), and (c) considers the impact and contribution of moderator variables that change the average or expected set of relations between/among variables.
Problem of Social Sciences: Type II Error
The challenge of identifying and handling Type II error (False Negative) still plagues the social sciences. Using the significance tests to evaluate the existence of a potential relationship generates the following set of outcomes (see Figure 1 for a display): (a) reality and the significance test results for the investigation concur that a relationship exists indicating agreement, (b) in reality a relationship exists but the significance test indicates no significant relationship, (c) the relationship in reality does not exist and the significance test reports nonsignificance which indicates agreement, and (d) in reality no relationship exists but the outcome of the significance test indicates a relationship.

Description of research outcomes.
Clearly, the outcomes for (a) and (c) indicate concurrence between empirical reality and the significance test used in the investigation to evaluate the existence of the relationship. Under these conditions the result of the investigation accurately indicates an understanding of the empirical reality. However, the other two outcomes generate inconsistency (b) and (d), labeled as a different form or type of error. Type I error (indicated in (d)) represents the false positive, the significance test in the report says a significant relationship exists when in reality no such relationship exists. Type II error indicates the false negative, the significance test reports no significant relationship when a relationship actually does exist.
The level of Type I, sometimes referred to as “alpha” error becomes set by the investigator. Most social science investigators select the “p < .05” value for significance. The value represents the phrase, “the probability of a chance significance finding is 5%.” This statement claims that if in fact no relationship exists, randomly 5% of significance tests will conclude that a significant relationship does exist. No amount of methodological or statistical sophistication becomes available to identify the errors because the errors are random.
The level of Type II error represents a combination of three factors: (a) one minus the level of the Type I error (essentially the levels of Type I and Type II error are reciprocal), (b) the size of the effect, and (c) the size of the sample. Selecting a 5% level of Type I error provides for a maximum value of Type II error of 95% (1.00, 0.05). This maximum value becomes reduced as the effect size and sample size increase. Larger effects are easier to detect and distinguish from small effects (the classic approach to understanding the “power” of a statistic; see Cohen, 1987).
The net impact becomes a Type I error of about 5% but an average Type II error of about 50% in most of the social sciences (Schmidt & Hunter, 2015). Type I error can be reduced by lowering the p value (using .01 or .001). However, since Type I error is inversely related to Type II error, a reduction in Type I error increases Type II error. Type II error reduces by (a) increasing the Type I error level (something viewed as undesirable) and (b) increasing the size of the sample. While meta-analysis permits a number of possibilities (e.g., assessment of both moderating and mediating processes; see Hayes & Rockwood, this issue), the primary justification and heart of the meta-analysis procedure involves the combining of sample sizes by averaging effects of different samples to produce a statistical estimate of effect/association that carries the implications of the combined sample size. The net result becomes a reduction in the level of Type II error. Meta-analysis, by combining samples reduces the confidence interval around any statistical estimation increasing the accuracy of any statistical parameter.
Variations in Meta-Analysis Procedures
Literature Search
The literature search in a meta-analysis involves a unique approach to finding and obtaining materials. Unlike a typical literature review using a narrative or integrative approach, a meta-analysis requires a very well-defined, public, and systematic set of parameters for inclusion of materials (Cooper, 1989; Cooper & Hedges, 1994). In addition, the search parameters and methods require a high degree of precision and articulation since the process of the literature search constitutes a part of the procedure of the technique, equivalent to a sampling procedure in most empirical investigations.
A variety of techniques become available when defining and articulating the literature that becomes included in the meta-analysis. For example, a meta-analysis may establish parameters that define the type of literature searched. The most critical element of defining the literature search becomes the explicitness of the method used to generate material included in the investigation. There exists no “gold” standard to employ for the search since the possibility of including all possible relevant manuscripts probably remains an illusion. Instead, the goal requires an articulation of how the search was conducted and the rules for inclusion/exclusion of relevant texts.
Disagreement with a literature search or the inclusion/exclusion rules may indicate less about the competence of the meta-analysis and more about the available resources or the theoretical and/or methodological standards adopted by the particular investigators. One of the main issues is the degree to which various meta-analyses encompassing the same literature, using different and divergent criteria generate consistency in observed outcomes. Employing different standards or techniques for literature inclusion does not necessarily lead to divergent outcomes. Meta-analyses using different sets of criteria may still generate very similar conclusions (one example of a topic with multiple meta-analyses producing very similar results, see the sequential strategies meta-analyses on door-in-the-face and foot-in-the-door; Dillard, Hunter, & Burgoon, 1984; Feeley, Anker, & Aloe, 2012; Fern, Monroe, & Avila, 1986; O’Keefe & Hale, 1998, 2001; Pascual & Guéguen, 2005). A recent issue has emerged among scholars that early meta-analyses may be out of date and require updating as new data sets become available. A meta-analysis conducted in 1985 may spur a new generation of studies that requires a new data collection and analysis that compares the new effect with the older summary (see, e.g., message sidedness; Allen, 1991, 1998; Eisend, 2006; O’Keefe, 1999).
A main challenge in meta-analysis is determining how to set up the boundaries of a literature search. These considerations include the following: (a) diversity and size of the underlying literature, (b) published versus unpublished manuscripts, (c) access in terms of language, and (d) restriction to particular stimuli or measures. These strategies are not exhaustive or inclusive; ingenuity and practicality are required in creating and defining the literature under investigation.
Innovative Techniques to Break up Large Bodies of Literature
The need to break up what may be a massive and diverse literature becomes evident in some large areas of investigations that go back a century or longer (see, e.g., the more than 20,000 manuscripts generated by the search term “social support”). Something as simple as searching for manuscripts that explore the impact of gain or loss message strategies in a persuasive message can generate hundreds of studies. One way to limit the literature search is to focus on studies exploring a specific context, such as health care (diet, surgery, colonoscopy, etc.). The ongoing set of meta-analyses by O’Keefe and Jensen (2006, 2008, 2009, 2011), O’Keefe and Nan (2012), and O’Keefe and Wu (2012) are good examples. One limiting strategy involves conducting a separate meta-analysis on each message content area and then building a picture that, if consistent, creates an inductive set of claims for the issue. Essentially, the logic holds that if the same outcomes occur across a set of meta-analyses then the expectation exists that a similar outcome will be observed when applied to the next new example. The reason for breaking up the literature becomes pragmatic since each separate community possesses a unique set of journals and each audience benefits from the analysis targeted at the particular content. The researcher may not have the luxury of accessing resources for a comprehensive analysis within a reasonable time frame. Breaking up the literature into manageable and digestible units creates the ability to address the practical issues that are important to a specific community while building toward a more general, unified theoretical model.
The question of whether a manuscript exists as a refereed journal publication or in some other form usually represents a marker variable dealing with the question of peer review and the quality of the investigation. Often, scientists assume that journal articles, when peer reviewed, constitute a means of ensuring quality in the methods and reporting. Arguments exist about whether the potential “bias” of the publication process may distort the research findings. Sometimes journals want to publish “significant” findings (Begg, 1994; Levine, Asada, & Carpenter, 2009). There are a number of comparisons of published and unpublished work (Allen, Hunter, & Donohue, 1989; Levine et al., 2009) and the differences are generally small. The reason may reflect the assumption about a set of manuscripts where only the best are published and the rest remain unpublished (Rosenthal, 1979). The motivation and opportunity for publication may not be equal across areas of research. For example, dissertations in areas where publication in a journal is expected and related to employment becomes more likely than in disciplines where the dissertation represents an accomplishment related to professional advancement independent of publication (e.g., Clinical Psychology).
One of the recent issues involves the increase in non-English publication venues. Investigators in the United States often restrict manuscripts to those available in English (when manuscripts may exist in many languages). The choice of which languages to include may or may not play a role in the report, depending on the availability of data in various languages. However, the more significant problem is being sensitive to cultural issues that may play a role in manuscript selection. One strategy is to engage scholars from multiple locations and/or cultural orientations with each conducting a search in the relevant language (Spanish, French German, Chinese, etc.) or cultural orientation (e.g., religion or race) and then treating each of these issues as a source of moderation. The issue of comparing manuscripts generated using multiple languages has received some consideration (see Bradford, Allen, Casey, & Emmers-Sommer, 2002) but remains an underdeveloped aspect of meta-analysis. The potential influence of culture may act as a hidden moderator that influences generalization.
Another strategy for selecting literature for inclusion involves defining a specific methodology or measure as a basis for inclusion. When a scale dominates a literature (like the Beck Depression Inventory in clinical psychology), the restriction on the basis of measurement may prove useful. In some cases, the restriction plays an important role in making meaningful comparisons (see Allen, Emmers-Sommer, et al., 2007, for a comparison of physiological and psychological responses to pornography). The author must find a means for rationally organizing what may exist as a vast literature base.
Conversion/Correction of Statistical Data
A meta-analysis always encounters the challenge of taking statistical information from a study that uses one metric and converting it to a common metric. The conversion process generates a means of direct comparison and eventual averaging across investigations. The assumption is that most statistical representations of effects, associations, or differences represent arbitrary choices on the part of the investigator. Essentially, a scholar can represent the information using almost any statistical method, so the choice of the particular representation is arbitrary (Grissom & Kim, 2003; Schmidt & Hunter, 2015).
This statistical thinking runs counter to the usual assumptions, promulgated by Stevens (1946) that the level of measurement represents a relation given the type of mathematical computations permissible. The application of meta-analysis violates this strict set of principles, which have been challenged frequently by Allen, Titsworth, and Hunt (2009). The underlying logic of meta-analysis is the assumption that distributions ultimately derive from the same underlying statistical premises and that different manifestations become mathematically translatable from form to form (see, e.g., Fleiss, 1994, for measuring effect sizes from categorical data).
The impact of various artifacts derived from selection biases and scaling issues, when present, systematically reduce the size of the observed relationship. The effects of these reductions, when calculated, provide a means for restoring or eliminating the source of the error in the analysis. Generally, the observed relationship (e.g., a correlation) is smaller than the actual true correlation between the conceptual elements; only regression to the mean systematically increases the size of the effect. What this means is that the observed effect may represent a serious underestimate of the actual effect, particularly when the impact of the artifact is large. Correction for the artifacts provides a more accurate estimate of the actual size of the underlying effect (Ghiselli, Campbell, & Zedeck, 1981; Hunter & Schmidt, 1994; Schmidt & Hunter, 2015).
Since the impact of the artifacts lacks uniformity across a set of investigations, the averaging of any estimates in a meta-analysis involves comparing apples with oranges or kiwis to bananas. Some estimates generate a small downward bias, while in other studies the impact becomes substantial. Averaging across the studies presents a systematic underestimation of effects because any tests for variability among the estimates fails to consider the impact or existence of this source of variability. The best solution is to simply reduce or eliminate the systematic influence of the artifacts on the estimation of the effect.
Trend to Psychometric Forms of Meta-analysis
The correction, prior to averaging, for various artifacts is termed “psychometric” meta-analysis (Ones, Viswesvaran, & Schmidt, 2017). This term indicates that various measurement and design artifacts have been corrected in the estimation of the average effect size. The impact of the various artifacts (e.g., attenuated measurement) has a well-defined mathematical relationship with a known impact and defined formula for correction (Mendoza & Mumford, 1987). The investigator should mindfully be aware of the impact of the bias created by various artifacts and the impact/corrections for those artifacts to make a choice about how to best handle them. The formulas are in fact very old and date back to the 1920s and 1930s but only recently have they been applied more consistently in primary studies and in meta-analyses.
Averaging of Effects
The goal of a meta-analysis becomes the averaging of effects across the various investigations to estimate the central or expected value of the relationship under consideration (Schmidt & Hunter, 2015). The central tendency or average represents a process of taking the separate estimates provided by each study and then averaging the relationships to create one overall estimate. The procedure typically involves a process employing some form of sample weighting that uses the contribution from the variance of the distribution (Hedges & Olkin, 1985).
The usual choice is between how to weight the data when averaging the effects. The two most popular choices involve weighting by sample size or weighting by contribution to the variance. Each procedure has advantages and disadvantages with regard to elements of the process (Allen, 2009; Anker, Reinhart, & Feeley, 2010; Cooper & Hedges, 1994; Hedges & Olkin, 1985; Johnson, Mullen, & Salas, 1995; Radenbush, 1994; Schulze, 2004). However, the average effect, regardless of the procedure used, should generate approximately the same statistical association. While important differences exist with regard to the estimation of confidence intervals and evaluating the impact of sources of moderation, the estimation of the average effect should not generate large differences.
Recent Moves Toward Bootstrapping Approaches of Meta-Analysis
One possible variation on either method involves the use of bootstrapping to estimate an average effect across a set of studies. Bootstrapping involves taking a series of random samples and then averaging each sample and eventually averaging all the random samples taken (often something like 5,000 such samples may be taken). The impact of bootstrapping involves the assumption that taking the average of the various distributions produces an average effect with less bias than relying on the entire set of data (Bollen & Stine, 1989; Cooper & Hedges, 1994).
The downside of bootstrapping involves the issue of whether sufficient data points exist to permit the generation of enough different combinations of studies to warrant the use of the technique. The impact of various correlations between sample size and the size of the estimate may make the analysis one that is unlikely to fully eliminate the fear of bias that motivated the original use of the technique. Recently, Kaufmann and Wittmann (2016) proposed that bootstrapping works best when combined with statistical artifact correction.
Assessment of Variability in Observed Effects
The assessment of observed variability in the set of effects plays an important role in understanding how to interpret the underlying association or effect under consideration. The need for meta-analysis exists because of the assumption that if all measurement and methodological choices were perfect then the data distributions of the relevant studies would be affected only by random sampling error (Mullen, 1989). Since all measurements and methodological choices are not perfect, it is necessary to average across observed effects. The primary consideration in this averaging process involves understanding the underlying distributional properties. For example, if a normal distribution of estimates is assumed then any difference among the outcomes of individual investigations reflects sampling error and no other source of variability (Rosenthal, 1984). The amount of variability in the observed effects becomes larger compared with the level of variability expected due to random sampling error (Schmidt & Hunter, 2015). The comparison is the basis for establishing evidence for the existence of potential sources of moderation in the observed statistical relationship.
Examining the distribution for outliers is another important consideration when evaluating variability. Outliers may occur randomly or represent some systematic source of influence. Winsorizing or trimming of the data can prove useful, although each carries some element of limitation. The problem with outliers becomes the challenge of establishing which element represents a random versus a systematic effect. In some cases, the outlier may be systematic and identifiable. Druckman (1994) provides an example of categorizing and creating a system for interpreting outliers in studies of negotiation. In other cases, the outlier may reflect unique circumstances. For example, in the Allen, Bourhis, et al.’s (2002) study analyzing student satisfaction with distance learning, one study was identified as an outlier (Köymen, 1992). That study presented data from students who suffered from a large, crowded, dusty, hot lecture hall without any air conditioning. Not surprisingly, the students taking the same course from comfortable locations using distance learning technology reported more satisfying instruction than students in the face-to-face, hot lecture-hall setting. In this case, the estimate fell outside the expected values but could be excluded for identifiable rather than random reasons. Authors should look carefully for systematic causes before attributing differences to random factors, which occur with a certain percentage of statistical probability.
When the level of variability is significantly greater than expected due to sampling error, then some additional source of variability must be considered. The question of how to account for that source of variability is important in meta-analysis. The normal term for such variability sources is the moderator variable. What different subject values or design element might differentiate divergent outcomes (see Baron & Kenny, 1986; Hayes & Rockwood, this issue, for discussions of the distinction between moderator and mediator variables).
For example, does the type of measurement affect the underlying effect observed? Measurement differences can involve the particular self-report scale used in the investigation as well as a comparison between types of measurement (e.g., self-report, observer evaluation, or physiological device—see Allen, 1989). The question of gender, race, location, age, as well as other demographic distinctions provide a basis for determining whether some distinction exists.
Study features also play an important role (Wortman, 1994). Was the investigation a survey or an experiment? For studies involving therapy, how long were the sessions, how many were examined, and what was the time interval for the follow-up measurement after completion? When examining media effects, the definition and type of stimulus may make a significant difference when considering the influence of media violence, for example, cartoon violence, “gory” versus “bloodless” violence, sexual violence, target of the violence-shooting a criminal versus innocent child, violence with or without prolonged suffering. Theoretical frameworks may prove useful for coding moderators. No exhaustive list of potential sources of moderation exists; any theoretical or methodological issue can moderate any observed relationship. The problem with examining moderating influences becomes the infinite number of such sources and the need to consider the level of Type I error that may exist when the number of moderator variables becomes very large (Hedges & Olkin, 1985).
Testing/Evaluating the Average Effects
The final steps in the meta-analysis process often involve a comparison of multiple average effects generated from a moderator analysis. One central question involves whether the researcher believes that the average effect truly represents the trend across the studies. Answering this question begins by determining whether the effect is fixed or random. A fixed-effect approach assumes that a “true” average exists, permitting the making of claims about the average association across studies. For example, a fixed approach would make a claim that a high-fear message is more persuasive than a low-fear message (Witte & Allen, 2000). Conversely, a random approach to the interpretation of averages makes a more context-bound claim rather than assuming a “true” average. For example, using the same fear appeals research, a random-effects approach would claim that some levels of high fear are more persuasive than low-fear messages (Hunter, Hamilton, & Allen, 1989). The random-effects approach assumes that no true average estimate exists, which limits the claim to certain contexts and restricts the kinds of advice provided on the basis of the existing data. Such distinctions involve more philosophical rather than empirical issues but the choice of taking a fixed versus random approach radically alters the value of the findings to social science research (Allen, 2009).
A more conventional approach related to multiple regression/analysis of variance makes the assumption that homogeneity exists within levels of a moderator, and heterogeneity exists between levels of a moderator (Hall & Rosenthal, 1991). What should happen is that the standard of homogeneity of variability within a level/cell and heterogeneity between cells provides a complete examination or justification for acceptance of the moderator analysis (Borenstein, Hedges, Higgins, & Rothstein, 2009). The next section considers the potential for additional follow-up statistical procedures when the goal involves generating a more comprehensive or theoretically driven evaluation. Many of the elements represent the emerging views about the flexibility and application of meta-analysis. The views continue to emerge as scholars develop a means to use these approaches.
Variable Purposes of a Meta-Analysis
Evaluate a Relationship Between Variables
One of the most common forms of meta-analysis becomes the examination of how one concept relates to another concept. Essentially, one variable becomes measured and a theoretical assumption states that a change in value predicts a change in the level of another variable. If no relationship exists, or exists in the opposite direction, the underlying justification for acceptance of the theory becomes suspect.
For example, suppose a scientist advocates for uncertainty reduction theory and provides a valuable explanation for understanding initial interactions (Berger & Calabrese, 1975). A central tenet of the theory is that self-disclosure of information reduces uncertainty. If 150 investigations that examine the relationship between self-disclosure and level of uncertainty exist and are examined in a meta-analysis, then it becomes possible to evaluate the central tenet of the theory. Whether the underlying correlation supports the predicted relationship, supports a conclusion in the opposite direction, or finds no relationship, the impact of the conclusion on the claim of the theory remains important. If a central tenet of a theory receives support, then continued acceptance of the theory becomes a rational exercise. If not, then continued support for the theory becomes more difficult to justify.
If moderating conditions exist for the relationship, then the theory may be partially supported, but require modification. The point of the meta-analysis becomes establishing some mechanism to evaluate the empirical correspondence between existing data and theoretical claims. For some theoretical claims, the establishment of an average effect may provide enough evidence to warrant continued acceptance or rejection of a particular portion of a theory (Wachter & Straf, 1990).
Evaluate the Impact of a Stimulus
The question of how a stimulus impacts a set of participants plays a central role in social science research when considering the impact of an intervention, or the impact of a message. Informing ongoing social controversies where groups ask, for example, whether regulating media content is justifiable based on research findings. Similarly, when seeking to learn about the impact of an intervention or treatment, it is useful to look across studies to determine longitudinal impacts (Turkiewicz & Allen, 2014; Turkiewicz, Allen, Venetis, & Robinson, 2014; Venetis, Robinson, Turkiewicz, & Allen, 2009).
Meta-analysis allows groups to use data to inform such questions as whether media violence, sexuality in the media, gender stereotypes, homophobia, racial images, music, or other content plays a central role in determining important outcomes (Allen, Bourhis, Tenzek, & Bauman, 2014; Allen, D’Alessio, & Burrell, 2011; Preiss, Gayle, Burrell, Allen, & Bryant, 2007; Timmerman et al., 2008). The analysis requires both an examination of cross-sectional, or immediate affects within an experimental design as well as longitudinal effects, measured typically in years, using surveys (Herrett-Skjellum & Allen, 1996).
In addition, it is important to determine whether effects are uniform across subjects. For example, a meta-analysis dealing with the impact of Earvin “Magic” Johnson’s announcement of his positive test for HIV found that this startling event resulted in increased anxiety about vulnerability to the disease among adult heterosexuals (because he was heterosexual, an athlete, and African American), but reduced anxiety among children (because he also pointed out to children that the disease was not transmitted by casual contact; Casey et al., 2003). This case points to a clear example of how audiences may have opposite reactions to the same news item, and how meta-analysis can provide an historical record of important events (Emmers-Sommer & Allen, 1999).
One informative example about the impact of meta-analysis can be observed in the controversies associated with the impact of media on various audiences. The impact of messages on an audience often creates a sense of urgency and concern on the part of policy makers about the need to implement media controls. Examination of virtually any meta-analysis examining media effects points to media having a significant influence on media consumers for both prosocial and antisocial messages. The impact of meta-analysis raises the stakes when considering an intervention because data often point to “best practices” that can be implemented to either blunt potentially negative impacts or even advocate for media censorship. The long-term implications of the meta-analysis findings remain an underdeveloped part of the domain, particularly in the era of “evidence-based” policy formulation (Cook et al., 1992; Lipsey & Wilson, 2001; Pettitti, 2000).
Current Trend of the Testing a Theoretical System
As indicated above, meta-analysis is often used to evaluate a theoretical system or causal model. The same possibilities that exist for primary data analysis exist within the context of a meta-analysis, although the actual statistical process requires consideration of the unique features of the available data. Becker (Becker, 2001; Becker & Schram, 1994) points to the value of a theoretically driven application of meta-analysis. Models or theoretical systems may evaluate how a data distribution can depart from an expected value (see D’Alessio & Allen, 2000), test the veracity of curvilinear models (Kim, Allen, & Preiss, 2019), or examine models employing rating and categorical systems (Maier, Allen, & Burrell, 2014). The underlying math and application continue to receive development and articulation.
Examples exist for the use of multiple regression, analysis of variance, causal modeling (Allen, Dilbeck, et al., 2014; Rosenthal, et al., 2000; Salazar, Gonzalez, Duysters, Sabidussi, & Allen, 2016; Turkiewicz et al., 2014), longitudinal analysis, as well as multilevel modeling (Houwelingen, Arends, & Stijen, 2002). One potential limitation to using these approaches is access to all the estimates necessary to produce the variance/covariance matrices or the correlation matrices. The question may also arise about whether ordinary least squares or maximum likelihood estimation provides a better approach. The challenge of generating the various goodness-of-fit indexes is sometimes difficult when using secondary data analysis within a meta-analysis (Allen, 2017; Shadish, 1996).
It is important to note that any primary data analytic tool can be used in a meta-analysis to address multiple sources of variability (Fried, Shirom, Gilboa, & Cooper, 2008). The challenge is finding a means of generating the equivalent test when additional multiple sources of variability exist (second-order sampling error, moderator variables, differential measurement, and artifact existence). While meta-analysis presents some unique opportunities, the lack of universal data configurations often means that the procedures require some degree of modification or adaptation to the unique circumstances of the existing data (Jak, 2015).
Developing Meta-Analysis as a Guide for Instructional Content
One of the most underrated and long-term applications of meta-analysis is in the field of education. The content of textbooks and instruction in the sciences presumes that research conclusions change instruction to reflect the improved state of knowledge. One impact of meta-analysis is assessing the consistency from textbook to textbook on empirical facts. The implications for textbook content becomes obvious in terms of assuring accuracy and consistency. Several meta-analyses have found a wide variety of inconsistency, largely because most textbooks simply select one study out of many to represent a finding (Allen & Preiss, 1990, 1998, 2002, 2006; Allen, Preiss, & Burrell, 2007). The impact of meta-analysis over time should be to standardize or create consistent explanations across textbooks. Published research collections already exist that accumulate meta-analyses (Allen, Preiss, Gayle, Burrell, 2002; Burrell, Allen, Gayle, & Preiss, 2014; Gayle, Preiss, Burrell, & Allen, 2006).
The impact of meta-analysis becomes the eventual establishment of a foundational knowledge about pedagogical practice. The acceptance of claims as empirically true, supported by available data begins the process of establishing conclusions that become represented as facts. One recent implication of meta-analysis becomes the movement of the social sciences into a process that ultimately represents an empirically verifiable and replicable set of claims with a long history of verified empirical testing. The recent development of the “scholarship of teaching and learning” directly involves meta-analysis as a means of improving instruction. Meta-analysis slowly is emerging as the preferred tool for the formulation and evaluation of educational approaches. Combined with the inclusion as the authoritative view of data accumulation, the link between the knowledge or content of the subject as well as providing evidence for the best practice represents probably the more recent and important innovation.
Limitations of Meta-Analysis
Ethnographic Trap
The term “ethnographic trap” describes an error in understanding relating to application of meta-analysis findings. Meta-analysis operates usually at the conceptual or construct level for variables. For example, meta-analyses of fear appeals indicates that high-fear messages are more persuasive than messages using low-fear appeals. However, the meta-analysis fails to provide information on how to generate fear so that the direction for how to take advantage of the results remains limited. No amount of statistical information overcomes or solve that particular limitation.
Different audiences react to the same fear message content differently. For example, older persons may react more strongly to a message dealing with a threat to social security or Medicare benefits. Not surprisingly, younger persons experience less fearful reactions about these benefits but react with greater anxiety about messages dealing with college tuition increases or threats to self-esteem. Gender also plays a role; men are not generally terrified of breast cancer risks and women usually do not respond strongly to messages about prostate cancer risks. Application of the general finding requires a knowledge of the language community and the different orientations they demonstrate.
Limited Sample/Stimulus Variability
Part of the challenge of any meta-analysis becomes the adequacy and variability of the available data pool. Studies may exist in large numbers but fail to contain enough variability to permit adequate generalization to a wide variety of circumstances and contexts. In most cases, the question is not the adequacy of the available data to provide an accurate estimate. Instead the question is whether the sample included in the meta-analysis contains sufficient data to permit generalization. Consider the meta-analysis by Dindia and Allen (1992) that reports whether men or women provide higher levels of self-disclosure. With over 200 investigations and more than 23,000 combined participants, the data pool still remains limited. The authors note in the conclusion that all data were collected in North America so that no studies incorporated participants from other continents and other cultures. The question becomes whether any conclusion about gender differences becomes generalizable given these limitations. If cultural influences exist, then the lack of data from Asia or Africa represents a serious limitation to the findings. The authors of any meta-analysis become limited by the available data in terms of generalizing the results. The meta-analysis may be 100% accurate but fail to generalize given the lack of variability in the available sample.
What the meta-analysis should provide is a commentary of the relevant limitations. The function of meta-analysis in this case becomes an agenda setting function by pointing out how to design the 201st study to advance the understanding of the issue (Eagly & Wood, 1994). The limitations discussion in a meta-analysis should immediately point to the most fruitful way to generate the next set of studies. The most valuable contribution becomes the ongoing ability to update and direct ongoing research efforts. The limitations of meta-analysis, like this one, serve as a scientific and methodological innovations because meta-analysis can directly provide a justification for additional empirical evidence from a survey of the entire literature.
Unequal Sample Size Distributions
Part of the problem with results becomes the question of the size of each data pool in subgroups. Unlike an experimental design with random assignment, the size of each of the groups may vary radically. The question becomes how to make holistic claims from a meta-analysis using very limited data pools. Recently, Bloch (2014) recognized that so many of the individual estimates within the meta-analysis rely on limited samples restricting the certainty of the claims.
Consider the Allen et al. (1989) meta-analysis comparing the impact of various therapies on reductions in public speaking anxiety. There exist seven potential therapies: (a) skills training, (b) cognitive modification, (c) systematic desensitization, (d) combination of skills training and cognitive modification, (e) combination of skills training and systematic desensitization, (f) combination of cognitive modification and systematic desensitization, and (g) a combination of all three therapies. When examining their Table 2 on page 61, which summarizes results for the sample size for each group, we see the following: (a) 3,516, (b) 382, (c) 1,499, (d) 246, (e) 1,299, (f) 142, and (g) 20. The problem is that some of the cells, particularly (g) have a very small sample size with a great deal of sampling error for that particular effect (triple combination), while other combinations also demonstrate small sample sizes (142, 246). Thus, even though the report includes 97 manuscripts and combines 169 separate experiments, the combined sample size for some of the elements of the underlying analysis remains small. Unlike an investigation where random assignment remains an option, meta-analysis can only report existing investigations because the distribution operates beyond the investigator’s control.
The obvious implication of this limitation is the need to provide direction on how to design the next investigation. Any theoretical configuration or argument claiming support from a meta-analysis may have some elements resting on a very shaky or limited foundation. Adding meaningfully to the data pool that already contains 169 separate investigations is the goal. The meta-analysis provides clarity and identifies the area of greatest impact for a future investigation. Systematic and articulated reviews provide a focus that carefully circumscribes the limitations of particular claims, something not typically found in traditional reviews. Focusing in on each claim and the justification for each separate claim provides a serious innovation in understanding the existing literature.
Alternative Theoretical Interpretations
Any new theory becomes constrained because it must account for new facts. Even when operating within the confines of one theory, any data or set of data points consistent with one theory may remain consistent with alternative theories including those generated in the future. Rather than viewing the results of any meta-analysis as “proof” for a theory, the meta-analysis should be viewed more as demonstrating consistency with the theory. Since the number of potential alternatives remains infinite, the choice for accepting a particular explanation represents a combination of parsimony, history, and popular preference.
The key is that any theory must provide a means of fidelity such that it explains or accounts for the data. A theory with a fundamental tenet or assumption of empirical reality that is widely inconsistent with existing data, summarized by a meta-analysis, becomes difficult to accept (meta-analysis can directly test and compare theoretical models, see Ruppel et al., 2017; Song et al., 2014). However, the establishing of an empirical claim with a great deal of data does not necessarily validate a particular theoretical position. Instead, the argument is that a theoretical position whose formulation is consistent with empirical data provides one possible explanation for the existing data. Essentially, theoretical consistency with existing data constitutes a necessary condition for acceptance but may not constitute a sufficient condition. Alternative formulations may provide better, more heuristic, simpler, or much greater utility as explanations for empirical reality.
The upshot of this thinking is remaining open to the existence and the development of alternatives that would account for any given finding. The dynamic view of the interplay between data and theory represents an ongoing need for reassessment and the mutual influence of theory and data on each other. Meta-analysis operates as a limiting condition for future theories. Formulations must be consistent with existing accumulations.
Conclusion
One challenge to understanding meta-analysis becomes the ability to accept innovative thinking about statistical relations and procedures. A surprising feature of the technique is that meta-analysis takes the scholar closer to the origins of statistical analysis and assumptions underlying the process of science. The process of meta-analysis takes the scientist back to the future by reminding the community of the original premises that guided the formation of the statistical process.
Many of the underlying statistical assumptions are found in the work of statisticians from about 1895 to about 1940, after which the development of the computer and factor analysis distracted or changed the social sciences. The focus on approaches in the philosophy of science of alternatives to more traditional views created another distraction and attempt to replace the underlying scientific foundations. The failure to fully recognize the impact of both Type II error and various artifacts in measurement and design contributed to a kind of depression when consistent findings across empirical investigations failed to emerge. The inconsistency among results generated frustration and disillusionment with the ability of social science to generate results that could be treated as facts. The inconsistency contributed to a focus on methodological artifact as an explanation as well as situational, contextual, and individual differences (a form of intersectionality) as a means of explaining the inconsistent outcomes.
The problem simply was that the inconsistency among findings have really three sources: (a) systematic and differential artifact effects, (b) moderating and mediating influences of variables, and (c) random sampling error. By working to eliminate the artifact effect and reducing sampling error meta-analysis can reveal whether any averages really are affected by moderating/mediating influences. Without the elimination of artifact and random error prior to such examinations the entire process lacks the ability to create a foundation capable of success.
The impact of meta-analysis becomes less about the statistical and computational procedures (which provide a variety of options), and more about looking across studies and comparing results. Meta-analysis allows the scientist to work toward the long-term generation of conclusions that can be treated as fact (Miller & Pollock, 1994). Meta-analysis simply makes the world of empirical results less random. Dubin (1978) defines a theory as something that organizes variables, taking the random world and creating conceptualizations that identify processes and specify relationships to explain the connections between elements. Without the use of meta-analysis to eliminate random sampling error and identify systematic influence of artifact, scholarly efforts at identifying and organizing the world to create meaningful connections simply becomes a “fools errand.” Meta-analysis does not create theoretical thinking. It simply makes theoretical thinking capable of testing. Meta-analysis raises the stakes of social science efforts because now facts are at stake.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
