What If We Were Texas Sharpshooters? Predictor Reporting Bias in Regression Analysis

Abstract

The author analyzes reporting biases in regression analyses. The consequences of researchers’ strategy to select significant predictors and omit nonsignificant predictors from regression analyses are examined, focusing on how this strategy—labeled the Texas sharpshooter (TS) approach—creates a predictor reporting bias (PRB) in primary studies and research syntheses. PRB was demonstrated in simulation studies when correlation coefficients from several primary regression studies with an underlying TS approach were aggregated in meta-analyses. Several important findings are noted. First, meta-analytical effect sizes of true effects can be overestimated because smaller, nonsignificant findings are omitted from regression models. Second, suppression effects of correlated predictor variables create biased effect size estimations for variables that are not related to the outcome. Finally, existing small effects are concealed, and between-study heterogeneity can be overestimated. Results show that PRB is contingent on sample size. While PRB is substantial in studies with small sample sizes (N < 100), it is negligible when large sample sizes (N > 500) are analyzed. Preconditions and remedies for reporting biases in regression analyses are discussed.

Keywords

reporting bias meta-analysis regression funnel plots publication bias

The fabled Texas sharpshooter fires a shotgun at a barn, then paints the target around the most significant cluster of bullet holes in the wall. Based on this scenario, the Texas sharpshooter fallacy describes a false conclusion that occurs whenever ex-post explanations are presented to interpret a random cluster in some data (Gawande, 1999). In science, this happens whenever researchers define their hypotheses after analyzing the data. These researchers can be fooled by randomness and suggest cause-and-effect relationships that might have occurred purely by chance (Gilovich, Vallone, & Tversky, 1985; Milloy, 1995). This approach, sometimes referred to as HARKing (i.e., hypothesizing after results are known), number crunching, selective reporting, omitting loser hypotheses, data dredging, grubbing, or fishing expedition, has often been discussed as a potential bias in empirical research in the social sciences and their related fields (Kerr, 1998; Leamer, 1983; Selvin & Stuart, 1966; Turner, Matthews, Linardatos, Tell, & Rosenthal, 2008). Despite this criticism, experts have found it to be a common occurrence in contemporary research. For example, Kerr (1998) found that, among behavioral researchers from different fields of psychology, certain forms of hypothesizing after results are known to be as common as the prescribed hypothetical-deductive approach. Similarly, Bedeian, Taylor, and Miller (2010) surveyed management researchers and reported that the development of hypotheses after results are known is the most prevalent form of research misconduct in their study. Although considerable consensus exists among researchers that the conduct of science prescribes a hypothetico-deductive approach for quantitative empirical research, literature examining whether the Texas sharpshooter approach creates a bias in published research findings is lacking.

Previous research has analyzed different forms of reporting bias that occur when the publication and diffusion of study results are contingent on the nature and direction of the results (McGauran et al., 2010; Sterne et al., 2011). Among the different forms of reporting biases, outcome reporting bias and publication bias have gained the most attention (McGauran et al., 2010; Song et al., 2010). Outcome reporting bias occurs when researchers report a selection of statistically significant outcomes within published studies, but omit other nonsignificant outcomes (Copas & Shi, 2001; Hutton & Williamson, 2000; Williamson, Gamble, Altman, & Hutton, 2005). For example, researchers might measure an outcome at two points in time, but report only the one that is significant in their final analyses.

Whereas outcome reporting bias is the selective nonreporting of one or more outcome variables in a study, publication bias is the nonreporting of an entire study (Song et al., 2010). It emerges when the publication probability of a study is contingent on the statistical significance of its results (Dickersin, Min, & Meinert, 1992; Ioannidis, 2005; Scargle, 2000; Stern & Simes, 1997). If most significant results are published, while at the same time a greater number of nonsignificant findings disappear in researchers’ file drawers, an overrepresentation of significant findings in published articles will emerge. Consequently, effect sizes could be overestimated in the literature (Chalmers et al., 1987; Villar, Carroli, & Belizán, 1995). In a recent attempt to assess this file drawer effect in organizational behavior, human resource management, and related fields, Dalton, Aguinis, Dalton, Bosco, and Pierce (2012) compared the number of significant and nonsignificant findings in correlation matrixes that were found in a large number of nonexperimental studies from these fields. They reported a very similar ratio of significant correlations in published and nonpublished studies, concluding that “the file drawer problem does not produce an inflation bias and does not pose a serious threat to the validity of meta-analytically derived conclusions” (p. 222). However, the authors failed to consider that a great body of studies in organizational research and related areas employs beta coefficients from multivariate regression analyses—but not bivariate correlations—for hypothesis testing (Chatterjee & Hadi, 2006, p. 3; Cohen, Cohen, West, & Aiken, 2003, p. 2). In fact, evidence suggests that meta-analyses in management research mostly use effect sizes that were also part of multivariate models in the primary studies (e.g., regression analysis and structural equation modeling; Aguinis, Dalton, Bosco, Pierce, & Dalton, 2011).

Outcome reporting bias and publication bias are well understood; however, the effect of predictor selection in multivariate settings has yet to be analyzed. Reporting bias in general is seen as a serious threat to the validity of findings in scientific discourse but is nonetheless an underresearched phenomenon in management research (Kepes, Banks, McDaniel, & Whetzel, 2012). Based on this research gap, the purpose of this article is to explore reporting biases generated by selective predictor reporting in regression analyses. I simulate the Texas sharpshooter (TS) approach—a predictor selection procedure based on statistical significance in regression analyses—to examine whether and how this approach biases research findings. The study is based on the following connected assumptions: (a) primary researchers analyze data in regression analyses, (b) they tend to drop predictor variables that are not significant, (c) hence zero-order correlations are not reported for those predictors that were dropped from regression analyses, and (d) zero-order correlations from many primary regression studies are later aggregated in meta-analyses. Using simulation, I demonstrate that this selection of the most important (i.e., significant) beta coefficients by discarding nonsignificant predictors in primary regression studies carries over to correlation coefficients in meta-analyses and produces spurious effects or substantive misestimations of true effects that might, ultimately, threaten the validity of meta-analytical findings.

I first discuss previous research on reporting biases. These considerations are the necessary foundation for the detailed examination of the suggested predictor reporting bias (PRB). I then show theoretically and by means of Monte Carlo simulations that researchers’ TS approach results in PRB. In the general discussion, I review statistical preconditions of PRB, discuss limitations, and present remedies.

Evidence and Consequences of Reporting Bias

According to Sterne et al. (2011), “Reporting biases arise when the dissemination of research findings is influenced by the nature and direction of results” (p. 303). It can have various forms, such as when findings affect publication or nonpublication (publication bias), the selection of reported outcomes (outcome reporting bias), speed of publication (time lag bias), multiple publication of research findings (multiple publication bias), publication probability in journals that are easy to access (location bias), publication in a particular language (language bias), or the number of citations (citation bias; McGauran et al., 2010; Sterne, Egger, & Moher, 2008; Sterne et al., 2011).

These different forms of reporting bias can be either on a study level or on an outcome level (Kepes et al., 2012; McGauran et al., 2010). Reporting bias on a study level affects entire studies, which includes, for example, publication bias and language bias. On an outcome level, biases might occur because of selective reporting of outcomes in a study (outcome reporting bias) or selective reporting of analyses (McGauran et al., 2010). The purpose of this article is an analysis of researchers’ strategy to perform several analyses and then choose the one with most significant findings for publication. This article thus explores a form of selective reporting of analyses. While the literature has acknowledged selective reporting per se as a potential problem in empirical research, this specific form of selective reporting in regression analyses has not been researched before.

Note that the above definition considers publication bias as one form of reporting bias; there are other conceptualizations that view publication bias as the superordinate concept. Then, outcome reporting bias, for example, is a form of publication bias (e.g., Kepes et al., 2012; Rothstein, Sutton, & Borenstein, 2005). I use the former conceptualization, because it better allows me to distinguish between researchers’ behavior when reporting results and the subsequent process of (non)publication and dissemination of their findings. The term publication bias, on the contrary, implies that potential biases arise only in the publication process.

The presence of reporting bias is difficult to show for single studies. However, for research synthesis (e.g., meta-analyses), previous research indicates the presence of some type of reporting bias in published research, mostly focusing on publication bias (Easterbrook, Berlin, Gopalan, & Matthews, 1991; Kraemer, Gardner, Brooks, & Yesavage, 1998; Lipsey & Wilson, 1993; Rothstein et al., 2005; Song et al., 2010; Sutton, Duval, Tweedie, Abrams, & Jones, 2000; Turner et al., 2008) or outcome reporting bias (Hahn, Williamson, & Hutton, 2002; Simmons, Nelson, & Simonsohn, 2011).

The most common tool in meta-analyses to inspect reporting bias is a funnel plot, which plots effect sizes against sample size or standard error (Aguinis, Pierce, Bosco, Dalton, & Dalton, 2011; Egger, Smith, Schneider, & Minder, 1997; Light & Pillemer, 1984). The rationale is that, in smaller studies, effect sizes must be large to become significant. If only significant findings are published, but nonsignificant findings disappear in the researchers’ file drawers, an asymmetry occurs in the funnel plot. Let us, for example, analyze the relationship between two measured phenomena (X and Y) that have a positive linear relationship. The sample size in the studies varies between 50 and 500, while the correlation is set to r _XY = .10, representing a small effect size (Cohen, 1988). Figure 1 depicts a funnel plot based on 100 simulated studies (all simulations coded in R language for statistical computing; R Development Core Team, 2008; the code is available from the author on request).

Figure 1.

A simulated example of publication bias.

Following Sterne and Egger’s (2001) recommendations, the vertical axis depicts each study’s standard error because the plot then corresponds to a symmetrical funnel if no bias exists in the data. Note that the largest studies have the smallest standard errors; thus, the vertical axis is inverted with a standard error of zero at the top (Sterne & Egger, 2001). Each dot represents one study. The gray dots show studies with a nonsignificant finding while the black dots indicate that the correlation is significant in that study when alpha is set to 5% (two-tailed). The dashed lines mark the region where study findings are not significant at the 5% level (i.e., pseudo–confidence level at 95%). If all dots are taken into account, the typical pyramid shape evidenced is characteristic of an unbiased set of studies in a funnel plot. However, in an extreme case of publication bias, only studies with significant findings are published, and the gray dots do not appear in the literature. An inspection of only the black dots in the scatter plot clearly shows an asymmetry because it seems that larger studies (i.e., those with small standard errors) exhibit smaller correlation coefficients. This asymmetry is an indicator of reporting bias (for other reasons of funnel plot asymmetry, see Egger et al., 1997; Sterne et al., 2011).

A meta-analysis that does not account for these effects and includes only published findings will yield an average effect size that is larger than the true underlying effect of r_XY = .10. In this example, the unweighted mean effect size is r = .17 for published studies (i.e., black dots) and r = .09 if all studies are included. The difference between these two effect sizes reflects the extent of reporting bias.

Statistical tools for detecting funnel plot asymmetry are based on the idea that the studies’ effect sizes are associated with standard errors or sample sizes. For example, a regression line fitted in the subset of black dots yields a positive slope coefficient for the standard errors (Egger et al., 1997) because higher standard errors are associated with higher correlation coefficients (z = 3.72, p < .001). This is an indicator of reporting bias, as there should be no significant relationship in an unbiased set of studies. Accordingly, no significant relationship is found in the complete set of studies (black and gray dots in Figure 1; z = –0.01, ns).

The next section builds on this link among sample size, effect size, and statistical significance to describe a mechanism in multiple regression analysis that produces biased effect size estimations.

Predictor Reporting Bias in Regression Analyses

Multivariate regression analyses offer researchers some options when choosing the “best” statistical model. Recently, Simmons et al. (2011) explicitly modeled researchers’ degrees of freedom in the analysis and reporting of statistical results in psychological research. They demonstrated that the number of false positives nearly doubled when researchers chose only significant findings in studies with two dependent variables (i.e., outcome reporting bias). We can transfer this logic from outcome variables to predictors in regression analyses. Let us, for example, think of researchers who are interested in the effects of team diversity on team performance (for an overview on team diversity, see Horwitz & Horwitz, 2007; Shore et al., 2011). In a field study, they might gather data from a large number of teams. The subsequent regression analyses could reveal a significantly positive impact of gender diversity and functional diversity on team performance, but no effects of age diversity or other types of diversity. These researchers rank the publication probability highest for the link among gender diversity, functional diversity, and team performance and then present an article on these effects. They follow a TS approach, as other aspects of team diversity are discarded.

Meanwhile, other researchers might study team diversity and find effects in multiple regressions for different types of team diversity on team performance. Ultimately, these studies can be combined to gain reliable estimates of population effect sizes (Henson & Roberts, 2006; Hunter, Schmidt, & Jackson, 1982). To aggregate findings from many studies, meta-analyses have been suggested as the “near-perfect vehicle for disclosure and replicability” (Dalton & Dalton, 2008, p. 141) because their main goal is the estimation of an overall effect size as calculated from many studies (Hedges & Olkin, 1985). Yet how does the TS approach of single studies (e.g., on team diversity) translate into effect size estimations in meta-analytical findings? To apply a TS approach, at least two predictor variables are necessary, as the TS approach includes that—after omitting one or more variables from regression analyses—the (nonempty) adjusted statistical model is estimated. Multiple regressions include a mechanism that affects the choice of variables when the TS approach is applied, thereby systematically biasing research findings. I first describe this mechanism using two predictor variables as an example. The extent of this effect will subsequently be demonstrated in various Monte Carlo simulations.

The relationship between two predictor variables and their impact on the beta coefficient estimation of each other’s variable was defined by Cohen et al. (2003, p. 68),

{\hat{β}}_{1} = \frac{r_{x 1 y} - r_{x 2 y} * r_{x 1 x 2}}{1 - r_{x 1 x 2}^{2}}

and

{\hat{β}}_{2} = \frac{r_{x 2 y} - r_{x 1 y} * r_{x 1 x 2}}{1 - r_{x 1 x 2}^{2}},

where

{\hat{β}}_{1}

and

{\hat{β}}_{2}

are beta coefficients for X₁ and X₂, respectively. Note that the regression coefficient

{\hat{β}}_{2}

is negative whenever

r_{x 2 y} < r_{x 1 y} * r_{x 1 x 2}

and positive whenever

r_{x 2 y} > r_{x 1 y} * r_{x 1 x 2}

(Friedman & Wall, 2005). Let us assume that a researcher studies the impact of two predictor variables (X₁ and X₂) on an outcome (Y). Let us further assume that X₁ is a positive predictor (r _X1Y = .30), X₁ and X₂ have a positive zero-order correlation (r _X1X2 = .30), and no linear relationship exists between X₂ and Y (r _X2Y = 0). Then, the expected standardized regression coefficient for X₂ in a linear regression with two predictors based on Equation 1b is

{\hat{β}}_{2} = \frac{0 - 0.3 * 0.3}{1 - {0.3}^{2}} \approx - 0.10.

Although X₂ is not directly related to Y, the expected value of the regression coefficient (i.e., the partial relationship) is negative because of the potential suppression effect shown in Equations 1a and 1b. The inclusion of X₁ in the regression increases the predictive validity of X₂. Without X₁, the expected beta coefficient of X₂ would be zero. Hence, X₁ is a suppressor variable for X₂ in this setting (Conger, 1974). The beta coefficients become significant whenever the absolute value of the corresponding t values exceed a critical value,

t = \frac{{\hat{β}}_{i} - β_{0}}{S E_{\hat{β}}},

where

S E_{\hat{β}}

is the standard error of the estimate of the regression coefficient.

β_{0}

is the specified value that

\hat{β}

is tested against and is mostly set to

β_{0} = 0

to examine whether

{\hat{β}}_{i}

significantly deviates from zero. Because of sampling error, a realization of the point estimate

{\hat{β}}_{2}

in an empirical sample will vary around its expected value of

{\hat{β}}_{2} \approx - 0.10

, with a standard deviation of

S E_{\hat{β}}

. For example, if

S E_{\hat{β}}

= 0.07, the absolute t value must be larger than 1.984 (α = .05, two-tailed; N = 100) to be statistically significant (Cohen et al., 2003, p. 86). Inserting these values in Equation 2 and solving for

{\hat{β}}_{i}

shows that the regression coefficient is significantly positive whenever

{\hat{β}}_{2}

> 0.14 but significantly negative whenever

{\hat{β}}_{2}

< −0.14. As the expected

{\hat{β}}_{2}

is –0.10, it is more likely that

{\hat{β}}_{2}

will be significantly negative than significantly positive.

Researchers applying a TS approach will keep only significant predictors in the sample. The probability of dropping X₂ because it is nonsignificant in a regression is lower for realizations that exhibit a negative correlation between X₂ and Y, as ${\hat{β}}_{2}$ is related to r _X1Y (see Equation 1b). Ultimately, an aggregation of significant regression results will yield a negative mean effect size for X₂ with regard to Y, although the “true” underlying relationship is r _X2Y = 0 (i.e., no relationship exists). This effect of the TS approach is key to understanding results from the following Monte Carlo simulations. If predictor variables in regressions are correlated, expected beta coefficients of variables with no direct relationship with the outcome can differ from zero. Although estimated correlation coefficients are dispersed around the true correlation coefficient, published correlation coefficients will mainly be negative in the chosen setting because a negative correlation coefficient has a greater chance of becoming significant in a multiple regression. As such, a meta-analysis might then produce a negative effect for X₂ on the outcome despite the fact that no underlying relationship exists (i.e., r _X2Y = 0).

The following Monte Carlo simulations show the potential impact of a TS approach on aggregated research findings in regression analyses. I start with a simple example that only includes two predictor variables in a regression and subsequent meta-analyses (Simulation Study 1). The second set of simulations employs a more complex regression model with eight predictor variables and compares results from different forms of a TS approach (Simulation Study 2). The third set of simulation studies test whether the most common tools for detecting publication bias are able to detect biases in meta-analyses when primary researchers apply a TS approach. Simulation Study 4 shows results when relevant simulation parameters (e.g., sample size and predictor constellations) are systematically varied. Finally, Study 5 reports cumulative findings to simulate shifts in effect sizes over time.

Simulation Study 1: A Simple Example of the Predictor Reporting Bias

The TS approach describes a researcher’s strategy of dropping predictors with nonsignificant findings in multivariate analyses. This selection of predictor variables might create a bias in published research findings, which is here labeled PRB. PRB is defined as the extent to which published research findings deviate from true relationships due to predictor selection in regression analyses. In the following simulations, I analyze the degree to which researchers’ TS approach creates PRB.

The setup for the simulations is shown in Figure 2. On the left-hand side, the general research process with an underlying TS approach is outlined. The right-hand side depicts the corresponding simulation process used in later Monte Carlo simulations. In the first step of the research process, primary researchers collect data in one research domain. The result is a data matrix that contains data from all assessed variables. Second, the TS approach suggests that primary researchers analyze data and drop those variables that do not yield significant results in regression analyses. They try out several regression models before choosing the most promising one (e.g., the model with the highest rate of significant findings). Third, primary researchers report correlation matrixes in their publications, but some variables are not included because they were nonsignificant in the previous regression analyses. For example, if eight predictor variables are assessed but only five have significant beta coefficients, the other three variables do not appear in the regression model and, hence, are also dropped from the published correlation matrix. Fourth, meta-analytic researchers use correlation coefficients from primary studies. However, in their meta-analyses, they can analyze only the correlation coefficients that were reported in primary studies. The estimated meta-analytic measure of effect size is based only on published correlation coefficients. Thus, the difference between the estimated effect size in the meta-analysis and the true underlying relationship indicates the extent of PRB.

Figure 2.

Flow chart of a research process following the TS approach (left) and the corresponding procedure in Monte Carlo simulations (right).

This research process was translated into the following settings in Monte Carlo simulations (see the right-hand side of Figure 2). First, true relationships had to be defined. These are unknown in empirical research because the ultimate goal of any empirical study is a precise estimation of these relationships and their causal structure. Next, N cases were generated for all predictor variables and outcome Y, based on the defined relationships. For the first simple example, the simulation was based on the previously described settings with r _X1Y = r _X1X2 = .30 and r _X2Y = 0. All variables were normally distributed (X₁, X₂, Y ∼ N[0, 1]). Sample size N varied between 50 and 100, following a discrete uniform distribution (i.e., all sample sizes between 50 and 100 have the same probability).

In Step 2, Y was regressed on all predictors in an ordinary least squares (OLS) regression, and ${\hat{β}}_{1}$ and ${\hat{β}}_{2}$ were estimated for the predictors X₁ and X₂, respectively. Simulating a TS approach, the predictor with the highest nonsignificant p value (i.e., the “least significant”) was removed from the regression, and a new regression without that predictor was estimated. This step was repeated until only significant predictors remained in the regression (as indicated by the loop in Step 2 in Figure 2). In other words, it was simulated that a primary researcher started with all predictors and then successively dropped nonsignificant predictors until only significant predictors remained in the statistical model.

Third, the results of the full model with all predictors and the model from the TS approach were saved for later comparison. A total of k full and k TS matrices was generated in this way, where k is the number of simulated studies conducted by hypothetical researchers applying a TS approach. In a last step, effect sizes for the predictor variables on Y were estimated in meta-analyses for the set of full models and the corresponding set of models from the TS approach. Following common practice in management research, effect sizes in meta-analyses were based on correlation coefficients and not beta coefficients from multiple regressions (Aguinis, Dalton, et al., 2011; Dalton & Dalton, 2008; Hunter & Schmidt, 2004).

The top of Table 1 presents meta-analytic results for the full models; the complete set of correlation matrixes from all studies was aggregated in this first example. Heterogeneity between studies was estimated according to Hunter and Schmidt’s (2004) work using the metafor package (Viechtbauer, 2010) in the R software for statistical computing. The estimated effect size based on correlation coefficients from all k = 100 studies with a total sample size of 7,456 yielded estimated effect sizes of $\overset{ˉ}{r}$ _X1Y = .31 (p < .001) and $\overset{ˉ}{r}$ _X2Y = –.01 (ns). This is very close to the true values that I defined for this simulation (r _X1Y = .30 and r _X2Y = 0). Thus, an aggregation of the full models reflected the underlying simulation settings very well. Not surprisingly, a meta-analysis that included all studies without any omitted variables gives very precise point estimates of the underlying “true” effects.

Table 1.

Meta-Analytic Results for Full Models and Models From a TS Approach.

	k	N	True r	$\overset{ˉ}{r}$	95% CI Lower	95% CI Upper
Full models (k = 100)
X₁	100	7,456	.30	.31	.29	.33
X₂	100	7,456	.00	–.01	–.04	.01

TS approach (k _max = 100)
X₁	83	6,258	.30	.33	.31	.35
X₂	15	1,115	.00	–.17	–.23	–.11

TS approach (k _max = 5,000)
X₁	3,853	291,792	.30	.35	.347	.353
X₂	678	51,997	.00	–.15	–.16	–.14

Note: k = number of studies; N = total sample size; r = estimated effect size; 95% CI lower/upper = lower/upper 95% confidence interval; TS = Texas sharpshooter. Heterogeneity was estimated according to Hunter and Schmidt (2004). Fixed effect estimations and other heterogeneity estimators yielded similar results; for example, in a TS approach with a larger k (lower part of the table): $\overset{ˉ}{r}$ _X2Y (Hunter & Schmidt, 2004) = $\overset{ˉ}{r}$ _X2Y (fixed effect) = $\overset{ˉ}{r}$ _X2Y (restricted maximum likelihood) = $\overset{ˉ}{r}$ _X2Y (empirical Bayes) = –.151.

Results from the subset of models from the TS approach are shown in the middle of Table 1. For X₁, only 83 studies were included because X₁ was not significant in 17 of the simulated studies. Note that the effect size of X₁ was slightly overestimated ( $\overset{ˉ}{r}$ _X1Y = .33, p < .001) because X₁ was discarded in 17 studies with a nonsignificant beta coefficient. This usually happens when the correlation between X₁ and Y is low as the chance of a predictor becoming significant in a multivariate regression is contingent on the zero-order correlation between predictor and criterion. Thus, an aggregation of the remaining correlation coefficients will overestimate the effect.

The point estimate for the effect size of X₂ was $\overset{ˉ}{r}$ _X2Y = –.17 (p < .001), based on correlation coefficients from 15 studies. The true direct relationship between X₂ and Y was defined as zero. However, the average correlation coefficient that I obtained from studies following the TS approach can be classified as a small to medium negative effect (Cohen, 1988). This example of the consequences of a TS approach can be derived from Equations 1a and 1b. The expected beta coefficient of X₂ is negative, resulting in a higher likelihood of a significantly negative partial relationship between X₂ and Y in a regression analysis. If zero-order correlations are aggregated only from the selected (i.e., significant) relationships between X₂ and Y, the result is a negative effect size, although no true underlying effect exists. This effect remained similar in size when the number of studies was increased from 100 to 5,000 (see lower part of Table 1 with k _max = 5,000). In conclusion, two forms of PRB are present in this example. The first is an overestimation for X₁ ( $\overset{ˉ}{r}$ _X1Y = .35), which is similar to the file drawer effect that would also occur in bivariate settings when only X₁ is measured. The second is a downwardly biased estimate for X₂ ( $\overset{ˉ}{r}$ _X2Y = –.15). The second effect is contingent on the multivariate setting; that is, the consistently negative effect size for X₂ would disappear if X₁ was not part of the regression analyses.

This setting was relatively simple as it only included two predictor variables in a regression; however, most regression analyses use a larger set of variables. In the following simulation studies, I alter the simulations to arrive at more realistic settings that better reflect regression analyses in management research.

Simulation Study 2: Predictor Reporting Bias in a Simulated Research Domain

Let us first define a research domain in which preceding theoretical work has suggested a causal model with eight potential predictors (X₁, X₂, …, X₈) for an outcome variable (Y). Researchers are now interested in approximating the true relationships of each predictor; therefore, they start exploring the research domain in empirical studies. Although unknown to the hypothetical researchers, the true relationships are specified in the simulation settings. Let us assume that intercorrelations of r = .20 exist among predictor variables (r _XiXj = .20). Furthermore, true correlation coefficients between X_i and Y are set to .30, .20, .10, or .00 to reflect different strengths in the true relationships between predictors and outcome (see top of Table 2). It is now again of interest to examine how the true relationships are mirrored in aggregated research findings. Simulations were conducted following the TS approach outlined in Figure 2. Results from full models were omitted because they produced unbiased and precise estimates.

Table 2.

Meta-Analytic Results From Simulated Regressions With Eight Predictor Variables Following Different TS Approaches.

Simulation settings: $(\begin{matrix} Y \\ X_{1} \\ X_{2} \\ X_{3} \\ X_{4} \\ X_{5} \\ X_{6} \\ X_{7} \\ X_{8} \end{matrix}) Ñ ((\begin{matrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{matrix}), (\begin{matrix} 1 & 0.3 & 0.3 & 0.2 & 0.2 & 0.1 & 0.1 & 0.0 & 0.0 \\ 0.3 & 1 & 0.2 & 0.2 & 0.2 & 0.2 & 0.2 & 0.2 & 0.2 \\ 0.3 & 0.2 & 1 & 0.2 & 0.2 & 0.2 & 0.2 & 0.2 & 0.2 \\ 0.2 & 0.2 & 0.2 & 1 & 0.2 & 0.2 & 0.2 & 0.2 & 0.2 \\ 0.2 & 0.2 & 0.2 & 0.2 & 1 & 0.2 & 0.2 & 0.2 & 0.2 \\ 0.1 & 0.2 & 0.2 & 0.2 & 0.2 & 1 & 0.2 & 0.2 & 0.2 \\ 0.1 & 0.2 & 0.2 & 0.2 & 0.2 & 0.2 & 1 & 0.2 & 0.2 \\ 0.0 & 0.2 & 0.2 & 0.2 & 0.2 & 0.2 & 0.2 & 1 & 0.2 \\ 0.0 & 0.2 & 0.2 & 0.2 & 0.2 & 0.2 & 0.2 & 0.2 & 1 \end{matrix}))$
	k	N	Truer	$\overset{ˉ}{r}$	95% CI Lower	95% CI Upper
TS Approach 1a: Successively delete least significant effects (see Figure 2)
X₁	637	49,060	.30	.36	.351	.366
X₂	625	48,260	.30	.37	.358	.373
X₃	238	18,335	.20	.33	.318	.344
X₄	200	15,026	.20	.34	.321	.350
X₅	71	5,440	.10	.06	.004	.107
X₆	60	4,526	.10	.10	.045	.154
X₇	178	13,434	.00	–.13	–.150	–.116
X₈	175	13,467	.00	–.14	–.155	–.122
TS Approach 1b: Successively delete least significant effects with a set of fixed variables
X₁	1,000 (fixed)	74,599	.30	.31	.303	.316
X₂	543 (TS)	42,100	.30	.37	.362	.379
X₃	1,000 (fixed)	74,599	.20	.21	.200	.215
X₄	199 (TS)	15,042	.20	.32	.306	.335
X₅	1,000 (fixed)	74,599	.10	.10	.092	.106
X₆	46 (TS)	3,451	.10	.16	.091	.219
X₇	1,000 (fixed)	74,599	.00	.00	–.009	.007
X₈	184 (TS)	14,202	.00	–.14	–.155	–.123
TS Approach 2: Successively add predictors with highest shares of variance explained
X₁	595	45,690	.30	.36	.355	.371
X₂	606	46,888	.30	.36	.353	.368
X₃	219	16,903	.20	.31	.300	.327
X₄	241	18,180	.20	.33	.314	.340
X₅	71	5,177	.10	.16	.119	.209
X₆	74	5,345	.10	.16	.114	.207
X₇	162	12,598	.00	–.13	–.147	–.113
X₈	168	12,760	.00	–.14	–.153	–.119
TS Approach 3: Choose best model from all possible regressions
X₁	619	46,499	.30	.36	.356	.372
X₂	576	43,296	.30	.35	.346	.362
X₃	251	18,843	.20	.31	.295	.321
X₄	258	19,297	.20	.30	.292	.317
X₅	97	6,915	.10	.13	.098	.171
X₆	88	6,289	.10	.14	.094	.177
X₇	183	13,556	.00	–.12	–.137	–.104
X₈	227	16,856	.00	–.11	–.129	–.099

Note: k _max = 1,000 (number of simulated studies); r _XiXj = .20; studies’ sample sizes ranged from 50 to 100; k = number of studies used in meta-analysis; N = total sample size; r = estimated effect size; 95% CI lower/upper = lower/upper 95% confidence interval; TS = Texas sharpshooter.

In the TS approach, the researcher starts with the full set of predictors (i.e., Y is regressed on X₁ to X₈), removes stepwise insignificant effects from regression analyses, and publishes the reduced set of significant predictors as well as the corresponding correlations. The upper part of Table 2 (labeled TS Approach 1a) shows the results from simulated meta-analytic findings based on k = 1,000 studies and the studies’ sample sizes varying between 50 and 100. For X₁, the true relationship with the dependent variable was set to r _X1Y = .30. In 637 out of 1,000 studies, X₁ significantly predicted Y. The estimated effect size $\overset{ˉ}{r}$ was slightly above the true value ( ${\overset{ˉ}{r}}_{X 1 Y} = 0.36)$ . Settings for X₂ were identical to X₁, which yielded the same expected point estimates. The small differences between X₁ and X₂ stemmed from sampling error. For X₃ and X₄, the true correlation was set to r _X3Y = r _X4Y = .20; only 238 and 200 out of 1,000 studies showed significant results, and the estimated effect sizes $\overset{ˉ}{r}$ were .33 and .34, respectively. Note that substantive overestimation of the true effect size occurred because only relatively large positive beta coefficients of X₃ and X₄ were significant.

However, in the case of X₅ and X_6, no positive bias occurred in effect size estimations because the expected beta coefficients for these two effects were reduced, as shown in Equations 1a and 1b for the two predictors. Consequently, some studies exhibited a significantly positive effect of X₅ and X₆ on Y while another set of studies indicated a significantly negative relationship. Finally, Table 2 gives results for X₇ and X₈, neither of which had a relationship with Y in the simulation settings (i.e., r _X7Y = r _X8Y = .00). Similar to the simpler setting in the previous section, the simulation again created a spurious relationship between the predictor and the outcome, with estimated effect sizes of –.13 and –.14 for X₇ and X₈, respectively.

In sum, simulation results showed that PRB manifested in two ways: an overestimation of existing effects because smaller effects were dropped (X₁ to X₄ in the simulation) and correlations among predictor variables that created spurious relationships when no effect (X₇ and X₈ in the simulation) or reduced existing effects (e.g., for X₅ and X₆) occurred, interfering with variables that were more strongly associated with the outcome.

In this simulation setting, all eight predictors were subject to the TS approach. Kerr (1998) described this approach as “pure HARKing.” Yet in most research domains, a fixed set of control variables must be part of any regression to test for new effects, which might make it difficult to generalize the reported simulation results. The middle part of Table 2, labeled TS Approach 1b, shows results from simulations when four out of eight predictors are fixed parts of the regression analyses. These can be thought of as variables that have previously been shown to be important in that research domain. Hence, the TS approach is applied only to the remaining four variables. According to Kerr (1998), this approach is a weaker form of pure HARKing. As can be seen in Table 2, unbiased point estimates emerged for the fixed variables (i.e., X₁, X₃, X₅, and X₇). The biases for those variables that were allowed to be deleted if nonsignificant (i.e., X₂, X₄, X₆, and X₈) were very similar to the results obtained when all eight predictors could be removed (see TS Approach 1a in the upper part of Table 2). Thus, in this setting, a fixed set of predictors does not substantially alter the (mis)estimated effect sizes of those predictors that can be targeted by a TS approach.

These simulations were based on the TS approach depicted in Figure 2. In essence, it assumes that researchers start with a full set of predictors and successively remove predictors with the highest p values until only significant effects are left. Two other TS approaches were simulated to test whether PRB is contingent on this specific strategy. First, researchers might estimate regressions for the different predictors and initially only add the strongest predictor in the regression equation before successively adding the second strongest predictor, the third strongest predictor, and so forth to the set of predictors in the regression analysis, until no further significant predictor can be added. By following this alternative TS approach, researchers start with an empty set of predictors and then add predictors that explain a maximum of additional variance in the outcome. Results from this TS approach are shown in the lower part of Table 2, labeled TS Approach 2. Most important, significantly negative mean correlations for X₇ and X₈ also occurred in subsequent meta-analyses (r _X7Y = –.13 and r _X8Y = –.14). Overall, the extent of PRB was similar to the bias in TS Approach 1.

In yet another TS approach, researchers might test all the different combinations of predictor variables and then choose the set with the highest number of significant beta coefficients for publication. For example, with eight predictor variables, 2⁸ – 1 = 255 different nonempty sets of predictors exist. Following this TS approach, simulations were conducted when the outcome was regressed on all possible sets of predictors. From this complete set of possible regressions, a subset was chosen that contained regressions with only significant predictors. This subset was further reduced by choosing the regression(s) with the maximum number of significant effects. If this subset contained more than one result, the regression explaining the highest percentage of variance in the outcome variable (i.e., the highest R ²) was chosen. Results from this third TS approach are shown at the bottom of Table 2 under the label TS Approach 3. Again, both forms of PRB were present. First, existing effects were overestimated. For example, the difference between the true effect and the estimated effect size ranged between Δr = .35 – .30 = .05 for X₂, and Δr = .31 – .20 = .11 for X₃. Beta coefficients that were not significant in the regression were discarded in the simulation, because they were too close to zero. Thus, the remaining correlation coefficients from studies with significant beta values were, on average, higher than the true effects. Second, for X₇ and X₈, negative relationships were found because of suppression effects (r _X7Y = –.12 and r _X8Y = –.11), although no true relationships existed. Here, the expected beta coefficients were negative (see Equations 1a and 1b). It follows that beta coefficients for X₇ and X₈ had a higher probability to become significantly negative than significantly positive. Thus, variables with negative beta coefficients—and, thus mostly negative correlation coefficients—had a higher probability to be included in meta-analyses. Note that both forms of PRB can point into different directions. While the first effect overestimates existing effects, the second is contingent on the expected beta coefficient, which can be derived from the pattern of correlations among predictors and outcome.

Overall, these additional findings indicate that PRB is not an artifact of the TS approach outlined in Figure 2; indeed, it is also inherent in other approaches when nonsignificant predictors are dropped in regression analyses. Note that the following simulations are based on TS Approach 1 because it is the simplest approach and, as follows from Table 2, PRB is not substantially different for the three TS approaches (the stability of this finding was confirmed in further simulations that compared different TS approaches; details are available from the author).

Simulation Study 3: Testing Tools to Detect PRB in Regression Analyses

Funnel plots with an asymmetric shape are an indicator for the presence of reporting bias (Egger et al., 1997; Sterne et al., 2011). In this section, I explore whether common statistical tools are useful when PRB is present in regression analyses and look at indicators of between-study heterogeneity. The general settings for these simulations were identical to TS Approach 1a in Table 2. The mean number of primary-level effects that are used for meta-analyses in management research and related fields are k = 18 effect sizes (Aguinis, Dalton, et al., 2011). To arrive at more realistic settings, the number of simulated studies was reduced from k _max = 1,000 to 100 primary-level studies, resulting in a lower expected number of “publishable” (i.e., significant) effects. The top of Table 3 shows results for a full set of 100 studies. Obviously, the estimated effect sizes $\overset{ˉ}{r}$ appear as precise and unbiased point estimates of true effects.

Table 3.

Meta-Analytic Results From Simulated Regressions Following the TS Approach With Eight Predictor Variables and Varying Intercorrelation, Sample Size, and Alpha Level.

	k	N	True r	$\overset{ˉ}{r}$	95% CI lower	95% CI upper	Z _s.e.	Z _n	Kendall’s τ	k _trim	${\overset{ˉ}{r}}_{t r i m}$	Q	I²
Full model with 100 simulated studies (k = k _max = 100)
X₁	100	7,535	.30	.31	.29	.33	–7.73***	–0.37	–.35***	+31	.36	130.01*	23.07
X₂	100	7,535	.30	.31	.28	.33	–6.84***	0.55	–.37***	+28	.35	115.60	13.49
X₃	100	7,535	.20	.20	.18	.22	–4.84***	0.28	–.32***	+27	.24	102.19	2.14
X₄	100	7,535	.20	.21	.19	.23	–4.82***	0.76	–.31***	+24	.24	97.73	0.00
X₅	100	7,535	.10	.10	.08	.13	–2.43*	0.90	–.17*	+25	.15	76.81	0.00
X₆	100	7,535	.10	.10	.08	.12	–2.52*	–0.21	–.13	+30	.15	90.80	0.00
X₇	100	7,535	.00	–.01	–.04	.01	0.32	–0.29	.06	+18	–.05	90.50	0.00
X₈	100	7,535	.00	–.02	–.05	.00	–0.16	0.93	–.05	+0	—	93.97	0.00
TS approach with 100 simulated studies (k _max = 100)
X₁	59	4,602	.30	.37	.34	.39	–5.22***	2.12*	–.35***	+18	.41	62.94	6.26
X₂	60	4,624	.30	.37	.34	.39	–3.00**	–0.90	–.22*	+17	.40	39.11	0.00
X₃	24	1,815	.20	.32	.27	.36	–1.20	0.46	–.16	+8	.35	10.80	0.00
X₄	22	1,668	.20	.32	.28	.37	–1.70	–0.83	–.23	+0	—	15.58	0.00
X₅	4	350	.10	.07	–.12	.26	–3.33**	1.34	–.67	+0	—	14.22**	71.78
X₆	5	350	.10	.19	–.02	.39	–2.49*	–0.22	–.60	+0	—	21.72***	76.66
X₇	19	1,509	.00	–.15	–.20	–.10	–0.69	–0.22	–.06	+0	—	6.95	0.00
X₈	16	1,288	.00	–.17	–.22	–.11	–0.60	0.10	–.35	+0	—	6.63	0.00
TS approach with 500 simulated studies (k _max = 500)
X₁	303	23,204	.30	.37	.36	.38	–7.31***	0.10	–.26***	+88	.41	212.52	0.00
X₂	296	22,556	.30	.37	.36	.38	–8.44***	0.69	–.31***	+92	.41	234.03	0.00
X₃	113	8,659	.20	.33	.31	.35	–1.31	–0.69	–.04	+23	.35	73.12	0.00
X₄	105	8,046	.20	.32	.31	.34	–1.99*	1.03	–.14*	+28	.35	45.13	0.00
X₅	41	3,025	.10	.12	.05	.18	–2.84**	–1.93	–.24*	+7	.18	174.79***	76.52
X₆	43	3,128	.10	.14	.07	.21	–1.52	0.29	–.19	+6	.19	186.69***	76.95
X₇	83	6,396	.00	–.14	–.17	–.12	0.36	–0.03	.03	+0	—	63.76	0.00
X₈	82	6,340	.00	–.13	–.16	–.11	–0.46	0.18	–.04	+0	—	46.15	0.00

Note: r _XiXj = .20; studies’ sample sizes ranged from 50 to 100; k = number of studies used in meta-analysis; N = total sample size; $\overset{ˉ}{r}$ = estimated effect size; 95% CI lower/upper = lower/upper 95% confidence interval; Z _s.e./Z _n = Z value of regression test for funnel plot asymmetry with standard errors/sample size as predictor; Kendall’s τ = rank correlation coefficient; k _trim = number of studies added in the trim-and-fill method; ${\overset{ˉ}{r}}_{T r i m}$ = estimated effect size in the trim-and-fill method; Q = Q statistic (significance of between-study heterogeneity); I² = I² index (amount of between-study heterogeneity).

*p < .05. **p < .01. ***p < .001.

The most common tools for detecting funnel plot asymmetry are regression analyses (Egger et al., 1997), rank correlations (Begg & Mazumdar, 1994), and the trim-and-fill method (Duval & Tweedie, 2000). Their usefulness in detecting and/or correcting for file drawer effects has been demonstrated elsewhere (Macaskill, Walter, & Irwig, 2001; Moreno et al., 2009; Rendina-Gobioff, 2006; Sterne et al., 2011). Here, I analyze if they are useful for detecting PRB when researchers choose a TS approach in regression analyses.

Sterne and Egger (2001) recommended the use of the standard error for funnel plots and corresponding regression tests. In a regression test, the degree of funnel plot asymmetry is measured by the intercept from a regression of standardized effect estimates against the reciprocal of the standard error (Egger et al., 1997). An intercept that significantly deviates from zero, as indicated by the corresponding Z value (Z _s.e.), supports the assumption of funnel plot asymmetry. However, note that the point estimate of the standard error in a sample is negatively related to the estimated R ² in that sample (Cohen et al., 2003, p. 86; Deeks, Macaskill, & Irwig, 2005). It follows that studies with a higher effect size tend to have lower standard errors. Hence, the standard error is a negative predictor for effect size, even if no bias is present. It follows from the described effect that Z _s.e. is related to the true effect size, even in the full model where no reporting bias is present. For example, Z _s.e. values for X₁ and X₂ are –7.73 and –6.84, respectively. To take this effect into account, Table 3 contains Z values from regression tests that use the sample size as predictor while asymmetry is indicated by Z _n. For Z _n, no such distortion occurs, and the Z _n values in the full model correctly indicate that no significant sign of reporting bias exist. Rank correlations in meta-analyses are based on Kendall’s tau (i.e., a rank correlation between the estimated effects sizes and variances). They suffer from the same problem as Z _s.e. values. Hence, Kendall’s tau is significantly negative for positive true effect sizes.

Results from the trim-and-fill method are shown in Figure 3. The top panel shows contour-enhanced funnel plots from four variables (X₁, X₃, X₅, and X₇) of the full model from Table 3. The contour lines correspond to milestones of statistical significance (pseudo–confidence levels at 90%, 95%, and 99%; Sterne et al., 2011). For example, the top left funnel plot contains 100 black dots—one for each study result of the X₁–Y relationship. Again, a distortion emerges due to the use of the standard error with an underlying true relationship, and a negative trend is evident for the black dots (note that scale values increase from bottom to top). The black loops in Figure 3 are the imputed studies that are needed to reach symmetry in the funnel plot (for details, see Duval & Tweedie, 2000). Obviously, the method does not perform very well because, for each of the four variables, studies are imputed despite the fact that the full models do not suffer from reporting bias (i.e., all simulated studies are included). Table 3 also gives values for the trim-and-fill-adjusted effect size estimates ( ${\overset{ˉ}{r}}_{T r i m}$ ). Note that these values cannot be used as indicators of reporting bias per se, but a comparison with true effect sizes shows that values for the trim-and-fill-adjusted effect size estimates deviate even more from the true correlation coefficients than the bias-uncorrected estimates from meta-analyses ( $\overset{ˉ}{r}$ ). Thus, the trim-and-fill method gave more biased effect size estimates than the bias-uncorrected effect size estimations. The reason for this counterintuitive finding is the previously mentioned effect that studies with larger effects have lower standard errors. This creates an asymmetric funnel plot even if no reporting bias is present. The trim-and-fill method then (incorrectly) fills in hypothetical studies to approximate symmetry in the funnel plot.

Figure 3.

Contour-enhanced funnel plots.

Results from a model with an underlying TS approach with 100 simulated studies are included in the middle rows of Table 3. The corresponding contour-enhanced funnel plots are depicted in the bottom panel of Figure 3. The same data were used as in the top panel, but a TS approach was simulated that only included variables if they were significant predictors of Y in a regression analysis. Thus, for each of the four predictors, the black dots in the bottom panel were a subset of the top panel. The trim-and-fill method, regression tests, and rank correlations did not perform very well because reporting bias was present for all variables in this setting; however, no method consistently found a bias in the results. One reason might be the relatively small k, which varied from 4 to 60 for X₅ and X₂, respectively, resulting in a low statistical power to detect reporting bias (Sterne et al., 2011). At the bottom of Table 3, results from a simulated TS approach from 500 studies are shown. Even in these large samples, the tools for detecting reporting bias produced unsatisfactory results. For example, more than 80 studies were included in the simulated meta-analyses for X₇ and X₈, and a substantial PRB was present (r _X7Y = –.14 and r _X8Y = –.13), but no evidence of reporting bias was given by any of the statistical tools.

A second objective of simulation study 3 was an assessment of between-study heterogeneity. When using the Q statistic, the heterogeneity between studies was significant at the 5% level for only one of the eight predictors (Q_X1 = 130.01, p < .05; see Table 3). Q indicates the presence versus the absence of heterogeneity (Huedo-Medina, Sánchez-Meca, Marín-Martínez, & Botella, 2006). The overall level of heterogeneity was relatively low. This correctly reflected simulation settings, which did not assume any substantial between-study variability. In addition, Table 3 reports the I² index, indicating the extent of heterogeneity (Higgins & Thompson, 2002). It is larger than zero whenever Q > (k – 1) and reflects the estimated percentage of the total variability due to true heterogeneity (Huedo-Medina et al., 2006). For the full model, I² varied between zero and 23.07%, which can be interpreted as no to low heterogeneity (Higgins & Thompson, 2002; Higgins, Thompson, Deeks, & Altman, 2003). This provides a good estimation of the true underlying settings.

Results of heterogeneity estimations from a simulated TS approach revealed that heterogeneity was also low. Exceptions were variables X₅ and X₆, for which some studies had significantly positive and others had significantly negative beta coefficients. For example, the funnel plot for X₅ in Figure 3 is hollow; two correlation coefficients are clearly positive and two are clearly negative. Consequently, Q and I² indicate a high level of heterogeneity for X₅ and X₆.

In sum, these first results indicated that tools used to detect publication bias did not perform well when studies from a simulated TS approach were aggregated. This is in some contrast to previous research that found greater support for the quality of these tools (Deeks et al., 2005; Egger et al., 1997; Sterne & Egger, 2001). Three main reasons can, in combination, explain this divergence. First, larger effect sizes create lower standard errors, which resulted in asymmetric funnel plots even in settings without reporting biases. Second, the number of studies used for most simulations reported above was relatively low. They reflected realistic meta-analytical settings in management research (Aguinis, Dalton, et al., 2011), but the corresponding statistical power for tests of funnel plot asymmetry was low. Third, the variance of sample sizes in primary studies was relatively low. For most settings, sample sizes varied between 50 and 100. Further simulations showed that the tools performed better when larger studies were also included (e.g., sample sizes between 50 and 1,000). A more detailed analysis in various settings goes well beyond the scope of this article, but further studies might examine the performance of these tools in other settings typically encountered in management research. An important extension would be to account for true heterogeneity between studies. In these simulations, the underlying effects were the same for all studies that were aggregated in a meta-analysis. But studies might differ with regard to their true effect sizes, that is, there is true heterogeneity in the sample (Sterne et al., 2011). For example, an effect might be different for men and women. Then, studies with predominantly men or women will have different true effect sizes, as gender moderates the relationship that is researched in the meta-analysis (Banks, Kepes, & Banks, 2012).

Simulation Study 4: Exploring the Texas Sharpshooter Bias in Various Settings

Simulation Studies 2 and 3 demonstrated the extent of PRB for one specific setting. I now vary important simulation parameters to examine the impact of (a) intercorrelation among predictors, (b) sample size, and (c) alpha level on the magnitude of the described biases. Table 4 shows condensed results with various simulation settings all based on the TS approach shown in Figure 2. In each of the simulations, only one parameter was altered; other settings were held constant (i.e., r _XiXj = .20; sample sizes between 50 and 100; alpha = .05). The ratio k/k _max indicates the proportion of studies with significant beta coefficients. For example, a value of k/k _max = .79 means that this predictor was significant in 79% of all final regression models.

Table 4.

Meta-Analytic Results From Simulated Regressions Following the TS Approach With Eight Predictor Variables and Varying Intercorrelation, Sample Size, and Alpha Level.

		Varying Intercorrelation Among Predictors (r _XiXj; sample size: 50-100; alpha = .05, two-tailed)								Varying Sample Size (r _XiXj = .20; alpha = .05, two-tailed)						Varying Alpha (r _XiXj = .20; sample size: 50-100)
		0.00		0.10		0.30		0.40		50-200		50-500		50-1,000		.01		.10
	True r	k/k _max	$\overset{ˉ}{r}$	k/k _max	$\overset{ˉ}{r}$	k/k _max	$\overset{ˉ}{r}$	k/k _max	$\overset{ˉ}{r}$	k/k _max	$\overset{ˉ}{r}$	k/k _max	$\overset{ˉ}{r}$	k/k _max	$\overset{ˉ}{r}$	k/k _max	$\overset{ˉ}{r}$	k/k _max	$\overset{ˉ}{r}$
X1	.30	.79	.34	.66	.36	.60	.37	.63	.36	.79	.33	.93	.31	.96	.30	.39	.40	.71	.36
X2	.30	.80	.34	.64	.36	.59	.37	.61	.36	.77	.32	.91	.30	.96	.30	.38	.39	.70	.35
X3	.20	.49	.28	.29	.31	.18	.33	.16	.33	.30	.28	.53	.23	.73	.21	.10	.37	.30	.31
X4	.20	.49	.28	.28	.31	.19	.33	.19	.32	.33	.28	.54	.23	.74	.21	.11	.38	.32	.31
X5	.10	.16	.24	.09	.25	.08	.06	.09	.01	.05	.15	.07	.11	.05	.10	.02	.20	.11	.15
X6	.10	.17	.24	.09	.24	.07	.03	.08	.08	.06	.14	.07	.10	.07	.11	.03	.24	.12	.12
X7	.00	.06	.01	.11	–.13	.24	–.10	.36	–.08	.30	–.08	.52	–.03	.73	–.01	.05	–.19	.28	–.10
X8	.00	.07	.04	.09	–.13	.26	–.11	.33	–.08	.27	–.09	.52	–.03	.73	–.01	.03	–.20	.28	–.10

Note: True r = predetermined effect size; k/k _max = proportion of studies used in meta-analysis; $\overset{ˉ}{r}$ = estimated effect size; r _XiXj = zero-order correlations among predictor variables; alpha = alpha level for significance of beta coefficients in regression analyses; based on 1,000 simulated studies.

First, the magnitude of zero-order correlations among predictors was varied between r _XiXj = .00 and r _XiXj = .40. In each simulation, all expected correlations among predictor variables were set to one of these values to reflect different degrees of intercorrelation. Sample size varied between 50 and 100, and alpha level was set to 5%. The results indicated that, when no linear relations exist among predictors (i.e., r _XiXj = .00), an overestimation of true effect sizes occurred, as is evident for X₁ to X₆ (see Table 4). For example, the estimated effect size for X₅ was r _X5Y = .24, although the true effect was set to .10.

For X₇ and X₈ in the setting with r _XiXj = .00, the mean correlation coefficient did not significantly differ from zero ( ${\overset{ˉ}{r}}_{X 7 Y} = 0.01$ and ${\overset{ˉ}{r}}_{X 8 Y} = 0.04$ ). With an increasing intercorrelation among predictors, two trends are noteworthy. First, k/k _max decreased for existing positive effects due to intercorrelations among predictor variables. The higher the predictors were correlated, the more variance they shared and, hence, the less likely they were to explain a statistically significant additional amount of variance in the outcome variable of a multivariate regression. Second, the probability increased that predictors with no relationship with the outcome (i.e., X₇ and X₈) were significant in regression analyses. For example, k/k _max increased from .06 for r _XiXj = .00 to .33 for r _XiXj = .40. The mean effect size for those variables showed a u-shaped form. No significant mean r occurred when no intercorrelation existed among predictors (r _XiXj = .00); the mean r then rose to r = –.13 and r = –.15 for r _XiXj = .10 and r _XiXj = .20, and subsequently dropped again for higher levels of intercorrelation. Thus, in this setting, PRB is most substantial for relatively low levels of correlation among predictor variables.

Table 4 complements the relatively small sample size in the previous simulations with three higher ranges. In the previous setting, sample sizes ranged between a minimum of 50 and a maximum of 100. In the current simulation, it was simulated that primary studies have maximum sample sizes of 200, 500, or 1,000. Importantly, PRB became less pronounced if average sample sizes increased. In fact, with sample sizes varying between N = 50 and N =1,000 (i.e., a mean sample size of about N = 525), neither a substantive overestimation of existing effects nor a noteworthy bias in nonexisting effects occurred. Thus, PRB is not important when primary studies have large sample sizes. This reflects findings from previous analyses that focused on publication bias (Egger et al., 1997; Macaskill et al., 2001; Sterne et al., 2011).

Furthermore, Table 4 contains results from simulations when alpha levels were changed from an initial value of .05 to .01 or .10. For alpha = .01 (two-tailed), absolute values of beta coefficients must be larger to become significant; therefore, both forms of PRB were more pronounced. For example, estimates of the nonexisting effects of X₇ and X₈ were r _X7Y = –.19 and r _X8Y = –.20 in simulated meta-analyses. If the alpha level was set to 10% (two-tailed), smaller beta coefficients also became significant (i.e., more predictors were included in “publishable” final regression models), as was reflected in higher values for k/k _max and a lesser extent of PRB, compared with alpha = .05 (see Table 2). Overall, these additional simulations indicate that PRB was most severe when the alpha level was low and sample sizes and correlations between predictors were small to medium.

Table 5 shows additional results for sets of predictors in simulated research domains that differ in number and combinations of true effect sizes. Intercorrelations among predictors varied between r _XiXj = .00 and .70. First, a research domain was simulated with one important predictor with a large effect size (r _X1Y = .50) and seven predictors that were not related to the outcome (Setting 5.1). If intercorrelations among predictor variables were present, PRB for X₂ to X₈ varied between $\overset{ˉ}{r}$ = –.14 when correlations among predictors were set to r _XiXj = .10 and $\overset{ˉ}{r}$ = –.03 for r _XiXj = .70. In Setting 5.2, a large negative effect was simulated (r _X1Y = –.50); again, seven predictors were added that were not related to the outcome. Results were similar to the previous simulation, only with a changed algebraic sign. For example, the mean effect size for X₂ to X₈ is $\overset{ˉ}{r}$ = +.14, when r _XiXj = .10. This is intuitively appealing because, based on Equations 1a and 1b, a positive beta coefficient for X₂ is expected whenever $r_{y 2} > r_{y 1} * r_{12}$ . In Setting 5.3, a research domain with four medium effects (X₁ to X₄; r _XiY = .30) and four nonexisting effects (X₅ to X₈; r _XiY = .00) was defined. Again, negative effect sizes occurred for variables that were not correlated with the outcome. For example, if correlations among predictor variables were set to r _XiXj = .10, the mean effect size for X₅ to X₈ was r = –.14. Finally, in Setting 5.4, results are reported for settings with 20 predictors and effect sizes varying from r = .00 to r = .30. PRB was similar in height to what was found in settings with fewer predictor variables.

Table 5.

Meta-Analytic Results for Different Sets of Predictor Variables With Varying Intercorrelations Among Predictors.

			Varying Intercorrelation among Predictors (r _XiXj)
			.00		.10		.20		.30		.40		.50		.60		.70
		True r	k/k_max	$\overset{ˉ}{r}$	k/k_max	$\overset{ˉ}{r}$	k/k_max	$\overset{ˉ}{r}$	k/k_max	$\overset{ˉ}{r}$	k/k_max	$\overset{ˉ}{r}$	k/k_max	$\overset{ˉ}{r}$	k/k_max	$\overset{ˉ}{r}$	k/k_max	$\overset{ˉ}{r}$
Setting 5.1	X₁	.50	>.99	.52	>.99	.52	>.99	.52	>.99	.51	>.99	.52	>.99	.50	>.99	.49	>.99	.49
Setting 5.1	X_2-8	.00	.05	–.01	.08	–.12	.12	–.14	.17	–.12	.23	–.09	.29	–.07	.39	–.06	.58	–.03
Setting 5.2	X₁	–.50	>.99	–.52	>.99	–.52	>.99	–.52	>.99	–.52	>.99	–.52	>.99	–.49	>.99	–.49	>.99	–.50
Setting 5.2	X_2-8	.00	.05	.00	.07	.14	.12	.14	.18	.12	.23	.09	.30	.08	.39	.05	.57	.02
Setting 5.3	X_1-4	.30	.86	.34	.66	.35	.59	.36	.57	.36	.61	.36	.69	.33	.79	.32	.94	.30
Setting 5.3	X_5-8	.00	.06	–.01	.14	–.14	.22	–.11	.31	–.10	.41	–.07	.54	–.05	.70	–.02	.91	.00
Setting 5.4	X_1-5	.30	.95	.32	.62	.36	.55	.36	.55	.36	.62	.35	.70	.32	.85	.31	>.99	.30
	X_6-10	.20	.74	.23	.24	.30	.18	.31	.17	.30	.18	.29	.18	.27	.24	.25	.49	.22
	X_11-15	.10	.31	.17	.08	.10	.09	.05	.11	.04	.13	.03	.14	.01	.18	.04	.41	.08
	X_16-20	.00	.08	.01	.22	–.10	.31	–.09	.40	–.07	.50	–.05	.62	–.04	.81	–.02	>.99	.00

Note: Sample size: 50-100; alpha = .05 (alpha level for significance of beta coefficients in regression analyses); True r = predetermined effect size; k/k _max = proportion of studies used in meta-analysis; $\overset{ˉ}{r}$ = estimated effect size; r _XiXj = zero-order correlations among predictor variables; based on 1,000 simulated studies.

Simulation Study 5: Convergence of the PRB

In the scientific process, not all studies are conducted simultaneously. We can use time as an ordering dimension for each set of simulated studies, which can be helpful for examining when and how convergence toward the “truth” occurs (i.e., if the discrepancy between published empirical results and true underlying relationships decreases when the number of studies increases). In other words, I simulated cumulative meta-analyses based on the idea that the accumulation of knowledge starts with a single study and then more and more studies are added to the growing body of literature (Egger & Smith, 1997). Figure 4 shows results from a simulation with the same settings as reported in Table 2. The x-axis indicates the number of studies (k) that have been conducted so far. The y-axis indicates the weighted mean correlation between predictor variables and outcome (Y) as an indicator of effect size. Figure 4 demonstrates that effect sizes from the full models converge toward the true values. For example, the curve for r = .20 in the full model that uses all data approximates r = .20 for an increasing number of studies. Conversely, the curve for r = .20 in the model from the TS approach also converges, but not toward its true value; instead, it remains at about r = .33. A similar discrepancy between true and estimated effect size also occurs for r = .00 in the TS approach, but not in the full model. Here, PRB is persistent and does not diminish if more studies are conducted. Even if more empirical studies are published that follow the TS approach, the extent of PRB remains stable.

Figure 4.

Estimated cumulative effect sizes for full models and TS models when the number of studies increases.

Discussion

The goal of this study was to analyze the impact of PRB in regression analyses. The study was based on the assumption that primary researchers often follow a TS strategy, which means that they tend to drop variables with nonsignificant regression coefficients in the analytical process. Simulation results showed that two types of PRB occur in meta-analytic findings when primary researchers adopt a TS approach. First, effect sizes are overestimated because nonsignificant findings are not reported in primary studies and, hence, are suppressed in subsequent meta-analyses. Similar effects have been demonstrated in other studies (Easterbrook et al., 1991; Sutton et al., 2000; Turner et al., 2008). Second, PRB occurs due to suppression effects of correlated predictor variables, which creates biased effect size estimations for predictors with low or no true relationship with the outcome. A main contribution of the current study lies in the identification of this second type of PRB, which can be present in regression analyses.

In the following, I first look at the prevalence and the importance of PRB in management research. I then discuss statistical preconditions of PRB and possible reasons for researchers to apply a TS approach, which then creates PRB. Subsequently, I suggest remedies to reduce or avoid PRB.

How Widespread Is the TS Approach?

PRB is the result of researchers’ TS approach, that is, excluding nonsignificant predictors in regression analyses. Simulation results in this article were based on the assumption that all researchers adopt this approach. However, not all researchers engage in such practices, but it is difficult to assess the actual share of researchers who proceed in the described way. Several authors have tried to capture the prevalence of questionable research practices, which can be employed to estimate this number. Kerr (1998) reported results from an own study with 156 behavioral scientists on the prevalence of the classical hypothetico-deductive approach and three forms of hypothesizing after results are known. He found that participating researchers report that different forms of HARKing are as common as the hypothetico-deductive approach. John, Loewenstein, and Prelec (2012) asked 2,155 academic psychologists about the perceived prevalence of 10 questionable research practices. About 40% of respondents admitted that they have occasionally decided whether to exclude data after looking at the impact of doing so on the results. The authors argue that this raw admission rates almost certainly underestimate the true prevalence. They offer an alternative prevalence estimate that is derived from admission estimates, which indicates prevalence estimates that are as high as 100% for 4 of the 10 research practices, including data exclusion after looking at the results. Likewise, Bedeian et al. (2010) asked 384 researchers from 104 PhD-granting management departments about the perceived prevalence of 11 different behaviors of research misconduct. A startling 91.9% of the management researchers surveyed indicated that they were aware of faculty engaging in the development of hypotheses after results were known, making it the most prevalent form of research misconduct in their study. Overall, it thus seems that forms of a TS approach are a widespread phenomenon among researchers in the social sciences, but the true prevalence might be concealed, mainly because researchers might not answer honestly when asked for these socially undesirable practices.

How Severe Is PRB in Management Research?

The extent to which PRB distorts published results in management research is affected by (a) the number of researcher that apply a TS approach, which then results in PRB, and (b) the absolute height of the bias, as demonstrated in the simulations. If all researchers choose a TS approach, PRB will be at its maximum. Based on empirical findings outlined in the previous section, it is reasonable to assume that every second researcher might follow a TS approach and choose variables based on their impact on the dependent variable in multiple regressions. Then, results reported in 50% of the primary studies might be misleading because they withhold some variables and thus don’t offer the full model with possibly different effects. Consequently, under these assumptions, PRB bias in meta-analyses would be half as high as its maximum value reported in the simulations. Publication bias adds to this effect, as results from studies with an underlying TS approach have a higher percentage of supported hypotheses and might therefore have a higher publication probability.

The absolute height of PRB in the simulations was contingent on sample size, number of predictors, chosen alpha level, and correlations among predictors and outcome. For example, in the multivariate setting used in Table 4, the spurious effect can be as high as ${\overset{ˉ}{r}}_{X Y}$ = –.20, although no true underlying effect was present (i.e., true r _XY = .00). Sample size of primary studies was shown to have a central role for the extent to which PRB distorts results. It is a severe problem in studies that have small sample sizes (N < 100); it constitutes a somewhat lesser threat in medium sample sizes (N < 200), and it is negligible when sample sizes are large (N > 500). For example, predictors that were not related to the outcome (X₇ and X₈ in Table 4) showed an average correlation of r = –.09 when sample sizes varied between 50 and 200 and r = –.03 in settings with sample sizes between 50 and 500.

An exact estimation of PRB rests on knowledge about the number of researchers who choose a TS approach and the true underlying relationships in the research domain. For example, if 50% of researchers apply a TS approach and use medium sample sizes (50 < N < 200), the expected bias for all variables that are not related to the outcome would be .5 × (–.09) = –.045 in the setting described above. That is, an expected effect size in meta-analyses would amount to r = –.045, although there are no underlying effects. However, neither the prevalence of a TS approach nor exact relationships of variables in research domains are usually available to meta-analytical researchers. Moreover, statistical tools to detect PRB perform not so well when meta-analyses include only a relatively small number of primary studies. Thus, it is not an easy task for meta-analytical researchers to estimate the existence or severity of PRB. We will come back to this problem when we discuss remedies to reduce or avoid PRB.

Last, it is important to note that distortions due to PRB are contingent on the research domain. While its effect is worst in domains with predominantly small sample sizes and many degrees of freedom for researchers when building the statistical models, it is negligible in primary studies with large sample sizes or a predefined set of variables in the statistical analyses. PRB might thus be most prevalent when data gathering is very resource-intensive. Then, researchers might find smaller sample sizes acceptable and might also try to get the most of the data, that is, applying a TS approach. It is not my intention to single out any particular research domain. However, for an illustrative example, let us look at the seminal meta-analysis from Barrick and Mount (1991) on the relationship between personality dimensions and job performance. In their analyses, they distinguish five occupations. For these groups, mean sample sizes are n = 107 for professionals, n = 100 for police, n = 192 for managers, n = 105 for sales, and n = 169 for skilled workers (own calculations). The mean reported correlation coefficients between each personality dimension and job performance varied from r = .03 (openness to experience) to r = .13 (conscientiousness). If we assume that several primary researchers chose only those personality dimensions that were significantly related to job performance in their regression analyses and discarded others, PRB had a noticeable impact on the meta-analytical results.

Statistical Preconditions of PRB

PRB requires degrees of freedom in the analytical procedure. An obvious statistical precondition is a multivariate setting in which one or more predictors are not necessarily part of the final statistical model. In addition, a bias stemming from a TS approach is most severe if predictor variables are correlated. Simulation results have shown that very low levels of correlation among predictor variables (e.g., r = .10; see Table 4) are sufficient for PRB to seriously affect estimations of effect sizes in meta-analyses. Finally, at least one predictor must be positively or negatively related to the outcome variable, which—in combination with correlated predictors—changes the expected beta value of other predictors (see Equations 1a and 1b). All three statistical determinants are important preconditions for PRB. However, it is reasonable to assume that these preconditions are met in almost every research domain in which regression studies are frequently conducted.

Reasons for Applying a TS Approach

PRB is contingent on a researcher’s deliberate choice of a TS approach, usually applied to ensure that a higher share of hypothesized effects are supported, thereby spuriously strengthening study findings and perceived study quality. Researchers utilize this strategy for different reasons. First, publication probability depends on how interesting study findings are. Presenting one’s own findings in the most desirable light could tip the scale toward acceptance in the publication process.

Second, confirmation bias and hindsight bias might affect researchers’ cognitions during the analytical process. Confirmation bias “connotes the seeking or interpreting of evidence in ways that are partial to existing beliefs, expectations, or a hypothesis in hand” (Nickerson, 1998, p. 175). Hindsight bias, or the I-knew-it-all-along phenomenon, refers to a biased representation of events or facts after the event (Blank, Musch, & Pohl, 2007). Both biases might lead researchers to believe that the final model from the TS approach represents their initial thoughts and assumptions better than a factually accurate model. The final statistical model is the product of a longer trial-and-error process in which variables are added and discarded. A researcher might find a (partially) spurious pattern in the data but is unwilling to consider its randomness. Furthermore, journal editors and reviewers in their function as gatekeepers in the publication process can also contribute to PRB whenever they suggest discarding unsupported hypotheses, thereby urging the author to remove the now redundant variable from the analyses.

The TS approach suggests that a nonsignificant predictor is deliberately removed from a data set. Such behavior is not necessarily intended to cover up weak findings. Researchers must choose carefully which control variables to include in the model as redundant controls reduce statistical power. A common recommendation in the literature is to avoid control variables that are uncorrelated with the dependent variable, unless the variable might be a suppressor (Atinc, Simmering, & Kroll, 2012; Becker, 2005; Edwards, 2008; Spector & Brannick, 2011). This appropriate rationale, if improperly applied to a researcher’s data set, parallels the TS approach because, in both cases, variables are excluded if their effect is small (i.e., nonsignificant in a regression analysis). In an empirical assessment of control variables in published micro and macro research, Atinc and colleagues (2012) criticized that very often no proper basis for the inclusion of control variables is offered, although they often account for more variance than the main effects. This argument does not provide direct evidence for the use of a TS approach, but it at least shows that researchers have opportunities to follow this approach and include only those variables that best fit their statistical models. Becker (2005) suggested running and reporting primary results with and without the control variables. In addition, a thorough explanation as to why control variables are included reduces the researchers’ degrees of freedom in successfully employing a TS approach for control variables (Atinc et al., 2012; Becker, 2005).

Possible Remedies to Reduce or Avoid PRB

Remedies to reduce or avoid PRB can affect the primary study or take the form of statistical tools in meta-analyses. The bias occurs as a result of the combination of relatively small sample sizes and the omission of “uninteresting” variables from statistical models. Both aspects have the means to decrease PRB. First, scientists must be aware that following a TS approach is a form of research misconduct (Bedeian et al., 2010). It must not be seen as a trivial matter that the scientific method starts with hypothesis formulation and ends with data analysis and interpretation—not the other way around. Primary studies that report only selected variables hold back important information that is needed for an unbiased assessment of the field of research.

Second, the problem of small sample sizes is related to the discussion of low statistical power, and some recommendations that have been put forth to avoid problems of low power are also valid for PRB (Maxwell, 2004; Sedlmeier & Gigerenzer, 1989). Of course, the most obvious way is for researchers to increase studies’ sample sizes, but because of complex or costly designs, this approach is not always feasible. For example, if we look at research on team outcomes in organizations, the average sample size in published research is about N = 65 groups per primary study (Biemann & Heidemeier, 2012). As a practical approach to gain sufficient sample sizes whenever costs are prohibitively high for single researchers, Maxwell (2004) suggested utilizing collaborative multisite studies, which enables researchers to gather larger data sets that are less prone to reporting bias in general.

Third, increasing the awareness of dropping variables from the final statistical model is another effective means for avoiding PRB. Incentives for researchers to employ the TS approach could be removed, and the link between significance of results and publication probability weakened. One obvious remedy is that reviewers and journal editors should base their evaluation of a paper’s quality on the methods used, not the results section (Kraemer et al., 1998). However, researchers and editors might find themselves in a “moral wiggle room” (Dana, Weber, & Kuang, 2007). Both parties are probably aware of reporting bias and the inadequacy of the TS approach, but (a) competition exists among researchers and among journals for various kinds of resources (Glick, Miller, & Cardinal, 2007), (b) nonsignificant results are perceived to be less interesting to the reader, (c) methodological literature on the potential dangers of selective reporting is lacking, and (d) low transparency exists in the process of variable selection in quantitative management research. It may follow that researchers and editors will display a lenient attitude toward a TS approach. Enhanced research standards that increase transparency could therefore counter individuals’ behavior that might otherwise impede scientific progress (Ioannidis, 2005). Researchers could report more than just the final model, showing some “failed” analyses as well, which will help draw researchers’ and readers’ attention to potential problems of the TS approach.

Likewise, Bentler (2007) suggested that “an author should submit a separate statement that verifies, for each major model, that (a) every parameter in the model is purely a priori, and if not, (b) details on all model modifications that were made” (p. 826). In addition, in an effort to increase transparency and reproducibility, journals might require data and code to be available online for other researchers to validate findings and suggest alternative models. However, most researchers are still unwilling to provide other researchers with data from their published articles (Wicherts, Borsboom, Kats, & Molenaar, 2006). Another interesting approach to reducing reporting bias was chosen by a group of editors from top-tier medical journals (e.g., The Lancet and Journal of the American Medical Association). To be considered for publication in these journals, drug research sponsored by pharmaceutical companies must be registered before the study is conducted (De Angelis et al., 2004).

In medical research, researchers can register prospective meta-analyses with the goal of engaging in joint efforts within the research community to include the best data available and improve the quality of meta-analytic findings. It is unlikely that an adapted approach will be chosen in management research in the near future, but top journals might establish a policy that all variables be listed and all study data be available as a precondition for publication. Such a step would increase transparency in the research process and might reduce reporting bias.

Finally, the simulation results demonstrated that common meta-analytical tools such as regression tests, rank correlations, and the trim-and-fill method are not appropriate means for identifying PRB when only a small or medium number of studies is synthesized. The degree to which PRB affects meta-analytic results can be understood only in combination with patterns of relationships that include other predictor variables. For example, the threat of PRB is greater if a predictor is sampled in a research domain with correlated predictors and other variables that are strong predictors of the outcome variable. A first estimation of the dangers of PRB could therefore contain an analysis of results from multivariate analyses or the inspection of a full correlation matrix in a given research domain rather than only looking at direct relationships of predictors with the meta-analysis’ outcome variable. However, more elaborate methodological approaches are required, and further research in this area is encouraged. In addition, the simulations only explored PRB when correlation coefficients were aggregated from primary studies that used OLS regressions. It might be interesting to empirically examine whether correlation coefficients in studies without multivariate follow-up analyses differ from studies with regression analyses, as only the latter might be affected by PRB. Further studies might also look at PRB in other multivariate analyses (e.g., multilevel models and structural equation modeling) or when meta-analyses are not based on correlation coefficients.

Summing up the above considerations, it seems that existing statistical tools to detect PRB might not perform very well in settings that are frequently encountered in management research. One reason could be that they were not specifically designed for the purpose to find PRB in meta-analyses. Future research might therefore aim at developing tailored tools for PRB. A promising approach could be an adoption of random effects selection models, which then offers an assessment of PRB (Hedges & Vevea, 1996). However, the most straightforward approach are remedies that are directed at improving primary studies. Here, a first step must be to raise the awareness of possible problems due to PRB and other reporting biases. Currently, researchers view selective reporting of studies that worked as a defensible practice (John et al., 2012).

Conclusion

Researchers often examine data in search of significant effects. Although this approach is widespread in management research, a detailed analysis of its consequences in the literature is missing. The current study attempted to draw attention to selective reporting of predictor variables in regression analyses. The results showed that this approach (labeled Texas sharpshooter approach) might create spurious predictor relationships in regression analyses in research domains where mostly small and medium samples are available. This can be interpreted as a warning to researchers, reviewers, and editors who perceive selective reporting of analyses to be a flawed but acceptable practice.

Footnotes

Acknowledgments

I thank Michael Cole, Heike Heidemeier, Daniela Noethen, Thorsten Semrau, and Dirk Sliwka for their helpful comments.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Aguinis

Dalton

D. R.

Bosco

F. A.

Pierce

C. A.

Dalton

C. M.

(2011). Meta-analytic choices and judgment calls: Implications for theory building and testing, obtained effect sizes, and scholarly impact. Journal of Management, 37(1), 5–38.

Aguinis

Pierce

C. A.

Bosco

F. A.

Dalton

D. R.

Dalton

C. M.

(2011). Debunking myths and urban legends about meta-analysis. Organizational Research Methods, 14(2), 306–331.

Atinc

Simmering

M. J.

Kroll

M. J.

(2012). Control variable use and reporting in macro and micro management research. Organizational Research Methods, 15, 57–74.

Banks

G. C.

Kepes

Banks

K. P.

(2012). Publication bias the antagonist of meta-analytic reviews and effective policymaking. Educational Evaluation and Policy Analysis, 34(3), 259–277.

Barrick

M. R.

Mount

M. K.

(1991). The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26.

Becker

T. E.

(2005). Potential problems in the statistical control of variables in organizational research: A qualitative analysis with recommendations. Organizational Research Methods, 8(3), 274–289.

Bedeian

A. G.

Taylor

S. G.

Miller

A. N.

(2010). Management science on the credibility bubble: Cardinal sins and various misdemeanors. Academy of Management Learning and Education, 9(4), 715–725.

Begg

C. B.

Mazumdar

(1994). Operating characteristics of a rank correlation test for publication bias. Biometrics, 50(4), 1088–1101.

Bentler

P. M.

(2007). On tests and indices for evaluating structural models. Personality and Individual Differences, 42(5), 825–829.

10.

Biemann

Heidemeier

(2012). Does excluding some groups from research designs improve statistical power? Small Group Research, 43(4), 387–409.

11.

Blank

Musch

Pohl

R. F.

(2007). Hindsight bias: On being wise after the event. Social Cognition, 25(1), 1–9.

12.

Chalmers

T. C.

Levin

Sacks

H. S.

Reitman

Berrier

Nagalingam

(1987). Meta-analysis of clinical trials as a scientific discipline. I: Control of bias and comparison with large co-operative trials. Statistics in Medicine, 6(3), 315–325.

13.

Chatterjee

Hadi

A. S.

(2006). Regression analysis by example. New York, NY: John Wiley.

14.

Cohen

(1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum.

15.

Cohen

West

S. G.

Aiken

L. S.

(2003). Applied multiple regression/correlation analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum.

16.

Conger

A. J.

(1974). A revised definition for suppressor variables: A guide to their identification and interpretation. Educational and Psychological Measurement, 34(1), 35–46.

17.

Copas

Shi

J. Q.

(2001). A sensitivity analysis for publication bias in systematic reviews. Statistical Methods in Medical Research, 10(4), 251–265.

18.

Dalton

D. R.

Aguinis

Dalton

C. M.

Bosco

F. A.

Pierce

C. A.

(2012). Revisiting the file drawer problem in meta-analysis: An assessment of published and non-published correlation matrices. Personnel Psychology, 65(2), 221–249.

19.

Dalton

D. R.

Dalton

C. M.

(2008). Meta-analyses. Some very good steps toward a bit longer journey. Organizational Research Methods, 11(1), 127–147.

20.

Dana

Weber

R. A.

Kuang

J. X.

(2007). Exploiting moral wiggle room: Experiments demonstrating an illusory preference for fairness. Economic Theory, 33(1), 67–80.

21.

De Angelis

Drazen

J. M.

Frizelle

F. A.

Haug

Hoey

Horton

R.,

… Van Der Weyden,

M. B.

(2004). Clinical trial registration: A statement from the International Committee of Medical Journal Editors. Annals of Internal Medicine, 141(6), 477.

22.

Deeks

J. J.

Macaskill

Irwig

(2005). The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. Journal of Clinical Epidemiology, 58, 882–893.

23.

Dickersin

Min

Y. I.

Meinert

C. L.

(1992). Factors influencing publication of research results. Journal of the American Medical Association, 267(3), 374–378.

24.

Duval

Tweedie

(2000). Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics, 56(2), 455–463.

25.

Easterbrook

P. J.

Berlin

J. A.

Gopalan

Matthews

D. R.

(1991). Publication bias in clinical research. The Lancet, 337(8746), 867–872.

26.

Edwards

J. R.

(2008). To prosper, organizational psychology should … overcome methodological barriers to progress. Journal of Organizational Behavior, 29(4), 469–491.

27.

Egger

Smith

G. D.

(1997). Meta-analysis: Potentials and promise. British Medical Journal, 315(7119), 1371–1374.

28.

Egger

Smith

G. D.

Schneider

Minder

(1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315(7109), 629–634.

29.

Friedman

Wall

(2005). Graphical views of suppression and multicollinearity in multiple linear regression. American Statistician, 59(2), 127–136.

30.

Gawande

(1999). The cancer-cluster myth. New Yorker, 9, 34–37.

31.

Gilovich

Vallone

Tversky

(1985). The hot hand in basketball: On the misperception of random sequences. Cognitive Psychology, 17(3), 295–314.

32.

Glick

W. H.

Miller

C. C.

Cardinal

L. B.

(2007). Making a life in the field of organization science. Journal of Organizational Behavior, 28(7), 817–835.

33.

Hahn

Williamson

P. R.

Hutton

J. L.

(2002). Investigation of within-study selective reporting in clinical research: Follow-up of applications submitted to a local research ethics committee. Journal of Evaluation in Clinical Practice, 8(3), 353–359.

34.

Hedges

L. V.

Olkin

(1985). Statistical methods for meta-analysis. New York, NY: Academic Press.

35.

Hedges

L. V.

Vevea

J. L.

(1996). Estimating effect size under publication bias: Small sample properties and robustness of a random effects selection model. Journal of Educational and Behavioral Statistics, 21(4), 299–332.

36.

Henson

R. K.

Roberts

J. K.

(2006). Use of exploratory factor analysis in published research. Educational and Psychological Measurement, 66(3), 393–416.

37.

Higgins

J. P. T.

Thompson

S. G.

(2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21, 1539–1558.

38.

Higgins

J. P. T.

Thompson

S. G.

Deeks

J. J.

Altman

D. G.

(2003). Measuring inconsistency in meta-analyses. British Medical Journal, 327(7414), 557–560.

39.

Horwitz

S. K.

Horwitz

I. B.

(2007). The effects of team diversity on team outcomes: A meta-analytic review of team demography. Journal of Management, 33, 987–1015.

40.

Huedo-Medina

T. B.

Sánchez-Meca

Marín-Martínez

Botella

(2006). Assessing heterogeneity in meta-analysis: Q statistic or I² index? Psychological Methods, 11(2), 193–206.

41.

Hunter

J. E.

Schmidt

F. L.

(2004). Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage.

42.

Hunter

J. E.

Schmidt

F. L.

Jackson

G. B.

(1982). Meta-analysis: Cumulating research findings across studies. Beverly Hills, CA: Sage.

43.

Hutton

J. L.

Williamson

P. R.

(2000). Bias in meta–analysis due to outcome variable selection within studies. Journal of the Royal Statistical Society: Series C (Applied Statistics), 49(3), 359–370.

44.

Ioannidis

J. P.

(2005). Why most published research findings are false. PLoS Medicine, 2(8), 696.

45.

John

L. K.

Loewenstein

Prelec

(2012). Measuring the prevalence of questionable research practices with incentives for truth-telling. Psychological Science, 23, 524–532.

46.

Kepes

Banks

G. C.

McDaniel

Whetzel

D. L.

(2012). Publication bias in the organizational sciences. Organizational Research Methods, 15(4), 624-662.

47.

Kerr

N. L.

(1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196–217.

48.

Kraemer

H. C.

Gardner

Brooks

J. O., III

Yesavage

J. A.

(1998). Advantages of excluding underpowered studies in meta-analysis: Inclusionist versus exclusionist viewpoints. Psychological Methods, 3(1), 23–31.

49.

Leamer

E. E.

(1983). Let's take the con out of econometrics. American Economic Review, 73(1), 31–43.

50.

Light

R. J.

Pillemer

D. B.

(1984). Summing up: the science of reviewing research. Cambridge, MA: Harvard University Press.

51.

Lipsey

M. W.

Wilson

D. B.

(1993). The efficacy of psychological, educational, and behavioral treatment: Confirmation from meta-analysis. American Psychologist, 48(12), 1181–1209.

52.

Macaskill

Walter

S. D.

Irwig

(2001). A comparison of methods to detect publication bias in meta-analysis. Statistics in Medicine, 20(4), 641–654.

53.

Maxwell

S. E.

(2004). The persistence of underpowered studies in psychological research: Causes, consequences, and remedies. Psychological Methods, 9(2), 147–163.

54.

McGauran

Wieseler

Kreis

Schüler

Y. B.

Kölsch

Kaiser

(2010). Reporting bias in medical research—A narrative review. Trials, 11, 37.

55.

Milloy

S. J.

(1995). Science without sense: The risky business of public health research. Washington, DC: Cato Institute.

56.

Moreno

S., Sutton, A., Ades, A., Stanley, T., Abrams, K., Peters, J., & Cooper, N

. (2009). Assessment of regression-based methods to adjust for publication bias through a comprehensive simulation study. BMC Medical Research Methodology, 9(1), 2.

57.

Nickerson

R. S.

(1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2(2), 175–220.

58.

R Development Core Team. (2010). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org

59.

Rendina-Gobioff

(2006). Detecting publication bias in random effects meta-analysis: An empirical comparison of statistical methods (Unpublished doctoral dissertation). University of South Florida, Tampa, FL.

60.

Rothstein

H. R.

Sutton

A. J.

Borenstein

(2005). Publication bias in meta-analysis. New York, NY: John Wiley.

61.

Scargle

J. D.

(2000). Publication bias: The “file-drawer problem” in scientific inference. Journal of Scientific Exploration, 14(2), 94–106.

62.

Sedlmeier

Gigerenzer

(1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105(2), 309–316.

63.

Selvin

H. C.

Stuart

(1966). Data-dredging procedures in survey analysis. American Statistician, 20(3), 20–23.

64.

Shore

L. M.

Randel

A. E.

Chung

B. G.

Dean

M. A.

Ehrhart

K. H.

Singh

(2011). Inclusion and diversity in work groups: A review and model for future research. Journal of Management, 37(4), 1262–1289.

65.

Simmons

J. P.

Nelson

L. D.

Simonsohn

(2011). False-positive psychology. Psychological Science, 22(11), 1359–1366.

66.

Song

Parekh

Hooper

Loke

Y. K.

Ryder

Sutton

A. J.,

… Harvey

(2010). Dissemination and publication of research findings: An updated review of related biases. Health Technology Assessment, 14(8), 1–220.

67.

Spector

P. E.

Brannick

M. T.

(2011). Methodological urban legends: The misuse of statistical control variables. Organizational Research Methods, 14, 279–286.

68.

Stern

J. M.

Simes

R. J.

(1997). Publication bias: Evidence of delayed publication in a cohort study of clinical research projects. British Medical Journal, 315(7109), 640–645.

69.

Sterne

J. A.

Egger

(2001). Funnel plots for detecting bias in meta-analysis: Guidelines on choice of axis. Journal of Clinical Epidemiology, 54(10), 1046–1055.

70.

Sterne

J. A.

Egger

Moher

(2008). Addressing reporting biases. In Higgins

J. P.

Green

(Eds.), Cochrane handbook for systematic reviews of interventions (pp. 297-333). Chichester, UK: John Wiley.

71.

Sterne

J. A. C.

Sutton

A. J.

Ioannidis

J. P. A.

Terrin

Jones

D. R.

Lau

… Higgins

J. P.

(2011). Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. British Medical Journal, 343(7818), 302–307.

72.

Sutton

A. J.

Duval

S. J.

Tweedie

R. L.

Abrams

K. R.

Jones

D. R.

(2000). Empirical assessment of effect of publication bias on meta-analyses. British Medical Journal, 320(7249), 1574–1577.

73.

Turner

E. H.

Matthews

A. M.

Linardatos

Tell

R. A.

Rosenthal

(2008). Selective publication of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine, 358(3), 252–260.

74.

Viechtbauer

(2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48.

75.

Villar

Carroli

Belizán

J. M.

(1995). Predictive ability of meta-analyses of randomised controlled trials. The Lancet, 345(8952), 772–776.

76.

Wicherts

J. M.

Borsboom

Kats

Molenaar

(2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61(7), 726–728.

77.

Williamson

P. R.

Gamble

Altman

D. G.

Hutton

J. L.

(2005). Outcome selection bias in meta-analysis. Statistical Methods in Medical Research, 14(5), 515–524.