Out of Shape: The Implications of (Extremely) Nonnormal Dependent Variables

Abstract

Organizational researchers have increasingly noted the problems associated with nonnormal dependent variable distributions. Most of this scholarship focuses on variables with positive values and long tails, such as employee performance, capital expenses, and assets. However, scholars frequently test organizational theories using dependent variables that include negative values, which is perhaps most prominently the case as it relates to measures of firm performance. Over the course of two studies, we investigate the implications of such nonnormally distributed dependent variables in organizational research. In Study 1, we examine the nonnormality of firm performance measures and uncover extreme levels of skewness and kurtosis that vary substantially across measures, samples, and years. We also illustrate that many transformations scholars use to address nonnormality are ineffective. In Study 2, we create simulations that seek to mirror these distributions, and we find that such extreme nonnormality reduces efficiency and increases Type II errors with most statistical approaches. Our analyses also reveal the effectiveness of quantile regression when modeling dependent variables that exhibit the nonnormal distributions often found in organizational research.

Keywords

bootstrapping extreme cases Monte Carlo simulations OLS regression outliers quantile regression

Introduction

Over the past decade, organizational researchers have highlighted the nonnormal distributions of several prominent variables and the corresponding impact on statistical models (Beck et al., 2014; Becker et al., 2019; O’Boyle & Aguinis, 2012). By and large, this research underscores that the distribution of the dependent variable influences the normality of the error term in a regression, such that nonnormally distributed dependent variables might necessitate unconventional estimation approaches (Cohen et al., 2003; Wooldridge, 2020). As Wooldridge (2020, p. 174) describes, “if the errors are random draws from some other distribution than the normal … this is a potentially serious problem because our inference hinges on being able to obtain critical values or p-values.” Organizational scholars often promote variable transformations (e.g., log, square root, etc.) as the primary means to resolve these issues (Crawford et al., 2015; Rönkkö et al., 2022). For the most part, this research on nonnormality focuses on variables with distributions that are right skewed and bounded by zero, such as employee performance, total assets, and revenues (e.g., Becker et al., 2019; Joo et al., 2017).

There exists considerably less knowledge, however, about the effects of nonnormally distributed variables that include negative values. Such nonnormally distributed variables preclude the use of widely used transformations and require approaches that either ignore the distribution of the variable or employ problematic adjustments to eliminate negative values (Becker et al., 2019; Rönkkö et al., 2022). At the same time, organizational scholars often test theories using dependent variables that can take negative values, such as stock market reactions (e.g., Graffin et al., 2016), strategic change (e.g., Zhu et al., 2020), initial public offering underpricing (e.g., Filatotchev & Bishop, 2002), and competitive repertoires (e.g., Connelly et al., 2017).

Although the lack of clarity about nonnormal variables with negative values applies to all content domains, it may have the most profound impact on firm performance, the preeminent dependent variable in strategy research (Henderson et al., 2012). As Gans and Ryall (2017) describe, “Understanding persistent heterogeneity in firm performance is, perhaps, the central objective in the field of strategy.” Uncovering potentially problematic distributional properties may therefore have dramatic implications for an array of theories (e.g., agency theory, resource-based view, and upper echelons) and contexts (e.g., acquisitions, alliances, and boards of directors).

Owing to the ambiguity in the literature about the potentially problematic nature of dependent variables distributed like firm performance, the structure and motivation of our research surround two broad research questions. First, just how nonnormal is firm performance across different measures and time? To investigate this question, in Study 1 we explore the shape of several performance variables over time and across samples. Research indicates that performance measures are associated with skewness and kurtosis (Henderson et al., 2012), but our results extend this knowledge to demonstrate that this nonnormality is dramatic and varies substantively across different measures, samples, and years. Moreover, we find that many transformations used by scholars (e.g., log, winsorizing, etc.) to resolve nonnormality are largely ineffective for variables with distributions like firm performance measures.

With this challenge in mind, the results of Study 1 inform our second research question: How might distributions such as those associated with firm performance influence statistical results? In Study 2, we design simulations whereby we generate dependent variables to mimic the distributions of the variables in Study 1. We examine the efficiency of ordinary least squares (OLS) against techniques designed to account for nonnormally distributed dependent variables—OLS with log transformations, winsorizing, inverse hyperbolic sine (IHS) transformations, robust standard errors, and bootstrapping. Ultimately, we find that all of these oft-adopted techniques to address nonnormality produce inefficient estimates and subsequent Type II errors. We also compare these results to those of quantile regression, which some researchers posit may help alleviate potential issues with nonnormal distributions (Li, 2015). Our simulations reveal that quantile regression represents an attractive analytical technique when analyzing distributions associated with firm performance.

Our research offers several contributions. First, we confirm that the distributions of performance measures are indeed nonnormal, but we find they are nonnormal in different ways. We uncover that skewness and kurtosis vary dramatically across measures, subsamples, transformations, and timeframes. We report a wide array of skewness values, including approximately −67 (ROA), 46 (ROE), and 124 (EPS). The values for kurtosis, which receives remarkably less attention than skewness in strategy research, are even more drastic and typically exceed 1,000 (or even 10,000), which is a far cry away from the kurtosis value of 3 for normally distributed variables. We also note that a vast majority of the variance in our variables is within firms over time and not between firms.

Second, we show that many of the popular transformations used by scholars to normalize distributions are ineffective in creating normal distributions, upholding efficient parameter estimation, and eliminating Type II errors. Specifically, in Study 1, we illustrate that the performance measures remain nonnormally distributed even after transformations like logging or winsorizing the variables. Further, in Study 2, we demonstrate that these transformations do not produce notably more efficient estimates, particularly compared against quantile regression. Taken together, our results suggest that the techniques currently used by scholars to resolve nonnormality issues may prove less effective than commonly thought.

Third, we extend prior work on variable distributions by demonstrating how different types of nonnormal distributions undermine causal inference. Our simulations illustrate that dependent variables with nonnormal distributions induce substantial Type II errors (not rejecting the null hypothesis when it should be rejected) when using linear models, which vary based on the type and extent of nonnormality. These challenges remain even when using widespread techniques such as variable transformations, robust standard errors, and bootstrapping. In contrast, we find that quantile regression outperforms all of these remedies in terms of model efficiency and Type II errors.

Nonnormal Distributions of Continuous Variables

Organizational research has historically focused on variables with normal distributions (Crawford et al., 2015). Simply stated, a normal distribution involves observations that (a) cluster around a stable group mean and (b) disperse out into symmetrical tails with an identifiable variance (O’Boyle & Aguinis, 2012). In recent years, though, scholars have highlighted the importance of nonnormal distributions, which are characterized by departures from a stable mean and finite variance. Indeed, management researchers have examined the nonnormality of job performance in various contexts, including employees, firms, politicians, and other stakeholders more broadly (Beck et al., 2014; O’Boyle & Aguinis, 2012). For instance, Joo, Aguinis, and Bradley (2017) study the performance of individuals, providing a taxonomy of these nonnormal distributions. Similarly, Andriani and McKelvey (2009) offer an overview of other social and organizational phenomena that are nonnormally distributed. Together, these studies illustrate that variables used to study organizations are often not normally distributed.

Measuring (Non)Normality: Skewness and Kurtosis

Statisticians use the idea of moments to describe the normality of distributions (Cox, 2010). Management researchers commonly reference the first two moments—the central tendency (i.e., mean) and variability (i.e., standard deviation)—of each variable when summarizing statistical results. Skewness and kurtosis, which are based on the third and fourth moments of a variable's distribution, have received relatively less attention. To gain a sense for how skewness and kurtosis are calculated, it is first important to understand that the rth moment of variable, y, for a sample size, n, can be calculated as:

m_{r} = \frac{\sum_{i = 1}^{n} {(y - \bar{y})}^{r}}{n} .

(1)

Using this equation, setting r equal to 1 results in the sample mean, and r equal to 2 results in the sample variance.

Skewness refers to the symmetry or asymmetry of a distribution (i.e., whether the distribution has a longer tail on one side or the other). Although distributional asymmetry is somewhat of “a vague concept” (Cox, 2010, p. 483), positive (negative) skew occurs when the right (left) side of the distribution has a longer tail. A normal distribution is associated with zero skew, which indicates symmetrical tails on both sides of the distribution. Using some of the most popular measures in psychology, Bishara and Hittner (2017) summarized datasets and estimated that the skewness of these measures ranged from approximately −3 to 3. Adapting Equation 1 for the third moment provides a measure of skewness known as g₁:

g_{1} = \frac{m_{3}}{m_{2}^{3 / 2}} .

(2)

Kurtosis reflects the extremity of a distribution's tails relative to a normal distribution (DeCarlo, 1997; Westfall, 2014). Wright and Herrington (2011) use the concepts of a distribution's head, shoulders, and tails to describe different types of distributions. According to this framework, leptokurtic reflects distributions with higher peaks, lower shoulders, and more values in the tails; platykurtic describes distributions with more values in the shoulders, lower peaks, and fewer values in the tails (Wright & Herrington, 2011). DeCarlo (1997) points out that kurtosis can arise because a distribution is nonnormal (heavy tails) or because the distribution is normal but includes outliers. Normal distributions are associated with positive kurtosis values (a kurtosis value of 3), although sometimes kurtosis measures are centered at 0 (i.e., excess kurtosis). Bishara and Hittner (2017) estimated that kurtosis for popular psychology measures ranged from approximately −2 to 40, with negative values reflecting excess kurtosis. Adapting Equation 1 for the fourth moment leaves a measure for kurtosis known as g₂:

g_{2} = \frac{m^{4}}{m_{2}^{2}} .

(3)

It is important to note that skewness and kurtosis both represent measures of a distribution's shape and are therefore related; kurtosis cannot be less than skewness squared (Bishara & Hittner, 2017; Wright & Herrington, 2011). In this sense, it is mathematically impossible to have skewed distributions that do not also have some degree of kurtosis. At the same time, kurtosis can (and often does) exceed the squared value of skewness. Similarly, both skewness and kurtosis are dimensionless, meaning the metrics themselves are inherently standardized regardless of the underlying variable (Cox, 2010). This allows for direct comparisons of normality or shapes across different measures and variables.

Consequences of Nonnormality

Statisticians note that the distribution of dependent variables is particularly relevant for linear estimators, such as OLS regression, and the numerous techniques that build on its estimation approach (Wooldridge, 2020). One primary assumption of OLS regression—and more sophisticated analytical techniques with similar assumptions, such as two-stage modeling, multilevel models, and panel data models—involves the requirement that the error term follows a normal distribution (Wooldridge, 2020). If the error term is not normally distributed, p-values are not interpretable in the conventional sense because the parameter estimates do not follow a normal sampling distribution. In other words, t-statistics and p-values are intended to provide some insight on the likelihood of a relationship existing in a population but researchers cannot interpret this likelihood if the error term is not normally distributed (Kennedy, 2008; Wooldridge, 2020).

For the most part, research on the impact of nonnormally distributed residuals focuses on the implications for standard errors rather than coefficients. This is due to the fact that when sample sizes are sufficiently large, linear modeling is able to estimate relatively unbiased coefficients despite nonnormal disturbances (Wooldridge, 2020). Standard errors, though, are far less accurate and fluctuate from their true values depending on the nature of the disturbance (Kennedy, 2008). Nonnormal residuals induce Type II errors due to inflated standard errors, a concept referred to as inefficiency (Kennedy, 2008; Wooldridge, 2020). Researchers in a variety of disciplines recognize this fact and tailor their estimation procedures accordingly. For instance, work in sociology (Osgood et al., 1988), political science (Katz & King, 1999), psychology (e.g., Cohen et al., 2003), economics (e.g., Leamer, 1983), and several others have accounted for the fact that residuals are not normally distributed. Scholars usually do this by examining the distribution of their dependent variables, reasoning that if the dependent variable is sufficiently nonnormally distributed, the residuals estimated using linear modeling likely follow suit.

Accounting for Nonnormal Distributions

Given the potential efficiency problems (inflation of standard errors) associated with nonnormal residuals, O’Boyle and Aguinis (2012, p. 105) suggest that “deviations from normality are seen as ‘data problems’ that must be ‘fixed.’” One fix that organizational researchers often employ involves transforming variables to reduce the impact of nonnormality in linear models. These transformations induce “inconstant changes in the units or scale of a variable … with the intent of creating a new distribution that is more normal” (Becker et al., 2019, p. 831).

In their review of such transformations in top organizational journals, Becker et al. (2019) found that approximately 40% of all studies employed at least one transformation. Specifically, they determined that the most popular transformation was the log transformation (comprising almost 90% of all transformations), but they also reported the use of other techniques such as winsorizing, square root transformations, and dropping outliers. Similarly, Rönkkö et al. (2022) reviewed articles published in top-tier journals and found that 66% of empirical articles using regression-type models used at least one transformation, with the log transformation being the most pervasive. Rönkkö et al. (2022) highlighted that scholars may avoid such transformations by using general linear models (GLMs) with nonlinear link functions.

In addition to these two recent studies (i.e., Becker et al., 2019; Rönkkö et al., 2022), we examined published research in two top management journals (i.e., Academy of Management Journal or AMJ, and Strategic Management Journal or SMJ) over the course of the past decade (i.e., 2010, 2015, and 2020) to help better understand how scholars tend to address potential issues stemming from nonnormality. The results in Table 1 confirm these trends over time. We observe that a remarkable portion of the empirical articles¹ published in these journals over the past decade rely on the dependent variable or model transformations that we emphasize in this manuscript. Indeed, 70% (over 60%) of the studies published in AMJ (SMJ) account for nonnormality by transforming the model or dependent variables themselves. Similarly, the usage of transformations appears to have increased dramatically over time. To this point, the extent to which scholars have employed transformations in AMJ (SMJ) in 2020 was nearly triple (double) that of 2010, with all but two empirical studies employing transformations in AMJ in 2020.

Table 1.

Content Overview of Nonnormality in Published Research.

	2010		2015		2020		Total
Academy of Management Journal	Count	% of Empirical	Count	% of Empirical	Count	% of Empirical	Count	% of Empirical
Empirical articles in our analyses	50	100%	56	100%	49	100%	155	100%
Articles with transformations	16	32%	45	80%	47	96%	108	70%
Winsorized	1	2%	0	0%	4	8%	5	3%
Log	1	2%	3	5%	5	10%	9	6%
Inverse hyperbolic sine	1	2%	0	0%	0	0%	1	1%
Robust/Cluster SE	11	22%	27	48%	36	73%	74	48%
Bootstrap	3	6%	19	34%	17	35%	39	25%
Quantile regression	1	2%	0	0%	0	0%	1	1%
Strategic Management Journal	Count	% of Empirical	Count	% of Empirical	Count	% of Empirical	Count	% of Empirical
Empirical articles in our analyses	64	100%	103	100%	73	100%	240	100%
Articles with transformations	28	44%	59	57%	61	84%	148	62%
Winsorized	2	3%	10	10%	4	5%	16	7%
Log	10	16%	18	17%	15	21%	43	18%
Inverse hyperbolic sine	0	0%	0	0%	1	1%	1	0%
Robust/Cluster SE	25	39%	40	39%	55	75%	120	50%
Bootstrap	0	0%	8	8%	3	4%	11	5%
Quantile regression	0	0%	0	0%	1	1%	1	0%

Note: We analyze empirical articles that feature at least one variable that could potentially assume a nonnormal distribution. Accordingly, the counts depicted in the rows “Empirical articles in our analyses” do not include qualitative research, meta-analyses, theory-exclusive manuscripts, or editorials. “Articles with transformations” depicts whether an article featured at least one dependent variable or model transformation of the ones we examine in our research, not the total number of transformations. To this end, it is possible for the number of different specific types of transformations to exceed the total number of articles with transformations, as any given article could have featured more than one.

Table 1 also reveals several different techniques scholars have employed to account for nonnormal distributions over the past decade—that is, robust standard errors, bootstrapping, logging, winsorization, quantile regression, and inverse hyperbolic sine. All these procedures are well documented, so our aim here is to provide a brief overview.

Winsorization. Winsorization involves transforming the data by replacing values for any given observation that exceed a certain threshold with the pre-specified ceiling and floor values (Aguinis et al., 2013). For instance, Cheng et al. (2014) winsorize several performance indicators (e.g., market-to-book ratio, cash flow ratio, etc.) by replacing any values above the 99th (or below the 1st) percentile in their sample with values that match the 99th (1st) percentile. Similarly, Flammer and Ioannou (2021) winsorize capital expenditure ratios and R&D intensity at the 95th percentile, replacing all variables with values above the 95th percentile (below the 5th percentile) with values exactly at the 95th (or 5th) percentile. Although there is not extensive guidance about what specific levels are appropriate for winsorization, our experience suggests that scholars typically select the 99th or 95th percentiles.

Despite the popularity of winsorizing, it is fallible since it replaces values of observations that may be empirically problematic but conceptually salient. As Aguinis et al. (2013, p. 271) contend, “outliers can also be of substantive interest and studied as unique phenomena that may lead to novel theoretical insights.” To this point, the three most profitable firms in Execucomp in 2020—Apple, Microsoft, and Berkshire Hathaway—had net incomes of US$57.4 billion, US$44.3 billion, and US$42.5 billion, respectively. Winsorizing transforms all these values (as well as for every other firm in the top 1%) to the net income of the 99th percentile, which is US$7.5 billion. While we use net income in this example, this transformation dilutes top (and bottom) performers regardless of the measure. Although this dilution may not always present problems, this example illustrates the potential challenges that may result from winsorizing.

Log transformations. Log transformations are usually applied to account for outliers by changing the original scale of the variable via taking its natural log value. As a result, the functional form relating independent variables to the original dependent variable is also altered, complicating the interpretation of results (Pek et al., 2018). In other words, log transformations can reduce the influence of outlier values, but this transformation also changes the interpretation of the transformed variable (Rönkkö et al., 2022). In addition, the log transformation cannot accommodate variables that include negative values. Researchers often add a constant to all values to make the variable positive (Villadsen & Wulff, 2021), which further obfuscates the interpretation of the estimates (Becker et al., 2019). This is especially the case if the variable does not follow a log-normal distribution, in which case a log transformation may fail to reduce nonnormality and can even exacerbate the problem (Feng et al., 2013).

Inverse hyperbolic sine (IHS) transformations. Although relatively uncommon in management research (Rönkkö et al., 2022), the IHS transformation is often used by economists to avoid the problems associated with log transformations (Aihounton & Henningsen, 2021). Unlike log transformations, IHS transformations do not require any manipulations of the original data in order to accommodate negative values. This transformation involves reconfiguring the distribution of the variable into something that resembles normality, often to address right-skewed dependent variables (Carboni, 2012). The results of regressions using IHS-transformed variables depend on the units of measurement associated with the transformed variables, thereby potentially complicating causal inference.

Robust standard errors. Scholars also adjust standard errors to help account for issues associated with nonnormally distributed residuals (Baum, 2006; Chou et al., 1991). Doing so generally involves sophisticated transformations of the standard errors from a linear model, usually via accessible commands in statistical software (Baum, 2006; Chou et al., 1991). Research indicates that incorporating robust standard errors reduces inefficiencies and Type II error from nonnormal residuals (Chou et al., 1991), in addition to several other related benefits (Curran et al., 1996). Strategy researchers routinely adopt this transformation, for instance when examining dependent variables like diversification (e.g., Zhou, 2011), corporate social responsibility (e.g., Hubbard et al., 2017), and firm performance (e.g., Quigley et al., 2019).

Bootstrapping. Bootstrapping does not adjust the data, instead creating a large number of artificial samples by randomly drawing and replacing the original observations (Pek et al., 2018). Each artificial sample is used to calculate a probability distribution for the target statistic. With this approach, the nonnormal distribution of the residuals is treated as “non-informative and a nuisance to be addressed” (Pek et al., 2018, p. 6). That said, bootstrapping relies on an assumption of independent observations, which is often violated with the types of data management scholars examine (e.g., panel data).

Quantile regression. Quantile regression is a nonparametric technique that estimates the marginal effect of the independent variable at different points of the distribution of the dependent variable (Cameron & Trivedi, 2010; Li, 2015). This is in stark contrast to the overwhelming majority of estimators (e.g., OLS regression) that focus on a single value reflecting the average relationship across all values of the dependent variable (Angrist & Pischke, 2009). Accordingly, the difference between OLS regression and quantile regression is in many ways analogous to the difference between a sample mean (OLS) and a sample median (quantile regression). While a sample mean minimizes the sum of squared residuals, the median minimizes the sum of absolute residuals (Koenker & Hallock, 2001).

Cameron and Trivedi (2010) highlighted several advantages of quantile regression. First, as compared to OLS, quantile regression is less sensitive to outliers, in part because it was created to help account for skewed distributions (Li, 2015). Second, whereas OLS requires a normally distributed error term, quantile regression avoids assumptions about the parametric distribution of regression error terms. Third, while quantile regression permits the examination of the effects of regressors at different points along the distribution of the dependent variable, it also estimates parameters analogous to OLS except reflecting the median effect instead of the mean.

Summary

Almost without exception, research on nonnormality in organizational contexts focuses either on variables that range from zero to positive infinity or require potentially problematic scaling to create positive variables (e.g., Becker et al., 2019; Joo et al., 2017). At the same time, we lack an understanding of the effectiveness of the techniques summarized in this section when nonnormal distributions include negative values. In the following two studies, we, therefore, investigate nonnormality in the context of firm performance, which is one the most prominent variables in strategy research that scholars have suggested follows a nonnormal distribution (Henderson et al., 2012). In the first study, we examine the nonnormality of various performance measures, and in the second study, we leverage our findings from Study 1 to investigate the efficacy of techniques to address problems with nonnormal distributions.

Study 1: The Shape and Normality of Firm Performance

Strategy research seeks to answer the questions: “What drives the performance of an organization? Why do some organizations succeed while others fail? And what, if anything, can managers actually do about it” (Makadok et al., 2018, p. 1530). Accordingly, a vast body of empirical scholarship examines the antecedents of firm performance, with as much as three-fourths of all published strategy articles featuring it as the dependent variable (Hamann et al., 2013; Richard et al., 2009). Scholars typically operationalize firm performance using accounting (e.g., ROA, ROE, ROS, and margins) and/or stock market (e.g., shareholder returns, market-to-book ratio, and analyst evaluations) indicators (Dalton et al., 1998; Hamann et al., 2013). Researchers have also recently expanded this to include social approval assets and external perceptions of the firm as another measure of success (Gamache & McNamara, 2019; Zavyalova et al., 2016).

Despite the salience of firm performance in strategy research, there remains mixed evidence regarding how firm activities influence subsequent performance. To this point, Makadok et al. (2018, p. 1530) argue that “the strategic management field has proven incredibly inconclusive” about what actually drives performance heterogeneity across research domains. Representing one potential reason for this conflict, Miller et al. (2013) conducted a meta-analysis and reported “remarkably weak” correlations among various measures of firm performance, particularly across categories (i.e., stock market and accounting measures).

In this Study 1, we thus examine how the distributional properties of performance variables may help explain these inconsistencies. To this end, we explore how skewness and kurtosis can quantify the nonnormality of a variety of commonly used performance measures, as well as the efficacy of transformations to normalize firm performance variables. Taken together, in Study 1, we examine the (non)normality of firm performance by asking:

RQ1: How do the skewness and kurtosis of firm performance variables vary across measures, subsamples, and transformations?

Methodology of Study 1

Sample. We downloaded Compustat data from 2000 to 2020 and constructed the following performance measures: net income (NI), ROA, ROE, EPS, market-to-book (M/B),² stock returns, and Tobin's Q (Dalton et al., 1998; Henderson et al., 2012). To ensure consistency across the firms in our sample, we limited the Compustat data to only those observations also included in Execucomp in any given year.³ Taken together, our final sample is comprised of 45,141 firm-year observations, although the sample size pertaining to any particular variable may fluctuate due to missing data.

Outcomes. We calculated statistics related to the first four moments for each variable—mean, standard deviation, skewness, and kurtosis, respectively. We also calculated median, minimum, and maximum values for each variable, in addition to the 1st, 50th, and 99th percentiles.

Results of Study 1

Table 2, which details our outcomes for the variables included in our sample, illustrates a number of findings that warrant attention. Notably, skewness and kurtosis varied considerably across measures. For example, ROA was negatively skewed, while EPS and M/B were positively skewed. Similarly, the kurtosis for EPS (18,760.59) was nearly 40 times the kurtosis for NI (489.37). Overall, Table 2 illustrates the incredible divergences in skewness and kurtosis across several variables that are intended to reflect the same construct of firm performance.

Table 2.

Distribution Parameters of Performance Measures.

Net Income
Transformation	N	Mean	SD	Skew	Kurtosis	Min	1st Pctl	50th Pctl	99th Pctl	Max
None	45,055	367.30	2284.20	4.94	489.37	−99289.00	−1431.70	53.91	7495.00	104,821.00
Winsorized	45,055	326.20	1071.20	4.63	28.14	−1431.71	−1431.70	53.91	7495.00	7495.00
Log	45,055	11.51	0.06	−147.02	25,619.47	0.00	11.49	11.51	11.58	12.23
Inverse hyperbolic sine	45,055	3.26	4.41	−1.12	3.19	−12.20	−7.96	4.68	9.62	12.25
Return on Assets
Transformation	N	Mean	SD	Skew	Kurtosis	Min	1st Pctl	50th Pctl	99th Pctl	Max
None	45,043	0.01	0.76	−66.85	9476.61	−103.00	−0.73	0.04	0.29	46.45
Winsorized	45,043	0.02	0.13	−2.97	16.23	−0.73	−0.73	0.04	0.29	0.29
Log	45,043	4.64	0.02	−186.98	37579.74	0.00	4.64	4.64	4.65	5.01
Inverse hyperbolic sine	45,043	0.02	0.18	−5.69	123.43	−5.33	−0.68	0.04	0.29	4.53
Return on Equity
Transformation	N	Mean	SD	Skew	Kurtosis	Min	1st Pctl	50th Pctl	99th Pctl	Max
None	45,042	0.07	9.50	45.79	9295.37	−790.61	−2.78	0.10	2.33	1156.00
Winsorized	45,042	0.06	0.50	−1.44	18.94	−2.78	−2.78	0.10	2.33	2.33
Log	45,042	6.67	0.03	−180.59	35952.60	0.00	6.67	6.67	6.68	7.57
Inverse hyperbolic sine	45,042	0.07	0.48	−0.83	37.04	−7.37	−1.75	0.10	1.58	7.75
Earnings Per Share
Transformation	N	Mean	SD	Skew	Kurtosis	Min	1st Pctl	50th Pctl	99th Pctl	Max
None	44,356	3.74	251.83	123.61	18760.59	−10873.00	−10.85	1.20	12.74	41243.00
Winsorized	44,356	1.36	2.96	−0.07	8.49	−10.85	−10.85	1.20	12.74	12.74
Log	44,356	9.29	0.05	−189.49	38980.70	0.00	9.29	9.29	9.30	10.86
Inverse hyperbolic sine	44,356	0.81	1.27	−0.81	5.47	−9.99	−3.08	1.01	3.24	11.32
Market to Book
Transformation	N	Mean	SD	Skew	Kurtosis	Min	1st Pctl	50th Pctl	99th Pctl	Max
None	41,521	4.19	34.59	111.49	16834.63	0.00	0.32	2.12	26.73	5603.07
Winsorized	41,521	3.27	3.80	3.82	20.73	0.32	0.32	2.12	26.73	26.73
Log	41,521	1.26	0.61	1.93	11.33	0.00	0.28	1.14	3.32	8.63
Inverse hyperbolic sine	41,521	1.62	0.74	1.42	7.87	0.00	0.32	1.50	3.98	9.32
Returns
Transformation	N	Mean	SD	Skew	Kurtosis	Min	1st Pctl	50th Pctl	99th Pctl	Max
None	40,056	14.32	1734.60	185.71	35951.04	−1.00	−0.84	0.06	5.15	337836.80
Winsorized	40,056	0.16	0.73	4.03	26.34	−0.84	−0.84	0.06	5.15	5.15
Log	40,056	0.75	0.42	8.60	130.82	0.00	0.15	0.72	1.97	12.73
Inverse hyperbolic sine	40,056	0.12	0.63	5.20	56.12	−0.88	−0.76	0.06	2.34	13.42
Tobin's Q
Transformation	N	Mean	SD	Skew	Kurtosis	Min	1st Pctl	50th Pctl	99th Pctl	Max
None	42,931	3.06	21.68	−102.23	11905.30	−2797.93	0.70	2.43	16.28	275.83
Winsorized	42,931	3.26	2.58	2.61	11.41	0.70	0.70	2.43	16.28	16.28
Log	42,931	7.94	0.04	−182.17	35122.44	0.00	7.94	7.94	7.94	8.03
Inverse hyperbolic sine	42,931	1.70	0.71	−1.89	24.28	−8.63	0.66	1.62	3.48	6.31

Note: The sample sizes in Table 2 reflect all firm-year observations covered in both Compustat and Execucomp databases for which at least one performance measure was available and are not missing values for the “SPcode” indicator in Execucomp. The reason the sample sizes diverge slightly across the different measures is due to missing data for each respective indicator.

Complementing these statistics, in Figure 1, we present kernel density plots to illustrate the distributions. The horizontal axis for each plot represents the actual range of values in the sample for each variable, and the vertical axis represents the density of observations featuring the value. Figure 1 reveals that ROA and Tobin's Q are negatively skewed and have high peaks, whereas ROE, EPS, and M/B have high peaks and are positively skewed. This figure also shows that NI has the least skewness and one of the lowest levels of kurtosis. In fact, the extreme nonnormality makes the central regions of the distributions in these plots difficult to visualize.

Figure 1.

Performance measure distributions: full sample.

We also examined the distributions of the performance variables across different index subsamples—S&P 500, MidCap 400, and SmallCap 600. We found substantial variation in both skewness and kurtosis for the same variable across samples. The distributions of ROE for each subsample are shown in Figure 2.⁴ As depicted, the magnitude of kurtosis for ROE ranged from 1533.83 for MidCap firms to 9295.37 for the full sample. In addition to magnitude, skewness also varied in a direction across samples. ROE skewness was negative (−73.90) for the SmallCap subsample and positive (46.44) for the S&P 500 subsample.

Figure 2.

Trends in skewness & kurtosis of ROE across subsamples.

We also show in Table 2 that transforming the variables does not appear to result in normal distributions. For instance, the winsorized values (1st & 99th percentiles) of net income still were associated with nonnormality in terms of both skewness (4.63) and kurtosis (28.14). The same holds true for ROA, which also actually exhibited increased kurtosis from 9476.61 to 37579.74 after logging the variable and adding the most negative number. These general trends were fairly consistent across all measures of firm performance.⁵ Similarly, in unreported supplemental analyses, we found that these patterns persisted even when adjusting the measures for industry performance or standardizing based on industry membership.

We also examined skewness and kurtosis for the different performance measures over time. Figure 3 shows how skewness and kurtosis of ROA varied in both magnitude and direction over time. For instance, skewness (kurtosis) for ROA in the full sample was approximately 45 (2050) in 2007 and −20 (515) in the following year. These findings are particularly compelling when coupled with the intraclass correlation coefficients (ICCs) associated with firms in our study (we do not report these ICCs for the sake of brevity). The ICCs for all of the variables reveal that an overwhelming majority of the variance in each measure exists within firms over time and not between firms.⁶ In fact, the average amount of between-firm variance for the variables in our study is approximately 7%, meaning almost all (i.e., about 93%) of the variation we observe is within firms over time.

Figure 3.

Trends in skewness & kurtosis of ROA in the full sample across years.

Discussion of Study 1

Research suggests that firm performance variables are not always normally distributed (Henderson et al., 2012; Makino & Chan, 2017), but in Study 1 we extend this scholarship in two crucial ways. First, our findings highlight there is not a simple dichotomy that distinguishes normal from nonnormal distributions. Instead, our results indicate that skewness and kurtosis vary considerably across different performance measures, subsamples, transformations, and time. In fact, some variables are skewed positively while others are skewed negatively. In contrast to the skewness range of approximately −3 to 3 of some of the most prominent psychology measures (Bishara & Hittner, 2017), we found that the absolute value of skewness actually often exceeded 50 for our performance variables. Similarly, while Bishara and Hittner (2017) note that kurtosis in psychology measures ranged from −1 to 40, we found that it often exceeded 1,000 (and commonly above 10,000). Our results thus suggest that performance measures are not just nonnormal, but exceedingly nonnormal compared to other related fields.

Second, and perhaps more importantly, our findings suggest that many performance measures remain nonnormally distributed even after different transformations that are popular in organizational research. In particular, while winsorizing the variables slightly reduced nonnormality, the values of both skewness and kurtosis for the winsorized versions of each measure still indicate extreme nonnormality (for both ratio and nonratio measures). Moreover, log transformations at times increased the degree of the nonnormality, and some variables even changed the overall direction of their skewness after applying the transformation. These findings illustrate that the transformations organizational scholars routinely employ do not meaningfully address the incredibly nonnormal distributions endemic to popular firm performance measures.

Taken together, the outcomes from Study 1 indicate that there is no single way to describe the nonnormality of the distributions of performance measures, making it challenging to offer specific recommendations or diagnoses about techniques to account for a particular variable. Accordingly, we do not know what our results might mean for researchers who use such nonnormally distributed variables as dependent variables in statistical models. With that in mind, in Study 2 we created simulations to better understand the effectiveness of popular techniques to address nonnormality—OLS, winsorization, log and IHS transformations, robust standard errors, bootstrapping, and quantile regression—in empirical estimation when variables follow distributions like those we uncovered in this study.

Study 2: Modeling Dependent Variables with Extremely Nonnormal Distributions

The results of Study 1 reveal that measures of firm performance are associated with extreme levels of skewness and kurtosis. As we described previously, research in statistics (Cohen et al., 2003) and econometrics (Wooldridge, 2020) has consistently documented that nonnormal disturbances, which stem from nonnormally distributed dependent variables, tend to inflate standard errors and induce inefficiency. Our focus in this Study 2, then, involves examining whether this remains the case—or is especially true—with the remarkable nonnormal distributions we uncovered in Study 1, as well as understanding the effectiveness of the techniques organizational scholars typically adopt to address this issue. Stated plainly, we investigate the following research question:

RQ2: How do dependent variables with high levels of skewness and/or kurtosis, as well as techniques to account for nonnormal distributions, influence parameter estimation?

Methodology of Study 2

We designed simulations to help examine the relative efficacy of techniques to account for nonnormally distributed error terms. Our goal with these simulations is to compare parameter estimates—paying careful attention to the standard errors, which provide insight into the efficiency of the estimator—across the transformations, adjustments, and modeling techniques we described previously within the same type of nonnormality.

Data generation process. The simulations in this study were created and run using R, an open-source statistical software package (R_Core_Team, 2022). Each condition involved 10,000 simulation iterations. For each iteration, we generated samples that included 1,000 observations. We should note, however, that the sample size itself does not play an independent role in deriving our results. In other words, the sample size is one of several components we specified to help create realistic statistical significance levels that are consistent with extant research on simulations (Busenbark et al., 2022; Certo et al., 2016; Semadeni et al., 2014). Although changing the sample size in isolation increases efficiency, our results remain identical if we increase or decrease the sample size in tandem with changing coefficients.

The purpose of this simulation was to generate a dependent variable, y, that followed a nonnormal distribution and was associated with an independent variable of interest. As we describe later, we then substantively varied the distribution of y to help mimic the nature of nonnormality in the dependent variables from Study 1. Creating y first involved generating a random independent variable, x, which followed a normal distribution with a mean of 0 and a variance of 1. We then used Equation 4 to create y as a function of our independent variable, x, as well as an error term, e, which was generated to reflect different nonnormal distributions:

y_{i} = 0.05 x_{i} + e_{i} .

(4)

Intuitively, the key to understanding this data generation process is that the distribution of y is largely a function of the distribution of e, which is consistent with other research on the topic that has employed different distributions to emulate nonnormality (de Winter et al., 2016; Westfall, 2014; Wright & Herrington, 2011).

To generate our nonnormal error terms, we specified several conditions using the skewed generalized t distribution using the -sgt-package in R (Davis, 2015). This distributional package is attractive for our purposes because it allows us to simulate distributions with negative and positive values while also varying both skewness and kurtosis (McDonald & Michelfelder, 2017). The command in R includes five parameters: mu (i.e., mean), sigma (i.e., variance), lambda (i.e., skewness), and two parameters (p and q) to denote kurtosis. Because the values for the parameters do not correspond exactly with skewness and kurtosis statistics, we employed an iterative process to identify our final conditions that mirror the distributions from Study 1.

We created six conditions to better understand how dependent variables with different (both normal and nonnormal) distributions might influence OLS regression. In Table 3, we display each condition and its corresponding simulation code. Condition 1 is our baseline, which represents a normal distribution; Condition 2 represents moderate (positive) skew and moderate kurtosis; Condition 3 represents moderate (negative) skew and moderate kurtosis; Condition 4 represents no skew and high kurtosis; Condition 5 represents high (positive) skew and high kurtosis; Condition 6 represents high (negative) skew and high kurtosis.⁷ Table 4 displays the descriptive statistics and kernel density plots across all 10,000 iterations for each condition.

Table 3.

Simulation Conditions.

Condition	Description	R Code
1	Normal distribution (skewness = 0, kurtosis = 3)	f1 < - function (n) {rsgt(n, mu = 0, sigma = 1, lambda = 0, p = 2, q = Inf, mean.cent = FALSE, var.adj = FALSE)}
2	Positive moderate skewness, moderate kurtosis	f2 < - function (n) {rsgt(n, mu = 0, sigma = 2, lambda = 0.8, p = 2, q = 2, mean.cent = FALSE, var.adj = FALSE)}
3	Negative moderate skewness, moderate kurtosis	f3 < - function (n) {rsgt(n, mu = 0, sigma = 2, lambda = -0.8, p = 2, q = 2, mean.cent = FALSE, var.adj = FALSE)}
4	No skewness, high kurtosis	f4 < - function (n) {rsgt(n, mu = 0, sigma = .1, lambda = 0, p = 1.15, q = 1.4, mean.cent = FALSE, var.adj = FALSE)}
5	High positive skewness, high kurtosis	f5 < - function (n) {rsgt(n, mu = 0, sigma = .2, lambda = 0.8, p = 1.25, q = 1.25, mean.cent = FALSE, var.adj = FALSE)}
6	High negative skewness, high kurtosis	f6 < - function (n) {rsgt(n, mu = 0, sigma = .2, lambda = -0.8, p = 1.25, q = 1.25, mean.cent = FALSE, var.adj = FALSE)}

Note: We used the seed number 12345 to establish the starting point for the pseudo-random number generation process.

Table 4.

Summary Statistics and Simulation Results of Ordinary Least Squares (OLS) an Quantile Regression Analysis.

Condition 1: Normal Distribution
Mean	SD	Skewness	Kurtosis	Min	Max	1st Pctl	99th Pctl
0.000	0.708	0.000	−0.024	−3.701	3.889	−1.631	1.633
					Simulated y Distribution
Model	B	SE	PerSig	DfBeta
Original	0.050	0.022	60.510%	0.000
Winsorized	0.049	0.022	60.310%	0.000
Log	0.016	0.007	59.110%	0.000
Inverse hyperbolic sine	0.043	0.019	59.880%	0.000
Robust standard errors	0.050	0.022	60.570%	–
Bootstrap	0.050	0.022	60.560%	–
10th percentile	0.050	0.037	28.440%	–
50th percentile	0.050	0.028	43.030%	–
90th percentile	0.050	0.037	27.650%	–
Condition 2: Moderate Positive Skewness and Moderate Kurtosis
Mean	SD	Skewness	Kurtosis	Min	Max	1st Pctl	99th Pctl
2.262	2.514	2.475	11.086	−14.225	217.152	−0.591	11.179
					Simulated y Distribution
Model	B	SE	PerSig	DfBeta
Original	0.049	0.080	10.220%	0.031
Winsorized	0.048	0.072	10.890%	0.000
Log	0.012	0.014	15.570%	0.000
Inverse hyperbolic sine	0.028	0.028	17.520%	0.000
Robust standard errors	0.049	0.078	10.260%	–
Bootstrap	0.049	0.078	10.510%	–
10th percentile	0.050	0.035	30.680%	–
50th percentile	0.049	0.076	10.780%	–
90th percentile	0.047	0.205	8.340%	–

Condition 3: Moderate Negative Skewness and Moderate Kurtosis
Mean	SD	Skewness	Kurtosis	Min	Max	1st Pctl	99th Pctl
−2.262	2.516	−2.493	11.334	−255.038	19.145	−11.178	0.593
					Simulated y Distribution
Model	B	SE	PerSig	DfBeta
Original	0.051	0.080	9.710%	0.033
Winsorized	0.050	0.072	10.440%	0.000
Log	0.002	0.005	7.290%	0.000
Inverse hyperbolic sine	0.029	0.028	17.290%	0.000
Robust standard errors	0.051	0.079	9.990%	–
Bootstrap	0.051	0.078	10.180%	–
10th percentile	0.053	0.206	8.100%	–
50th percentile	0.051	0.076	11.020%	–
90th percentile	0.051	0.036	30.570%	–
Condition 4: No Skewness and High Kurtosis
Mean	SD	Skewness	Kurtosis	Min	Max	1st Pctl	99th Pctl
0.000	0.606	0.044	153.975	−1058.808	949.569	−1.241	1.240
					Simulated y Distribution
Model	B	SE	PerSig	DfBeta
Original	0.050	0.019	67.040%	0.039
Winsorized	0.049	0.010	99.730%	0.000
Log	0.006	0.003	52.770%	0.000
Inverse hyperbolic sine	0.049	0.010	99.270%	0.000
Robust standard errors	0.050	0.017	72.650%	–
Bootstrap	0.050	0.017	72.620%	–
10th percentile	0.050	0.019	71.200%	–
50th percentile	0.050	0.003	100.000%	–
90th percentile	0.050	0.019	71.200%	–
Condition 5: High Positive Skewness and High Kurtosis
Mean	SD	Skewness	Kurtosis	Min	Max	1st Pctl	99th Pctl
0.604	1.811	11.774	189.455	−1939.055	5044.999	−0.164	6.111
					Simulated y Distribution
Model	B	SE	PerSig	DfBeta
Original	0.049	0.057	15.320%	0.245
Winsorized	0.049	0.030	38.020%	0.000
Log	0.020	0.009	59.550%	0.000
Inverse hyperbolic sine	0.044	0.017	72.770%	0.000
Robust standard errors	0.049	0.051	18.450%	–
Bootstrap	0.049	0.051	18.270%	–
10th percentile	0.050	0.004	100.000%	–
50th percentile	0.050	0.013	96.740%	–
90th percentile	0.049	0.089	11.570%	–
Condition 6: High Negative Skewness and High Kurtosis
Mean	SD	Skewness	Kurtosis	Min	Max	First Pctl	99th Pctl
−0.603	1.818	−11.909	193.743	−11917.276	635.541	−6.065	0.165
					Simulated y Distribution
Model	B	SE	PerSig	DfBeta
Original	0.049	0.058	15.220%	0.242
Winsorized	0.049	0.030	38.190%	0.000
Log	0.002	0.004	7.040%	0.000
Inverse hyperbolic sine	0.044	0.017	73.130%	0.000
Robust standard errors	0.049	0.051	17.910%	–
Bootstrap	0.049	0.051	18.120%	–
10th percentile	0.051	0.089	11.660%	–
50th percentile	0.050	0.013	96.900%	–
90th percentile	0.050	0.004	100.000%	–

Models. For each of the six conditions, we compared the effectiveness of multiple statistical techniques that scholars apply to help resolve issues associated with nonnormally distributed residuals. Original refers to OLS regression on our original data without transformation. Winsorized refers to OLS while using winsorized values of y at the 1st and 99th percentiles. Log and Inverse hyperbolic sine refer to OLS using log and IHS transformations, respectively. To account for negative values, we add the minimum value in the sample plus one to the dependent variable before applying the log transformation. Robust standard errors refer to OLS with robust standard errors, and Bootstrap refers to models with case bootstrapping. For our examinations of nonparametric quantile regression, we used R's-quantreg-package to compute our parameter estimates (Koenker, 2022), and we focus on the results for the 10th, 50th, and 90th percentiles (a decision we revisit and discuss later in this article).

Outcome variables. Following research that employs simulations (e.g., Certo et al., 2020; Semadeni et al., 2014), we retained the mean coefficients (B) and standard errors (SE) estimated by each technique across 10,000 iterations per condition. We expect the coefficients will remain similar across different techniques, whereas the standard errors may fluctuate and thus reduce efficiency. To assess efficiency, we calculated the percentage of the 10,000 iterations that were statistically significant at the p = .05 level (PerSig) (Busenbark et al., 2022). Finally, we also retained DfBeta, which represents the average number of outlier observations (within the 1,000 observations per iteration) that substantively influenced the OLS parameter estimates.⁸

Results of Study 2

Table 4 displays the results of our simulations. We observe a relatively stable B at or near values of 0.050 across Conditions 1–6, which is consistent with the idea that nonnormal residuals do not bias coefficient estimates. The slight exceptions to this involve the log and IHS transformations, which fundamentally change the values of the variables and confer different interpretations of the coefficients. Consequently, the B values deviated from the true value in the log and IHS models across the conditions.

We assessed efficiency, which may indeed fluctuate due to nonnormality, via SE and PerSig. Because the error term changes in each condition, it is important to compare these efficiency metrics across the models within and not between each condition. Our findings for Condition 1, which represents a normal distribution, indicate that estimates from OLS regression in its Original form and virtually all of the other parametric models (i.e., all the transformations and bootstrapping) are approximately equally as efficient. Indeed, the PerSig values for each of these models are almost identical at about 60%. By contrast, quantile regression appears slightly less efficient, with PerSig ranging between approximately 28% (10th and 90th percentiles) and 43% (50th percentile). Taken together, these results reinforce that OLS is the most efficient estimator when all assumptions are upheld (Kennedy, 2008). Our results for Condition 1 reinforce that quantile regression underperforms OLS in terms of efficiency when the assumptions of OLS are met.

In Conditions 2–6, we introduced error terms with various nonnormal distributions, which are summarized in Table 4. In every condition we simulated, we observe inefficiencies associated with six variants of OLS modeling (Original, Winsorized, Log, IHS, Robust standard errors, and Bootstrap) compared to quantile regression. To this point, the PerSig values for all nonquantile models are notably lower than their counterparts at various percentiles in quantile regression. For example, in Condition 6, the PerSig of the Original model was 15.22%, whereas nearly all the percentiles in the quantile regression (except the 10th percentile) outpace the OLS estimation with no transformations or adjustments.

On average across Conditions 2–6, the six different OLS specifications estimated statistically significant parameters in about 25% of the iterations. In contrast, across the same conditions, the three different points of quantile regression produced statistically significant parameters in about 44% of the iterations on average. Given that the B outcomes were consistent across the models and the conditions, this means that researchers seeking to test hypotheses would fail to rightfully reject the null hypothesis more often with OLS specifications than with quantile regression.

Discussion of Study 2

Nonnormally distributed error terms reduce the efficiency of OLS (and thus other linear models), and our results help to understand the relative effectiveness of techniques used to mitigate this problem. Not surprisingly, OLS performs the most efficiently when the dependent variable follows a normal distribution, such as in our baseline Condition 1. In this condition, transformations did not improve the efficiency of OLS in any meaningful way because the error terms were normally distributed. And although the log and IHS models produced different coefficient estimates owing to the data transformation, the efficiency remained almost identical. At the same time, quantile regression was less efficient than OLS and its related approaches when the residuals adhered to the strict assumptions of normality.

The relative efficacy of OLS—and its several variants using popular techniques to address nonnormality—and quantile regression invert when the dependent variable displays different levels of skewness and kurtosis. In Conditions 2 and 3, we feature moderate levels of skewness and kurtosis, and quantile regression was more efficient than the several OLS specifications. In Conditions 5 and 6, in which we feature high levels of skewness and kurtosis, quantile regression is again correspondingly incredibly more efficient than the OLS specifications. Even our bootstrapping models estimated parameters and displayed levels of efficiency much more in tune with the OLS models than with quantile regression. In other words, although bootstrapping represented the only estimation technique that did not transform the data or estimates in some ways, it also did not substantively enhance efficiency, at least to the same degree as did quantile regression.

Of particular note is Condition 4, in which we isolate the effects of kurtosis, a characteristic of nonnormality that receives far less attention in the literature than skewness. In this case, nearly all of the techniques to address the nonnormality outperformed OLS in terms of efficiency, but quantile regression was particularly effective. For the first time to our knowledge, our simulation reveals that nonparametric estimation like quantile regression enhances efficiency in the absence of skewness but in the presence of kurtosis. This finding is particularly important considering that Li's (2015) article on quantile regression references skewness (or a close variant) 23 times but does not mention kurtosis. Moreover, given the extreme kurtosis values uncovered in Study 1, we think this is an important contribution to research examining firm performance as a dependent variable.

Figure 4 illustrates the intuition underlying quantile regression by displaying the standard errors estimated by quantile regression at different percentiles of the dependent variable (across our conditions). In particular, Figure 4 underscores the association between the nature of skewness and corresponding standard errors. For instance, in Condition 2—the distribution of y has a long tail that extends to the right (Table 4)—the standard error increases as the tail moves to the right and decreases on the opposite side of the distribution where the probability mass is highest. The opposite pattern holds true for Condition 3, where the distribution of y has a long tail that extends to the left. We observe similar patterns for Conditions 5 and 6, which also include positive and negative skewness. Taken together, the lines in Figure 4 demonstrate that quantile regression is most efficient when estimating relationships near the median (or the highest probability mass) value for the dependent variable.

Figure 4.

Standard errors estimated by quantile regression across percentiles and conditions.

Discussion and Conclusion

Our work offers several contributions to research examining the impact of nonnormality on causal inference as well as empirical research examining the nonnormality of firm performance. We extend work on the distribution of firm performance (Henderson et al., 2012; Makino & Chan, 2017) by showing that both skewness and kurtosis vary dramatically across measures, samples, years, and transformations. Stated simply, there is no single way to describe the nonnormality of firm performance—an idea that to our knowledge has not been discussed in strategy research. Moreover, our results reveal levels of skewness and kurtosis that are well beyond the levels discussed in other disciplines (Bishara & Hittner, 2017; Cohen et al., 2003; Osgood et al., 1988). We also document the ineffectiveness of many techniques that organizational scholars use to resolve such extreme nonnormality. Specifically, our results from Study 1 suggest that even after using log transformations and winsorization, performance variables often remain nonnormally distributed.

To understand how the types of distributions reported in Study 1 influence casual inference, in Study 2 we designed simulations to examine the effectiveness of analytical techniques that scholars use to analyze nonnormally distributed dependent variables. We find that the use of linear regression with such dependent variables results in larger standard errors, which increase confidence intervals, p-values, and Type II errors. Moreover, these challenges remain even after employing transformations used to address nonnormality. With this in mind, we can only imagine how many manuscripts are in the proverbial “file drawer” as a result of such nonnormal variables. Our results may at least partially explain the conclusions that strategy research remains inconclusive about the drivers of performance heterogeneity (Makadok et al., 2018).

Our results also demonstrate that quantile regression is better suited than OLS to study such nonnormally distributed dependent variables. The outcomes in Study 2 highlighted that quantile regression was more efficient than OLS, winsorization, log transformations, IHS transformations, robust regression, and bootstrapping in all conditions in which the dependent variable had a nonnormal distribution. In addition to being more efficient, quantile regression also provides information about the effect of the independent variable at additional points of the distribution of the dependent variable, such as the 10th or 90th percentile. Moreover, researchers can enjoy the benefits of quantile regression without transforming values—and the potential meaning—of the dependent variable. This is especially advantageous given that firms in the top or bottom of the tails for any particular performance distribution are likely theoretically interesting.

Taken together, the results of our two studies present a counterintuitive perspective for researchers examining firm performance. Undoubtedly, scholars in strategy are particularly interested in uncovering common characteristics and practices of the most highly performing companies (Henderson et al., 2012). Study 2 suggests that quantile regression is well-suited for analyzing skewed data, but counterintuitively, its advantages relative to OLS are particularly pronounced when analyzing data at the end of the distribution opposite the long tail. In other words, the advantages of quantile regression relative to OLS when examining high-performing firms increase when the data are negatively skewed. Study 1 indicates this is generally the case with strategy research.

Recommendations for Addressing Nonnormal Dependent Variables

The implications from the outcomes in Studies 1 and 2 are clear: Firm performance measures, even when transformed, exhibit remarkable degrees of nonnormality (i.e., Study 1), and the techniques organizational scholars routinely employ to account for nonnormality prove impotent at best and deleterious at worst (i.e., Study 2). What remains less clear is precisely how researchers confronted with nonnormal dependent variables may proceed in their own work. In this section, we integrate research on transformations and nonparametric models with the implications from our studies to provide recommendations for researchers examining dependent variables with extreme nonnormality.

Step 1: Scrutinize the data and report relevant statistics. Consistent with Becker and colleagues’ (2019) recommendation for nonlinear transformations, the first step in analyzing data with continuous dependent variables involves examining variable distributions. This might start in the data preparation phase of the research process, where Beck et al. (2014) suggest scholars consult kernel density plots for their dependent variables. While most journals in organizational research require that authors report the means and standard deviations of variables, we contend it is also productive for referees and readers at large to review information about skewness and kurtosis. To this end, we encourage authors to consider also reporting details about skewness and kurtosis in their tables alongside the mean and standard deviation values.

Step 2: Compare transformed variables against the raw data. After establishing nonnormality, researchers might proceed by using transformations such as winsorization and log transformations (Becker et al., 2019). It is possible, for instance, that there may exist problematic outliers that are not conceptually or empirically salient and could benefit from transforming the data (Joo et al., 2017). Researchers will never uncover whether this is the case without at least comparing untransformed variables with their transformed counterparts. At the same time, our findings from Studies 1 and 2 are more in stride with cautionary warnings against transformations from extant research (Rönkkö et al., 2022). Accordingly, we encourage scholars to revisit the procedures in the first step to examine the normality of the transformed variable(s).

Step 3: Employ quantile regression. Although Rönkkö et al. (2022) highlight the usefulness of generalized linear modeling (GLM) when dependent variables have nonnormal distributions, this technique does not accommodate the types of variables we studied (i.e., those ranging from positive infinity to negative infinity with extreme levels of skewness and/or kurtosis). With this in mind, we suggest that scholars examining variables similar to those we feature in our research consider quantile regression. Our results in Study 2 suggest that the efficiency of quantile regression drastically exceeds that of OLS and related techniques. Unless the dependent variable is normally distributed, it is difficult to rationalize using techniques that examine average effects as opposed to specific percentiles or median effects (e.g., quantile regression). At the very least, we believe it is instructive to compare the parameter estimates (namely efficiency) at the median of quantile regression against its mean analog in OLS (Li, 2015). Comparing the results to parametric estimators (e.g., OLS) will provide insight into whether quantile regression influences efficiency.

Another primary advantage of quantile regression involves understanding how the effect of an independent variable may vary along different points of the distribution of the dependent variable (Koenker & Hallock, 2001; Li, 2015). To this point, Beck et al. (2014) contend that, at least in their context of employee performance, it is valuable to account for the range of values in case seemingly problematic areas of the distribution are theoretically rich or practically salient. A logical question stemming from this recommendation, however, involves deciding which quantiles to examine and report. Although we reported the results at the 10th, 50th, and 90th percentiles, we could just have easily examined the 1st and 99th percentiles or any other pairing. With this in mind, in Figure 4, we graph the standard errors from the 10th through the 90th percentile in each of our six conditions. As we demonstrate in Figure 4, the nature of skewness and kurtosis plays a vital role in which percentiles estimate more efficient parameters.

At the same time, we do not have specific recommendations about which particular percentiles to report from quantile regression, as the most appropriate level appears to be contingent on the nature of the data (see Figure 4). This is perhaps why there remains debate in other disciplines about best practices for reporting results from a quantile regression (Petscher & Logan, 2014). We do, however, encourage researchers to consult the standard errors across all of the percentiles, just as we illustrate in Figure 4 (e.g., Andriani & McKelvey, 2009). And as we pointed out in the previous subsection, we suggest that the median (i.e., 50th percentile) should be included in each quantile regression since it represents a natural analog to comparable approaches that focus on average effects.

Future Research

Quantile regression. Our findings about the benefits associated with quantile regression highlight the importance of re-examining studies (both published and not published) that feature performance as a dependent variable. Effectively accounting for nonnormality may allow researchers to develop or even shift consensus on key facets of some of the most important theoretical perspectives in organizational research. Scholars could supplement both findings and nonfindings by re-examining the same data with quantile regression, particularly since it provides more information about relationships than models that report only average effects. As Mosteller and Tukey (1977, p. 266) point out, “Just as the mean gives an incomplete picture of a single distribution, so the regression curve gives a corresponding incomplete picture for a set of distributions.” In this way, quantile regression allows researchers to acknowledge heterogeneity and model the potential for relationships to vary along the distribution of the dependent variable. After all, it is hard to disagree that “Data about … Ma & Pa stores in one tail don’t offer much useful information about … Wal-Mart in the opposite tail” (Andriani & McKelvey, 2009, p. 1067).

Simulations. We are also hopeful that our research will benefit future scholarship that employs simulations. Organizational scholars have used simulations to examine the effects of multicollinearity (Kalnins, 2018), endogeneity (Semadeni et al., 2014), and sample selection (Certo et al., 2016), among others. But to our knowledge, all of this work models normally distributed variables. Given our results, future research could examine the extent to which implications from these simulation-based studies hold true with nonnormal distributions.

Another area of future research involves extending the data generation process we employed to account for a multilevel context. As we noted previously, we calculated ICCs for each of the performance variables in Table 2 to better understand the amount of between-firm relative to within-firm variance in each variable. These ICCs revealed that almost all of the variance in the constructs is within-firm over time rather than across firms. Future researchers might further examine this issue and determine the multilevel components of skewness and kurtosis. This would, in turn, potentially allow for simulations that investigate multilevel approaches to quantile regression.

Conclusion

The primary take-away from our research is this: When examining performance as a dependent variable, or another measure that exhibits such extreme nonnormality, research using statistical models that strategy scholars tend to favor may be doomed from the start. Given that scholars rarely empirically test relationships involving firm performance using nonparametric techniques, it is possible that the methodologies commonly employed by strategy scholars have undermined knowledge accumulation and the ability to replicate results. Our research works to right this ship by offering some insight into how to determine when nonnormality represents an issue for causal inference and how to resolve the problems with nonparametric models.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

S. Trevis Certo

John R. Busenbark

Notes

Author Biographies

S. Trevis Certo is the Jerry and Mary Anne Chapman Professor of Business in the Department of Management and Entrepreneurship at the W. P. Carey School of Business at Arizona State University. His research interests include corporate governance, top management teams, firm performance, and research methodology.

Kristen Raney is a PhD candidate in the Department of Management and Entrepreneurship at the W. P. Carey School of Business at Arizona State University. Her research interests include stakeholder relationship management, social activism, corporate governance, and research methods.

Latifa Albader is a doctoral candidate in the Department of Management and Entrepreneurship at the W. P. Carey School of Business at Arizona State University. Her research interests include corporate governance, top management teams, shareholder activism, and research methods.

John R. Busenbark is an Assistant Professor in the Management & Organization Department of the Mendoza College of Business at the University of Notre Dame. His primary research interests include corporate governance and research methods.

References

Aguinis

Gottfredson

R. K.

Joo

(2013). Best-practice recommendations for defining, identifying, and handling outliers. Organizational Research Methods, 16(2), 270‐301. https://doi.org/10.1177/1094428112470848

Aihounton

G. B. D.

Henningsen

(2021). Units of measurement and the inverse hyperbolic sine transformation. The Econometrics Journal, 24(2), 334‐351. https://doi.org/10.1093/ectj/utaa032

Andriani

McKelvey

(2009). Perspective-from Gaussian to Paretian thinking: Causes and implications of power laws in organizations. Organization Science, 20(6), 1053‐1071. https://doi.org/10.1287/orsc.1090.0481

Angrist

J. D.

Pischke

J.-S.

(2009). Mostly harmless econometrics: An empiricist's companion. Princeton University Press. https://doi.org/10.1111/j.1475-4932.2011.00742.x

Baum

C. F.

(2006). An introduction to modern econometrics using Stata. Stata Press.

Beck

J. W.

Beatty

A. S.

Sackett

P. R.

(2014). On the distribution of job performance: The role of measurement characteristics in observed departures from normality. Personnel Psychology, 67(3), 531‐566. https://doi.org/10.1111/peps.12060

Becker

T. E.

Robertson

M. M.

Vandenberg

R. J.

(2019). Nonlinear transformations in organizational research: Possible problems and potential solutions. Organizational Research Methods, 22(4), 831‐866. https://doi.org/10.1177/1094428118775205

Bishara

Hittner

(2017). Confidence intervals for correlations when data are not normal. Behavioral Research Methods, 49(1), 294‐309. https://doi.org/10.3758/s13428-016-0702-8

Busenbark

J. R.

Yoon

H. E.

Gamache

Withers

(2022). Omitted variable bias: Examining management research with the impact threshold of a confounding variable (ITCV). Journal of Management, 48(1), 17‐48. https://doi.org/10.1177/01492063211006458

10.

Cameron

A. C.

Trivedi

P. K.

(2010). Microeconometrics using Stata. Stata Press.

11.

Carboni

O. A.

(2012). An empirical investigation of the determinants of R&D cooperation: An application of the inverse hyperbolic sine transformation. Research in Economics, 66(2), 131‐141. https://doi.org/10.1016/j.rie.2012.01.002

12.

Certo

S. T.

Busenbark

J. R.

Kalm

LePine

J. A.

(2020). Divided we fall: How ratios undermine research in strategic management. Organizational Research Methods, 23(2), 211‐237. https://doi.org/10.1177/1094428118773455

13.

Certo

S. T.

Busenbark

J. R.

Woo

H. S.

Semadeni

(2016). Sample selection bias and Heckman models in strategic management research. Strategic Management Journal, 37(13), 2639‐2657. https://doi.org/10.1002/smj.2475

14.

Cheng

Ioannou

Serafeim

(2014). Corporate social responsibility and access to finance. Strategic Management Journal, 35(1), 1‐23. https://doi.org/10.2139/ssrn.1847085

15.

Chou

C. P.

Bentler

P. M.

Satorra

(1991). Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: A Monte Carlo study. British Journal of Mathematical and Statistical Psychology, 44(2), 347‐357. https://doi.org/10.1111/j.2044-8317.1991.tb00966.x

16.

Cohen

West

S. G.

Aiken

L. S.

(2003). Applied multiple regression/correlation analysis for the behavioral sciences (Third ed.). Lawrence Erlbaum Associates.

17.

Connelly

B. L.

Tihanyi

Ketchen

D. J.

Carnes

C. M.

Ferrier

W. J.

(2017). Competitive repertoire complexity: Governance antecedents and performance outcomes. Strategic Management Journal, 38(5), 1151‐1173. https://doi.org/10.1002/smj.2541

18.

Cox

N. J.

(2010). Speaking Stata: The limits of sample skewness and kurtosis. The Stata Journal, 10(3), 482-495. https://doi.org/10.1177/1536867X1001000311

19.

Crawford

G. C.

Aguinis

Lichtenstein

Davidsson

McKelvey

(2015). Power law distributions in entrepreneurship: Implications for theory and research. Journal of Business Venturing, 30(5), 696‐713. https://doi.org/10.1016/j.jbusvent.2015.01.001

20.

Curran

P. J.

West

S. G.

Finch

J. F.

(1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1(1), 16. https://doi.org/10.1037/1082-989x.1.1.16

21.

Dalton

D. R.

Daily

C. M.

Ellstrand

A. E.

Johnson

J. L.

(1998). Meta-analytic reviews of board composition, leadership structure, and financial performance. Strategic Management Journal, 19(3), 269‐290. https://doi.org/10.1002/(sici)1097-0266(199803)19:3<269::aid-smj950>3.0.co;2-k

22.

Davis

(2015). Skewed generalized T distribution tree.

23.

DeCarlo

(1997). On the meaning and use of kurtosis. Psychological Methods, 2(3), 292‐307. https://doi.org/10.1037/1082-989x.2.3.292

24.

de Winter

Gosling

Potter

Harlow

(2016). Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychological Methods, 21(3), 273‐290. https://doi.org/10.1037/met0000079

25.

Feng

Wang

X. M.

(2013). Log transformation: Application and interpretation in biomedical research. Statistics in Medicine, 32(2), 230‐239. https://doi.org/10.1002/sim.5486

26.

Filatotchev

Bishop

(2002). Board composition, share ownership, and ‘underpricing’ of UK IPO firms. Strategic Management Journal, 23(10), 941‐955. https://doi.org/10.1002/smj.269

27.

Flammer

Ioannou

(2021). Strategic management during the financial crisis: How firms adjust their strategic investments in response to credit market disruptions. Strategic Management Journal, 42(7), 1275‐1298. https://doi.org/10.1002/smj.3265

28.

Gamache

D. L.

McNamara

(2019). Responding to bad press: How CEO temporal focus influences the sensitivity to negative media coverage of acquisitions. Academy of Management Journal, 62(3), 918‐943. https://doi.org/10.5465/amj.2017.0526

29.

Gans

Ryall

(2017). Value capture theory: A strategic management review. Strategic Management Journal, 38(1), 17‐41. https://doi.org/10.1002/smj.2592

30.

Graffin

Haleblian

Kiley

J. T.

(2016). Ready, AIM, acquire: Impression offsetting and acquisitions. Academy of Management Journal, 59(1), 232‐252. https://doi.org/10.5465/amj.2013.0288

31.

Hamann

P. M.

Schiemann

Bellora

Guenther

T. W.

(2013). Exploring the dimensions of organizational performance: A construct validity study. Organizational Research Methods, 16(1), 67‐87. https://doi.org/10.1177/1094428112470007

32.

Henderson

A. D.

Raynor

M. E.

Ahmed

(2012). How long must a firm be great to rule out chance? Benchmarking sustained superior performance without being fooled by randomness. Strategic Management Journal, 33(4), 387‐406. https://doi.org/10.1002/smj.1943

33.

Hubbard

T. D.

Christensen

D. M.

Graffin

S. D.

(2017). Higher highs and lower lows: The role of corporate social responsibility in CEO dismissal. Strategic Management Journal, 38(11), 2255‐2265. https://doi.org/10.1002/smj.2646

34.

Joo

Aguinis

Bradley

K. J.

(2017). Not all nonnormal distributions are created equal: Improved theoretical and measurement precision. Journal of Applied Psychology, 102(7), 1022-1053. https://doi.org/10.1037/apl0000214

35.

Kalnins

(2018). Multicollinearity: How common factors cause type 1 errors in multivariate regression. Strategic Management Journal, 39(8), 2362‐2385. https://doi.org/10.1002/smj.2783

36.

Katz

J. N.

King

(1999). A statistical model for multiparty electoral data. American Political Science Review, 93(1), 15‐32. https://doi.org/10.2307/2585758

37.

Kennedy

(2008). A Guide to Econometrics (2nd Edition ed.). Blackwell.

38.

Koenker

(2022). quantreg: Quantile regression. In R package version 5.94. R package version 5.94.

39.

Koenker

Hallock

(2001). Quantile regression. Journal of Economic Perspectives, 15(4), 143‐156. https://doi.org/10.1257/jep.15.4.143

40.

Leamer

E. E.

(1983). Let's take the con out of econometrics. The American Economic Review, 73(1), 31‐43. https://www-jstor-org-443.web.bisu.edu.cn/stable/1803924

41.

(2015). Moving beyond the linear regression model: Advantages of the quantile regression model. Journal of Management, 41(1), 71‐98. https://doi.org/10.1177/0149206314551963

42.

Makadok

Burton

Barney

(2018). A practical guide for making theory contributions in strategic management. Strategic Management Journal, 39(6), 1530‐1545. https://doi.org/10.1002/smj.2789

43.

Makino

Chan

C. M.

(2017). Skew and heavy-tail effects on firm performance. Strategic Management Journal, 38(8), 1721‐1740. https://doi.org/10.1002/smj.2632

44.

McDonald

Michelfelder

(2017). Partially adaptive and robust estimation of asset models: Accommodating skewness and kurtosis in returns. Journal of Mathematical Finance, 7(1), 219‐237. https://doi.org/10.4236/jmf.2017.71012

45.

Miller

C. C.

Washburn

N. T.

Glick

W. H.

(2013). The myth of firm performance. Organization Science, 24(3), 948‐964. https://doi.org/10.1287/orsc.1120.0762

46.

Mosteller

Tukey

(1977). Data analysis and regression: A second course in statistics. Addison-Wesley.

47.

O’Boyle

Aguinis

(2012). The best and the rest: Revisiting the norm of normality of individual performance. Personnel Psychology, 65(1), 79‐119. https://doi.org/10.1111/j.1744-6570.2011.01239.x

48.

Osgood

D. W.

Johnston

L. D.

O'Malley

P. M.

Bachman

J. G.

(1988). The generality of deviance in late adolescence and early adulthood. American Sociological Review, 53(1), 81‐93. https://doi.org/10.2307/2095734

49.

Pek

Wong

A. C. M.

(2018). How to address non-normality: A taxonomy of approaches, reviewed, and illustrated. Frontiers in Psychology, 9(2104), 1‐17. https://doi.org/10.3389/fpsyg.2018.02104

50.

Petscher

Logan

J. A.

(2014). Quantile regression in the study of developmental sciences. Child Development, 85(3), 861‐881. https://doi.org/10.1111/cdev.12190

51.

Quigley

T. J.

Hambrick

D. C.

Misangyi

V. F.

Rizzi

G. A.

(2019). CEO selection as risk-taking: A new vantage on the debate about the consequences of insiders versus outsiders. Strategic Management Journal, 40(9), 1453‐1470. https://doi.org/10.1002/smj.3033

52.

R_Core_Team (2022). R: A language and environment for statistical computing.

53.

Richard

P. J.

Devinney

T. M.

Yip

G. S.

Johnson

(2009). Measuring organizational performance: Towards methodological best practice. Journal of Management, 35(3), 718‐804. https://doi.org/10.1177/0149206308330560

54.

Rönkkö

Aalto

Tenhunen

Aguirre-Urreta

M. I.

(2022). Eight simple guidelines for improved understanding of transformations and nonlinear effects. Organizational Research Methods, 25(1), 48‐87. https://doi.org/10.1177/1094428121991907

55.

Semadeni

Withers

M. C.

Certo

S. T.

(2014). The perils of endogeneity and instrumental variables in strategy research: Understanding through simulations. Strategic Management Journal, 35(7), 1070‐1079. https://doi.org/10.1002/smj.2136

56.

Villadsen

A. R.

Wulff

J. N.

(2021). Statistical myths about log-transformed dependent variables and how to better estimate exponential models. British Journal of Management, 32(3), 779‐796. https://doi.org/10.1111/1467-8551.12431

57.

Westfall

(2014). Kurtosis as peakedness, 1905-2014. R.I.P. The American Statistician, 68(3), 191‐195. https://doi.org/10.1080/00031305.2014.917055

58.

Wooldridge

J. M.

(2020). Introductory econometrics: A modern approach. South-Western/Cengage.

59.

Wright

Herrington

(2011). Problematic standard errors and confidence intervals for skewness and kurtosis. Behavior Research Methods, 43(1), 8‐17. https://doi.org/10.3758/s13428-010-0044-x

60.

Zavyalova

Pfarrer

M. D.

Reger

R. K.

Hubbard

T. D.

(2016). Reputation as a benefit and a burden? How stakeholders’ organizational identification affects the role of reputation following a negative event. Academy of Management Journal, 59(1), 253‐276. https://doi.org/10.5465/amj.2013.0611

61.

Zhou

Y. M.

(2011). Synergy, coordination costs, and diversification choices. Strategic Management Journal, 32(6), 624‐639. https://doi.org/10.1002/smj.889

62.

Zhu

Shen

(2020). Why do some insider CEOs make more strategic changes than others? The impact of prior board experience on new CEO insiderness. Strategic Management Journal, 41(10), 1933‐1951. https://doi.org/10.1002/smj.3183