Abstract
Many outcomes of interest to management and strategy researchers are in the form of fractions, proportions, or percentages. We review 10 years of research in seven leading strategy and management journals. Instead of implementing best-practice techniques, such as fractional logistic regression, management scholars rely primarily on linear regression, log-odds regression, or the Tobit model. Following up on our review, we present re-estimations of two published papers to show how best-practice methods yield substantially different results than the most commonly used methods. Using simulations, we confirm the results from the reproduced examples in a broader context. Finally, we present a worked example that researchers can lean on when they deal with fractional outcomes.
Keywords
Introduction
“Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.” (Box and Draper, 1987: 74)
Many outcomes of interest to management and strategy researchers are fractional in nature. Such outcomes indicate the fraction that an element of interest constitutes in relation to a whole. Examples include market share as indicated by a firm’s sales as a fraction of all sales in a market (Gómez and Maícas, 2011; Huesch, 2013), innovation indicated by the percentage of a firm’s sales attributable to new products or services (Mihalache et al., 2012), or firm sustainability measured as the percentage of resource use, stemming from renewable sources (Weigelt and Shittu, 2016). Fractional outcomes are attractive to researchers because they are easy to understand and facilitate substantial interpretation of effect sizes that are meaningful and comparable across organizations. Fractional outcomes also make relatively unusual research settings more approachable via easy-to-communicate outcomes of interest. Examples of this include sports (percentage of games won (Fonti and Maoret, 2016); percentage divestment of pitching capacity (Moliterno and Wiersema, 2007)) and commercial aviation (proportion of flights arriving at least 15 minutes late (Prince and Simon, 2015)).
With an abundance of potential applications, it is no surprise that studies with fractional outcomes are common in management and strategy journals. We identify 300 articles in the last 10 years of research in seven of the most prominent journals (Academy of Management Journal, Journal of International Business Studies, Journal of Management, Management Science, Organization Science, Research Policy, and Strategic Management Journal) and find that around 5% of all published articles include one or more quantitative analyses with a fractional outcome.
Fractional outcomes are per definition bounded between 0 and 1 and indicate the size of a part relative to the whole. For fractional outcomes, the part can never exceed the size of the whole, and neither the part nor the whole can take negative values. These characteristics make fractional outcomes different from ratios as, for example, return on assets (ROA), which can take on negative values or values well above one. However, the bounded nature of fractional outcomes also means that researchers are presented with special statistical challenges and requirements. Failure to meet these special requirements may lead to flawed and erroneous conclusions.
To handle the distinct nature of fractional dependent variables, Papke and Wooldridge (1996) introduced fractional logit regression (FLR). Unfortunately, our review of existing research documents that FLR is very rarely used by management scholars despite previous studies demonstrate the advantages of using FLR or similar on data examples from different fields (Cook et al., 2008; Cox, 1996; Kieschnick and McCullough, 2003; Maddala, 1991; Papke and Wooldridge, 1996, 2008; Ramalho and da Silva, 2009; Zhao et al., 2001). Instead, linear regression models (LRM), linear regression with log-odds transformation (LOR), or the Tobit model (TM) are used extensively according to our review of strategy and management research. Researchers need to know about the consequences of failing to meet best practice when studying fractional outcomes, the existence of a better alternative, and the possibilities of theory development by proper estimation of fractional outcomes. Editors and reviewers also need to be aware of the consequences of misspecifying fractional outcomes to ensure trust in reported results and to mitigate the risk that authors try different models in search for “significant” results (Bettis, 2012).
Our article makes several important contributions that illuminate the severe consequences of failing to meet best practices when estimating fractional outcomes. First, we show that current studies with fractional dependent variables very often fail to meet best practices. In our review of research in top journals, we find that only around 6% of the articles studying a fractional outcome use FLR. We re-estimate models from two published articles to show that the consequences of this misspecification can be dire and include substantially different effect sizes and changing significance levels.
Second, given the variety of methods used, we are interested in how misspecifications are likely to affect management and strategy research. To do this, we use simulations with data characteristics typically faced by management researchers. Our review of existing research shows that while researchers very often deal with a fractional dependent variable that has a mean close to zero, they often face very different degrees of variability. If a study, for instance, is interested in management intensity measured as the fraction of managers to the total number of employees, the mean will be low as will variability (no firms have 80% managers). In contrast, a study of export intensity may still have a low mean but variability is higher as some firms export almost all their products. By exploring these data scenarios, we present the first study that seeks to explore whether the applied models are still useful in the context of management research or potentially lead researchers to drawing wrong conclusions. This addresses a current gap in the research methodology literature. No existing studies have systematically compared different models under consideration of data variability and pile up at or close to zero.
Third, Certo et al. (2020) recently explored the dangers of using (unbounded) ratios (such as ROA) as the dependent variable. As described, a fraction is a special case of a ratio. Using a simulation, we show how FLR overcomes the issues identified by Certo et al. (2020) for fractional outcomes and argue that researchers should not be afraid to engage with research outcomes of this nature. Fourth, we provide a practical example illustrating how management and strategy research should estimate fractional outcomes.
Our results clearly show that FLR should be the preferred choice for most of the types of fractional outcomes analyzed in strategy and management research so it is disconcerting to only see it performed in around 6% of articles in our top journals. Our work adds to recent studies in the strategy and management literature investigating methodological issues (Certo et al., 2016; Certo and Semadeni, 2006; Haans et al., 2016; Harrison et al., 2017; Shaver, 2005). This line of research is important in our efforts to build trustworthy cumulative science. Uncertainty about methodological choices paves the way for more researcher degrees of freedom and the possibility of an implicit or explicit hunt for significant results (Simmons et al., 2011). Knowledge of best-research practices among researchers, reviewers, and editors is an effective way to mitigate such problems and create stronger trust in published research results.
Our article proceeds in the following way: first, we report our literature review on practices currently used to estimate fractional outcomes in top journals of our field. Next, we briefly present the theory behind the most popular approaches and their applicability (or lack thereof) to fractional outcomes. We then provide an empirical example indicating how the results of published papers are different when estimated with FLR. We proceed with three simulation studies further identifying potential issues when misspecifying fractional outcomes and a worked example providing a best-practice framework for researchers to use. We end with a discussion.
Literature review
As a first step to better understand the use of fractional outcomes in management and strategy research, we performed a systematic review of existing studies. We focused on seven of the most prominent journals in the field: Academy of Management Journal, Journal of International Business Studies, Journal of Management, Management Science, Organization Science, Research Policy, and Strategic Management Journal. These journals, besides being highly respected, also cover the related subfields of management, organization, strategy, innovation, and international business. They are all committed to publishing leading research and have the highest reviewing standards. We searched all journal articles published in these seven journals in a 10-year period from 2007 to 2016 (excluding editorials, research notes, etc.). For each article, we identified whether the dependent variable in any statistical analysis was fractional. Below, we discuss the boundaries of the definition of fractional outcomes.
The definition of fractional outcomes
As mentioned in the introduction, fractional dependent variables are defined to be bounded between zero and one. This boundedness implies a nonconstant effect of the covariates, which is essentially what makes the statistical challenges arise (Papke and Wooldridge, 1996). It is important to realize that this definition excludes variables that appear to be in fractional form but are actually not bounded between zero and one.
Several common ratios used in management research do not qualify as fractional variables. The main reason is that ratios compare two different quantities. Returns are a different quantity than assets and one common way of comparing one to the other is by computing a ratio. As returns may be positive or negative, ROA has no theoretical upper or lower limits. Some common ratios in management research such as the book to market ratio may have a lower bound at zero, 1 but no upper bound. While neither the book nor market value can be negative, there is no theoretical bound on how much the book value can be larger than the market value.
Fractional outcomes are conceptually different from ratios. Fractional outcomes describe how large a quantity is in comparative relation to a whole. Because the part considered is a part of this whole, it cannot be larger than the whole. For instance, it is impossible to win more games than you play (Fonti and Maoret, 2016) or produce more than the entirety of a system in-house (Novak and Stern, 2009). Furthermore, both the part considered and the whole are constrained to be non-negative (winning or playing less than zero games is impossible). These two facts combined ensure that a fractional outcome is bounded naturally between zero and one. As our focus is on fractional outcomes, we only included studies with a dependent variable matching the given definition.
Overview of studies including fractional outcomes
When a study with a proper fractional dependent variable was identified, we identified the estimation approach used and noted relevant descriptive statistics about the data.
Table 1 shows that, in total, 300 articles include at least one analysis with a fractional outcome. The articles are spread across the period without any visible trend toward more or less use of fractional outcomes. According to our count, in these five journals, fractional outcomes are found in roughly 4.94% of all articles.
Fractional outcome studies in five top journals.
Regression strategies
Figure 1 breaks the studies down by estimation type used for the main analysis.

Regression models for fractional outcomes.
In a later section, we describe these approaches’ advantages and disadvantages when modeling fractional outcomes. Here, we focus on their prevalence. Different forms of linear regression represent the most common estimation technique. Linear regression is used in 165 articles and is the most popular option in all years. Most studies do not note the bounded nature of the outcome. Some researchers acknowledge the bounded nature of their outcome variable and perform a transformation of the dependent variable before running the regression. The most commonly used transformations are log-odds and log transformations. Such transformations, however, run into trouble when the dependent variable includes zero values. In these cases, researchers typically correct the data by arbitrarily adding a very small decimal value to the zeroes before conducting the transformation. A total of 40 studies use a transformed dependent variable in their main analyses. A third modeling strategy that is relatively widely used is the TM, which is applied in 71 studies. This model is also used when authors acknowledge the bounded nature of the dependent variable and seek to address the issue using an estimation respecting censored outcomes.
A small minority of articles (17 in total) include an FLR analysis that explicitly addresses the fractional nature of the dependent variable. An additional four studies report running FLR models as robustness tests. Interestingly, the use of this type of model did not increase over time. Finally, a few studies use other modeling approaches such as Poisson regression.
We interpret these data to indicate that no clear consensus has emerged about the estimation of fractional outcomes in management and strategy research. Indeed, a range of approaches is used. Often, models are very sophisticated, yet they fail at a much more fundamental level through improper modeling of the dependent variable. Before we show how this can fundamentally affect the conclusions that can be drawn from the data, we briefly review the theory underlying the different approaches currently used to study fractional outcomes in top journals.
Regression models for fractional outcomes
Linear and log-odds regression
LRMs are the most popular choice for analyzing fractional dependent variables in top management journals. For this type of outcome, however, LRMs run into a problem: they do not account for the fact that the outcome is bounded between 0 and 1. Thus, it is possible for predictions based on the estimated model to lie outside the unit interval. This makes the downsides of using LRMs to predict proportional outcomes analogous to the downsides of the linear probability model for binary outcomes (Ramalho et al., 2011). The bounds require a nonlinear effect of the regressors, which can be understood as a gradual tailing off of the effect strength near the boundary conditions. However, the constant effect implied by LRMs cannot reflect the true relationship because a constant effect would result in impossible predictions outside the unit interval.
To overcome the issue of potentially predicting outside the bounds of the outcome, management researchers have employed various strategies. A common approach 2 is to use the logit transformation and model the log-odds ratio as a linear function. We refer to this approach as log-odds regression (LOR). The logit transformation ensures that the model predictions lie between 0 and 1 but creates two other problems. First, the transformation complicates the interpretation of the coefficients. Without strong independence assumptions, it is not possible to get predictions on the original percentage scale (Papke and Wooldridge, 2008). This is a substantial drawback as the expected value of the fractional response is what we are usually interested in.
Second, we cannot use the LOR if our dependent variable contains zeros or ones. Our literature review suggests that management researchers in such cases often opt for ad hoc adjustments by adding or subtracting an arbitrarily small constant (e.g. 0.01) from the dependent variable. While this solution is practical, it becomes undesirable to change extreme values if they represent a large percentage of the data (Liu and Eugenio, 2016). It hardly seems preferable to (arbitrarily) change data when regression approaches exist that easily handle values at the extremes (Baum, 2008; Papke and Wooldridge, 1996). Furthermore, even if we convince ourselves that we can defend changing the extreme values, the problem remains of how to recover the expected value of the outcome without further assumptions.
The Tobit model
In the face of fractional data with some proportion of zeroes, management researchers often turn to the TM. This model is commonly applied to censored data, that is, when observations on the dependent variable are censored if they are smaller or larger than some threshold (Cameron and Trivedi, 2005). Unfortunately, censored dependent variables are often confused with so-called corner solution responses (Wooldridge, 2010). Corner solution responses not only exhibit a continuous distribution over a range of values but also take on values at one or two focal points. With zero and one as the two focal points, a fractional outcome may be considered a type of corner solution. However, fractional outcomes are certainly not censored but instead defined only over the interval [0, 1] (Maddala, 1991). When regarding fractional outcomes as a corner solution, the TM may be appropriate for fractional outcomes when certain conditions are met (Wulff and Villadsen, in press), yet we do not observe the TM used this way in our review.
First, a concern about using the TM on fractional data is that it is built on assumptions that are likely to be violated when dealing with fractional data. The TM assumes normality and homoscedasticity of the error term in the latent variable model. These assumptions are likely to be violated for fractional outcomes (as indicated by Figure 2) as the conditional variance is a function of the conditional mean (Cook et al., 2008). Violation of the assumptions is serious as it leads to inconsistency of the Tobit maximum likelihood estimator (Cameron and Trivedi, 2010).

Distribution of fractional outcomes commonly occurred in management illustrated by beta distributions: (a) mean = 0.02, (b) mean = 0.09, (c) mean = 0.24, and (d) mean = 0.50.
Second, the TM cannot represent the true data generation process, unless we use a two-limit TM (Ramalho et al., 2010). The two-limit TM requires the fractional outcome to contain values at both zero and one. For example, with values at zero, but not at one, a TM with a proper upper bound cannot be estimated.
Third, when using the TM to model fractional outcomes, extra care needs to be taken when interpreting its results (Wulff and Villadsen, in press). We found that researchers applying the TM on fractional data routinely report coefficients without computing the appropriate marginal effects. In the TM, a coefficient represents the marginal effect only in the latent variable case. It is unclear why we should be interested in the latent variable mean since observations outside the unit interval do not exist (Wooldridge, 2010). In strategy and management, observations at the extremes may be a natural consequence of individual choices by managers or firms. Some firms do not acquire any patents, and the number of acquired patents cannot be negative. If we analyze the latent variable mean, we are indirectly pretending that there exist negative patent counts, but that these are substituted for zeroes because we do not observe them.
Fractional logistic regression
FLR is an appropriate technique for modeling fractional outcomes. Papke and Wooldridge (1996) suggested imposing a functional form for the conditional mean of the fractional outcome:
Another benefit of using an FLR approach is that the necessary tools for interpretation are the same as for the widely used logit model. Due to the inherent nonlinearity of the model, researchers should compute marginal effects and preferably graph these or the model predictions (Hoetker, 2007). Interpretation should therefore not present itself as a hurdle, at least not to management researchers that regularly use the logit model for binary outcomes. As stated above, this model is effortlessly estimated in modern statistical software (Code for worked example in Supplemental Material). For instance, as of Stata 14, fractional models can be fit using the fracreg command that is fully compatible with margins.
By construction, the FLR model puts the maximum marginal effect at y = 0.5 and has it decrease symmetrically toward both boundaries. 3 This may not be the best way to describe the data in some settings. For instance, an extra million dollars in R&D may have a strong effect on the proportion of innovation sales for firms with only little innovation (2%) compared with highly innovative firms (98%). When the effects are asymmetrical, we should consider link functions such as the loglog or complementary loglog that take this into account. As we show in the worked example at the end of this article, we can use different measures and statistics to help us decide which function to prefer.
Empirical examples
To show the consequences of estimating fractional outcomes with suboptimal techniques, we contacted the authors of published studies to try to re-analyze their data with FLR. We contacted 32 authors but were only able to obtain data from two studies. The remaining authors were either not able to share their data (e.g. because of confidentiality issues) or did not respond. We are gracious to the authors who generously shared their data and do not think these studies represent the most serious cases. Indeed, it has been shown that studies conducted by authors who share data are more likely to replicate (Wicherts et al., 2011). However, as we show below, even in these two well-conducted studies, the use of FLR yields different interpretations of the results.
Reproduction 1: proportion of foreign sales
Table 2 contains the reproduced coefficients and average partial effects (APEs) from the original paper accompanied by the results from FLR. The first study we reproduce is Laursen et al. (2012), who investigate the effect of social capital on the proportion of foreign sales. The authors posit that when social capital reaches a certain threshold the positive effect of social capital switches to negative, thus trapping firm into operating only in their home regions.
Reproducing results from three published studies using fractional outcomes.
FLR: fractional logit regression; APE: average partial effects.
The cell color indicates when the reproduced results are different from the FLR estimation.
indicates change in significance level compared with original study,
indicates substantial change in magnitude, and
indicates change in significance level and magnitude. Models are estimated as in the code supplied by the authors. Control variables are included but omitted for simplicity. Robust standard errors are in parentheses.
p < 0.05; **p < 0.01; ***p < 0.001.
Laursen et al. (2012) use a TM to model their fractional outcome, which has a mean of 0.33, a standard deviation (SD) of 0.30, 21% of the observations at zero and below 1% at one. The authors use fractional regression as a robustness check so the main results should not vary much from the reported TM. Our concern with the Tobit involves the typical use of a linear interpretation and the computation of marginal effects, especially away from the mean. The data provides a useful illustration of how interpretation of the Tobit may differ from FLR. Reproducing the results from Model III in their paper, we see that effects at the median value of the independent variable are similar (as the author’s robustness check would also indicate) but very different at high or low values. The APE at the 5th percentile of social capital estimated by the Tobit is approximately twice as large as the one estimated by FLR (Table 2). This is a substantial difference. Furthermore, at high levels of social capital, we observe a change in the significance level. While the TM APE is significant at the 5% level, the FLR APE is significant at the 1% level. In our simulations below, we show that a linear interpretation of the Tobit, like the one employed by the authors, is likely to result in an overestimation of the relationship for low levels of social capital. 4 In total, our re-estimation does not lead us to question the authors’ main hypotheses tests but show how the TM can lead to problematic interpretation of results, especially away from the mean.
Reproduction 2: press freedom and response rate
The second reproduction is based on Jensen et al. (2010), who investigate the relationship between firm survey nonresponse and press freedom. The authors suggest that in politically repressive environments firms use nonresponse as a self-protection mechanism. At the country level, survey nonresponse has a mean of 0.95, a SD of 0.07%, and 13% of the observations take on the value 1.
As Laursen et al. (2012), Jensen et al. (2010) use a TM with a linear interpretation but do not report to have used a fractional regression as a robustness check. Comparing the significance levels of the coefficients, the TM coefficient is significant at the 5% level, while FLR suggests significance at the 10% level. The TM is underestimating the APE for countries with little press freedom and overestimating the APE for countries with a high degree of press freedom. The TM is estimating the effect of press freedom to be constant across different values of nonresponse, where FLR is predicting the APE to decline as nonresponse increases. This is a substantial difference that might imply a different theoretical relation between the variables. Furthermore, the differences in APE are nontrivial. At the 5th percentile, the Tobit APE is underestimated by a factor of 1.6, while it is overestimated by a factor of 2 at the 95th percentile. Furthermore, the TM APE is significant at the 5% level, while the FLR APE is significant at the 0.1% level.
The consequences of improper estimation of fractional outcomes
Our re-estimation of these rather well-executed studies indicates that the consequences of misestimating fractional outcomes are far from trivial. To investigate (a) when problems with wrongful estimations are likely to arise and (b) how big they are likely to be, in the next section, we conduct three simulations using data characteristics that our review showed were typical for management and strategy research. This enables us to understand better when we should be particularly concerned about the results of existing studies. Before describing the simulations, we discuss the fractional data characteristics typically encountered by researchers.
Data characteristics
To understand which kind of fractional data that management researchers often encounter, we collected and summarized means and SDs from the studies included in our review. For 110 variables, descriptive information about the dependent variable was missing. After contacting the authors, we received information on six more variables and ended with means and SDs for 72% of the studies in our review.
Analyzing the collected information, we discovered some interesting tendencies. For 10% of the fractional dependent variables, the mean was below 0.02; for 25%, it was below 0.09; for 50%, it was below 0.24; and for 75% it was below 0.50. Although many researchers encountered dependent variables with similar means, we found a lot of variation in the SDs for the different mean levels. Thus, management researchers seem to typically encounter fractional data with low means with very different levels of variation.
The mean and variance of fractional variables often have a special relationship. For means close to the boundaries, the variance is generally smaller than around the middle. The variable is bounded, which makes location, variation, and skew less intuitively separated than for the unbounded scale of the normal distribution (Smithson and Verkuilen, 2006). For instance, a lower mean could imply skewness if close enough to the zero boundary. To gain some intuition about the common fractional distributions in management research, we create four figures (Figure 2(a) to (d)). To generate the distributions, we use a beta distribution. The beta distribution recognizes many of the special features of fractional variables and is thus often used to describe the behavior of proportions 5 (Ferrari and Cribari-Neto, 2004; Paolino, 2001). Represented by a beta density, each figure illustrates the consequence of data spread 6 for the shape of the distribution of fractional data.
The figures show that it is likely that management researchers often encounter right-skewed data when working with fractional outcomes. Figure 2(a) shows that for a mean of 0.02, a medium or high SD makes the distribution highly right-skewed. Distributions with a very low mean and high SD are likely to occur, for instance, in scenarios where the vast majority of firms have no or very few patent citations (Vasudeva et al., 2015), but with a fair share of outliers. Management researchers also face outcomes with a low mean and low SD. This may happen in cases where all observations have very similar low percentage values, for example, percentage of shareholder votes withheld (Hillman et al., 2011) or the proportion of customer failures per bank (Sasson, 2008). Such variables are likely to be normally distributed as depicted in Figure 2(a).
Figure 2(b) and (c) shows that for means around 0.09 and 0.24, management researchers are likely to face right-skewed data of various extremities for the most common SD levels. In management research, these variables may be characterized by events that happen more frequently and to a larger extend than for means very close to zero. For instance, a reoccurring variable with a mean around 0.09 was a firm’s foreign sales relative to total sales (Fernhaber et al., 2009; Lee and Weng, 2013). Examples of variables with a mean around 0.24 included the share of patents acquired (Arora et al., 2014) and percentage or workers who quit or were dismissed (Batt and Colvin, 2011). Several firms have a substantial percentage of workers who quit or were dismissed, but because many firms may have a very low percentage, such variables are likely to exhibit considerable skewness.
Figure 2(d) illustrates that for means around 0.5, the shapes may be very different depending on the SD. Low and medium SDs generate normal looking shapes, while a high SD results in a bimodal looking distribution with more mass in the tails. A normal looking shape with a low SD is likely in scenarios where there is a strong tendency for values to clump up close to the middle. This may occur in some rating scenarios such as when asking participants to rate the positive behavior of others and then compute a ratio (Mero et al., 2007) or computing the percentage of patients who rated their physician as “excellent” (Hekman et al., 2010). A “flatter” normal shape with an SD of 0.19 occurs when outliers are more likely. This could be likely in sports scenarios such as measuring the percentage of games won (Holcomb et al., 2009; Sirmon et al., 2008). Finally, for this mean level, researchers may encounter a bimodal shape. For instance, if asking certain groups of respondents for what proportion of the paid work in the household they were responsible (Livingston, 2014), they may distribute themselves mostly in either the no-work participation or sole wage-earner category.
That right-skewed distributions often occur in management research may be caused by the presence of zero values (Ramalho et al., 2011). As mentioned above, we suspect that especially the outcomes illustrated in Figure 1(a) to (c) are likely to contain some proportion of zeroes. For instance, a firm may not engage in bribery at all (Birhanu et al., 2016) or have no patent citations (Vasudeva et al., 2015). While authors often report minimum and maximum values, we found that they very rarely report the number or proportion of zeroes (or ones) in their data. The exceptions to this rule provide us with a hint that the proportion of zeroes in management research may vary substantially. Authors report as low as 2% (Eggers, 2012) or 10% (Kumar and Lim, 2008) zeroes, while others report 54% (Weigelt and Miller, 2013) or even as high as 76% (Weigelt and Shittu, 2016). Although the proportions of zeroes can clearly be substantial in some cases, the lack of information about these indicates that researchers and reviewers pay little attention to boundary values.
As documented above, management researchers often deal with data characterized by different variability and thus skewness. That the outcomes typically have low means may indicate that having some proportion of zeroes is rather common. The fact that these proportions are rarely reported indicates little concern about how excessively boundary values occur.
Three simulation studies
In light of the inconsistent modeling approaches of fractional outcomes uncovered in our review, we designed a series of Monte Carlo simulations. Our goal was to gauge the performance of four regression models in two different data scenarios often encountered in management research. We seek to illuminate a more general image of the challenges that were made visible in the replicated studies above. Study 1 examines the role of the variability of the fractional outcome when analyzing fractional data. Study 2 investigates how the proportion of zero values impacts regression model results. Together, these studies help us understand the degree to which inferior estimation techniques may affect the conclusions drawn from data and better execute future studies of fractional outcomes in management. Finally, Study 3 addresses the concerns recently raised by Certo et al. (2020) about the dangers of using ratios as dependent variables. We explore whether the problems of modeling ratios also apply to correctly modeled fractional outcomes. Details on the data generation process and general procedure are available in the Supplemental Material.
Study 1: the consequences of variability without boundary values
In Study 1, we investigated the role of fractional data variability for the model performance of LRM, TM, LOR, and FLR, respectively. Study 1 focuses on a scenario with no boundary values. According to our review, such data are relatively common in management research. For example, Cennamo and Santalo (2013) measured platform market share as a given video game console’s unit sales in a given month relative to total unit sales of active consoles in that month. Their fractional outcome had a minimum value of 0.01 and a maximum value of 0.71 and thus no values at the boundaries. Another example is Groysberg et al. (2011), who measured client-rated group effectiveness as the percentage of clients who rated a given research department as being one of the best 10 in a year. In no cases in the authors’ sample did no or all the clients rate a department as such (minimum = 0.01, maximum = 0.59).
As uncovered in our review, management researchers use data characterized by very different variability. To understand how different levels of variability impact model results for the most common data types in management research, we varied the SDs for different means. By manipulating the parameters of a beta distribution, we simulated fractional outcomes with the most common means as described by the 10th, 25th, 50th, and 75th percentiles. For each mean, we varied the SDs to reflect a low and a high level of variability. These levels were based on the SDs at the 25th and 75th percentiles for each mean level uncovered in our review. 7 In the Supplemental appendix, we explain the simulation conditions in more detail.
Results
Figure 3 illustrates the results of Study 1. As expected, all methods have wider predictions bans for outcomes with higher SDs. Their ability to predict the true relationship is very different. While LOR predicts the true relationship very well for low SDs, it is highly sensitive to fractional outcomes with high SDs. This is apparent for all investigated mean levels. The predictions for low levels of x are highly variable for a mean of 0.02 with 90% of predictions in the interval from 0.02 to 0.31. For mean levels of 0.09, 0.24, and 0.50, LOR overshoots substantially for low values of x, while undershooting as x approaches its mean.

Study 1—the top and bottom rows contain predictions for low and high SD-to-mean ratios, respectively.
There is no noticeable difference between the LRM and TM for any scenarios. This makes sense as the fractional data in Study 1 contain no boundary values. The LRM and TM predict a substantial portion of negative values for the right-skewed distributions. In all scenarios, FLR is very close to the true relationship and tracks its nonlinear behavior very well with some mild underestimation for high proportions.
In the Supplemental appendix, we provide additional information about the estimators’ bias at mean for fractional outcomes with typical means and SDs. As the SD increases relative to the mean, LOR estimates increasingly biased marginal effects with higher standard errors. The LRM and TM are very good estimators of the marginal effect at the mean (MEM) showing only minimal bias and good efficiency. In almost all scenarios, FLR hits the true MEM and is overall the best estimator of the true MEM.
Discussion
Our results have important implications for management research analyzing fractional dependent variables. First, our results suggest that logit transformation of the fractional outcome followed by ordinary least squares (OLS) estimation results in severe prediction error if the fractional data have high variance. This is consistent with Paolino’s (2001) observation that LOR has a notably higher prediction error than OLS. However, we find that for low-variance outcomes, we may be less concerned about management research using LOR. Under such conditions, LOR is even rivaling the performance of FLR.
We find that in high and medium variance cases with no boundary values, researchers are much better off sticking with the LRM. The LRM and the TM are excellent estimators of the relationship around the mean of our predictor variable even for high variance outcomes. This is consistent with Papke and Wooldridge’s (2008) assertion that the LRM does a good job of estimating the marginal effect in many nonlinear contexts. The LRM and the TM do still, however, produce a considerable proportion of negative predictions on right-skewed data even if the outcome variance is low. Overall, this is consistent with our replication results which showed problems to be much more serious at the tails compared with median values.
The scholars in our review often motivate the use of LOR out of concern for out-of-bounds predictions. We show that under medium and high variance conditions, an LOR trades the out-of-bounds problem for a bias problem. As management researchers often base their conclusion on coefficient interpretation, this is arguably a poor trade. The good news is that management researchers do not need to decide between the lesser of the two evils. Our results show the FLR tracks the true relationship very well with little bias even in high-variance conditions.
Study 2: the consequences of boundary values at zero
In Study 1, we demonstrated that LOR performs poorly for variable fractional data, especially at the lower boundary. Study 2 examines the influence of the proportion of zeroes 8 on the performance of our selected regression models. While our review suggests that zero values commonly occur, management scholars treat these very differently. An LRM simply treats zero values as any other value and thus ignores its property as the lower boundary of the outcome. The LOR acknowledges the bounded nature of the outcome yet forces the researcher to add a small constant to the zeroes as the log-odds function is not defined for boundary values. 9 The TM risks confusion about interpreting zeroes as censored observations that could potentially have been lower, if we had observed them. Finally, the FLR makes no changes to the zeroes while using a functional form that acknowledges the boundedness of the outcome (Figure 4).

Study 2—the top and bottom rows contain predictions for low and high proportions of zeroes, respectively.
Results
Figure 3 presents the results of our simulations in Study 2. In this study, the LRM and TM produce different predictions. As the proportion of zeroes increases, the relationship as estimated by the TM becomes increasingly negative. This leads to a lower tendency to undershoot higher proportions but a higher tendency to undershoot lower proportions. The TM is more prone than the LRM to predict negative values, and this tendency increases as the proportion of zeroes grow. For a mean of 0.24, the TM starts to overshoot high proportions and even predicts values well above 1.
For 5% zeroes, LOR generally performs almost as well as FLR and much better than the LRM and TM. For 35% zeroes, the variability of the LOR predictions increases, especially when predicting high values. For a mean outcome of 0.02, LOR undershoots substantially for data containing 35% zeroes but is still closer to the true relationship than the LRM. Again, FLR comes very close in all scenarios, and its predictions are less affected by an increasing proportion of zeroes.
Discussion
Based on our review, we suspect that management researchers often encounter fractional variables with some proportion of zeroes. Our results show that the proportion of zeroes can have profound implications for model predictions. The TM reacts to a higher proportion of zeroes by making more out-of-bounds predictions. This is ironic, as the primary motivation among management scholars for using the TM is to cope with the boundedness of the outcome. Our simulations show that management researchers are likely to make matters much worse than simply sticking with the LRM.
Our review showed that management scholars primarily relied on the TM’s coefficients when interpreting their results. In a TM, the coefficients represent the marginal effect on the latent variable mean. Our results show that the higher the proportion of zeroes, the larger the difference between the TM’s estimated marginal effect and the true marginal effect. It is important to note that that the problem is the same if the proportion of ones is high. This should invoke doubt about the validity of past management research when authors analyze data with even a modest proportion of zeroes or ones and interpret their results with respect to the latent variable mean.
LOR was also affected by the proportion of zeroes but to a lesser extent. In many of our scenarios, the LOR accurately estimated the true MEM. However, it clearly struggled with bias for an outcome with a mean of 0.24 and had general difficulties predicting high enough values. Like the conclusion in Study 1, management researchers would generally be much better off sticking with the LRM instead of trying to fix the boundedness issue through the TM or LOR. Our results suggest that in the presence of zeroes, these models are likely to produce biased estimates, especially the TM. FLR produced the most accurate predictions while being unaffected by the proportion of zeroes. Again, FLR consistently outperformed the other models in almost every scenario.
Study 3: the consequences of variable dispersion
In Study 3, 10 we investigate the consequence of variable dispersion on the estimates of FLR. In a recent study, Certo et al. (2020) use simulations to illustrate how dispersion in the denominator of a ratio can severely distort results from the LRM when that ratio is used as dependent variable. It is unclear whether this concern also holds for fractional outcomes. Dispersion is measured as the coefficient of variation (COV), which is defined as a variable’s SD divided by its mean. In Goldfarb et al. (2015), for instance, the denominator total sales in the fractional outcome percent sales that are difficult to pronounce has a COV of 1.24. Our goal with Study 3 is to assess whether the concerns raised by Certo et al. (2020) hold for the best performing model in our simulations above—FLR.
Results
Figure 5 contains the results from Study 3. As it is evident from the figure, the mean of the estimated beta coefficient is essentially equal to the true value. Importantly, this consistency does not depend on the COV. Thus, results from the FLR are not distorted when the denominator used to generate the fractional outcome suffers from even severe dispersion above 1.5 or below 0.5. The results do not depend on the mean of the fractional outcome itself.

Study 3—the plots contain smoothed trend lines the average estimated fractional logit coefficients across coefficient of variation values from 1000 trials.
Discussion
Study 3 demonstrates that the concerns raised by Certo et al. (2020) do not apply to fractional outcomes. Our simulation mimics the special properties of fractional outcomes. While Certo et al. (2020) generate the numerator and denominator as if they were separate variables, 11 we rely on a binomial process where the numerator is generated as a part of the denominator. This creates a simulated fractional outcome that is, in fact, a true fractional outcome as it is bounded strictly between zero and one (Papke and Wooldridge, 1996).
According to our results, researchers using FLR to model fractional outcomes do not need to worry about their results being distorted by dispersion in the denominator. This result is of great importance to strategic management researchers who often model fractional outcomes. If researchers respect the bounded nature of fractional outcomes by using FLR, our results suggest that studies of fractional outcomes should not be discouraged based on suspicions about dispersion. To be clear, we only intend to explore fractional outcomes and not other types of ratios where Certo et al. (2020) offer valuable analyses and advice.
Worked example: proportion of innovative sales
In this section, we present an empirical example to illustrate the use of FLR. The fractional logit model is easy to run with pre-specified code in several software packages. However, model evaluation and interpretation of results may be different from management researchers’ usual practices, so we will focus on those elements here. Relevant code can be found in the Supplemental appendix. We use data from 2009 on 168 German firms’ innovativeness from the Management, Organization, and Innovation Survey, which is a joint initiative if the European Bank for Reconstruction and Development and the World Bank Group (EBRD-World Bank, 2010). Innovativeness is measured as the proportion of sales over the past 3 years that is attributable to new products and services (e.g. Leiponen and Helfat, 2010). This fractional outcome has a mean of 0.22, an SD of 0.18, and just 3 (1.79%) and 2 (1.19%) observations at the zero and one boundary, respectively. The proportion of innovation sales is an appropriately bounded fractional outcome, as it is impossible for firms to attribute more than all or less than none of their sales to new products and service.
In our example, we focus on two key predictor variables used in the literature: firm size and firm age (Berchicci, 2013). We expect that both variables are logarithmically related to the proportion of innovation sales. Larger firms have higher access to financial and human resources which enhances innovativeness. However, eventually, the positive effect of being larger wears off due to inertia (Mihalache et al., 2012). In other words, as firm size increases, the positive effect of firm size on innovativeness decreases. Similarly, we expect the negative effect of age on innovativeness due to reluctance to pursue innovation to decrease as firms grow older (Hottenrott and Lopes-Bento, 2014). To facilitate this logic, we log-transform both predictor variables before our analysis.
Below, we illustrate the two main steps in fractional modeling. We estimate an FLR with the following conditional mean specification
where
Step 1: model specification and evaluation
The first step in FLR modeling is to check the conditional mean specification of our FLR model in equation (1). This can be done using a robust version of Ramsey’s (1969) RESET test using two powers (Ramalho and Ramalho, 2012) and the generalized goodness of functional form (GGOFF) test (Ramalho et al., 2014). The bottom of Table 3 contains the test statistics for said tests. At an alpha of 5%, neither the RESET nor the goodness-of-fit (GOF) tests can reject the null hypothesis that the FLR is correctly specified.
Example of predicting percent innovative sales using fractional logistic regression.
APE: average partial effect; GOFF: goodness of functional form.
APE for factor levels East, South, and West is the discrete change from the base level, which is North. Robust standard errors are in parentheses for coefficients and APEs, while exact p-values are in parentheses for the LM model fit statistics.
p < 0.05; **p < 0.01; ***p < 0.001.
Next, we test the logit specification against the probit, loglog, cloglog, and cauchit functional forms. We estimate equation (1) using these link functions for
Step 2: interpretation of results
For interpretation, management scholars should reach into their toolbox from generalized linear models such as the binary logit (Hoetker, 2007). Researchers should avoid direct interpretation of coefficients and instead rely on APEs and graphical illustrations of predicted proportions (Wulff, 2015). Table 3 also contains the APEs for each predictor variable and Figure 6 displays the predicted proportions of innovation sales at different values for size (left) and age (right). As firms grow larger, the proportion of their sales attributable to innovations rises quickly. However, around 1000 employees, the positive relationship starts to wear off, and the sampling variability increases considerably. Similar insights can be derived from the plotted relationship between firm age and innovativeness. Young start-ups have around twice the predicted proportion of innovation sales compared with 40-year-old firms.
This same interpretation is possible to make based on the APEs (Table 3). APEs are interpreted on the expected value of the fractional outcome. For instance, increasing log(age) by one unit results in an average decrease in the innovation sales proportion of 0.04. If researchers wish to draw statistical inferences based on the p-values, this is also possible based on the derived t-statistics.

Predicted proportions of innovation sales for different values of size (left) and age (right).
In short, managements researchers should first make sure that their FLR is correctly specified using the RESET and GGOFF tests and compared with alternative specifications using p-tests and model fit statistics such as R2 and AIC. Once an acceptable model has been chosen, management researchers can intuitively interpret their fractional model results in terms of the predicted proportions facilitating effective communication to their readers.
Discussion
Our article opened with a version of George Box’s famous aphorism “all models are wrong, but some are useful” (Box and Draper, 1987: 424). Taken together, our results demonstrate when inappropriate modeling approaches may produce results that are so flawed then they are not useful. This is especially true when the researcher is interested in predictions away from the average, which is something strategy, and management researchers inherently are when we try to understand sources of success and failure. That a large part of existing research based on suboptimal estimation choices makes adequate predictions around the mean is only a weak consolation, given the objective of these researches are never specified to only being interested in average firms or organizations. Future theoretical development relies on confidence in existing empirical evidence, which we call into question with our analyses. Below, we discuss in broad terms when we as management scholars should be more or less concerned about the conclusions that are drawn from inappropriate modeling of fractional dependent variables.
Areas of concern
Our simulations demonstrate that, in some cases, employing LOR or the TM can lead to nontrivial bias in estimating marginal effects and seriously flawed model predictions. When combined with a focus on the latent variable, the TM’s weakness lies in the proportion of zeroes. Even in the face of only 5% zeroes, the TM’s estimates are biased, and this bias only becomes larger as the proportion of zeroes increases. In our simulations, we found an overestimation of the true MEM by 90% for 35% zeroes. On top of this, the TM’s predictions become worse as the extent of negative predictions increase. It is important to repeat that these problems are only concerning the latent variable mean. We expect that marginal effects concerning the censored mean are much less biased. However, as management researchers almost exclusively interpret the TM with respect to the latent variable mean, the issues found in our article are very concerning.
Our simulations demonstrate that the greatest problems of LOR are when confronted with medium- or high-variance outcomes in scenarios without boundary values. As we initially showed, high variance often implies skewed distributions for fractional variables. LOR produces seriously erroneous predictions and substantially biased MEMs for SD-to-mean ratios above 1.
Our literature review showed that 34% of fractional outcome papers use either the TM or regression with a transformed dependent variable. Unfortunately, outcomes with high SD-to-mean ratios are anything but a rarity in management research, and we suspect that some proportion of zeroes are often present. This raises the question of whether we need to reexamine prior research to see how robust previous results are to appropriate methods (Bettis et al., 2016). At the very least, we recommend future studies to clearly report descriptive statistics describing the characteristics of the fractional outcome, including the (untransformed) mean, SD, and the number of values at the boundaries.
Despite its out-of-bounds prediction issue, we found that the LRM in many cases does surprisingly well when we compare it with the best-praxis approaches. We found that the LRM does a good job of estimating marginal effects yet becomes problematic with an increased outcome variance relative to the mean or an increased proportion of zeroes. It is somewhat reassuring that LRM’s estimation of the MEM may be biased, but still comes close enough to be considered useful, yet as marginal effects around the mean is rarely the objective of the studies, LRM remains a suboptimal choice of model for fractional outcomes. However, we assess that past research that uses the LRM to draw inferences around the mean should not be on top of our list of concerns.
Often management researchers are interested in using their models to make predictions away from the mean. We find that the LRM often struggles with making such predictions and in many cases produces out-of-bounds predictions. For management research, this is particularly worrying when researchers test hypotheses including interactions. The challenges of interpreting interactions in nonlinear models discussed in the management literature (e.g. Wiersema and Bowen, 2009) are very similar to those posed when modeling fractional outcomes. Even following the best-practice recommendations of graphically interpreting linear models that include interactions (e.g. Dawson, 2014) does not remedy the shortcomings of the LRM: predictions become increasingly erroneous as we move away from the mean of the predictor variable. In our review, we identified several papers that based their interpretation on linear interaction plots even though their models clearly made out-of-bounds predictions. If we want to take the bounded nature of our fractional outcome seriously, we need to revisit cases where linear models are used to test interaction hypotheses.
Finally, we want to re-iterate how Study 3 addresses a recent concern about ratios as dependent variables (Certo et al., 2020). We show that for correctly estimated fractional outcomes, researchers should not be worried which emphasizes the importance of adding FLR to the toolbox of social scientists.
Conclusion
In this article, we present the first review of the use of fractional outcome models in strategy and management research. Our simulations help clarify which conditions that make otherwise popular regression models problematic in the way they are currently applied in management research. We have focused on current applications and FLR as an easily applied alternative that would yield more precise and trustworthy research results. We want to note that our review of the estimation of fractional outcomes is far from exhaustive. For instance, as we briefly mention, the TM has desirable properties when boundary values different from 0 and 1 are relevant. Likewise, two-part models offer interesting opportunities when the amount of zeroes is significant and caused by different processes than those that determine the amount (e.g. the decision to export may or may not be different from the decision of how much to export once you have decided to do so). Future studies will hopefully elaborate on how these and other methods open for new research findings and novel possibilities for theorizing.
We expect and hope that studies of fractional outcomes will remain relevant and popular in the future. Importantly, the types of models we recommend in this article are easily estimated using modern statistical software. Thus, we hope that management researchers will follow our recommendations to keep our discipline among the leading fields when it comes to adopting state-of-the-art statistical models.
Supplemental Material
SAQ_supplement_mat – Supplemental material for Are you 110% sure? Modeling of fractions and proportions in strategy and management research
Supplemental material, SAQ_supplement_mat for Are you 110% sure? Modeling of fractions and proportions in strategy and management research by Anders R Villadsen and Jesper N Wulff in Strategic Organization
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Notes
Author biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
