Umpire Home Bias in Major League Baseball

Abstract

This paper studies whether Major League Baseball umpires displayed home bias in their pitch calls, using data on pitch call accuracy from the 2010–2019 seasons to isolate evaluator bias from player performances. The main findings are consistent with umpire home bias, as home batters on average received more called balls on actual ball pitches and fewer called strikes on actual strike pitches, which work in their favor. The bias is not entirely explained by umpire, player, or stadium characteristics, nor is it attributable to umpiring inconsistencies.

Keywords

evaluation bias referee bias sports Major League Baseball

Introduction

This paper evaluates whether home bias exists among Major League Baseball (MLB) umpires in their pitch calls. Umpire home bias is an example of a principal-agent problem, where the interests of the evaluators (i.e., the umpires) from showing favoritism toward the home team exceed the rewards from the league for being objective evaluators.¹ Such bias leads to inaccurate performance evaluations as they no longer measure the true productivity of the evaluated individuals (Parsons et al., 2011).² However, identifying umpire bias requires assessing umpires’ productivity, which can be challenging for several reasons. First, an objective benchmark to assess umpire productivity may not exist when the officiating rules require subjective interpretations. Furthermore, disentangling umpire performance from player performance may be difficult. For example, an umpire calling more fouls against a team may reflect his or her bias against the team, but it is also possible that the players commit more fouls in the first place. Finally, umpiring calls may in turn affect the decisions by the players, as players may adjust their strategies based on the observed calls.

The main contribution of this paper is to use the accuracy of pitch calls, which can be evaluated by comparing the actual pitch locations versus the pitch calls, to address the aforementioned issues and provide additional evidence on umpiring bias. In the context of MLB, call accuracy can be objectively evaluated since ball and strike zones are well defined in the rulebook (Tainsky et al., 2015),³ and the pitch locations can be accurately tracked. Call accuracy also controls for the impact of pitcher and batter performances relative to call instances. While the number of ball and strike calls would also depend on the pitcher's and the batter's abilities, the accuracy of pitch calls (conditional on the actual location of the pitches) would only reflect the umpire's ability to distinguish between balls and strikes.

Figures 1 and 2 demonstrate the degree of inaccurate calls by comparing the difference between actual pitch calls and actual pitch locations from the 2010–2019 seasons. About two-thirds of all pitch calls were balls and one-third of pitch calls were strikes. However, among all of the actual pitches thrown, slightly less than a quarter of them would have actually fallen inside the strike zone while the rest should have been called as balls. The top panel of Table 1 highlights the frequency of erroneous calls. Among all actual strikes and all actual balls thrown, umpires made inaccurate calls at a rate of 9.12% and 14.43%, respectively. In this paper, I measure umpire home bias as whether these inaccurate calls disproportionately favored the home team over the away team.

Figure 1.

Percentage of called strikes and balls.

Figure 2.

Percentage of actual strikes and balls.

Table 1.

Umpire Pitch Calls and Pitch Errors—Home and Away Batters.

	Actual Strikes	Actual Balls
All batters
Correct calls (% of actual pitches thrown)	90.88	85.57
Incorrect calls (% of actual pitches thrown)	9.12	14.43
Actual pitches thrown	913,488	2,862,207
Home batters
Correct calls (% of actual pitches thrown)	90.79	85.82
Incorrect calls (% of actual pitches thrown)	9.21	14.18
Actual pitches thrown	447,867	1,411,181
Away batters
Correct calls (% of actual pitches thrown)	90.96	85.32
Incorrect calls (% of actual pitches thrown)	9.04	14.68
Actual pitches thrown	465,621	1,451,026

Why might umpires favor the home team over the away team? On one hand, umpire performances are subject to reviews from the league that employs them. Starting from 2001, MLB has employed strike zone monitoring technology (QuesTec) to objectively evaluate pitch locations and pitch calls. More scrutiny from the league should reduce the incentive for umpiring bias if umpires are rewarded and punished according to their performance; however, MLB rarely imposes severe punishments for underperforming umpires (Bradbury, 2019). On the other hand, MLB games are often attended by a predominantly one-sided audience. The fans may increase the social pressure faced by the umpires, causing them to award more favorable calls to the home team (Endrich & Gesche, 2020). Thus, in the absence of punishment mechanisms, umpires may internalize the social pressure from the crowds and make calls disproportionately benefitting home teams (Garicano et al., 2005). Evaluating which of the monitoring forces is stronger (the league or the home crowds) is an empirical question that this paper attempts to answer.

To evaluate umpire home bias, pitch-level data for every pitch called in every MLB game from the 2010–2019 seasons were utilized. The findings of this paper suggest umpire home bias in MLB exists since home batters received more favorable calls from the umpires relative to away batters. In particular, umpires mistakenly called 0.07 fewer strikes per game against home batters among actual strikes thrown, and called 0.7 more balls per game against home batters among actual balls thrown. The results are robust to the inclusion of stadium and player fixed effects and are not entirely explained by umpiring inconsistencies. Furthermore, the errors favored the home team even more when attendance increased and in high-leverage situations. Interestingly, while umpire experience decreased the overall error rates, it did not reduce the instances of home bias. Umpires with more experience are still more likely to give favorable calls to home batters relative to away batters.

Eliminating the influence of umpiring errors on the outcome of sporting events has been an ongoing debate in sports. On one hand, umpires have been an integral part of such events since the inception of the professional leagues. On the other hand, the erroneous calls may have a significant impact on the game outcome itself. As an example of an effort to remove such influence, all Triple-A Minor League Baseball games starting from the 2023 season are utilizing robot umpires in pitch calls in an attempt to “eliminate the individual and sometimes inconsistent strike zones that vary from umpire to umpire, and with it the possibility that a game can turn on a bad ball/strike call” (Associated Press, 2023, p. 6). The findings in this paper contributes to the debate by providing an estimate on the marginal benefits of removing human umpires from making pitch calls, in particular for the away team. While the results suggest the degree of umpire home bias is relatively modest, they also suggest such bias is exacerbated by forces that may increase the pressure umpires face. As a result, home teams would have a smaller advantage in winning ballgames if human umpires were to be removed from making pitch calls.

This paper contributes to existing studies on umpiring bias by providing supporting evidence in the context of MLB pitch calls, while at the same time addressing the empirical issues in measuring umpire performance. Studies have shown that soccer referees tend to award home teams more stoppage time (Garicano et al., 2005; Rickman & Witt, 2008), more penalty kicks when trailing (Dohmen, 2008), as well as fewer fouls and cards against them (Bryson et al., 2021; Buraimo et al., 2012). Referee home biases have also been found in hockey penalties (Guerette et al., 2021), basketball turnovers (Price et al., 2012), and boxing points (Balmer et al., 2005).⁴ The main departure of this paper is the empirical strategy utilized. In this paper, I focus on call accuracy to measure umpiring bias rather than the call instances used in most of the existing studies. The strategy is similar to that of Gong (2022), who used the accuracy of foul calls for National Basketball Association (NBA) games before and during the COVID-19 epidemic to test whether referee biases for home team are weakened by the absence of crowds.

This paper is a part of a small but growing group of studies that use baseball to study human behavior. As noted in Parsons et al. (2011) and Bradbury (2019), the advantages of using baseball data include the sheer number of observations available, a well-defined and accurate measure of incentives, and a clear distinction between the agents responsible for a decision. Studies using data from MLB have found evidence of discrimination in performance evaluations (Kim & King, 2014; Parsons et al., 2011), on the effects of monitoring on shirking (Bradbury, 2019; Mills, 2017b), on the impact of temperature on labor productivity (Fesselmeyer, 2021), and on the relationship between performance evaluation quality and evaluator's productivity (Mills, 2017a). The findings in this paper complement these studies by identifying the extent of umpiring preference for home batters. To my best knowledge, this paper is among the first to estimate such bias in professional baseball.

The structure of this paper is as follows: the second section describes the data, the third section outlines the methodology, the fourth section reports and discusses the results, and fifth section concludes.

Data

The main data in this paper comes from Skillalytics, which collects pitch-by-pitch data from MLB Statcast (Gilliland, 2022). Statcast has tracked the location of every pitch, known as the X-coordinate (horizontal location) and the Z-coordinate (vertical location) in the MLB games, as well as the coordinates for the top and the bottom of the strike zone for each batter. Each pitch can be identified as an actual strike or an actual ball by combining the pitch location with the strike zone coordinates. Pitch types (e.g., slider, fastball), pitch characteristics (such as velocity and spin rates), and pitch outcomes (e.g., called pitches or ball in play) are also recorded in the data. The dataset also contains information regarding the pitcher-batter matchup for each pitch. From the same dataset, I obtained the ball and strike counts before the pitch, names of the pitcher and the batter as well as their pitching/batting hands, the home and away teams, as well as the date and time of each game. Finally, I merged the pitch-by-pitch data with a game-level dataset, Baseballr, that contains information on the inning and the score for both teams for every pitch (Petti & Gilani, 2022). Baseballr also obtains its information from MLB Statcast regarding game-level information.

Skillalytics, as well as the MLB Statcast data that it utilizes, are suitable to address the topic under investigation for a few reasons. First, Skillalytics contains the exact location of each pitch as it enters the home plate, as well as the strike zone for every at-bat. This allows each pitch to be identified as an actual strike that enters the strike zone, or an actual ball that does not. Second, the actual pitch call for every pitch is recorded. This allows umpiring performance to be observed, as the pitch call can be compared with the actual pitch location. Finally, every pitch in every MLB game is recorded in the dataset. The sheer volume of the data (more than 7.3 million pitches thrown and more than 3.7 million pitches called in the 2010–2019 seasons) reduces the possibility that any observed preferences in pitch calls are driven by statistical outliers.

I focus on the pitches where umpires needed to make a pitch call, calling the pitch as either a strike or a ball.⁵ Comparing the pitch call with the actual pitch location, a call can be either: (a) an actual strike correctly called as a strike, (b) an actual strike incorrectly called as a ball, (c) an actual ball correctly called as a ball, or (d) an actual ball incorrectly called as a strike. While both Scenarios (b) and (d) represent erroneous calls, Scenario (b) would benefit the batter at the expense of the pitcher, and Scenario (d) would benefit the pitcher at the expense of the batter.

Table 1 reports the frequency of pitch call errors for both away and home at-bats. Among all called pitches, there were about 0.91 million pitches thrown that were actual strikes and 2.86 million that were actual balls from the 2010–2019 seasons. Among the actual strikes, umpires called 90.88% of pitches correctly as strikes and incorrectly called 9.12% of them as balls. Umpires were less accurate with their calls on actual balls, with only 85.57% of them called correctly as balls and 14.43% of them incorrectly called as strikes. The splits for home versus away batters were consistent with umpire home bias. Umpires incorrectly called 9.21%of actual strikes as balls for home batters and only 9.04% for away batters, which benefitted home batters at the expense of away pitchers. Similarly, the erroneous calls for actual balls thrown also disproportionately benefited home batters. Among these pitches, umpires incorrectly called 14.68% of them as strikes for away batters while they incorrectly called only 14.18% for home batters.

Game-level data is obtained from Retrosheet, which collects information from MLB box scores for each game. There are a total of more than 24,600 games in the pitch-by-pitch data. Of all these games, attendance figures and name of home-plate umpire are available for more than 99% of the games. Umpire-level data, including the years served as MLB umpire and the number of games as home-plate umpire in each year, is also obtained from Retrosheet. Table 2 presents the summary statistics of pitches called per game as well as other control variables. There were about 152 pitches called per game on average, with about 37 of the pitches as actual strikes and 115 of the pitches as actual balls. Umpires called 50 strikes and 102 balls per game, with an average of 19.9 erroneous calls per game. Left-handed batters accounted for 43.6% of all called pitches, while left-handed pitchers accounted for 27.7% of all called pitches. Umpires on average had 16 years of experience and about 464 games of experience as home-plate umpire. Finally, among all the games in the sample, fan attendance was at 70.3% of stadium capacity on average.

Table 2.

Summary Statistics.

	Mean	St. Dev.
Actual strikes per game	36.88	9.89
Actual balls per game	114.84	20.45
Called strikes per game	50.21	12.16
Called balls per game	101.51	18.37
Called pitches per game	151.71	25.97
Incorrect strike calls per game	3.37	2.56
Incorrect ball calls per game	16.53	5.94
Incorrect calls per game	19.90	6.66
Percent of left-handed batters	0.436	0.496
Percent of left-handed pitchers	0.277	0.200
Number of games as home-plate umpire	463.85	282.14
Years of experience as home-plate umpire	16.31	9.03
Attendance (as percent of capacity)	0.703	0.225

Methodology

To measure whether umpires perform differently between home and away batters, I begin with the following specification:

I (correct call | pitches thrown)_{i} = β_{0} + β_{1} H B_{i} + β_{2} X_{i} + U_{j} + V_{g t} + ϵ_{i},

(1)

where the dependent variable is an indicator equal to one if umpire makes an error on a pitch call for pitch i and zero otherwise, HB _i is an indicator equal to one if home team is at bat and zero otherwise, X_i is a series of control variables, U_j is umpire fixed effects for home-plate umpire j, V_gt are time dummies representing the month and the year the game was played to control for time-specific factors potentially affecting pitch call accuracy, and ɛ_i is the error term. The control variables include the throwing and the batting stances of the pitcher and the batter, height of the strike zone, pitch types,⁶ ball and strike counts, inning number indicators, and score differences when the pitch is thrown. These variables could potentially impact how umpires visualize the strike zone and the pitch location, which may also affect pitch call accuracy.

The main coefficient of interest, β₁, can be interpreted as umpire productivity against home batters relative to away batters (e.g., Lopez & Mills, 2019), since the strike zones can be accurately defined for each batter. A positive β₁ implies umpires perform better when home batters are at the plate, since the pitch calls are more accurate against home batters relative to away batters. However, different types of call errors may benefit either the home batters (at the expense of away pitchers) or the away batters (at the expense of home pitchers). Therefore, umpiring performance differences do not necessarily imply umpiring biasedness.

To test whether the errors benefit one team over the other, I first divide the called pitches into two subsamples: actual strikes thrown and actual balls thrown. Then, for each subsample, I test whether the erroneous calls favor home batters over away batters, following the same specification as Equation (1):

I (called balls | actual balls)_{i} = β_{0} + β_{1} H B_{i} + β_{2} X_{i} + V_{g t} + U_{j} + ϵ_{i},

(2)

I (called strikes | actual strikes)_{i} = β_{0} + β_{1} H B_{i} + β_{2} X_{i} + V_{g t} + U_{j} + ϵ_{i} .

(3)

Both Equations (2) and (3) are estimated by linear probability model and probit model.

In both equations, the direction of the β₁ coefficient determines whether the inaccurate calls systematically favor home batters over away batters. If umpires are more accurate in calling actual balls as balls for home batters, the coefficient for β₁ in Equation (2) will be positive, and the inaccurate ball calls will favor home batters over away batters. On the other hand, if umpires are calling fewer actual strikes as strikes for home batters, which implies a negative β₁ in Equation (3), the inaccuracy in strike calls also will favor home batters.

The underlying assumptions behind the methodology are that (a) an unbiased umpire will adopt the official definition of a strike zone to make pitch calls and (b) an unbiased umpire will on average perform similarly against both home and away batters based on the official strike zone. In practice, however, it is difficult to test assumption (a) as the exact strike zone an umpire uses in a game is not directly observable, and thus any observed umpiring preferences may be driven by inconsistent strike zones. Nevertheless, improvements in pitch tracking technology, such as the earlier QuesTec and the later Zone Evaluation systems, would allow the league and its umpires to evaluate umpiring performance and improve on it.⁷ On the other hand, while the estimation equations have controlled for pitcher-, batter-, time-, and umpire-specific factors that can affect umpiring performance, there may be other unobserved factors that potentially contributes to performance differences. While both (a) and (b) need to be assumed to interpret the observed preference for home batters as umpire home bias, the results in this paper can at least be interpreted as evidence consistent with the bias. I will also address the inconsistency issue by performing additional robustness checks that focus on borderline pitches and will present the findings in the “Results” section.

If umpire rulings are in favor of home batters, will the bias be affected by social pressure or umpiring experience? I examine two sources of social pressure: the leverage situation in a game, and fan attendance. The leverage situation refers to the importance of ball and strike calls on the win percentage of a game. I consider the following situations where pitch calls are more relevant for game outcomes: the pitch calls on 2-strike or 3-ball counts, where the next strike or ball call will potentially end the at bat, and the calls in tied or 1-run games at 7th inning and after, where each additional out or man on base will have a greater impact on win percentages. Fan attendance refers to the number of paid attendances as a percentage of stadium capacity. Umpiring experience is measured as either the years of home-plate umpiring experience or the number of games the umpire has served as the home-plate umpire.

To test whether these factors contribute to umpire home bias, I include an interaction term between these variables and the home batter dummy in the following specifications:

\begin{aligned} I (pitch called = pitch thrown)_{i} = & β_{0} + β_{1} H B_{i} + β_{2} I (leverage)_{i} \\ + β_{3} H B_{i} \times I (leverage)_{i} + β_{4} X_{i} + V_{g t} + U_{j} + ϵ_{i}, \end{aligned}

(4)

\begin{aligned} I (pitch called = pitch thrown)_{i} = & β_{0} + β_{1} H B_{i} + β_{2} At t_{g} \\ + β_{3} H B_{i} \times At t_{g} + β_{4} X_{i} + V_{g t} + U_{j} + ϵ_{i}, \end{aligned}

(5)

\begin{aligned} I (pitch called = pitch thrown)_{i} = & β_{0} + β_{1} H B_{i} + β_{2} U E_{g} \\ + β_{3} H B_{i} \times U E_{g} + β_{4} X_{i} + V_{g t} + U_{j} + ϵ_{i}, \end{aligned}

(6)

where I(leverage) _i is an indicator equal to one if the pitch occurred in a high-leverage situation and zero otherwise, Att _g is attendance of game g, and UE _g is the experience of home-plate umpire making the pitch call in the game. In Equations (4) to (6), the coefficient for the interaction term β₃ captures the additional effect of these factors on umpire home bias. If social pressure and experience exacerbate the degree of home bias, β₃ should have the same sign as the average home bias coefficient β₁. More specifically, a positive β₃ among actual balls and a negative β₃ among actual strikes are evidence suggesting these factors increase the extent of home bias, whereas the opposite signs would suggest otherwise.

Results and Discussion

Main Results

Table 3 shows the main results on whether home bias exists in MLB umpires’ calls. The estimates from the linear probability model are presented in columns (1) and (2). Column (1) suggests that umpires on average called fewer actual strikes as strikes for home batters than for away batters, which is more favorable for the former. Compared with away batters, home batters were 0.19 percentage points less likely to receive an erroneous call from the umpire. In terms of pitches, home batters on average received 0.07 fewer correctly called strikes per game. The evidence in column (2) also favors home batters. Compared with away batters, home batters received 0.61 percentage points more ball calls, or about 0.7 actual balls being correctly called as balls. Columns (3) and (4) report the results from the probit model, and the marginal effects from probit model are presented in Table 4. The marginal effects suggest that umpires on average called 0.18 percentage points fewer actual strikes as strikes and 0.61 percentage points more actual balls as balls against home batters, as compared with away batters. The magnitude of the effects is comparable with the linear probability estimates. Among the control variables, both the pitching and batting stances and the ball and strike counts have statistically significant associations with call accuracy, whereas the pitch types (not reported) do not.

Table 3.

Umpire Home Bias: Baseline Estimates.

Estimation	(1) Actual strikes LPM	(2) Actual balls LPM	(3) Actual strikes probit	(4) Actual balls probit
Home batter	−0.00185^∗∗	0.00608^∗∗∗	−0.0118^∗∗	0.0277^∗∗∗
Home batter	(−3.12)	(14.43)	(−3.11)	(14.56)
Left-handed batter	−0.00340^∗∗∗	−0.0133^∗∗∗	−0.0214^∗∗∗	−0.0583^∗∗∗
Left-handed batter	(−5.51)	(−30.28)	(−5.42)	(−29.64)
Left-handed pitcher	−0.00329^∗∗∗	−0.00141^∗∗	−0.0213^∗∗∗	−0.00799^∗∗∗
Left-handed pitcher	(−4.92)	(−2.93)	(−5.01)	(−3.67)
Height of strike zone	0.0250^∗∗∗	0.00383^∗	0.155^∗∗∗	0.0134^∗
Height of strike zone	(11.72)	(2.53)	(11.38)	(1.99)
0-1 count	−0.0846^∗∗∗	0.105^∗∗∗	−0.463^∗∗∗	0.457^∗∗∗
0-1 count	(−62.37)	(160.60)	(−74.12)	(143.97)
0-2 count	−0.169^∗∗∗	0.166^∗∗∗	−0.777^∗∗∗	0.929^∗∗∗
0-2 count	(−45.04)	(262.34)	(−61.82)	(171.56)
1-0 count	0.00130	−0.00233^∗∗	0.00651	−0.0109^∗∗∗
1-0 count	(1.59)	(−2.67)	(1.07)	(−3.56)
1-1 count	−0.0586^∗∗∗	0.0802^∗∗∗	−0.345^∗∗∗	0.326^∗∗∗
1-1 count	(−40.77)	(103.57)	(−47.43)	(93.33)
1-2 count	−0.143^∗∗∗	0.149^∗∗∗	−0.686^∗∗∗	0.768^∗∗∗
1-2 count	(−48.33)	(229.87)	(−64.56)	(171.53)
2-0 count	0.0166^∗∗∗	−0.0321^∗∗∗	0.123^∗∗∗	−0.105^∗∗∗
2-0 count	(14.25)	(−22.07)	(12.43)	(−22.17)
2-1 count	−0.0419^∗∗∗	0.0531^∗∗∗	−0.262^∗∗∗	0.203^∗∗∗
2-1 count	(−22.66)	(47.26)	(−26.13)	(43.40)
2-2 count	−0.104^∗∗∗	0.126^∗∗∗	−0.544^∗∗∗	0.589^∗∗∗
2-2 count	(−36.67)	(157.94)	(−47.40)	(123.06)
3-0 count	0.0434^∗∗∗	−0.107^∗∗∗	0.410^∗∗∗	−0.318^∗∗∗
3-0 count	(41.22)	(−41.94)	(30.07)	(−44.10)
3-1 count	−0.00373	0.0179^∗∗∗	−0.0323^∗	0.0682^∗∗∗
3-1 count	(−1.86)	(9.76)	(−2.32)	(10.16)
3-2 count	−0.0781^∗∗∗	0.103^∗∗∗	−0.436^∗∗∗	0.440^∗∗∗
3-2 count	(−23.41)	(84.28)	(−29.41)	(68.13)
Observations	910,925	2,765,572	910,925	2,765,572

t statistics calculated with robust standard errors in parentheses.

Inning, pitch type, and year-month indicators are controlled for in all specifications.

∗ p < 0.05.

∗∗ p < 0.01.

∗∗∗ p < 0.001.

Table 4.

Umpire Home Bias: Marginal Effects From Probit Estimates.

	(1) Actual strikes	(2) Actual balls
Home batter	−0.0018^∗∗∗ (−3.11)	0.0061^∗∗∗ (14.56)
Observations	910,925	2,765,572

z statistics calculated with robust standard errors in parentheses.

∗

p < 0.05.

∗∗

p < 0.01.

∗∗∗

p < 0.001.

I also consider additional confounding factors potentially affecting umpiring accuracy. First, I address the possibility that the dimension of sporting venues, namely, the physical distance between the field and the stands can affect umpiring decisions (Buraimo et al., 2012; Dawson & Dobson, 2010). I include a stadium dummy for each MLB stadium as well as any neutral-site venue that hosted MLB games. Second, I address the possibility of the “Matthew effect” where player status affects umpiring decisions (Kim & King, 2014). I control for the effect by including pitcher and batter dummies. Table 5 presents the linear probability model estimation results with additional fixed effects included.⁸ The results suggest umpire home bias is not entirely explained by stadium dimensions or Matthew effects, since the coefficients in the table appear to have the same signs as the ones from the main results.

Table 5.

Umpire Home Bias: Including Other Fixed Effects.

	(1)	(2)	(3)	(4)	(5)	(6)
	Actual strikes	Actual balls	Actual strikes	Actual balls	Actual strikes	Actual balls
Home batter	−0.0018^∗∗	0.0062^∗∗∗	−0.0009	0.0057^∗∗∗	−0.0009	0.0057^∗∗∗
Home batter	(−3.11)	(14.58)	(−1.58)	(13.37)	(−1.57)	(13.39)
Umpire FE	Yes	Yes	Yes	Yes	Yes	Yes
Stadium FE	Yes	Yes	No	No	Yes	Yes
Pitcher and Batter FE	No	No	Yes	Yes	Yes	Yes
Observations	910,925	2,765,572	910,691	2,765,438	910,691	2,765,438

t statistics calculated with robust standard errors in parentheses.

Inning, pitch type, and year-month indicators are controlled for in all specifications.

∗

p < 0.05.

∗∗

p < 0.01.

∗∗∗

p < 0.001.

Next, I evaluate whether high-leverage situations affect umpire calls. Table 6 looks at situations when the next pitch may potentially result in a strike out or a walk, and when the scores are close in late innings, defined as tied or 1-run games in the 7th inning and afterwards. Columns (1) and (2) suggest that in high-leverage counts (2-strike or 3-ball counts), umpires are less accurate in their strike calls and more accurate in their ball calls. The interaction terms suggest high-leverage counts have a mixed effect on umpire home bias. While home batters would receive even fewer strike calls that favored them, they would receive slightly fewer ball calls that benefitted them as well. On the other hand, the evidence from high-leverage at-bats is consistent with umpire home bias. Columns (3) and (4) suggest that home batters were more likely to receive favorable ball calls in high-leverage at-bats, which would have worked in their favor, and while they were less likely to receive unfavorable strike calls than away batters, the difference is not statistically significant.

Table 6.

Umpire Home Bias in High-Leverage Situations.

	(1) Actual strikes	(2) Actual balls	(3) Actual strikes	(4) Actual balls
Home batter	−0.0012	0.0067^∗∗∗	−0.0017^∗∗	0.0058^∗∗∗
Home batter	(−1.88)	(12.53)	(−2.75)	(12.81)
I(3-ball or 2-strike)	−0.0755^∗∗∗ (−21.75)	0.104^∗∗∗ (80.63)
Home batter × I(3-ball or 2-strike)	−0.0055^∗∗ (−2.74)	−0.0021^∗ (−2.57)
I(Close game late inning)			−0.0076^∗∗∗ (−5.19)	0.0039^∗∗∗ (3.90)
Home batter × I(Close game late inning)			−0.0014	0.0027^∗
Home batter × I(Close game late inning)			(−0.73)	(2.13)
Observations	910,925	2,765,572	910,925	27,65,572

t statistics calculated with robust standard errors in parentheses.

Inning, pitch type, and year-month indicators are controlled for in all specifications.

∗

p < 0.05.

∗∗

p < 0.01.

∗∗∗

p < 0.001.

Finally, I examine the role of attendance and umpire experience on umpire home bias. Table 7 presents the results. As columns (1) and (2) indicate, higher attendance increased the probability that a strike was accurately called but reduced the probability that a ball was accurately called. The interaction terms, however, reaffirm the preference toward home batters. In particular, increase in attendance raised the likelihood for home batters receiving correct ball calls as opposed to away batters. Since home batters on average were already more likely to receive such calls, higher attendance is associated with an increase in umpire home bias. Columns (3) to (6) look at the impact of umpiring experience on umpire home bias. Having more years or games of experience as home-plate umpires is associated with improved accuracy in pitch calls. However, the interaction terms imply home batters’ likelihood of receiving more favorable calls did not decrease with umpiring experience, which suggests umpiring home bias is not significantly related to experience.

Table 7.

Effect of Attendance and Umpire Experience on Umpire Home Bias.

	(1)	(2)	(3)	(4)	(5)	(6)
	Actual strikes	Actual balls	Actual strikes	Actual balls	Actual strikes	Actual balls
Home batter	−0.0026∗∗	0.0088^∗∗∗	−0.0023	0.0068^∗∗∗	−0.0013	0.0079^∗∗
Home batter	(−2.95)	(13.81)	(−1.15)	(4.79)	(−0.37)	(3.09)
Attendance	0.0047^∗∗∗ (4.18)	−0.0093^∗∗∗ (−11.44)
Home batter × Attendance	−0.0019	0.0065^∗∗∗
Home batter × Attendance	(−1.17)	(5.70)
ln (Year experience)			0.0104^∗∗∗	−0.0022
ln (Year experience)			(6.41)	(−1.92)
Home batter × ln (Year experience)			0.00016	−0.00027
Home batter × ln (Year experience)			(0.21)	(−0.51)
ln (Game experience)					0.0074^∗∗∗	−0.00086
ln (Game experience)					(6.60)	(−1.11)
Home batter × ln (Game experience)					−0.00009	−0.00031
Home batter × ln (Game experience)					(−0.15)	(−0.71)
Observations	907,562	2,754,946	910,925	2,765,572	910,925	2,765,572

t statistics calculated with robust standard errors in parentheses. Inning, pitch type, and year-month indicators are controlled for in all specifications.

∗

p < 0.05.

∗∗

p < 0.01.

∗∗∗

p < 0.001.

An implicit assumption behind umpire home bias is that an unbiased umpire will call balls and strikes based on the official definition of strike zones, and thus, if umpires do not call the pitches accurately, they are biased in favor of one team over the other. However, umpires may have their own perceptions of strike zones and call the pitches accordingly. This implies the calls that seemingly benefit one team over the other may be driven by umpires being “consistently inconsistent” in their strike zone definitions rather than being biased in their calls. To illustrate this possibility with an example, suppose the lower boundary of an umpire's strike zone is consistently below the lower boundary of the official strike zone (i.e., the umpire is “generous” in calling low pitches as strikes). If away batters face more of these low pitches than home batters, the umpire will incorrectly call too many actual balls as strikes according to the official strike zone, even though his calls are consistent with his definition of the strike zone.

While an umpire's perceived strike zone is not directly observable to outsiders, an unbiased umpire should still make the same call for all pitches that enter the plate in the same coordinates. Therefore, as long as the average pitch locations faced by home and away batters are identical, strike zone inconsistency should not affect the likelihood of receiving the correct call against either batter. Table 8 shows the average X- (width) and Z- (height) coordinates of pitches faced by home and away batters. For each of the strike zone areas 1 to 9 and ball zone areas 11 to 14,⁹ the average X- and Z-coordinates of the pitches faced by home and away batters were nearly identical. There is not enough evidence suggesting the average pitch location faced by either batter is different.

Table 8.

Pitch Coordinates: Home Vs. Away Batters.

	Home Batter		Away Batter
	X-Coord. Mean	Z-Coord. Mean	X-Coord. Mean	Z-Coord. Mean
Strike Zone
Zone 1	−0.478 (0.136)	3.069 (0.258)	−0.478 (0.136)	3.079 (0.257)
Zone 2	−0.00007 (0.137)	3.065 (0.260)	−0.0007 (0.136)	3.077 (0.259)
Zone 3	0.475 (0.136)	3.066 (0.255)	0.475 (0.136)	3.077 (0.256)
Zone 4	−0.475 (0.136)	2.493 (0.238)	−0.475 (0.137)	2.501 (0.240)
Zone 5	0.003 (0.136)	2.487 (0.240)	0.002 (0.137)	2.494 (0.241)
Zone 6	0.478 (0.136)	2.489 (0.238)	0.477 (0.136)	2.497 (0.237)
Zone 7	−0.472 (0.135)	1.921 (0.222)	−0.471 (0.136)	1.921 (0.223)
Zone 8	0.004 (0.137)	1.909 (0.233)	0.003 (0.136)	1.912 (0.224)
Zone 9	0.477 (0.136)	1.914 (0.221)	0.475 (0.136)	1.916 (0.221)
Ball Zone
Zone 11	−1.098 (0.560)	3.426 (0.656)	−1.101 (0.561)	3.433 (0.655)
Zone 12	0.977 (0.528)	3.384 (0.645)	0.974 (0.525)	3.391 (0.646)
Zone 13	−1.005 (0.558)	1.557 (0.644)	−1.006 (0.557)	1.565 (0.647)
Zone 14	1.016 (0.548)	1.442 (0.674)	1.013 (0.546)	1.442 (0.677)

Standard deviation in parentheses.

Nonetheless, I perform two additional tests to ensure the results are not entirely driven by umpiring inconsistency. First, I limit the sample to borderline pitches, defined as pitches that enter the plate within 1 inch of the official strike zone boundaries, and re-run the main regressions to check if the results are still robust. In the second test, I apply nearest-neighbor matching technique among the borderline pitches. Matching is a statistical technique that compares the means of the two groups (treatment group that receives a program, versus control group that doesn’t), assuming in a randomized experiment that individual characteristics will on average be the same between the two groups. In the context of this paper, nearest-neighbor matching implies comparing the pitch calls of actual pitches that share similar X- and Z- coordinates.¹⁰ If, relative to away batters, home batters received more favorable calls on average for pitches with similar coordinates, the evidence is consistent with umpire home bias. The main assumption behind both tests is that umpires should be giving the same call to similar pitches regardless of whether their perceived strike zones are identical to the official strike zone. For the matching test, the sample is also limited to borderline pitches.

Table 9 presents the results from both tests. As columns (1) and (2) suggest, limiting the sample to borderline pitches reduces the observations to around 180,000 actual strikes and 360,000 actual balls. However, home batters were still more likely to receive favorable calls from the umpires. The coefficients suggest home batters received fewer strike calls among actual strikes and more ball calls among actual balls. Column (3) presents the nearest-neighbor matching result. Among the borderline pitches that were closest to each other, home batters on average received 0.8 percentage points fewer strike calls than away batters. Such difference gave home batters an advantage over away batters. In sum, the results remain consistent with umpire home bias and suggest strike zone inconsistency does not entirely explain the observed bias.

Table 9.

Umpire Home Bias: Borderline Pitches and Matching.

Dependent Variable	(1) Actual Strikes	(2) Actual Balls	(3) Called Strikes
Estimation method	LPM	LPM	Matching
Home batter	−0.00253	0.0103^∗∗∗	−0.0082∗∗∗
Home batter	(−1.31)	(6.48)	(−8.2)
Observations	179,217	358,679	538,470

t statistics calculated with robust standard errors in parentheses.

Inning, pitch type, and year-month indicators are controlled for in all specifications.

∗

p < 0.05.

∗∗

p < 0.01.

∗∗∗

p < 0.001.

Discussion

The findings of umpire home bias are consistent with the referee bias observed in soccer and other sports as summarized in Dohmen and Sauermann (2016). On the other hand, it is somewhat surprising that umpire home bias still exists even in an environment where relatively objective measures of umpire performances exist and therefore umpiring biasedness can be relatively easier to identify. Furthermore, while monitoring technologies such as QuesTec have been shown to improve umpiring performances by providing feedback to umpires (Mills, 2017b), the evidence in this paper suggests improved performance does not necessarily translate into reduced biasedness. Monitoring technologies were implemented throughout MLB during the sample period, yet the result suggests there is still a modest, yet statistically significant, umpiring preference in favor of the home team.

The existence of evaluation bias suggests that the employers of evaluators may need to provide extra incentives, whether they be performance bonus or punishment, to increase evaluators’ efforts in reducing bias. It also justifies a deeper understanding of the potential factors driving evaluation bias. The impact of leverage or fan attendance on call accuracy has been well-documented in Archsmith et al. (2021) and Smith and Groetzinger (2010). The results in this paper suggest fan attendance had the strongest impact on umpiring biasedness among all the potential factors. This is consistent with the social pressure hypothesis, where a one-sided fanbase pressures the umpire to award more favorable calls toward the home team.

On the other hand, the impact of umpiring experience on call accuracy is somewhat ambiguous in theory. Human capital accumulation theory such as Mincer (1974) has suggested productivity of umpires should be positively related to umpiring experience, and therefore umpires should be less affected by home crowds in their decisions. However, older and more experienced umpires may also develop longer-run relationships with the teams they referee, which may result in a systematic biasedness favoring these teams (Hlasny & Kolaric, 2017). While the results suggest umpire performance improves with experience, which is consistent with the human capital accumulation channel, there is not enough evidence to suggest umpiring biasedness is reduced with experience.

Several policy implications can be made from the results. First, an improvement in evaluator performance does not necessarily reduce evaluation bias. The results in this paper suggest the degrees of home biasedness between experienced and inexperienced umpires are similar. This implies the same factors that may pressure inexperienced umpires to make favorable calls for home teams may still affect experienced umpires. Therefore, instead of relying on experience, evaluators will need to recognize the sources of bias and eliminate their impact to reduce evaluation biases. Second, to reduce evaluation bias, there needs to be an unbiased party to monitor the evaluators. When the monitors themselves are biased, such as the presence of one-sided attendees in a baseball game, it will exacerbate any evaluation bias.

Conclusion

Evaluators are crucial in assessing worker productivity. It is therefore important to ensure evaluators are unbiased in their evaluations. However, assessing the performance of evaluators can be challenging. In this paper, I provide some evidence regarding evaluators’ biasedness and potential explanations for it with using umpiring data from MLB. The results are consistent with umpiring home bias, as the call errors disproportionately favored home batters over away batters. Social pressure that increased the punishment from making unfavorable calls also contributed to the home bias. Umpires were less likely to make unfavorable mistakes against home batters when games were well attended. Furthermore, home bias did not disappear with more experience. Umpires with more home-plate umpiring experience had similar rates of home biasedness as umpires with less experience.

There are a few possible directions for future research. First, given that the leagues stand to benefit from unbiased evaluators, it is also important to evaluate whether the effects of any productivity-increasing policies also reduce evaluation bias. As the results of this study indicate, improvements in evaluation productivity do not always guarantee a reduction in evaluation bias. Second, the results also warrant a deeper investigation into the implicit reasons behind the bias. As the famous example in Price and Wolfers (2010) showed, once NBA referees became aware of the racial bias in foul calls, they no longer displayed any preferences in their subsequent calls. Understanding the reasons behind home bias and providing incentives to reduce it could help mitigate the principal-agent problem in umpiring.

Footnotes

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Mike Hsu

Notes

Author Biography

Mike Hsu is an Assistant Professor of Economics at Valparaiso University. His research interests include sports economics and international trade. He received his PhD in Economics from the University of Houston.

References

Archsmith

Heyes

A. G.

Neidell

Sampat

B. N.

(2021). The dynamics of inattention in the (baseball) field. NBER Working Paper, (w28922).

Associated Press (2023). Robo umps reach Triple-A, but MLB rollout still uncertain. Retrieved August 1, 2023, from https://apnews.com/article/robo-umps-abs-triplea-ccc901dc69c6101fb6a793e5fe867a77

Balmer

N. J.

Nevill

A. M.

Lane

A. M.

(2005). Do judges enhance home advantage in European Championship boxing? Journal of Sports Sciences, 23(4), 409–416. https://doi.org/10.1080/02640410400021583

Bradbury

J. C.

(2019). Monitoring and employee shirking: Evidence from MLB umpires. Journal of Sports Economics, 20(6), 850–872. https://doi.org/10.1177/1527002518808350

Bryson

Dolton

Reade

J. J.

Schreyer

Singleton

(2021). Causal effects of an absent crowd on performances and refereeing decisions during COVID-19. Economics Letters, 198, 109664. https://doi.org/10.1016/j.econlet.2020.109664

Buraimo

Simmons

Maciaszczyk

(2012). Favoritism and referee bias in European soccer: Evidence from the Spanish league and the UEFA Champions League. Contemporary Economic Policy, 30(3), 329–343. https://doi.org/10.1111/j.1465-7287.2011.00295.x

Cueva

(2020). Animal spirits in the beautiful game. Testing social pressure in professional football during the COVID-19 lockdown. Center for Open Science.

Dawson

Dobson

(2010). The influence of social pressure and nationality on individual decisions: Evidence from the behaviour of referees. Journal of Economic Psychology, 31(2), 181–191. https://doi.org/10.1016/j.joep.2009.06.001

Dohmen

Sauermann

(2016). Referee bias. Journal of Economic Surveys, 30(4), 679–695. https://doi.org/10.1111/joes.12106

10.

Dohmen

T. J.

(2008). The influence of social forces: Evidence from the behavior of football referees. Economic Inquiry, 46(3), 411–424. https://doi.org/10.1111/j.1465-7295.2007.00112.x

11.

Endrich

Gesche

(2020). Home-bias in referee decisions: Evidence from “ghost matches” during the Covid19-pandemic. Economics Letters, 197, 109621. https://doi.org/10.1016/j.econlet.2020.109621

12.

Fesselmeyer

(2021). The impact of temperature on labor quality: Umpire accuracy in Major League Baseball. Southern Economic Journal, 88(2), 545–567. https://doi.org/10.1002/soej.12524

13.

Garicano

Palacios-Huerta

Prendergast

(2005). Favoritism under social pressure. Review of Economics and Statistics, 87(2), 208–216. https://doi.org/10.1162/0034653053970267

14.

Gilliland

(2022). Skillalytics MLB umpire rating system. https://github.com/skillalytics/mlb-umpire-rating-system

15.

Gong

(2022). The effect of the crowd on home bias: Evidence from NBA games during the COVID-19 pandemic. Journal of Sports Economics, 23(7), 950–975. https://doi.org/10.1177/15270025211073337

16.

Guerette

Blais

Fiset

(2021). The absence of fans removes the home advantage associated with penalties called by national hockey league referees. PLOS One, 16(8), e0256568. https://doi.org/10.1371/journal.pone.0256568

17.

Hlasny

Kolaric

(2017). Catch me if you can: Referee–team relationships and disciplinary cautions in football. Journal of Sports Economics, 18(6), 560–591. https://doi.org/10.1177/1527002515588955

18.

Kim

J. W.

King

B. G.

(2014). Seeing stars: Matthew effects and status bias in major league baseball umpiring. Management Science, 60(11), 2619–2644. https://doi.org/10.1287/mnsc.2014.1967

19.

Lopez

M. J.

Mills

B. M.

(2019). Opportunistic shirking behaviour during unpaid overtime. Applied Economics Letters, 26(7), 608–612. https://doi.org/10.1080/13504851.2018.1488048

20.

Mahalanobis

P. C.

(1936). On the generalized distance in statistics. National Institute of Science of India.

21.

Major League Baseball (2022). Strike zone. https://www.mlb.com/glossary/rules/strike-zone

22.

Mills

B. M.

(2017a). Policy changes in Major League Baseball: Improved agent behavior and ancillary productivity outcomes. Economic Inquiry, 55(2), 1104–1118. https://doi.org/10.1111/ecin.12396

23.

Mills

B. M.

(2017b). Technological innovations in monitoring and evaluation: Evidence of performance impacts among Major League Baseball umpires. Labour Economics, 46, 189–199. https://doi.org/10.1016/j.labeco.2016.10.004

24.

Mincer

J. A.

(1974). Schooling, experience, and earnings. National Bureau of Economic Research.

25.

Parsons

C. A.

Sulaeman

Yates

M. C.

Hamermesh

D. S.

(2011). Strike three: Discrimination, incentives, and evaluation. American Economic Review, 101(4), 1410–1435. https://doi.org/10.1257/aer.101.4.1410

26.

Petti

Gilani

(2022). baseballr: Acquiring and analyzing baseball data. https://billpetti.github.io/baseballr/, https://github.com/BillPetti/baseballr.

27.

Price

Remer

Stone

D. F.

(2012). Subperfect game: Profitable biases of NBA referees. Journal of Economics & Management Strategy, 21(1), 271–300. https://doi.org/10.1111/j.1530-9134.2011.00325.x

28.

Price

Wolfers

(2010). Racial discrimination among NBA referees. Quarterly Journal of Economics, 125(4), 1859–1887. https://doi.org/10.1162/qjec.2010.125.4.1859

29.

Rader

B. G.

Winkle

K. J.

(2008). Baseball’s great hitting barrage of the 1990s (and beyond) reexamined. NINE: A Journal of Baseball History and Culture, 17(1), 70–96. https://doi.org/10.1353/nin.0.0015

30.

Reade

J. J.

Schreyer

Singleton

(2021). Stadium attendance demand during the COVID-19 crisis: Early empirical evidence from Belarus. Applied Economics Letters, 28(18), 1542–1547. https://doi.org/10.1080/13504851.2020.1830933

31.

Rickman

Witt

(2008). Favouritism and financial incentives: A natural experiment. Economica, 75(298), 296–309. https://doi.org/10.1111/j.1468-0335.2007.00605.x

32.

Singleton

Bryson

Dolton

Reade

J. J.

Schreyer

(2021). Economics lessons from sports during the Covid-19 pandemic. In Research Handbook on Sport and COVID-19.

33.

Smith

E. E.

Groetzinger

J. D.

(2010). Do fans matter? The effect of attendance on the outcomes of major league baseball games. Journal of Quantitative Analysis in Sports, 6(1). https://doi.org/10.2202/1559-0410.1192

34.

Tainsky

Mills

B. M.

Winfree

J. A.

(2015). Further examination of potential discrimination among MLB umpires. Journal of Sports Economics, 16(4), 353–374. https://doi.org/10.1177/1527002513487740