Abstract
This study introduces a nested bivariate binomial (BVB) regression model under a Bayesian logistic regression framework to evaluate offensive efficiency in football, with application to the 2023 Brazilian Série A league. The BVB model characterizes offensive performance as a two-stage process: (i) the number of shots on target given total shot attempts, and (ii) the number of goals scored given shots on target. Financial covariates—namely, club investment during the 2023 season and estimated market value—are incorporated at both stages to assess their association with team efficiency. The analysis is based on data from 20 teams across the first 12 rounds of the season, mitigating mid-season variability and transfer effects. Results indicate that higher market value is positively associated with scoring efficiency, whereas recent financial investment does not exhibit a significant effect. These findings constitute a primary investigation into the influence of financial variables on offensive performance in football. By introducing the BVB model within a sports analytics framework, the study provides preliminary evidence of the explanatory and predictive utility of financial indicators—particularly market value—in modeling sporting outcomes.
Introduction
Football (also known as soccer) stands as the most widely followed sport globally, engaging billions of fans across diverse cultural and geographic contexts. Its global prominence is not only evident in its vast audience but also in the extensive economic systems that underpin the professional game. Modern football clubs operate as multifaceted enterprises, allocating substantial resources to player acquisitions, salaries, training infrastructure, and strategic planning. As a result, the sport increasingly reflects a complex interplay between financial investment and on-field performance, with club success often contingent upon effective financial management and resource allocation.
According to the McKinsey & Company report “The Value Pitch: The Importance of Team Value Management”, football clubs should regard their player rosters as portfolios of assets rather than mere cost centers. The report’s analysis of the 69 most valuable European clubs over five years demonstrates that activities such as developing contracted players, promoting youth talent, and optimizing player trading contributed approximately €13 billion of the total €17.2 billion increase in team market value. Importantly, this internally generated value frequently exceeded that derived from net investment in new acquisitions. For more than 70% of these clubs, including prominent teams like Liverpool and Real Madrid, effective asset management was the primary driver of value creation.
Building upon this strategic perspective, a substantial and growing body of academic research has examined the intricate relationships between financial inputs and sporting performance in football. For instance, Carmichael et al. 1 investigated the English Premier League and found that sustained competitiveness is strongly linked to prudent financial management, particularly the alignment between wage expenditure and revenue generation. In parallel, Dimitropoulos and Scafarto 2 analyzed the effects of UEFA’s Financial Fair Play regulations on Italian football, illustrating how budgetary discipline influences both club financial health and competitive outcomes.
Complementary to these studies, 3 proposed a composite evaluation framework that integrates business, financial, and sporting dimensions to assess French football clubs’ overall performance, advocating for a multidimensional approach to understanding success. Similarly, Guzmán and Morrow 4 employed data envelopment analysis to evaluate technical efficiency among English Premier League teams, revealing key factors that affect productivity and resource utilization.
Further studies have expanded the analysis to include governance and organizational factors. For example, Ruta et al. 5 demonstrated that institutional governance structures critically influence club performance by shaping strategic decision-making processes. Meanwhile, Sakinc et al. 6 examined the phenomenon of soft budget constraints in football, cautioning that expectations of external financial support may induce strategic mismanagement and undermine sustainability. Wicker et al. 7 emphasized the complexity of linking monetary incentives to player effort and productivity, reinforcing the nuanced relationship between financial inputs and output quality. Moreover, Wilson et al. 8 showed that ownership models in the English Premier League significantly impact sporting outcomes, underscoring the role of governance and stakeholder engagement in performance.
From an applied forecasting and betting market perspective, 9 proposed a novel and profitable predictive model for the over/under betting market in professional football. The study focuses on the over/under 2.5 goals market, a popular form of total goals betting, and evaluates the profitability of statistical forecasts when applied to real betting odds. Utilizing historical match data from various European leagues, the author develops a logistic regression model that incorporates a range of covariates—including recent team performance, home advantage, and head-to-head statistics—to estimate the probability that a match will exceed 2.5 total goals.
Based on the context above, this study applies a nested bivariate binomial (BVB) regression model to data from the 2023 season of the Brazilian Série A league. Originally introduced by Crowder and Sweeting, 10 the BVB model is well-suited for representing two hierarchically dependent binomial processes, thereby allowing the modeling of sequential outcomes commonly observed in sports analytics. In the context of offensive efficiency, we define a two-stage process: (i) the number of shots on target (i.e., goal attempts with accurate direction) given the total number of shots taken, and (ii) the number of goals scored given the number of shots on target.
This study also extends the BVB framework by incorporating financial covariates at both stages to examine whether club-level economic factors are associated with offensive efficiency. Specifically, we include two financial indicators: (1) total investment by the club during the 2023 season, measured in millions of Brazilian reais, and (2) the estimated market value of the club, measured in millions of euros. The total investment reflects short-term financial commitments related to player acquisitions and infrastructure, whereas market value captures longer-term assessments of player quality, squad stability, and perceived future potential. However, it is important to acknowledge that these financial covariates do not fully conform to the conventional definition of covariates as pre-treatment or exogenous variables observed prior to the outcomes of interest. In the present context, both financial indicators are measured contemporaneously with the performance data and may therefore be influenced by—or evolve in parallel with—the very outcomes they are intended to explain. Accordingly, the model should be interpreted primarily as an explanatory or retrospective framework, rather than one designed for predictive inference. The inclusion of these variables aims to assess whether cross-club financial disparities are systematically associated with variation in offensive efficiency during the early stages of the season.
The contributions of this study are threefold. First, it introduces the application of the Nested Bivariate Binomial (BVB) model to capture team-level nested proportions in a sports analytics context, specifically modeling sequential offensive actions in football. Second, the study extends this framework by incorporating financial covariates at both stages of the model to examine the association between club-level economic factors and offensive efficiency. Third, it provides a comparative evaluation of model performance with and without financial covariates, using Bayesian model selection criteria to assess the explanatory value added by economic variables. To the best of our knowledge, this is the first application of a nested BVB model incorporating covariates within soccer analytics, providing novel insights into the interplay between financial investment and on-field outcomes.
This paper is organized as follows: Section “Materials and Methods” presents the theoretical background of the BVB model, detailing its formulation, assumptions, and mathematical properties. Section “Inference Methods” details the Bayesian inferential framework employed for parameter estimation, including prior specification, computational algorithms, and model validation techniques used to assess goodness-of-fit. Section “Results” presents the application of the methodology to data from the Brazilian Série A league’s 2023 season and discusses the interpretation of the results. Finally, Section “Concluding Remarks” closes the manuscript.
Materials and methods
Study data
The analysis is based on data from the Brazilian Série A football league, comprising performance and financial indicators for 20 teams over the first 12 rounds of the 2023 season (Table 1). This early-season window was intentionally selected to ensure consistency in the number of matches played across all clubs, thereby minimizing potential biases arising from unequal scheduling or evolving team circumstances. By focusing on this initial phase, the study seeks to assess whether financial disparities among clubs are already associated with differences in offensive efficiency, prior to the full realization of cumulative performance outcomes and the potential endogeneity introduced by league standings and results-based adjustments. Furthermore, this restriction helps to mitigate the influence of mid-season confounders—such as player transfers, tactical reconfigurations, injuries, fixture congestion, and accumulated fatigue—that could obscure or distort the underlying relationships between financial structure and on-field performance over the course of a full season.
Dataset related to the 20 soccer clubs in Brazilian series A league (2023).
Model formulation
For the data analysis of Brazilian Serie A soccer league, we employed a nested bivariate binomial (BVB) distribution introduced by Crowder and Sweeting.
10
Since this model allows the modeling of two hierarchically dependent binomial processes, we focused on a two-step process in offensive efficiency:
The number of goal kicks (i.e., shots on target) out of the total number of kicks attempted denoted by The number of goals scored out of the number of goal kicks denoted by Marginally, Conditionally on
In this way, defining the bivariate random vector
The moment generating function (MGF) of
In this way, from Equations (3) and (4), the correlation coefficient for the BVB is given by:
To incorporate covariate information into the analysis, we extend this formulation to a regression framework, incorporating a set of common covariates into both levels of the nested BVB model. Specifically, let
Inference methods
In this section, we describe the Bayesian inferential procedure in two stages. First, we present the inference for the model without covariates, which establishes a baseline specification and allows assessment of the model structure under minimal assumptions. Next, we extend the model to incorporate financial covariates through logistic regression formulations. All models are estimated using a fully Bayesian framework implemented via Markov Chain Monte Carlo (MCMC) methods, and model comparison is conducted through the Deviance Information Criterion (DIC).
Likelihood function
Given a dataset consisting of
These estimators represent the overall empirical proportions across all observations. Under standard regularity conditions and for large
Bayesian inference
In this section, we consider a fully Bayesian framework to estimate the parameters
For both prior choices, posterior summaries such as means, medians, credible intervals, and highest posterior density (HPD) regions can be derived analytically from the respective Beta distributions. Now, to incorporate covariate information into the BVB model, we extend the parameterization of
Model discrimination criterion
As goodness-of-fit criteria, we considered the Deviance Information Criterion (DIC), which is suitable for comparing models estimated via posterior simulation.
13
The DIC is used here to evaluate competing Bayesian specifications—namely, models without covariates (using distinct prior choices) and models incorporating financial covariates through logistic regression. In this case, let
Results
To illustrate the proposed methodology, we have adopted the Brazilian Serie A soccer league data introduced in Table 1. Figure 1 displays scatter plots of the performance metrics against the covariates: (i) the number of goal kicks (i.e., shots on target) out of the total number of kicks attempted by each team, and (ii) the number of goals scored out of the number of goal kicks. These quantities are examined in relation to the clubs’ investment in the 2023 season (measured in million Brazilian reais) and their estimated market value (measured in million euros).

Scatterplots of the number of goal kicks (i.e., shots on target) relative to the total number of kicks attempted, and the number of goals scored relative to the number of goal kicks for each team, plotted against the covariates: investment in 2023 (in million Brazilian reais) and club market value (in million euros).
As illustrated in Figure 1, no discernible relationship is observed between either the 2023 financial investment or the club’s market value and the proportion of goal kicks–defined as shots on target–relative to the total number of attempts. On the other hand, the proportion of goals from successful goal kicks (
Consistent with the descriptive observations in Figure 1, our initial analysis involved fitting the BVB model without incorporating the financial covariates investment in 2023 (measured in million Brazilian reais) and club market value (measured in million euros). Under this baseline specification, the random variable
Estimates of parameters (posterior means)
The parameter
To further investigate the influence of financial factors on offensive efficiency within a model-based framework, we extended the baseline BVB model by incorporating the financial covariates—total investment in 2023 and club market value—at both stages, as detailed in Section “Materials and Methods”. As prior distributions, we adopted independent normal priors with hyperparameters specified as
Posterior summaries for the BVB model in the presence of covariates
The posterior summaries presented in Table 3 indicate that neither total investment in 2023 nor club market value exert statistically significant effects on the probability of successful goal kicks (95% credible intervals for the regression parameters include zero). However, for the goal-scoring stage, the coefficient associated with club market value (
Building on the previous findings, it is instructive to situate these results within the broader context of recent financial and organizational transformations among Brazilian Série A soccer clubs. A salient example is the case of Botafogo, which experienced a pronounced revival under the ownership of U.S. investor John Textor. Following a period marked by financial distress, Botafogo successfully restructured its operations, culminating in triumphs in both the Campeonato Brasileiro Série A and the Copa Libertadores in 2024.
14
This resurgence is widely attributed to substantial capital injections and the implementation of a novel managerial framework, enabled by the 2021 legislative reform permitting Brazilian football clubs to transition into limited liability companies.
15
The positive and statistically significant association identified in the BVB model between club market value and the likelihood of scoring from successful goal kicks (
Concluding remarks
This study investigated the performance dynamics of Brazilian Serie A soccer clubs through the application of a nested bivariate binomial (BVB) model, focusing on the association between financial indicators—specifically club market value and investment. The model allowed for the examination of how these financial factors relate to two sequential outcomes: the number of successful goal kicks (
In contrast, no statistically significant association was observed between financial investment and the probability of successful goal kicks (
In contrast, no statistically significant association was observed between financial investment and the probability of successful goal kicks (
Footnotes
Acknowledgments
The authors thank the editor and referees for their valuable suggestions that led to significant improvements to the manuscript.
Ethical considerations
This article does not contain any studies with human or animal participants.
Consent to participate
There are no human participants in this article and informed consent is not required.
Consent for publication
Not applicable.
Declaration of conflicting interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding statement
The author(s) received no financial support for the research, authorship, and/or publication of this article.
