Abstract
By modelling results of sport matches as a set of paired fixed effect linear models, the goal of the present article is showing that traditional scoring outputs can be used to do inference on parameters related to the net relative strength or weakness of teams within a league. As hypothesis testing method, we propose either a normal-based and a non-parametric permutation-based approach. As an extension to round-robin of the ranking methodology recently proposed by Arboretti Giancristofaro et al. (2014) and Corain et al. (2016), results of pairwise testing are then exploited to provide a ranking of teams within a league. Through an extensive Monte Carlo simulation study, we investigated the properties of the proposed testing and ranking methodology where we proved its validity under different random distributions. In its simplest univariate version, the proposed methodology allows us to infer on the teams average net scoring within a league, while in its more intriguing multivariate layout it is suitable for looking for any team-related global dominance using a wide set of performance indicators. Finally, by using traditional basketball box scores, we present an application to the Italian Basket League.
Introduction and motivation
A round-robin tournament refers to a set of sport competitions in which each competitor challenges, once at time, all other competitors. Most sports leagues play a double round-robin tournament where teams meet twice, but single, triple and quadruple round-robin tournaments do also occur (Rasmussen and Trick, 2008). Besides sports, round-robin design is intensively used by psychologists to investigate on social relations models and social networks issues (Kenny et al., 2006). With the goal of comparing several foods, beverages, etc., the round-robin design is also considered in the field of sensory sciences by exploiting sensorial evaluations provided by a set of panelists towards all possible pairs of investigated items (Meilgaard et al., 2006).
Statistical models on round-robin design are distinctive by being suitable of handling pairs of dependent responses, in other words they are specifically designed to deal with the analysis of paired comparison data (Cattelan, 2012). By referring to mixed effect models, psychometricians deeply investigated social networks using the variance components approach where subjects within a network were handled as a random factor (Nestler, 2016). It is worth noting that variance components approach is not actually suitable for data sport analytics where team effects must act as fixed parameters in order to find out possible significant differences among competitors. In spite of the great assonance between round-robin design and the running of team- and individual-sports tournaments, the interest around inference on parameters of round-robin design was not deeply investigated in data sport analytics literature.
Since scores of the sport teams fluctuate over time and are obviously partially affected by some natural random variability, suitable modelling and inferential-based methods on performance indicators are demanded either for descriptive and explicative purposes. After modelling results of sport matches as a set of paired fixed effect linear models, based on the concept of multivariate stochastic dominance, the aim of this work is to propose a reference framework for modelling, testing and ranking on multivariate scoring sport data.
The present article is organized as follows: Section 2 provides a literature review on modelling for data sport outcomes; Section 3 outlines the adopted modelling for round-robin design and provides the description of two proposed procedures aimed at doing inference on a set of multivariate parameters of interest. In Section 4, the main results of a simulation study are shown and discussed, while Section 5 presents an application to basketball and finally Section 6 deals with conclusions, final remarks and some suggestions for future perspectives. In the Appendix, some additional details on the real case study application are provided.
Modelling for data Sport Outcomes
There is a considerable existing literature on modelling the scores of the two opposing teams (or players) for data sport analytics purposes. Harville (2003) proposed a linear model for the expected difference in score in each game of basketball or football as a difference in team effects plus or minus a home court/field advantage. In order to fit the related linear model, the author proposed a modified least squares estimation method and used the teams estimates either to rank the team within a league (for use in the selection or seeding procedures) and to effectively predict the outcomes of postseason games. Such a ranking system was considered also by Stefani (1980), Stern (1992), Stern (1995) and Harville and Smith (1994). An alternative approach suggests trying to separately model the scores of two opponents. Karlis and Ntzoufras (2003) proposed applying a bivariate Poisson distribution with a dependence parameter between the number of goals scored by the two teams and then accommodated the model in order to inflate the probabilities in case of draw. Adam (2016) tried to extend the bivariate Poisson approach by mean of a generalized linear model where the scores were modelled as the joint probability of a Poisson distribution, representing the total number of goals, and a binomial distribution, representing the goals of one team given that total number of goals.
By mainly focusing on the probability of winning or losing the sports matches, a second relevant piece of research in data sport outcome modelling has been inspired by the Bradley–Terry model (Bradley and Terry, 1952). As pointed out by Chan (2011), linear models for paired comparisons, the Bradley–Terry model and the Thurstone–Mosteller model in particular, have been widely used in sports for ranking and rating purposes. Cattelan et al. (2013) introduced a dynamic extension of the Bradley–Terry model for paired comparison data to model the outcomes of sporting contests, allowing for time varying abilities. By combining the Bradley–Terry model and a non-linear model that utilizes an exponential distribution to describe longitudinal changes in scale values, Usami (2017) proposed a procedure for Bayesian longitudinal paired comparison data analysis to rank sumo players. An alternative approach to model the outcomes of the matches in terms of win–draw–loss has been proposed by Goddard and Asimakopoulos (2004) via an ordered probit model. By specifying the probability of the match outcome as a function of the difference of abilities of the two teams, an ordered probit model was adopted also by Koning (2000).
Finally, thanks to the availability of more insightful sports data, also provided by emerging player tracking systems enabling a richer quantitative characterization of performance, a third research area for data sport outcome modelling has been developed around the concepts of stochastic processes and machine learning techniques. Franks et al. (2015) attempted to combine spatial and spatio-temporal processes, matrix factorization techniques and hierarchical regression models with player tracking data to model and advance the state of defensive analytics in basketball. Cervone et al. (2016) proposed a framework for using optical player tracking data to estimate, in real time, the expected number of points obtained by the end of a possession (EPV) that was derived from a stochastic process model for the evolution of a basketball possession. They model this process at multiple levels of resolution, differentiating between continuous, infinitesimal movements of players and discrete events such as shot attempts and turnovers. Vracar et al. (2016) presented a methodology for modelling and forecasting basketball match between two distinct teams as a sequence of team-level play-by-play in-game events where authors assumed a Markov property and model state transitions with a logistic regression model. By exploiting play-by-play basketball logs, Chen and Fan (2016) proposed using a functional data analysis approach to model the observed score difference viewed as the realization of the latent continuous intensity process. This approach has several nice features such as the ability of defining and numerically characterizing momentum in basketball games.
Testing and ranking on round-robin design for data sport analytics
Let us consider a tournament designed as a round-robin championship where a set of competitors are supposed to pairwise challenge each with one another. More formally, suppose that a round-robin design involves a sports tournament with
The competition between two opponents produces a pair of outcomes, in form of scalars but most often vectors, one for each competitor. Since generally a tournament takes place during a given time span, outcomes for each competitor can be viewed also as longitudinal multivariate observations of performance indicators.
By referring to the literature of social relations models for round-robin design often known as dyadic data (Kenny et al., 2006), when opponents
Without loss of generality, let us assume that for each
In order to model the performances of the two
where
In order to focus our attention on a new parametrization highlighting the net effect due to attack plus defense, let us consider the net performance
Note that expressions (1) and (2) are actually fixed effect multivariate multi-way ANOVA model that, in each univariate component and using an R-like coding, can be also expressed as
Y ∼ Home+Team+Opponent+Home*Team+Home*Opponent+Covar_1+…+Covar_q
and
ΔY∼ Home+Team+Opponent+Home*Team+Home*Opponent+ΔCovar_1+…+ΔCovar_q.
When the univariate response represents just the points made and granted,
It is worth noting also that the inference on the net relative strength or weakness of each teams within a league is nothing but that testing on the
We expressed the multivariate null and alternative hypotheses using the Union-Intersection Roys principle (Roy, 1953; Pesarin and Salmaso, 2010), where we take into account for the possible interaction between team and home–away effect (by using the
Moreover, the alternative hypothesis was also broke down into two separated hypotheses, denoted by an upper-right ‘
Anyway, although this should rarely happen in real problems, since it could happen that for some univariate performances a team is either above and at the same time below the mean of the league, it is worth noting that
By exploiting the multivariate one-sided alternatives in expression (4), we may provide a league's ranking using the ranking methodology proposed by Arboretti Giancristofaro et al. (2014). In fact, by suitable combining information from directional multivariate
Let
either using (a) a normal-based and a (b) non-parametric permutation-based approach, that is,
(a) Assuming the normality of random errors in (1), that is,
(1.a).
(2.a).
where
(b) Without assuming any specific random distribution, within a non-parametric framework,
(1.a).
(2.b).
where
As about the multivariate
(1). The Kosts method for combining dependent
(2). One suitable non-parametric combining function, such as the Fishers and the Tippets combining function (Pesarin and Salmaso, 2010; Bonnini et al., 2014).
Depending on the fact that the covariate effects are truly active or null, the combined multivariate permutation tests can be considered respectively as an approximated or an exact testing solution for problems (3) and (4). The normal-based solution has to be always viewed as approximated and its behaviour under finite samples should be evaluated via simulation study, as well as its robustness against violations of the normality assumption (see the next section).
As a final remark, we highlight that under the non-parametric combination setting, the testing problems (3) and (4) can be actually viewed respectively as a multivariate sign one-sample problem and a two-sample multivariate paired data problem (Pesarin and Salmaso, 2010; Bonnini et al., 2014) with constrained permutations under two-way layout (Corain and Salmaso, 2007a). In fact, in the first case, we seek for an evidence against the symmetric distribution around 0 of all univariate responses, while in the second problem, once the
More formally, let us consider all the two occasions where both opponents
In order to investigate the properties of the proposed testing and ranking methods under finite samples of sizes comparable to real situations in data sport analytics, we performed a Monte Carlo simulation study where we independently simulated the outcomes of a tournament consisting of 112 simulated matches as a result of a double round-robin with
The simulation study was designed to take into account for six different settings, defined as combinations of the following configurations:
Two possible scenarios of bivariate mean vectors, as graphically displayed in Figure 1; note that each scenario has some competitors that are equal in performance, that is, in the simulated tournament we are jointly either under the null and the alternative hypothesis; Three type of multivariate distributions for random errors: (a) Normal, (b) Studentst3 (with degree of freedom equal to 3), as an example of an heavy-tailed distribution, (c) g-and-h distributions with g = (0.5,0.5) and h = (0,0), as an example of a moderate skewed distribution (Kowalchuk and Headrick, 2000). Figure 2 provides a graphical representation of each bivariate distribution, along with the simulated performances under scenario 1.
In order to keep our simulation more realistic, we also set up an home-court effect equal to 0.5 and two covariate effect with an average size effect equal to 0.25.
Bivariate mean alpha values by competitor and scenario
Bivariate mean alpha values by competitor and scenario
Simulated errors and simulated team performance from scenario 1 by type of error
Despite the team performance was simulated according to expression (1) with
As far as the specific combination method we used to calculate the combined-multivariate
Legend for rejection rates results in Figures 4 and 5
Note that the solid blue and orange lines represent the rejection rates under the true null hypothesis. This null hypothesis, along with each one of the remaining four selected alternatives, are all tested versus both multivariate one-sided alternatives.
Significant testing on alpha: Rejection rates test by method and type of multivariate distribution
First of all, we note that under the null hypothesis of equality to 0, both procedures properly respect the nominal levels, while under the alternatives, as far as alpha moves apart from 0, the rejection rates approach faster to 1. The normal-based method looks like somewhat more powerful than the permutation-based procedure, showing also a robust behaviour with respect to the type of random error.
Pairwise testing on alpha: Rejection rates test by method and type of multivariate distribution
As far as the pairwise testing, we note that under the null hypothesis of pairwise equality both procedures look like a bit biased but, for the most used significant levels, they are able to properly respect the nominal levels. Under the alternatives and as expected, as the distance between pairs of alpha gets larger, the rejection rates approach faster to 1. The permutation-based method looks like somewhat more powerful than the normal-based procedure, and both methods have a robust behaviour with respect to the type of random error. By setting the significant
Power comparison, at 5% significant
-level, on significant and pairwise combined tests by testing method and type of multivariate distribution
Finally, it is interesting to investigate how both pairwise testing methods behave when applied to the ranking methodology, that is, when they are used to estimate the true ordering among teams (it is actually reported in Figure 1, for both scenarios). In this regards, for each one simulated tournament we calculated the Spearman correlation index between the estimated and the true ranking: The larger the correlation the better is the ranking method. As benchmark, after assuming that the first bivariate simulated response do represent the match score, we also considered
A kind of simulated Win/Loss W/L criterion, whose ranks, at the end of the simulated tournament, may be viewed as a benchmark of the proposed two ranking methods; The least squares ranking of
Ranking performance comparison, at 5% significant
-level, by scenario, ranking method and type of multivariate distribution
As suggested by Figure 7, the comparison among Spearmans rho values by ranking method suggest some interesting hints: Both testing methods (dark blue and red box) appear to be as a reliable tool, either in mean and in variance, to estimate the true underlying ranking among teams. As expected, note that the W/L criterion (yellow box), which is a kind of descriptive in nature ranking approach, is relatively much less reliable, especially under scenario 1 when some teams are supposed to perform equally. Under scenario 1, the ranking results provided by the least squares ï' ¡ estimates method (orange box) are slightly more reliable than the W/L criterion but certainly not better than all two pairwise testing rankings. Note also that the performance of the last method is getting better under scenario 2. The normal-based testing approach show a slightly better performance than the permutation-based one. This kind of bit surprisingly result is probably explained by the intrinsic ability of t-test of getting significant values much more close to zero: since the ranking method requires a suitable adjustment correction, getting closer to zero do provide to the normal-based approach an overall larger power.
In order to highlight the practical use of the proposed methodology, we present its application to a real case study in basketball, namely to the Italian Basket A League (first division,
Univariate ANOVA table summary results by performance indicator
Univariate ANOVA table summary results by performance indicator
Analysis of residuals did not highlight any kind of issue. Residuals were specifically detected against violation of time independence assumption (see the analysis of residuals plots in Appendix). In Figure 9, table results of both normal-based and permutation-based univariate and multivariate

Using an heat-map-like representation, Figures 10 and 11 display the multivariate normal- and permutation-based
Heat-map of multivariate pairwise one-sided normal
-values (from hypothesis testing as in (4))
Heat-map of multivariate pairwise one-sided permutation
-values (from hypothesis testing as in (4))
Accordingly to simulation results, both pairwise normal-based and permutation-based combined directional
Ranking analysis results (10% significant
level) by pairwise testing
As a final remark, it is interesting to underline that the final outcomes of play-off and play-out, not reported nor analysed here, are basically in tune with our findings. In fact, semi-finals were reached by MIO, VE, AV and TN, and eventually VE won the championship by defeating TN in the finals.
The main goal of the article was proposing a new approach to infer on parameters related to the net relative strength or weakness of teams within a league based on modelling results of sport matches within a round-robin setting with a set of paired fixed effect linear models. Even if our reference modelling for data sport performance has been used widely in the literature (Harville, 2003), the novelty of our approach relays on how to do inference in the set of parameters of interest; therefore, we take into account two main issues:
The team ranking is obtained by hypothesis testing and not simply by ordering any point estimate of suitable parameters; Our testing and ranking approach is multivariate in nature, so that we may jointly consider not just one but a (possible large) set of outcomes/performances.
As a result of point 1, our approach is able to handle with tied teams, that is, truly ex-aequo condition, which seem to be a realistic situation happening within data sport analytics. As an advantage from point 2, we are allowed to extract from sport outcomes more information than what is usually done by univariate models. Both the hypothesis testing methods we proposed appeared as reliable tools, in particular with good behaviour irrespective of the shape of the multivariate underlying random distribution. In general, since we are not assuming normality of random errors as done by the traditional testing approaches for linear modelling, the methodology we proposed provides a flexible and less demanding in terms of underlying assumptions tool to infer on the presence of possible multivariate-related dominances that may take place among a set of several multivariate populations, that is, teams or, as forthcoming future extension, among set of players or just single players within a team.
From the practical point of view in its simplest univariate version, the proposed methodology allows us to infer on the teams average net scoring within a league, while in its more intriguing multivariate layout it is suitable for looking for any team-related global dominance using a wide set of performance indicators. The proposed approach can be in future extended to not only numerical but also ordered categorical performance indicators (Arboretti Giancristofaro et al., 2012b,c).
Finally, we presented a real application to basketball in order to highlight that the proposed methodology can be effective to face some real problems in sport performance analytics. In this connection, we suggest that some directions for future research should be addressed to a more in-depth pre-analysis to carefully select the relevant list of sport performances and covariates with the goal of making our analysis more predictive in terms of consistency with sport, outcomes and the related ranking rules and rationales. Finally, by taking inspiration of what is done in statistical process control (Corain and Salmaso, 2013, 2014), a kind of innovative multivariate control chart could be proposed as a way to monitor the teams performance time-by-time during the tournament.
Appendix
where
where
Team label reference
Performance and covariate scatter plots
Analysis of residuals
Acknowledgments
The authors would like to thank the anonymous reviewers for their helpful and constructive comments that greatly contributed to improving the quality of the manuscript. It has to be acknowledged that authors contributed equally likely to this manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors received no financial support for the research, authorship and/or publication of this article.
