Abstract
Although the full potential of observational methodology is realized through diachronic analyses, synchronic analyses can be used to investigate associations between categorical variables. Log-linear modeling is an appropriate method for investigating associations between three or more dimensions using multidimensional contingency tables. We provide a practical example of how we used log-linear analysis to study efficiency in a men’s basketball competition played by Spain’s top teams using a model containing three dimensions (and their respective categories): position of last pass before a shot, position of shot, and result of shot. The best-fit and most parsimonious model (i.e., the model that provided the best explanation of the observed frequencies in the contingency table and that contained the fewest effects) was a conditional independence model in which last pass position and shot position were associated independently of the categories in the shot result dimension and the interaction between shot position and shot result was not affected by the categories in the last pass dimension. Estimation and subsequent interpretation of the significant parameters in the selected model showed how log-linear modeling can provide basketball coaches with practical insights within an observational methodology study.
Introduction
This study has a twofold objective. Our first aim is methodological in nature and consists of illustrating the potential of log-linear analysis applied to observational methodology 1 by providing a simple, practical example of how this synchronic technique can be used to study efficiency in sport. Our second aim, which is substantive in nature, is to shed light on successful and unsuccessful play in elite men’s basketball by studying associations between the position of the last pass before a shot (last pass position), the area of the court from where the shot was taken (shot position), and the outcome (favorable vs. unfavorable) of the shot (shot result).
Efficiency in basketball has been analyzed by studying the impact of different game-related variables on shot outcomes (e.g., basket, missed basket, foul).2–5 In studies of this type, it is essential to know the exact position of the shot, not only to analyze biomechanical factors, which have an obvious impact on scoring success,6–8 but also to study tactics related to timing and positioning.9,10 Shooting success has also been analyzed from the perspective of where the actions leading up to a shot occur,11–13 with particular emphasis on the position of the last pass.14–16
To satisfy the methodological aim of this study and present a conceptually simple application of log-linear analysis, we included just three dimensions in our model, as the more dimensions there are, the more complicated it is to interpret results. The specific aim was to investigate synchronic associations between last pass position, shot position, and shot result.
Log-linear analysis
The distribution of observed frequencies in a contingency table is influenced by the size of the sample and interactions between the categorical data within the different dimensions. This influence is known as effect, and it determines the distribution of the cells in the table.
The number of effects depends on the number of study dimensions (usually called ‘variables’ in experimental and quasiexperimental studies). As we studied three variables—last pass position, shot position, and shot result—the distribution of observed frequencies could potentially be influenced by eight effects in total, as the number of possible effects (models) is 2,
3
where the exponent corresponds to the number of dimensions.
17
These effects are described below:
Overall or grand-mean effect, which reflects the size of the sample (first effect). First-order or main effects, reflecting the distribution of each of the variables in the contingency table (second, third, and fourth effects). Second-order or association effects, reflecting the interactions between pairs of variables (fifth, sixth, and seventh effects). Third-order or interaction effect, reflecting the interaction among the three variables (eighth effect).
The contribution of these effects can be statistically expressed using log-linear analysis. Log-linear models are additive models based on the calculation of natural logarithms (ln) of expected frequencies, which constitute the parameters of the model. The value of these frequencies depends on the sum of effects attributable to sample size and the relationship between the categorical variables.
In a three-way contingency table, the combination of effects will generate 19 different models for explaining the distribution of observed frequencies. The simplest model is the equiprobability or grand-mean model, where the only effect is that that attributable to sample size (overall or grand-mean effect). The most complex model is the saturated model, which includes all possible effects.18,19 The equation for the saturated model for a three-way contingency table is
The purpose of log-linear analysis is to identify the model that provides the best explanation of the distribution of frequencies (arranged in the cells of the contingency table) corresponding to the categories/codes in each dimension. The choice of model is determined by two factors: fit and simplicity (parsimony). 20 A model, for instance, will be selected if it provides an adequate fit to the data, i.e., if no significant differences are observed between the expected and the observed frequencies. Once this goodness of fit has been confirmed, the model must meet the criterion of parsimony: it must be simplest (i.e., the model containing the fewest effects) of all the candidate models. 21 Once the best model has been chosen, the next step is to obtain the parameter estimates. This is done by quantifying the effect that each dimension exerts on the frequencies of the different categories it contains and the effect of the interactions between the dimensions with respect to the cell counts in the contingency table. 22 The third and final step is to interpret the results.
Method
We applied an intersessional/intrasessional, nomothetic, multidimensional observational methodology design. 23 It was intersessional because we studied seven matches in an elite basketball competition, and intrasessional, because we performed a frame-by-frame analysis of the behaviors in all the matches in the observation sample. The study was nomothetic because we observed different players on different teams, each playing different rivals, and multidimensional because we studied different dimensions of behavior dictated by the observation instrument. These behaviors can be classified as proxemic (relating to where they occurred) and gestural (relating to the different technical-tactical offensive actions).
Participants
The participants were the players on the teams that won the quarter-finals, semi-finals, and final of the 2012 Copa del Rey, which is a basketball competition involving the top teams in the Spanish men's ACB league. The sample consisted of 456 sequences ending in a shot by one of the winning teams. The breakdown of sequences per match and hence per winning team (shown first in each pair) is as follows: 66 for Caja Laboral-Gipuzkoa, 62 for FC Barcelona-Lucentum, 69 for Banca Cívica-Unicaja, 68 for Real Madrid CF-Fuenlabrada, 60 for Real Madrid-Caja Laboral, 63 for Real Madrid CF-Banca Cívica, and 68 for Real Madrid-FC Barcelona.
The research project was approved by a scientific committee at the University of La Rioja in accordance with the Ethical Principles of Psychologists, the Code of Conduct of the American Psychological Association, and the guidelines of the Ethics Committee of the Spanish Association of Psychologists.
Observation instrument
Observation instrument.
The observation instrument was loaded into the data annotation and coding program Match Vision Studio v.3 (Figure 1).
25
The sequences analyzed comprised the shot itself and a maximum of five actions immediately preceding the shot (maximum of six rows for each sequence in the dataset). For each action, the observer noted down where and when it happened. The resulting data were therefore concurrent, time-based type IV data.
26
Screenshot showing data annotated for a moment of the match in Match Vision Studio v.3.
Data quality control: Interobserver agreement and generalizability of results
Interobserver agreement
We analyzed interobserver agreement to check the reliability of the datasets. The data were annotated by two observers duly trained according to the procedure described by Anguera. 27 The first observer annotated all the sequences for all the matches, and the second observer annotated 10% of the sequences in each match. Interobserver agreement was measured by calculating Cohen’s kappa statistics 28 in GSEQ (v. 5.1) following the recommendations of Bakeman and Quera. 29 A kappa statistic of over 0.81 was obtained for all the datasets in the observation sample, confirming the reliability of the data used in the subsequent analyses. The kappa statistics calculated for the respective matches were 0.81 for Caja Laboral-Gipuzkoa, 0.82 for FC Barcelona-Lucentum, 0.91 for Banca Cívica-Unicaja, 0.82 for Real Madrid CF-Fuenlabrada, 0.82 for FC Barcelona-Caja Laboral, 0.81 for Real Madrid CF-Banca Cívica, and 0.84 for Real Madrid-FC Barcelona. The team observed (i.e., the winner of the match) is shown first.
Generalizability of results
We performed a generalizability theory analysis 30 in SAGT v. 1.0 31 to estimate the generalizability coefficients for the general linear model corresponding to the design categories/matches, with match as the instrumentation facet. The results showed that 96.1% of the variability was accounted for by the categories facet, 0.4% by the matches facet, and 3.4% by the interaction between these facets. The relative generalizability coefficient obtained, e2 = 0.995, shows a high level of generalizability, reflecting the homogeneity of the dataset (matches/team observed).
Log-linear analysis
The three dimensions used in the log-linear analysis were last pass, shot position, and shot result.
The analysis was performed according to the steps described previously. Of the 19 models generated, the model with the best fit was selected through backward elimination in SPSS v. 19.0. This procedure simplifies models through hierarchical stepwise backward elimination of nonsignificant effects. Hierarchical elimination means that if a model includes higher order parameters, then all the lower-order parameters must necessarily be included. Beginning with the saturated model, nonsignificant higher order interactions are progressively eliminated until the simplest model (the one with the fewest effects) offering the same goodness of fit is reached. 22
Parameter estimation can be performed using the general log-linear analysis procedure in SPSS, which quantifies the magnitude of each effect in the selected model. Prior to this, however, it is necessary to perform dummy coding using a reference category for each dimension in the model. In our case, these categories were the outer central corridor for last pass position, the paint for shot position, and favorable results for shot result. The resulting parameters are expressed as natural logarithms, and as such, the exponential function (or anti-logarithm) of the estimated parameters are the equivalent of odds and odds ratios for the different categories in the dimensions. These logs quantify the magnitude of each of the effects.
Results
Considering the practical relevance of our findings, both for researchers and basketball coaches, we have graphically depicted our results for shot position, shot result (favorable or unfavorable), and completion of sequence in Figure 2.
Favorable and unfavorable results for shots and completion of sequence categories by court area. (a) The figure on the left shows the percentage of shots taken from the different areas of the court. The figure on the right shows the percentage of shots with a favorable result in each of these areas (rates over 50% are shown in green while those under 50% are shown in red). (b) Distribution of sequence completion categories in the different areas of the court. Categories with favorable results are shown in green while those with unfavorable results are shown in red. A1 indicates basket and foul; Bl, block; FR, foul received; Mk, basket; Ms, missed basket.
In the log-linear analysis, the simplest hierarchical model resulting from the backward elimination process was the conditional independence model,
19
which consists of two pairs of dimensions that are associated independently with a third dimension, with all three dimensions generating categorical data. Specifically, the interaction between last pass position and shot position was independent of the shot result categories, and the interaction between shot position and shot result was not affected by the last pass position categories. The Pearson chi-square (goodness of fit) test showed that the model provided a good fit to the study data (χ2 = 34.329; p = 0.794), as the differences between the expected and observed frequencies were nonsignificant.
19
The equation for the selected model can thus be expressed as
Estimation of significant parameters in selected model.
The results of the log-linear analysis can be interpreted using the exp (λ) values shown in the fourth column of Table 2. The values are interpreted differently depending on whether they are higher or lower than 1. Values above 1 indicate an increased likelihood of a given category occurring compared with the reference category. The information in the first row of Table 2, thus, tells us that the last pass before a shot is 1.799 times more likely to come from the paint than from the outer central corridor. Values below 1, by contrast, indicate a decreased likelihood of a given category occurring compared with the reference category. In this case, row 2 tells us that a shot taken from the outer left corridor is 0.205 times less likely than a shot taken from the paint. The interpretation of values below 1 can be simplified by using the inverse of exp (λ): 1/exp (λ). This inverse value shows how more likely the reference category is to occur than the category being analyzed. Using the same example as above, the inverse value of 0.205 is 4.878, which means that a player is 4.878 times more likely to take a shot from the paint than from the outer left corridor.
Discussion
As stated by Anguera and Hernández-Mendo, 32 the choice of analytical technique in an observational methodology study is determined mainly by the design of the study 23 but also by the observation instrument used and the nature of the data. Furthermore, and in accordance with Anguera and Izquierdo, 33 the true potential of observational methodology can be unlocked by undertaking diachronic analyses of concurrent time-based data (type IV data according to Bakeman 26 ). The three main diachronic analytical techniques used in observational methodology 34 are T-pattern detection (using the THEME software package),35,36 lag sequential analysis, 26 and polar coordinate analysis. 37
However, despite the enormous potential offered by these diachronic techniques in observational methodology, synchronic analyses are sometimes sufficient to answer the research question(s) being investigated.38–40 The most common type of synchronic statistical techniques that search for associations between dimensions containing categorical data are chi-square-type tests. 41 These tests, however, are limited as they can only compare two sets of variables or dimensions (two-way contingency tables). When more than two variables are involved, this limitation is typically overcome by performing various pairwise analyses in different contingency subtables. 42 The emergence of log-linear analysis permitted the investigation of various types of higher order interactions between dimensions comprising categorical variables. 41 Log-linear analysis involves examining multidimensional contingency tables and is therefore appropriate for studies investigating three or more dimensions.43,44
The strength of log-linear analysis lies in its ability to quantify the individual influence of dimensions (through their different categories) on frequencies as the combined effect of several variables on the magnitude present in the cells of the contingency table.17,20 Specifically, log-linear analysis can a) provide an overall picture of the effects of variables on the distribution of observed frequencies by presenting a model that formally depicts the relationship between these variables and b) provide a narrower picture by showing the influence of interactions between dimensions on frequencies in the contingency table, i.e., by showing in which cells a given parameter is relevant or not. This is possible because in log-linear analysis there are no dependent or independent variables. All the variables are symmetric and their purpose is to explain the distribution of frequencies. When the aim is to analyze the strength of associations between a dependent variable and a series of independent variables, other techniques, such as logistic regression analysis19,39 or data envelopment analysis (DEA)45,46 are more appropriate.
We have illustrated the use of log-linear analysis with a practical example designed to shed light on factors influencing efficiency in basketball. We used a simplified three-way model representing three dimensions (and their corresponding categories): last pass position, shot position, and shot result. We then selected the most parsimonious model capable of explaining the distribution of cell counts in the three-way model. 47 The model, which was selected through backward elimination, was a conditional independence model, i.e., a model containing two variables that were associated independently of a third variable.
The first-order effects showed that a pass leading up to a shot was more likely to come from the paint than from the outer central corridor. This observation is consistent with the findings of Fernández et al., 24 who reported that the fewest passes were made in this outer area. Lamas, De Rose, Santana, Rostaiser, Negretti, and Ugrinowitsch 48 proposed using the outer central corridor and the paint, respectively, to create space inside and outside the zone in order to penetrate towards the basket rather than passing the ball for a shot. Our results do not coincide with those15,49,50 who found that the most effective sequence was a pass to a shooter from the outer area. Nor do they coincide with those49,51,52 who reported that the central corridor was the best place for distributing the ball and making the last pass.
Continuing with our discussion of first-order effects, we found that shots were more likely to be taken from inside the paint than from the outer left corridor, the outer central corridor, the middle of the right corridor, the middle of the left corridor, or the middle central area. Our identification of the paint as the most likely place for a shot is consistent with findings from studies that have analyzed overall shot positions.11,14,24,53 Different results, however, have been observed in studies that have analyzed specific game situations. Shots following a direct block, for example, are mostly taken in the outer central corridor,
54
while over half of the shots taken in the first and last five minutes of a match are taken from the outer left corridor.
55
Figure 3(a) shows a graphic representation of the first-order effects.
Graphic representation of first-and second-order effects. (a) First-order effects. The figure on the left shows that last passes are more likely to come from the paint (light green) than from the outer central corridor (dark red). The figure on the right shows that shots are more likely to be taken in the paint (light green) than in the outer left corridor, the outer central corridor, the middle of the right corridor, the middle of the left corridor, or the middle of the central corridor (dark red). (b) Second-order effects of the association between last pass position and shot position showing the likelihood of a shot according to where the pass came from. (c) Second-order effects of the association between shot position and shot result showing the position of shots with a greater likelihood of an unfavorable outcome (dark red).
Finally, we saw that shots were more likely to have a favorable outcome than an unfavorable outcome, supporting findings reported for elite senior basketball.11,24
The second-order effects for the interaction between last pass position and shot position show that when the ball is passed from the outer right corridor, the shot is more likely to be taken from inside the paint than from the outer right corridor. These results are consistent with those 52 who found that passes directly preceding a shot came more frequently from a player to the right of the shooter.
The second-order effects for the interaction between last pass position and shot position also show that shots are more likely to be taken from inside the paint when the player receives the ball from the outer left corridor than from either the outer right corridor or the middle of the right corridor. Passes from the outer left corridor to the paint have been linked to successful pre-shot actions, including a switch of the ball from outside to inside the three-point line (or vice versa), 15 an inside pass, 56 and the creation of a wide distance between the passer and the shooter. 52 Our results for these second-order effects are also consistent with findings of Muñoz et al., 53 who observed that sequences featuring inside passes leading to a shot near the basket were effective. Fernández et al. 24 also showed that shots following an inside pass from the outer right or left area were associated with favorable outcomes.
The second-order effects for the interaction between last pass position and shot position also show that shots from inside the paint are more likely to occur when the pass is made from the paint than from either the outer right corridor, the outer central corridor, or the middle of the right corridor. Passes within the paint have been associated with scoring success in previous studies of inside play 49 and of actions consisting of one-on-one situations starting outside the three-point line and culminating in a shot taken close to the basket.24,53
Finally, our results show that shots are more likely to be taken from inside the paint than from the outer right corridor or the middle of the right corridor when the player making the pass is in the middle of the right corridor. Passes from this area of the court to the paint are uncommon,24,57 but the option of switching the ball from the right corridor to the central corridor and then into the paint, where the chances of scoring are highest, is tactically advisable.11,14,24,53 Figure 3(b) shows a graphic representation of the second-order effects of the association between last pass position and shot position.
The second-order effects for the interaction between shot position and shot result show that shots are more likely to have an unfavorable outcome when they are taken from the outer right corridor, the outer left corridor, the outer middle area, the middle of the right corridor, or the middle of the left corridor. These findings are consistent with results showing that outside shots are less effective than inside shots.11,14,24,53,57 Figure 3(c) shows a graphic representation of the second-order effects of the association between shot position and shot result.
In conclusion, we have presented a practical example from elite basketball to show how log-linear analysis can be applied to examine a multidimensional contingency table and identify significant associations between aspects of offensive play in the form of last pass position, shot position, and shot result.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors gratefully acknowledge the support of two Spanish government projects (Ministerio de Economía y Competitividad): 1) La actividad física y el deporte como potenciadores del estilo de vida saludable: Evaluación del comportamiento deportivo desde metodologías no intrusivas [Grant number DEP2015-66069-P, MINECO/FEDER, UE]; 2) Avances metodológicos y tecnológicos en el estudio observacional del comportamiento deportivo [PSI2015-71947-REDP, MINECO/FEDER, UE].
