Game theory in regression modeling: A brief review on Shapley Value regression

Abstract

The work describes developments in the multiple regression performed for building models resistant to multicollinearity, having meaningful robust solution for individual parameters, convenient for interpretation of the results, and good for prediction. A tool from the cooperative game theory, the Shapley Value analysis, have been tried for estimation of regression coefficients and relative usefulness of the predictors in a model. This approach has been checked and successfully applied in various real-life projects in data analysts for commercial companies. It is useful for decision makers in economics, management, marketing research, and any other practical fields.

Keywords

Regression modeling resistance to multicollinearity Shapley Value regression

1. Introduction

This brief review describes works in multiple regression analysis when a researcher needs to know a comparative importance of the individual predictors in the model. However, such analysis can be difficult to perform because of multicollinearity among regressors, which produces inflated coefficients and negative inputs to multiple determination from presumably useful regressors. To solve this problem, one of the most famous tools of the cooperative (non-antagonistic) game theory had been tried. It was introduced by Lloyd Shapley (1953), who together with his follower Alvin Roth (1988) won Nobel Prize in economics in 2012.

Shapley Value (denoting it SV) uses a finite formula of combinatorial kind to assign a unique distribution among all the players who yield a total surplus in their coalition. In a lay explanation, the SV allocates the total value of the game to each player by evaluating over all possible coalitions that a player can join in. The value for an $i$ -th player can be defined as

$\displaystyle\textit{SV}_{i}=\mathop{\sum}\limits_{S}\gamma_{n\left(S\right)}% \left[{v\left(S\right)-v\left({S-\left\{i\right\}}\right)}\right]$

where the summing goes across all possible subsets of players $S$ . The function $v$ (.) is called a characteristic function defining a value of a coalition. The value of an $i$ -th player is defined via the averaged increments for all subsets of the values $v(S)$ and $v(S-\{i\})$ of the game with and without $i$ -th player. In other words, it is the marginal value of adding the player to any possible set of other players. The weights are defined as

$\displaystyle\gamma_{n\left(S\right)}=\frac{\left({s-1}\right)!\left({n-s}% \right)!}{n!}$

so the summation is weighted by a factor that reflects the number $s$ of subsets of a particular size that are possible given the total number $n$ of the players. More on SV can be found in (Fragnelli & Sánchez-Soriano, 2020).

Employing the coefficients of multiple determination $R^{2}$ for the value function, it is possible to define the relative importance of the predictors and by them to estimate parameters of the SV regression (SVR). This approach was proposed by (Lipovetsky & Conklin, 2001), with demonstration of theoretical and practical advantages of SVR. The SVR results are very encouraging – they systematically show that it is possible to perform a reliable analysis even in presence of a high degree of multicollinearity among the predictors. Such results can be understood by the specific structure of Shapley Value inputs as averages of the incremental net effects over all possible coalitions of regressors (Lipovetsky & Conklin, 2005). While the ordinary least squares (OLS) regression coefficients and shares of predictor importance are highly prone to multicollinearity distortions, all SV characteristics, being averaged values, are very consistent and demonstrate very stable bootstrapping output.

Besides applications for multiple linear regression, the SV technique have also been applied in the discriminant and logistic regression analysis (Lipovetsky & Conklin, 2004a; Lipovetsky, 2006), and random-coefficient models (Lipovetsky, 2007). Special clarifications that SV is useful not only for finding the predictors’ shares but for estimation of the parameters of regression as well was given in (Lipovetsky & Conklin, 2010a, b). Using SVR in modeling with structurally missing data is considered in (Lipovetsky & Nowakowska, 2013), application to the combined Granger-Koyck causality modeling in (Lipovetsky, 2016), and to finding key drivers for individual respondents in (Lipovetsky, 2020).

2. SVR vs OLS and other methods

Comparing SVR with the ordinary least squares (OLS) and other techniques of regression modeling, the following differences can be noticed.

1.
OLS vs SVR.

OLS builds an aggregate of the independent variables (IVs) for the best fit of the dependent variable (DV), but it does not consider estimation of the individual contribution of IVs into the model. To get meaningful individual IVs parameters not prone to the multicollinearity effects, various techniques have been developed, among the most known are Ridge-regression, LASSO, Elastic nets, and SVR as well. The SVR has the following desirable features. As it was noted in (Lipovetsky & Conklin, 2014), only the predicted DV values and $R^{2}$ do not depend on the degree of a correlation matrix ill-conditioning, while the OLS model coefficients are prone to the multicollinearity effects.
2.
Robustness of parameters.

SVR produces not only contribution of predictors to DV variability, but the estimates of the regression coefficients which are very stable to a possible change in the data sample. Standard deviations of regression coefficients are always smaller and t-statistics bigger in SVR than in OLS, so SVR yields statistically significant parameter estimates where OLS fails (for example, (Lipovetsky & Conklin, 2001, Table 2). Bootstrapping for t-statistics estimations have been used regularly in many dozens of datasets, and SVR always outperformed OLS.
3.
Elasticity.

OLS coefficients are usually interpreted as a change in the DV due to unit change in an IV subject to all other IVs being held constant, so it is an analogue of elasticity in absolute units. However, the OLS coefficients could be very far from such elasticity values because of multicollinearity which inflates and changes signs of the parameters of collinear IVs pushing them to opposite directions. It is possible to build models by the elasticity criterion cleaning the data from smaller changes and using an approach of data gradients: comparisons of such models with SVR show that they are very close, so in contrast to OLS the SVR coefficients can be interpreted as the elasticity (Lipovetsky, 2010, 2012).
4.
Shares’ structure.

Pair correlations present separated measures of IVs relations with DV, but they do not estimate the individual predictors’ contribution in their simultaneous synergic impact on the DV in the total model. Because of such a synergy, the coefficient of multiple determination $R^{2}$ can be bigger than the total of all pair correlations of the IVs with DV. Also, an IV can have close to zero correlation with the DV, but in a multiple regression its coefficient and contribution differ from zero (see Lipovetsky & Conklin, 2004b). In contrast to OLS, the SVR coefficients have a structure of shares similar to that of the squared paired correlations of DV with IVs, which makes SVR a reliable tool for the predictors’ analysis and interpretation of their impact in the regression (Lipovetsky & Conklin, 2010b, 2014).
5.
SVR vs other modern techniques.

SVR has been compared to many other methods of regularized regressions in multiple studies, and the results of different methods are very close to those of SVR. For example, see comparisons to several kind of Ridge-regression models in (Lipovetsky & Conklin, 2010b, Table 4), to indices of Gibson-Johnson and Green obtained via the orthonormal approximations in (Lipovetsky & Conklin, 2014), to data gradient modeling (Lipovetsky, 2010, 2012), to the special parameterization techniques (Lipovetsky, 2009), and Ehrenberg’s residual error analysis (Lipovetsky, 2013, Table 3). All these approaches yield results close between themselves, except the OLS which is not safe from the multicollinearity distortion of the parameters.
6.
Predictions on new data sets by correlated regressors.

Although OLS gives the best precision for the self-predictions on the same data which was used for building the model, but the SVR could be better for the prediction on new data sets. Various simulation had been performed on the potential outcomes and found the SVR prevails by the residual mean square error (RMSE) in many cases – these results are described in detail in (Lipovetsky & Conklin, 2010b). In OLS, if a presumably beneficial IV receives an opposite by sign coefficient, should we use its higher or lower value to get an improvement in the outcome? It’s hard to believe that the model itself somehow would make a meaningful adjustment and yield an adequate prediction for a new data, but with SVR we are free from such assumptions. However, for a new data by the correlated predictors (and they are always correlated in real life) it makes sense to choose on the new values of the predictors taking into account their own structure of correlations, that is considered in (Lipovetsky, 2017).

3. Conclusions

Resuming, the SVR has interpretable results for analyzing individual predictor’s importance and coefficients, its predictions are often better than OLS on new data, the standard errors of coefficients are smaller than in OLS, so $t$ -statistics are better, and the models are robust. Results and experience accumulated in two decades of using the SV approach in many hundreds of various real projects permit to conclude that the SVR technique significantly facilitates analysis of the predictors influence in regression models and is very convenient for practical applications.

References

Fragnelli

, & Sánchez-Soriano

, eds. (2020). Handbook of the Shapley Value, Chapman and Hall/CRC, Boca Raton, FL.

Lipovetsky

(2006). Entropy Criterion in Logistic Regression and Shapley Value of Predictors, Journal of Modern Applied Statistical Methods, 5, 121-132.

Lipovetsky

(2007). Iteratively Re-weighted Random-Coefficient Models and Shapley Regression, Model Assisted Statistics and Applications, 2, 201–212.

Lipovetsky

. (2009). Linear Regression with Special Coefficient Features Attained via Parameterization in Exponential, Logistic, and Multinomial – Logit Forms, Mathematical and Computer Modelling, 49, 1427-1435.

Lipovetsky

. (2010). Meaningful regression coefficients built by data gradients, Advances in Adaptive Data Analysis, 2, 451-462.

Lipovetsky

(2012). Interpretation of Shapley Value Regression Coefficients as Approximation for Coefficients Derived by Elasticity Criterion, JSM’12, Proceedings of the Joint Statistical Meeting of the American Statistical Association, July-August, 3302-3307, San Diego, CA.

Lipovetsky

(2013). How Good is Best? Multivariate Case of Ehrenberg-Weisberg Analysis of Residual Errors in Competing Regressions, Journal of Modern Applied Statistical Methods, 12(2), 242-255.

Lipovetsky

(2016). Combined Granger-Koyck Causality Distributed Lag Modeling, International Journal of Operations and Quantitative Management, 22(4), 317-333.

Lipovetsky

(2017). Prediction of Percent Change in Linear Regression by Correlated Variables, Journal of Modern Applied Statistical Methods, 16(2), 347-358.

10.

Lipovetsky

(2020). Personalized Key Drivers for Individual Responses in Regression Modeling, International Journal of Risk and Contingency Management, 9(3), 15-30.

11.

Lipovetsky

, & Conklin

(2001). Analysis of Regression in Game Theory Approach, Applied Stochastic Models in Business and Industry, 17, 319-330.

12.

Lipovetsky

, & Conklin

(2004a). Decision Making by Variable Contribution in Discriminant, Logit, and Regression Analyses, Information Technology and Decision Making, 3, 265-279.

13.

Lipovetsky

, & Conklin

(2004b). Enhance-Synergism and Suppression Effects in Multiple Regression, International Journal of Mathematical Education in Science and Technology, 35, 391-402.

14.

Lipovetsky

, & Conklin

(2005). Incremental Net Effects in Multiple Regression, International Journal of Mathematical Education in Science and Technology, 36, 361-373.

15.

Lipovetsky

, & Conklin

(2010a). Reply to the paper “Do not adjust coefficients in Shapley value regression”, Applied Stochastic Models in Business and Industry, 26, 203-204.

16.

Lipovetsky

, & Conklin

(2010b). Meaningful Regression Analysis in Adjusted Coefficients Shapley Value Model, Model Assisted Statistics and Applications, 5(4), 251-264.

17.

Lipovetsky

, & Conklin

(2014). Predictor Relative Importance and Matching Regression Parameters, Journal of Applied Statistics, 42, 1017-1031.

18.

Lipovetsky

, & Nowakowska

(2013). Modeling with Structurally Missing Data by OLS and Shapley Value Regressions, International Journal of Operations and Quantitative Management, 19(3), 169-178.

19.

Roth

., ed. (1988). The Shapley Value: Essays in Honor of Lloyd S. Shapley. Cambridge University Press: Cambridge.

20.

Shapley

LS.

(1953). A value for n-person games. In: Contribution to the Theory of Games Kuhn

H.W.

and Tucker

A.W.

, eds., II, Princeton University Press, Princeton, NJ, 307-317.