Abstract
The work describes developments in the multiple regression performed for building models resistant to multicollinearity, having meaningful robust solution for individual parameters, convenient for interpretation of the results, and good for prediction. A tool from the cooperative game theory, the Shapley Value analysis, have been tried for estimation of regression coefficients and relative usefulness of the predictors in a model. This approach has been checked and successfully applied in various real-life projects in data analysts for commercial companies. It is useful for decision makers in economics, management, marketing research, and any other practical fields.
Introduction
This brief review describes works in multiple regression analysis when a researcher needs to know a comparative importance of the individual predictors in the model. However, such analysis can be difficult to perform because of multicollinearity among regressors, which produces inflated coefficients and negative inputs to multiple determination from presumably useful regressors. To solve this problem, one of the most famous tools of the cooperative (non-antagonistic) game theory had been tried. It was introduced by Lloyd Shapley (1953), who together with his follower Alvin Roth (1988) won Nobel Prize in economics in 2012.
Shapley Value (denoting it SV) uses a finite formula of combinatorial kind to assign a unique distribution among all the players who yield a total surplus in their coalition. In a lay explanation, the SV allocates the total value of the game to each player by evaluating over all possible coalitions that a player can join in. The value for an
where the summing goes across all possible subsets of players
so the summation is weighted by a factor that reflects the number
Employing the coefficients of multiple determination
Besides applications for multiple linear regression, the SV technique have also been applied in the discriminant and logistic regression analysis (Lipovetsky & Conklin, 2004a; Lipovetsky, 2006), and random-coefficient models (Lipovetsky, 2007). Special clarifications that SV is useful not only for finding the predictors’ shares but for estimation of the parameters of regression as well was given in (Lipovetsky & Conklin, 2010a, b). Using SVR in modeling with structurally missing data is considered in (Lipovetsky & Nowakowska, 2013), application to the combined Granger-Koyck causality modeling in (Lipovetsky, 2016), and to finding key drivers for individual respondents in (Lipovetsky, 2020).
SVR vs OLS and other methods
Comparing SVR with the ordinary least squares (OLS) and other techniques of regression modeling, the following differences can be noticed.
OLS vs SVR. OLS builds an aggregate of the independent variables (IVs) for the best fit of the dependent variable (DV), but it does not consider estimation of the individual contribution of IVs into the model. To get meaningful individual IVs parameters not prone to the multicollinearity effects, various techniques have been developed, among the most known are Ridge-regression, LASSO, Elastic nets, and SVR as well. The SVR has the following desirable features. As it was noted in (Lipovetsky & Conklin, 2014), only the predicted DV values and Robustness of parameters. SVR produces not only contribution of predictors to DV variability, but the estimates of the regression coefficients which are very stable to a possible change in the data sample. Standard deviations of regression coefficients are always smaller and t-statistics bigger in SVR than in OLS, so SVR yields statistically significant parameter estimates where OLS fails (for example, (Lipovetsky & Conklin, 2001, Table 2). Bootstrapping for t-statistics estimations have been used regularly in many dozens of datasets, and SVR always outperformed OLS. Elasticity. OLS coefficients are usually interpreted as a change in the DV due to unit change in an IV subject to all other IVs being held constant, so it is an analogue of elasticity in absolute units. However, the OLS coefficients could be very far from such elasticity values because of multicollinearity which inflates and changes signs of the parameters of collinear IVs pushing them to opposite directions. It is possible to build models by the elasticity criterion cleaning the data from smaller changes and using an approach of data gradients: comparisons of such models with SVR show that they are very close, so in contrast to OLS the SVR coefficients can be interpreted as the elasticity (Lipovetsky, 2010, 2012). Shares’ structure. Pair correlations present separated measures of IVs relations with DV, but they do not estimate the individual predictors’ contribution in their simultaneous synergic impact on the DV in the total model. Because of such a synergy, the coefficient of multiple determination SVR vs other modern techniques. SVR has been compared to many other methods of regularized regressions in multiple studies, and the results of different methods are very close to those of SVR. For example, see comparisons to several kind of Ridge-regression models in (Lipovetsky & Conklin, 2010b, Table 4), to indices of Gibson-Johnson and Green obtained via the orthonormal approximations in (Lipovetsky & Conklin, 2014), to data gradient modeling (Lipovetsky, 2010, 2012), to the special parameterization techniques (Lipovetsky, 2009), and Ehrenberg’s residual error analysis (Lipovetsky, 2013, Table 3). All these approaches yield results close between themselves, except the OLS which is not safe from the multicollinearity distortion of the parameters. Predictions on new data sets by correlated regressors. Although OLS gives the best precision for the self-predictions on the same data which was used for building the model, but the SVR could be better for the prediction on new data sets. Various simulation had been performed on the potential outcomes and found the SVR prevails by the residual mean square error (RMSE) in many cases – these results are described in detail in (Lipovetsky & Conklin, 2010b). In OLS, if a presumably beneficial IV receives an opposite by sign coefficient, should we use its higher or lower value to get an improvement in the outcome? It’s hard to believe that the model itself somehow would make a meaningful adjustment and yield an adequate prediction for a new data, but with SVR we are free from such assumptions. However, for a new data by the correlated predictors (and they are always correlated in real life) it makes sense to choose on the new values of the predictors taking into account their own structure of correlations, that is considered in (Lipovetsky, 2017).
Resuming, the SVR has interpretable results for analyzing individual predictor’s importance and coefficients, its predictions are often better than OLS on new data, the standard errors of coefficients are smaller than in OLS, so
