In applications of multiple regression, one of the most common goals is to measure the relative importance of each predictor variable. If the predictors are uncorrelated, quantification of relative importance is simple and unique. However, in practice, predictor variables are typically correlated and there is no unique measure of a predictor variable’s relative importance. Using a transformation to orthogonality, new measures are constructed for evaluating the contribution of individual variables to a regression sum of squares. The transformation yields an orthogonal approximation of the columns of the predictor scores matrix and it maximizes the sum of the covariances between the cross-product of individual regressors and the response variable and the cross-product of the transformed orthogonal regressors and the response variable. The new measures are compared with three previously proposed measures through examples and the properties of the measures are examined.
An important question that statistical consultants and researchers commonly face after conducting a multiple regression analysis is which variable contributes most to predict or explain the criterion variable. For example, a chemist may raise the question of the relative importance of temperature and concentration in determining the rate of reaction. The term importance is recognized in the literature as having various possible meanings. A predictor may be considered important if the corresponding regression parameter is statistically significant. A second definition judges a predictor as more or less important on the basis of its practical impact on the response. It has been argued that the question of relative importance is even more common than the question of statistical significance (e.g. Healy, 1990).
Numerous methods have been proposed for evaluating the relative importance of regressors. The two most obvious methods are the beta weight method, which simply looks at the beta coefficients of variables that have been standardized to have variances of one, and the zero-order correlation method, which looks at the correlation between individual variables and the response. General statistical packages automatically include these statistics in their output from a regression or correlation analysis, making them convenient to use. However, as noted by Lipovetsky and Conklin (2015), multicollinearity can make these measures practically meaningless since, for example, high collinearity can change signs and inflate the values of regression coefficients in comparison with pair correlations between the regressors and response. Other evaluation methods include product measures (Pratt, 1987), usefulness (Darlington, 1968), structure coefficients (Courville & Thompson, 2001), dominance analysis (Budescu, 1993), orthogonal counterparts (Gibson, 1962; Johnson, 1966), relative weight analysis (Johnson, 2000), Shapley value regression (Lipovetsky & Conklin, 2001) and random forests (Liakhovitski et al., 2010). When predictors are uncorrelated, these measures lead to the same result and have the desirable property that their measures of the individual contributions of the predictor variables sum to , the proportion of the variation in the response that the regressors explain. However, the measures can give quite different results for correlated regressors.
Reviews of work on relative importance are given in Johnson and LeBreton (2004), Nathans et al. (2012), Grömping (2007), Lipovetsky and Conklin (2010) and Stadler et al. (2017), and a good older overview is given by Darlington (1968). As noted by Johnson and LeBreton (2004), there is no unique solution to the problem of evaluating relative importance, so identifying good measures must be based on the logic behind their development, their properties and shortcomings, and the apparent sensibility of the results they yield.
In this paper we develop new measures of relative importance and compare them with well-regarded alternatives. The new measures are based on transformations that yield orthogonal variables that are closely related to the original regressors. In consequence they have much in common with the orthogonal counterparts measure proposed by Gibson (1962) and the relative weights measure of Johnson (2000). The main difference is that the new measures use the values of both the regressors and the response in determining the transformation, while the measures of Gibson (1962) and Johnson (2000) ignore the response when determining the transformation and use only the values of the regressors. Intuitively, there should be benefits in letting the response influence the transformation, as the purpose of the transformation is to help evaluate the relationship between regressors and the response.
The new measures proposed here are compared with the orthogonal counterparts measure (Gibson, 1962), and the relative weights measure (Johnson, 2000) and also with the dominance analysis measure proposed by Budescu (1993). The relative weights measure and dominance analysis are widely recommended procedures for estimating the relative importance of predictor variables (Tonidandel & LeBreton, 2010; Nathans et al., 2012). Comparison is made through examples and by examining theoretical properties.
In Section 2 we describe the measures of Gibson (1962), Johnson (2000) and Budescu (1993) and add insights into these measures. In Section 3 we describe the new measures and in Section 4 they are compared with other measures. Some of the measures have a rotation invariance property, whereby an orthogonal rotation can be applied to some variables without affecting the relative weights assigned to un-rotated variables. The rotation invariance property is described in Appendix A. Concluding comments are given in Section 5.
Three measures of relative importance
In this section we describe the orthogonal counterparts measure, the relative weights measure and the dominance analysis measure. The orthogonal counterparts measure and the relative weights measure each form the basis of new measures that we propose in Section 3.
We assume that the response, , and regressors are related through the regression equation
where and is random error and has variance . We suppose there are data, so that the model can be written in matrix form as:
where is an vector of 1’s, is an vector of responses, is an matrix of known values of , is a vector of regression coefficients (whose values are unknown) and is an vector of independent random errors. The coefficient is irrelevant for regressors’ relative importance, so, to simplify notation, throughout this article we assume that and have been centered to have sample means of 0. Then the least squares estimate of is and var.
Orthogonal counterparts (OC) measure
Gibson (1962) and Johnson (1966) suggested a method for obtaining a set of orthonormal predictors that are closely related on a one-to-one basis with the original set of predictors. The new predictors can be considered as ‘orthogonal counterparts’ to the original regressors. To approximate the relative importance of the original predictors, the response variable is regressed on the new orthonormal variables. The proportion of the predictable variance in the response that is accounted for by each orthogonal counterpart can be taken as the importance measure of the original regressors. Details are as follows.
Suppose a set of orthonormal vectors must be chosen to
and let is the symmetric square-root of . (That is, is a symmetric matrix whose diagonal elements are positive and .) Then, putting , it can be shown (see, for example, Garthwaite et al., 2012) that
Each column of has a mean of 0 since each column of has a mean of 0.
Gibson (1962) and Johnson (1966) assume that the have been standardised to each have the same sample variance, when the maximisation in Eq. (3) is equivalent to
where ‘cor’ denotes sample correlation, and so it is also equivalent to
where is the residual when is regressed on a single predictor with sample values . Based on Eq. (5), Gibson (1962) describes as “the set of orthogonal factors …having the highest degree of one-to-one correspondence with the correlated predictors.” Based on Eq. (6), are the best-fitting orthogonal representation of (Johnson, 2000) and are termed the ‘orthogonal counterparts’ of .
In the remainder of this paper we will assume that and each variable have been standardised to have unit length. That is, for . Let denote the vector of regression coefficients from regressing on , so . Then is called the beta weight of () and the squared beta weight, , is the variation in that is explained by . Hence the squared beta weights are a natural measure of the relative importance of the variables. Each variable is paired with an variable, and the Orthogonal Components (OC) measure takes these squared beta weights as a measure of the importance of the variables, defining the relative importance of as . The sum of these importance weights equals the variation in that is explained by a multiple linear regression with as the independent variables (or, equivalently, with as the independent variables).
Johnson (2000) argues that the OC measure can assign relative weights that are inappropriate when the original variables are highly correlated, and gives examples where some variables are assigned weights that seem too low. However, the OC measure appears to give sensible weights to the variables when the correlations between variables are not high. Also, recent work by Garthwaite and Koch (2016) implies that the OC measure has an attractive ‘rotation invariance’ property. When some variables have strong collinearities, they can be transformed into non-collinear variables via orthogonal rotation of coordinate axes. Only axes corresponding to variables involved in the collinearities need to be rotated, and Garthwaite and Koch (2016) show that the rotation has no effect on the Z-variables that correspond to un-rotated axes. The predictable variation in is also unaffected by the rotation, so the OC measure has the property that the relative importance is unchanged for those variables associated with un-rotated axes. (Further detail is given in Appendix A.) This has the following implications for the OC measure.
Sometimes collinear variables can be transformed into meaningful variables that are not collinear through a rotation of the axes associated with them. This can lead to relative weights that are a transparently reasonable representation of the importance of the different variates. Moreover, the relative weights are unchanged for those variables that are not involved in the rotation.
Since axes could be rotated to remove collinearities without affecting the relative weights of the other variables, multicollinearities do not affect the relative weights that the OC measure gives to variables not involved in the collinearities.
Garthwaite et al. (2012) suggest a criterion for choosing that is similar, but not identical, to the criterion in Eq. (3): Choose to
under the constraint that they are orthonormal vectors and for . They refer to the transformation from to the resulting as the cos-square transformation. It has an attractive duplicate invariance property. Suppose the set of vectors is increased by adding the set of vectors where each of the vectors is identical to . With the cos-square transformation, this duplication of has no effect on the transformed values of (i.e are unchanged).
Thus, for example, if and are measurements of, say, a patients blood pressure before and after a meal, then the two variables will be very highly correlated. The duplicate invariance property means that whether one or both blood pressure measurements are included in the regression model has little impact on those variables that are paired with the other variables. As the orthogonal variables that maximize Eq. (3) will generally be very similar to those that maximize Eq. (7), we might expect the OC measure to usually be fairly insensitive to variable duplication.
Relative Weights (RW) measure
The Relative Weights (RW) measure of Johnson (2000) is based on the same variables that are calculated for the OC measure. That is, are the orthonormal variables that maximize and, as they are orthogonal, the relative importance of in predicting is clearly . However, while the OC measure simply takes as the relative importance of , the measure of Johnson (2000) takes into account all the correlations between the and variables. From the criterion that determines the variables, the correlation between and should be high, but this correlation could still be well below 1 if the variables display collinearities or high intercorrelations. Also, might not be the only variable that has a marked correlation with .
Let denote the correlation between and . The transformation from to has the unexpected property that for all (see, for example, Johnson, 1966). This leads to the useful consequence that . The RW measure divides the relative importance of amongst the variables according to the square of their correlations with , so the relative importance weight that derives from is . (This indeed partitions the relative importance of , as .) The full relative importance weight of is obtained by summing the relative importance weights that it derives from all the variables. Thus, under the RW measure, the relative importance of is given by
Relationships between the , and variables for three regressors when is regressed on the variables and each variable is regressed on the variables.
The in Eq. (8) may be regarded as the squares of regression coefficients rather than the squares of correlation coefficients, as
when is regressed on . When Johnson (2000) proposed the RW measure he used the regression model in Eq. (9) to motivate its construction. However, we prefer to view the as squared correlations because correlation is a symmetric relationship while the regression in Eq. (9) is a one-directional relationship and shows how the variables determine . When viewed as a regression, the relationships between the , and variables is illustrated in Fig. 1 and shows no direct link between the and variables. When the are viewed as squared correlations, the links between the and variables are two-directional associations, thus giving links from the variables to .
Applications in which the RW measure has been used are reported in Johnson and LeBreton (2004) and Krasikova et al. (2011). Part of the attraction of the RW measure is that it typically gives similar results to the dominance analysis measure of Budescu (1993), even though Budescu’s measure and the RW measure are calculated in very different ways. As Johnson (2000, p.15) suggests, “it is encouraging that two measures that have very different definitions and calculations produce very similar solutions”, and Johnson and LeBreton (2004, p.251) argue that the closeness of results indicates that the two measures are measuring the same construct. We next describe the dominance analysis measure.
Dominance analysis (DA) measure
The Dominance Analysis (DA) measure evaluates the importance of a regressor by considering the increase in that results from adding to regression submodels, where is the proportion of the variation in that is explained by the regression. For correlated regressors, the increase in will generally depend upon which regressors are in the submodel before is added. The DA measure considers all the different submodels that could be formed from every possible subset of variables that excludes . It defines the weight (relative importance) of as the average increase in from adding to each of these submodels.
The DA measure was proposed by Budescu (1993) and is sometimes referred to as the ‘general dominance measure’. It is equivalent to a measure developed by Lindeman et al. (1980). The measure is well-regarded. For example, Johnson (2000, p.4) writes that “The average increase in associated with the presence of a variable across all possible models is a meaningful measure that fits the definition of relative weight”. At the same time, it could be argued that the importance of a regressor in a particular regression model should not be determined by its importance in smaller submodels, but by its importance in the full model. As noted earlier, the DA measure and the RW measure are generally very similar in the weights they assign to variables, although an example in Section 4 illustrates that this is not always the case.
The most commonly stated criticism of the DA measure is that it is computationally demanding. This is because there are regression models that should be fitted in order to evaluate the relative importance of each variable. When Lindeman et al. proposed their relative importance measure in 1980, fitting models with all combinations of variables was only practical when the number of variables was fewer than 5 or 6. Since then, advances in computer power has substantially increased that number, so currently it takes only 0.28 seconds to fit all submodels of 12 regressors using software developed by Grömping (2006). However, it was not possible to use Gromping’s software with models containing 25 regressors and with 20 regressors the general dominance measure could not be calculated. Moreover, the number of variables that are included in regression models has increased substantially, especially with interest in ‘big data’. One possibility is to examine a sample of submodels rather than examining all possible submodels. This approach has proved effective when using Shapley value regression to measure relative importance (Conklin et al., 2004), another measure where, in principle, all possible submodels should be examined. Simulations we have conducted suggest that the DA measure can be well-approximated by examining 500 random sequences for entering variables into the regression model.
New measures of relative importance
Three new measures are proposed here. All are based on transformations that yield orthogonal variables – the first and third are very similar to the OC measure of Gibson (1962) and Johnson (1966); the second is very similar to the RW measure of Johnson (2000). The main difference is that the new measures use transformations that are determined by cross-products of the and variables, rather than ignoring in choosing the transformation. The third new measure uses weights to alter the balance of the different cross-products when forming orthogonal variables.
The estimated regression coefficient for regressing on is
and the regression sum of squares (RegSS) is
Let and let be an diagonal matrix with diagonal elements . The RegSS can also be rewritten as:
where is a vector of ones.
Both the OC and RW measures construct orthogonal vectors that corresponds closely to the original predictors on a one-to-one basis. The way the are chosen ignores the values of , even though the reason for constructing is to partition the RegSS. With our new measures, a set of orthogonal vectors is chosen so that is closely related to . Suppose is regressed on and that is the th component of . Then ’s contribution to the RegSS from the th sample is . Our new measures take as a first estimate of the contribution of to the RegSS from the th sample. (The OC and RW measures equivalently take as a first estimate of ’s contribution to the RegSS from the th sample, where is the th component of .) Hence, as is the th component of , it is appropriate to focus on in the criterion for choosing . It is for this reason that we want and to be closely related.
We also want to be a linear transformation of , so that regression models with as explanatory variables and with as explanatory variables give identical predictions, residuals and regression sums of squares. Hence, analogous to Eq. (3), we choose so that is maximized subject to the constraints that and for some non-singular matrix . The following result is proved in Appendix B.
Theorem 1. Under the constraints that and , the value of that maximizes is
where
and
We should note that the variables are standardised but typically varies with . Hence the variables are given equal importance in maximising (as with the OC and RW measures) but here, in maximising , is given greater importance when is large than when it is small. This has the benefit that those variables that are most highly correlated with are given greater weight when choosing the . (We could scale the variables so that is the same for each , but that would lose this benefit.)
First new measure (NM1)
In the same way that the OC measure views as the counterpart of (), our first New Measure (NM1) views as the counterpart of (). The RegSS when is regressed on is . As are a set of orthonormal vectors, is the RegSS both when is regressed on and when is regressed on . NM1 defines the relative importance of as
Like the OC measure, NM1 has a rotation invariance property. Specifically, if an orthogonal rotation is applied to some of the variables, the relative importance of the other variables is unchanged if relative importance is measured using NM1. This result is proved in Appendix A, where further detail of rotation invariance is given. As with the OC measure, it means that collinearities do not affect the relative importances that NM1 gives to variables not involved in the collinearities.
Second new measure (NM2)
While NM1 allocates all the RegSS of to , our second new method, NM2, divides the RegSS of between the variables according to their association with . As , from Eq. (13) we have that . (This is an attractive representation of because is a set of orthonormal vectors and is an orthogonal matrix.) Thus,
As noted in Section 3, correspond closely to on a one-to-one basis, so should generally be highly correlated with . Also, as and are orthogonal for , typically will not be closely associated with for .
NM2 divides the RegSS of between to reflect the squares of the sample correlations between and (). Let denote the sample correlation between and . It is readily shown that
The proportion of ’s RegSS that NM2 attributes to is , so NM2 defines the relative importance of as:
If has low correlations with the other variables, the NM1 and NM2 measures will give similar relative importance to . However the relative importances that they assign to can differ markedly if is highly correlated with some of the variables. This will be seen in Section 4.
Third new measure (NM3)
If an variable has a small regression coefficient in the multiple regression of on all the variables, then dropping that variable from the regression model can be attractive. With the NM1 and NM2 measures (and also the OC and RW measures), the orthogonal counterparts of all variables can change markedly if any variables are discarded, which might be undesirable in some situations. Our third new measure, NM3, takes account of the size of regression coefficients when forming orthogonal counterparts, so that the inclusion or exclusion of variables with small regression coefficients has little effect on the orthogonal counterparts of other variables.
As in Eq. (10), let denote the estimated regression coefficient for regressing on and put . While NM1 and NM2 choose to maximize , with NM3 we choose to maximize . Thus, with NM3, the importance of the correlation between and depends upon the size of .
If we let , then , and the maximization problem is analogous to the maximisation problem addressed in Theorem 1. Put . Then replacing with and with in Eqs (13)–(15) yields the value of that maximizes .
NM3 views as the counterpart of and evaluates the relative importance of as the value of when is regressed on . Thus,
The NM3 and the DA measures are the only measures we examine that explicitly use the multiple regression of on . Using this regression model seems sensible, since the purpose of the measures is to evaluate the contribution of each variable to this regression.
Examples
In this section we apply the measures of relative importance to several datasets. In Section 4.1 we examine straightforward application of the measures, using three datasets that have clear structures. In Section 4.2 we examine how relative importance changes under orthogonal rotation of some variables and under variable selection.
Fixed models
Each dataset consists of 1000 data drawn from a multivariate normal distribution with a mean vector of zeros and variance-covariance matrix , where varies with the dataset. The first component of a datum is the response, , and the other components are the explanatory variables, . We first describe each dataset by giving the sample correlation matrix , the multiple regression model that relates to the explanatory variables, the value of for that regression, and the regression coefficients for univariate regressions when is regressed separately on one -variable at a time. We also note salient features of the dataset. After this brief description of the datasets, we tabulate the relative importance that the different measure allocate to each variable. The results are then discussed.
Example 1. In this first dataset, correlates highly with and its correlations with and are much lower. Also, has much the biggest regression coefficient in a multiple regression of on , and . There is marked correlation between the variables.
The sample correlation matrix is
The fitted standardized multiple regression model is:
and the univariate regression models are
Example 2. There are just two explanatory variables in this dataset. The variable is highly correlated with but uncorrelated with . Together, and give a multiple regression equation that predicts perfectly.
The sample correlation matrix is
The fitted standardized multiple regression model is:
and the univariate regression models are
Example 3. In this example, is almost a perfect linear function of the last five variables () while is more highly correlated with than the other variables. Also, the largest correlations between the variables are the correlations involving .
The sample correlation matrix is
The fitted standardized multiple regression model is:
and the univariate regression models are
Results from the three examples
The relative importance given to each variable by the six different measures are given for each example in Table 1. Advocates of the RW measure argue that one of its strengths is that it generally gives similar results to the DA measure. Table 1 shows that this was also the case for our examples, but the table shows that the NM2 measure also gave similar results to DA. Indeed, for Examples 1 and 3 the relative importances assigned by DA are a little closer to those of NM2 than to those of RW. The results of the other measures (OC, NM1 and NM3) are often fairly similar to each other, especially those of OC and NM3, as in Examples 1 and 2. At the same time, NM1 is notably similar to DA in Example 3, while NM3 gives radically different results to all other measures in that example.
In each of the examples, at least one variable’s contribution to predicting was small but it was correlated with variables that were better predictors. Then the variable’s relative importance was generally higher when measured by RW, DA or NM2 than when measured by OC, NM1 or NM3. This can be seen in Example 1, where and are poor predictors, and in Example 2, where is a poor predictor. The NM1 and NM3 measures, though conceptually quite similar to the OC measure, can give evaluations that are clearly more sensible than those of the OC measure. This is illustrated in Example 2, where the OC measure evaluates the relative importance of as 100% and the relative importance of as 0%. This is inappropriate, since on its own cannot explain all the variation in , while the combination of and can explain all the variation in , clearly showing that contributes usefully to the multiple regression model. The NM1 and NM3 measures evaluate the contribution of as small, but non-zero. The larger values given to by the RW, DA and NM2 measures are perhaps a better reflection of ’s contribution, since on its own explains 20.3% of the variation in .
In Example 3, it is arguable whether is useful for predicting . On the one hand, makes little contribution to the multiple regression model while, on the other hand, it is the best univariate predictor of . NM3 gives a relative importance that is close to 0, which might be considered appropriate in view of the multiple regression model. Other measures give it a much higher relative importance; indeed, DA and NM1 evaluate it as the most important predictor which, to the writers, seems inappropriate. Example 3 also shows that the RW and DA measures are not always in close agrement: while DA evaluates as the most important variable in the regression model, RW evaluates it as the least important.
Relative importances given by the orthogonal counterparts (OC), relative weights (RW) and dominance analysis (DA) measures and by three new measures (NM1, NM2 and NM3) in Examples 1–3
OC
RW
DA
NM1
NM2
NM3
Example 1
0.856
0.642
0.665
0.764
0.669
0.864
0.008
0.115
0.101
0.061
0.105
0.000
0.002
0.108
0.099
0.040
0.091
0.001
Example 2
1.000
0.798
0.798
0.930
0.830
0.989
0.000
0.202
0.202
0.070
0.170
0.011
Example 3
0.124
0.137
0.170
0.172
0.159
0.001
0.163
0.160
0.152
0.143
0.147
0.184
0.155
0.153
0.145
0.146
0.149
0.177
0.167
0.164
0.158
0.160
0.162
0.199
0.163
0.160
0.155
0.160
0.161
0.187
0.165
0.162
0.156
0.155
0.157
0.188
Orthogonal rotation and variable selection
Two examples are examined in this section. In the first, two of the explanatory variables are highly correlated and we consider both the model with the original variables and the model that results from rotating the correlated variables. Measures of relative importance are applied to both models and their differences are examined. In the second example, one variable has a regression coefficient that does not differ significantly from 0 (at the 5% level of significance). We examine how dropping this variable from the model effects the relative importances of the other variables.
Example 4 (Orthogonal rotation). The Longley dataset (Longley, 1967) is well-used as an example of highly collinear regression. The dataset contains annual values of various US macroeconomic variables for the years 1947–1962. Here we use five of its variables: (number of thousands of people employed), (GNP price deflator), (GNP in millions of dollars), (number of thousands of unemployed people) and (number of people in the armed forces). We take as the response variable and initially take the other four variables as the explanatory variables.
The following is the sample correlation matrix for these variables:
The fitted standardized multiple regression model is:
for which .
The correlation matrix shows that there is a strong collinearity between two of the explanatory variables, and . Collinearity can radically affect the values of parameter estimates and will inflate their variances. Transforming variables to remove collinearity is consequently attractive and here we replace and by the variables
This is equivalent to multiplying the original variables by the orthogonal rotation matrix,
The new variables and are uncorrelated.
Regressing on the transformed set of variables gives the equation
Theory implies that the regression coefficients of the unrotated components ( and ) should be unchanged – comparison of Eqs (21) and (22) shows that this is indeed the case. Also, the value is again 0.986. However, with some measures of relative importance, the importances of and in the pre-rotation model (Eq. (21)) will differ from their importances in the post-rotation model (Eq. (22)). This can be seen in Table 2, where the relative importances given by our six measures of importance are presented.
Relative importances of variables before and after rotation
OC
RW
DA
NM1
NM2
NM3
Relative importances before rotation
0.400
0.390
0.390
0.527
0.361
0.088
0.526
0.417
0.411
0.371
0.393
0.888
0.023
0.099
0.104
0.046
0.139
0.005
0.036
0.079
0.081
0.042
0.092
0.004
Relative importances after rotation
0.922
0.682
0.687
0.891
0.550
0.967
0.004
0.014
0.015
0.007
0.183
0.004
0.023
0.161
0.156
0.046
0.146
0.004
0.036
0.128
0.128
0.042
0.107
0.011
In line with theory, the table shows that the relative importances given by the OC and NM1 measures to and are unchanged by the rotation of and . With the other measures, the relative importances given to and do change, though the degree of change varies with the measure. With NM3 the importance values change by a large proportion (e.g. from 0.004 to 0.011), though the changes are small in absolute terms. With the RW and DA measures the changes are quite large – noticeably larger (at least three times larger) than with the NM2 measure. Interestingly, values given by the NM2 measure are straddled by the before/after values given by the RW and DA measures, and are quite close to the averages of the before/after values given by both the RW measure and the DA measure. For example, the RW measure gives before/after values of 0.079 and 0.128 to , and their average is relatively close to the values 0.092 and 0.107 that NM2 gives to .
Example 5 (Variable selection). Wood (1973) presents data from a process variable study of a petroleum refinery unit. The dependent variable is the octane value of the petroleum produced and there are four independent variables: three relate to feed composition and the fourth relates to process conditions . Eighty-two observations were taken, giving the following sample correlation matrix:
After standardizing variables, regression of on the four independent variables gave
as the regression model. There is clear evidence that , and should be included in the regression model ( for each of these three variables) but whether should be included is debatable. The null hypothesis that the regression coefficient for is zero is rejected only at significance level 0.07. Omitting from the model gives the regression equation
The top half of Table 3 displays the real importance assigned to the different -variables by the different measures when all four -variables are included in the regression model. Surprisingly, all but one of the measures gives a higher relative importance than , even though is the variable whose inclusion in the model is tenuous. The NM3 measure is the exception. It gives a relative importance of 0.0, which concords fully with the inference that can reasonably be omitted from the regression model.
Relative importances of variables before and after omitting
OC
RW
DA
NM1
NM2
NM3
Relative importances before omitting
0.593
0.515
0.518
0.488
0.439
0.685
0.012
0.066
0.064
0.009
0.051
0.060
0.121
0.146
0.150
0.160
0.178
0.000
0.180
0.179
0.173
0.250
0.238
0.161
Relative importances after omitting
0.640
0.570
0.578
0.604
0.542
0.665
0.016
0.075
0.075
0.017
0.072
0.074
0.246
0.257
0.248
0.281
0.287
0.162
The lower half of Table 3 shows the relative importances assigned to , and after has been omitted from the model. In the whole of the table, the RW and DA measures are strikingly similar in all their evaluations. It is also the case that all measures evaluate as the most important variable and as the second most important (both before and after omitting ). In other respects though, there is limited agreement across measures. For example, NM1 and NM2 agree quite closely in their evaluations of and , but NM1 is similar to OC in its evaluation of , while NM2’s evaluations of are similar to those of RW, DA and NM3.
With most measures, the relative importance of is far greater than the difference between the values of the models in Eqs (23) and (24). Hence, with those measures the omission of must substantially increase the relative importance of at least one -variable. As has higher absolute correlation with than with or , it might be anticipated that omitting would increase the relative importance of more than that of or . This is indeed the case for the OC, RW and DA measures, but not for NM1, NM2 or NM3. It seems then, that the effects on relative importance of omitting a variable are somewhat unpredictable and can vary markedly with the choice of measure.
Conclusion
Six measures for evaluating the relative importance of predictor variables in a regression have been examined. From the examples presented in Section 4, it is clear that usually there is some consensus between them – variables given a high relative importance by one measure are usually given a high relative importance by other measures, and similarly for low relative importance. At the same time, in each example there were differences between the measures in their evaluations, and some differences were substantial.
The following correlation matrix combines results from Tables 1–3 to give an overview of the similarity between the different measures. It gives the correlation between each pair of measures for the contributions recorded in Table 1 and the top halves of Tables 2 and 3. (The lower halves of Tables 2 and 3 are ignored to avoid double-counting of data.)
The correlations between methods are very high in general, and the correlation between the RW and DA methods is especially high (0.999), showing the high concordance between these two methods that has been found in previous studies (Krasikova et al., 2011; Johnson, 2000). The other striking feature of the correlations is the comparatively low correlation between NM3 and each of the other methods (never exceeding 0.94), indicating that NM3 gives a distinctive perspective on the contributions of variables.
Occasionally, common sense shows that an evaluation is unreasonable. For instance, in Example 2 the OC measure evaluated the relative importance of as 100% and that of as 0%. This is clearly inappropriate, as all the variation in could not be explained by on its own, but could be explained by the combination of and . Often though, the evaluations of the different measures all seem reasonable and how to choose between them is not clear-cut, because there are no known ‘correct’ evaluations with which to make comparison. As noted by Johnson and LeBreton (2004, p. 240), “Because there is no unique mathematical solution to the problem (of evaluating relative importances), these indices (measures) must be evaluated on the basis of the logic behind their development, the apparent sensibility of the results they provide, and whatever shortcomings can be identified.” Properties of the different measures and features of the data set should also be taken into account.
The following arguments favour different measures.
The DA and RW measures have been the most widely recommended measures in recent years, partly because they typically give similar evaluations, suggesting that there is an underlying construct that they both appraise. The examples presented here support that rationale, as they give further evidence that the two measures generally give similar results – there is only one case (variable in Example 3) where the DA and RW evaluations differ appreciably. The RW measure is simpler and easier to implement than the DA measure.
The OC and NM1 measures have the rotation invariance property so, with either measure, multicollinearities have little affect on the relative weights given to variables not involved in the collinearities.
In constructing the new measures (NM1–3), the aim was to improve upon the OC and RW measures by letting influence the transformation to orthogonality, rather than determining the transformation from just the values of the regressors. This was motivated by the observation that the transformation’s purpose is to help evaluate the relationship between and the regressors, so both should be taken into account in forming the transformation. On that basis, NM1 is to be preferred over OC, since in other respects the construction of the two measures are very similar. Similarly for NM2 and RW.
In the examples, a feature of NM3 is that it gave low relative importance to variables that might reasonably be omitted from the regression, which could be considered an attractive characteristic. In Example 5, for instance, it gave a relative importance of 0.000 while other measures gave it a relative importance of 0.121 or more. Similarly, in Example 3, predictions of are not improved by including in the regression model, but only NM3 gave a low evaluation.
Taking account of the above points, the NM3 measure is recommended when there are some independent variables whose inclusion or exclusion from the regression model is debateable. When it is clear which independent variables should feature in the regression, but there is high multicollinearity among a subset of them, then the NM1 measure is recommended because it has the rotation invariance property, though its choice in preference to the OC measure (which also has the rotation invariance property) is close. In other circumstances, one of the RW, DA and NM2 measures should be used and we recommend the RW measure – the three measures are likely to give very similar evaluations of relative importance and the RW measure is widely recommended for its simplicity and ease of use (Tonidandel & LeBreton, 2010).
The new measures presented here and ideas behind them could be adapted to give other measures of potential value. In particular, any of the OC, RW and NM2 measures could be modified to use regression coefficients as weights when forming orthogonal counterparts, in the same way that NM3 is derived from NM1. The weighting scheme could also be generalised to use the as weights (where the are the multiple regression coefficients). Setting equal to would correspond to ‘no weighting’, and increasing would increase the importance of the weighting.
Footnotes
Acknowledgments
We are grateful to a referee whose constructive comments led to clear improvements in the paper.
Appendix A: Rotation invariance property
An orthogonal rotation of axes , to axes , is illustrated in Fig. 2a and b. In Fig. 2a, the positions of 10 points are plotted and new axes and are shown. The new axes are obtained by rotating the original axes and (by in this case). Figure 2b shows the same 10 points, but drawn with and as the horizontal and vertical axes. It can be seen that rotation of axes changes the correlation between variables: Figure 2a shows that the points are highly correlated when expressed in terms of and , while Fig. 2b shows that the correlation is low when the points are expressed in terms of and . Consequently, orthogonal rotation can be used to remove or reduce collinearity between variables.
(a) Points before rotation with , as axes. (b) Points after rotation with , as axes.
We only need to rotate those variables that are involved in collinearities. For example, suppose there is just one collinearity and it involves only the first of the explanatory variables. Then axes are rotated using a rotation matrix, say, that has the following block-diagonal form:
where is an orthogonal matrix of order and is a order identity matrix.
Rotation produces new variables that are linear combinations of the original predictors. The rotation matrix should be chosen in such a way that the variables that are created have meaningful interpretation. For example, if only the first two predictors and are responsible for one collinearity then can be set as:
This rotation creates two meaningful variables, the first one is proportional to and the second one is proportional to .
In terms of the original variables, and , the ten points in Fig. 2 form the data matrix:
Pre-multiplying this data matrix by in Eq. (26) gives the points in terms of the new variables and :
The sample correlation between and is 0.951, while the sample correlation between and is 0. (The correlation between the sum and difference of two variables that have been standardised to have equal variances is always 0.)
With the majority of measures of importance, rotating some explanatory variables will change the relative importance of every variable. However, results in Garthwaite & Koch (2016) show that with the OC measure only the relative importances of variables involved in the rotation are changed – the relative importances are unchanged for those variables that are not involved in the rotation. Theorem 2 (below) shows that NM1 also has this rotation invariance property.
Lemma 1. If is a positive-definite matrix and is an orthogonal matrix of the same dimension as , then .
Proof, so . Hence, .
Lemma 2. Suppose . Under the constraints that is a linear transformation of and that , the value of that maximises is
Hence, , where is defined in Eq. (15). Thus (from Lemma 1), so , where is defined by Eq. (15). Now the proof of Theorem 1 does not require the fact that the variables have been standardised to have unit variance. Hence the result of the theorem also applies to and . It follows that so, from Eq. (29), . As , this gives , so , where is defined by Eq. (13).
If say, just the first of explanatory variables are rotated, then the rotation matrix has the block-diagonal structure in Eq. (25). Then, from Lemma 2, . When is regressed on , the contribution of to the RegSS is the same as the RegSS from a univariate regression of on , because are an orthogonal set of vectors. Under NM1, this RegSS is taken as the relative importance of in a regression of on . Similarly, when is regressed on , NM1 evaluates the relative importance of as the RegSS from a univariate regression of on . As for , NM1 has the rotation invariance property given in the following theorem.
Theorem 2. If an orthogonal rotation is applied to some of the variables, the relative importance of the other variables is unchanged if relative importance is measured using NM1.
Appendix B: Proof of Theorem 1
Preliminary lemma.
Lemma 3. If and , then where is a orthogonal matrix. The converse also holds: if and is an orthogonal matrix, then .
Proof of Lemma 3 For the first part of the lemma, let . Then and . Also . This implies that is orthogonal, as required. The converse is immediate: if and is an orthogonal matrix, then .
Proof of Theorem 1 From Lemma 3, where is an orthogonal matrix. Put , so . Also, define for . Then . Since is an orthogonal matrix, it is immediate from Theorem 1 in Garthwaite et al. (2012) that is maximised when , where . Thus Eq. (15) defines .
References
1.
BudescuD. V. (1993). Dominance analysis: A new aproach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114, 542-551.
2.
ConklinM.PowagaK. & LipovetskyS. (2004). Customer satisfaction analysis: Identification of key drivers. European Journal of Operational Research, 154, 819-827.
3.
CourvilleT. & ThompsonB. (2001). Use of structure coefficients in published multiple regression articles: β is not enough. Educational and Psychological Measurement, 61, 229-248.
4.
DarlingtonR. B. (1968). Multiple regression in psychological research and practice. Psychological Bulletin, 69(3), 161-182.
5.
GarthwaiteP. H.CritchleyF.Anaya-IzquierdoK. & MubwandarikwaE. (2012). Orthogonalization of vectors with minimal adjustment. Biometrika, 99(4), 787-798.
6.
GarthwaiteP. H. & KochI. (2016). Evaluating the contributions of individual variables to a quadratic form. Australian & New Zealand Journal of Statistics, 58(1), 99-119.
7.
GibsonW. (1962). Orthogonal predictors: A possible resolution of the Hoffman-Ward controversy. Psychological Reports, 11(1), 32-34.
8.
GrömpingU. (2007). Estimators of relative importance in linear regression based on variance decomposition. The American Statistician, 61, 139-147.
9.
GrömpingU. (2006). Relative importance for linear regression in R: the package Relaimpo. Journal of Statistical Software, 17(1), 1-27.
10.
HealyM. (1990). Measuring importance. Statistics in Medicine, 9(6), 633-637.
11.
JohnsonJ. W. (2000). A heuristic method for estimating the relative weight of predictor variables in multiple regression. Multivariate Behavioral Research, 35(1), 1-19.
12.
JohnsonJ. W. & LeBretonJ. M. (2004). History and use of relative importance indices in organizational research. Organizational Research Methods, 7, 238-257.
13.
JohnsonR. M. (1966). The minimal transformation to orthonormality. Psychometrika, 31(1), 61-66.
14.
KrasikovaD.LebretonJ. M. & TonidandelS. (2011). Estimating the relative importance of variables in multiple regression models. in: International Review of Industrial and Organizational Psychology. Indianopolis: Wiley.
15.
LiakhovitskiD.BryukhovY. & ConklinM. (2010). Relative importance of predictors: Comparison of Random Forest with Johnson’s relative weights. Model Assisted Statistics and Applications, 5, 235-249.
16.
LindemanR. H.MerendaP. F. & GoldR. Z. (1980). Introduction to Bivariate and Multivariate Analysis. Scott, Foresman.
17.
LipovetskyS. & ConklinM. (2001). Analysis of regression in game theory approach. Applied Stochastic Models in Business and Industry, 17, 319-330.
18.
LipovetskyS. & ConklinM. (2010). Meaningful regression analysis in adjusted coefficients Shapley value model. Model Assisted Statistics and Applications, 5, 251-264.
19.
LipovetskyS. & ConklinW. M. (2015). Predictor relative importance and matching regression parameters. Journal of Applied Statistics, 42, 1017-1031.
20.
LongleyJ. W. (1967). An appraisal of least squares programs for the electronic computer from the point of view of the user. Journal of the American Statistical Association, 62(319), 819-841.
21.
NathansL. L.OswaldF. L. & NimonK. (2012). Interpreting multiple linear regression: A guidebook of variable importance. Practical Assessment, Research and Evaluation, 17(9), 1-19.
22.
PrattJ. W. (1987). Dividing the indivisible: Using simple symmetry to partition variance explained. in: Proceedings of the Second International Conference in Statistics, Tampere, Finland: University of Tampere, 245-260.
23.
StadlerM., Cooper-ThomasH. D. & GreiffS. (2017). A primer on relative importance analysis: Illustrations of its utility for psychological research. Psychological Test and Assessment Modeling, 59, 381-403.
24.
TonidandelS. & Le BretonJ. M. (2010). Determining the relative importance of predictors in logistic regression: An extension of relative weight analysis. Organizational Research Mehods, 13, 767-781.
25.
WoodF. S. (1973). The use of individual effects and residuals in fitting eqations to data. Technometrics, 15, 677-695.