Ridge Regression and multicollinearity: An in-depth review

Abstract

Multicollinearity is a phenomenon in which two or more identified predictor variables in a multiple regression model are co-dependent or highly correlated. The presence of this phenomenon can have a negative impact on the analysis as a whole and can severely limit the conclusions of the research study. This paper reviews and provides examples of the different ways in which multicollinearity can affect a research project, how to detect multicollinearity, and how one can reduce its impact through Ridge Regression.

Keywords

Ridge Regression multicollinearity regulation SAS

1. Introduction

Multicollinearity is often described as the statistical phenomenon wherein there exists a perfect or exact relationship between predictor variables. From a conventional standpoint, this can occur in regression when multiple predictors are highly correlated with each other (Allison, 2012; Chatterjee et al., 2000), thus the term “multicollinearity”. However, variables do not need to be highly correlated for collinearity to exist, though this is oftentimes the case. No matter how this relationship is discovered, one of the best ways to think of collinearity is as a type of variable “co-dependence”, as this is a more accurate description of how collinear variables respond to each other.

Why is this important? When things are related, we say that they are linearly dependent. In other words, they fit well into a straight regression line that passes through many data points. In the incidence of multicollinearity, it is difficult to come up with reliable estimates of individual coefficients for the predictor variables in a model which results in incorrect conclusions about the relationship between the outcome and predictors. Therefore, in the consideration of a multiple regression model in which a series of predictor variables were chosen in order to test their impact on the outcome variable, it is essential that multicollinearity not be present (Afshartous & Preston, 2011; Draper & Smith, 2003).

1.1 A linear example

Another way to look at this issue is by considering the base multiple linear regression Eq. (1) structure (Joshi et al., 2012):

$\displaystyle y=x\beta+\varepsilon$ (1)

In this equation, $y$ is an nx1 vector of response, $x$ is an nxp matrix of predictor variables, $\beta$ is a px1 vector of unknown constants, and $\varepsilon$ is an nx1 vector of random errors with $\varepsilon i\sim$ NID (0, $\sigma^{\wedge}2$ ). In a model such as this, the presence of multicollinearity would inflate the variances of the parameter estimates, leading to a lack of statistical significance of the individual predictor variables even if the overall model itself remains significant (Afshartous & Preston, 2011; Draper & Smith, 2003; Montgomery et al., 2001; Unknown, 2018; Wicklin, 2013). Considering this, we can see how the presence of multicollinearity can end up causing serious problems when estimating and interpreting $\beta$ , even in the simplest of equations.

1.2 Different models, different circumstances

Collinearity is especially problematic when a model’s purpose is explanation rather than prediction. In the case of explanation, it is more difficult for a model containing collinear variables to achieve significance of the different parameters. In the case of prediction, if the estimates end up being statistically significant, they are still only as reliable as any other variable in the model, and if they are not significant, then the sum of the coefficients is likely to be reliable. In summary if collinearity is found in a model testing prediction, then one need only increase the sample size of the model. However, if collinearity is found in a model seeking to explain, then more intense measures are needed. The primary concern resulting from multicollinearity is that as the degree of collinearity increases, the regression model estimates of the coefficients become unstable and the standard errors for the coefficients become wildly inflated.

2. Detecting multicollinearity

We will begin by exploring the different diagnostic strategies for detecting multicollinearity in a dataset. While reviewing this section, the author would like you to think logically about the model being explored. Try identifying possible multicollinearity issues before reviewing the results of the diagnostic tests.

2.1 Introduction to the datset

The dataset used for this paper is easily accessible by anyone with access to SAS ${}^{\@setsize{\scriptsize}{8pt}{\viipt}{\@viipt}\textregistered}$ . It is a sample dataset titled “Lipid Data”. The code and analyses performed in this paper were done so in Base SAS 9.4. This dataset is from a faux study that investigated the relationships between various health factors and heart disease (Semantic Community, 2013). In order to explore this relationship, blood lipid screenings were conducted on a group of patients. In order to test the long-term effects of health risk factors, three months after the initial screening, follow-up data was collected from a second screening that included additional information such as gender, age, weight, total cholesterol, and history of heart disease. The outcome variable of interest in this analysis is the reduction of cholesterol level in milligrams of cholesterol per deciliter (mg/dL) between the initial and 3-month lipid panel ( $\mu=$ 9.77, $\sigma=$ 27.63). The continuous predictor variables of interest are the age of participant in years ( $\mu=$ 24.32, $\sigma=$ 3.27), height of participant in centimeters ( $\mu=$ 69.33, $\sigma=$ 4.16), weight in pounds measured at first screening ( $\mu=$ 158.65, $\sigma=$ 28.39), total cholesterol in mg/dL measured at first screening ( $\mu=$ 191.23, $\sigma=$ 35.67), triglycerides level in mg/dL measured at first screening ( $\mu=$ 97.26, $\sigma=$ 60.95), HDL level in mg/dL measured at first screening ( $\mu=$ 45.4, $\sigma=$ 10.09), LDL level in mg/dL measured at first screening ( $\mu=$ 144.28, $\sigma=$ 32.99), skinfold measurement in millimeters at first screening ( $\mu=$ 18.03, $\sigma=$ 8.18), systolic blood pressure in millimeters of mercury ( $\mu=$ 123.41, $\sigma=$ 6.78), diastolic blood pressure in millimeters of mercury ( $\mu=$ 77.78, $\sigma=$ 9.86), exercise level in average minutes of moderate exercise per day ( $\mu=$ 81.05, $\sigma=$ 78.65), and coffee consumption in cups per day ( $\mu=$ 1.29, $\sigma=$ 1.66).

2.2 Data cleaning and preparation

As always, all data should be cleansed and prepared before any rigorous data analyses can be conducted. This includes an examination of the quantity and occurrence of missing data, distributional assumptions, skewness, kurtosis, data entry errors, and potential outliers. Descriptive statistics, kernel density plots, normality tests, and Q-Q plots could aid in this step. Additionally, examining preliminary associations of each independent variable with the outcome variable using chi-square, t-tests, or simple bivariate regression models would be beneficial to any analyst’s understanding of the data. Given the introductory nature of this review, these steps will not be covered in detail; however, it is important to consider them in any formal exploration.

3. Multicollinearity investigation

Now we can begin to explore whether or not the chosen model is suffering the effects of multicollinearity. Given the analyses conducted above, there are a few potentially high correlations that demand some attention: (1) exercise and systolic/diastolic blood pressure, (2) coffee and systolic/diastolic blood pressure, and (3) overall cholesterol and HDL/LDL levels. These are just a few of the possible collinearity issues that can be explored within this dataset. For the purpose of brevity, we will concentrate on the third relationship: overall cholesterol and HDL/LDL levels.

Our first step is to explore the correlation matrix (Joshi et al., 2012). This can be done in SAS through use of the CORR procedure. The results are available in Table 1. Variables of particular interest have a high correlation – about 0.8 or higher – with another variable. As the correlation matrix in this table shows, there seems to be some particularly high correlations between a few of the variables. Some relationships of note would be Cholesterol/LDL (0.96) and Weight/Height (0.70). Next we will examine multicollinearity through the Variance Inflation Factor, Tolerance, and Collinearity Diagnostics (Joshi et al., 2012). This can be done by specifying the vif, tol, and collin options respectively after the model statement in SAS.

Table 1
Pearson correlation coefficients

	Age	Weight	Cholesterol		Triglycerides		HDL		LDL		Height
Age	1.0000	0.0894	0.	2628 ${}^{**}$	0.	2117 ${}^{*}$	0.	2031 ${}^{*}$	0.	2159 ${}^{*}$	$-$ 0.	0208
Weight		1.0000	$-$ 0.	0219	0.	1076	$-$ 0.	2756 ${}^{**}$	0.	0574	0.	6979 ${}^{**}$
Cholesterol			1.	0000	0.	4008 ${}^{**}$	0.	3525 ${}^{**}$	0.	9617 ${}^{**}$	$-$ 0.	0752
Triglycerides					1.	0000	$-$ 0.	2784 ${}^{**}$	0.	4890 ${}^{**}$	0.	0407
HDL							1.	0000	0.	0834	$-$ 0.	2447 ${}^{*}$
LDL									1.	0000	$-$ 0.	0078
Height											1.	0000

${}^{*}$ denotes p-values less than 5% significance; ${}^{**}$ denotes p-values less than 1% significance.

When considering variance inflation, the magic number to look out for is anything above the value of 10 (Hair et al., 1995; Kennedy, 1992; Marquardt, 1970; Neter et al., 1989). The earlier collinearity findings are supported in the Variance Inflation results, where these same variables reveal values far larger than 10. As for tolerance, we want to make sure that no values fall below 0.1 (given that tolerance is 1/VIF). In reviewing the results in Table 2, we can see several variables – namely cholesterol, triglycerides, HDL, and LDL – have values well below our 0.1 cutoff value. Next, we will look at the collinearity diagnostics for an eigensystem analysis of covariance comparison.

Table 2

Parameter estimates for initial model

Variable	DF	Parameter estimate		Standard error		t value	Pr $>$ $\|$ t $\|$	Tolerance	Variance inflation
Intercept	1	18.	3859	86.	4528	0.21	0.8328		0
Age	1	0.	6326	1.	6835	0.38	0.7093	0.5143	1.9446
Weight	1	$-$ 0.	2983	0.	2487	$-$ 1.20	0.2385	0.3751	2.6657
Cholesterol	1	$-$ 169.	2015	157.	5957	$-$ 1.07	0.2903	4.66E-7 ${}^{*}$	2144274 ${}^{*}$
Triglycerides	1	2.	6754	2.	5163	1.06	0.2950	0.0004 ${}^{*}$	2647.5733 ${}^{*}$
HDL	1	169.	1920	157.	4672	1.07	0.2900	5.56E-6 ${}^{*}$	179909 ${}^{*}$
LDL	1	169.	5252	157.	5920	1.08	0.2894	5.51E-7 ${}^{*}$	1814534 ${}^{*}$
Height	1	$-$ 0.	2643	1.	4548	$-$ 0.18	0.8569	0.4911	2.0363

${}^{*}$ denotes a significant result with either Tolerance $<$ 0.1 or VIF $>$ 10.

In review of the collinearity diagnostics outlined in Table 3, our focus is going to be on the relationship of the eigenvalue column to the condition index column. If one or more of the eigenvalues are small (close to zero) and the corresponding condition number large, then we have an indication of multicollinearity. As for our results, Table 3 shows large deviations in the final three factors, with the eigenvalue landing very close to zero (0.00695, 0.00100, and 1.02E-8) and the condition index being quite large in comparison (33.02, 86.87, and 27272).

Table 3

Collinearity investigation results

Number	Eigenvalue	Condition	Proportion of variation
		index	Intercept	Age	Weight	Cholesterol	Triglycerides	HDL	LDL	Height
1	7.5748	1.0000	3.62E-5	0.0002	0.0002	2.88E-10	1.65E-6	5.04E-9	4.86E-10	2.62E-5
2	0.3155	4.8998	0.0001	0.0002	0.0004	3.21E-11	3.35E-4	1.08E-7	2.79E-10	0.0001
3	0.0578	11.4460	0.0018	0.0018	0.0510	4.36E-8	1.14E-7	1.24E-6	6.39E-8	0.0028
4	0.0334	15.0663	0.0004	0.0123	0.0131	5.38E-8	0.0003	3.23E-6	3.19E-7	0.0002
5	0.0106	26.7943	0.0629	0.3149	0.1288	2.36E-15	1.38E-5	8.60E-8	6.73E-10	0.0261
6	0.0070	33.0168	0.0224	0.6144	0.4063	2.95E-9	0.0002	6.42E-6	2.09E-8	0.0003
7	0.0010	86.8653	0.8488	0.0243	0.2856	5.40E-9	2.31E-5	1.78E-7	2.42E-8	0.8528
8	1.01E-8	27272 ${}^{*}$	0.0636	0.0320	0.1146	1.0000	0.9991	1.0000	1.0000	0.1178

${}^{*}$ denotes a significant increase in Condition index in contrast to a significant decrease in Eigenvalue.

4. Combating multicollinearity

Now that we have uncovered the presence of multicollinearity, the next step is to combat it. The easiest way to approach this would be to drop one of the problem variables and rerun the analysis to test for further multicollinearity. If none exist, then we can continue with our exploration with this newer, trimmed model. This is a viable and preferred option for those models in which the collinear variables are not of primary interest to the analyst (e.g. covariates) and could be removed from the model without consequence to the study objective. However, we will not always be able to utilize this option (e.g. the variable is a confounder and must be retained). There are just some variables, no matter how highly correlated they are, that we need to keep in the model for the sake of being thorough. For these models, we need to explore other methods of model development.

4.1 Regularization methods

Statistical theory and machine learning have made great strides in defining regularization techniques that are designed to help generalize models with highly complex relationships (such as multicollinearity) (Ng, 2004). In its most simplistic form, regularization adds a penalty to model parameters (all except intercepts) so the model generalizes the data instead of overfitting (a side effect of multicollinearity).

There are two main types of regularization: L1 (Lasso Regression) and L2 (Ridge Regression). The key difference between these two types of regularization can be found in how they handle the penalty. Through Ridge Regression, a squared magnitude of the coefficient is added as the penalty term to the loss function. Take the following cost function Eq. (2) as an example (Rosenberg, 2015; van Wieringen, 2018):

$\displaystyle\sum\limits_{i=1}^{n}\left(Y_{i}-\sum\limits_{j=1}^{p}{X}_{ij}% \beta_{j}\right)^{2}+\lambda\sum\limits_{j=1}^{p}\beta_{j}^{2}$ (2)

Considering the above Eq. (2), if lambda ( $\lambda$ – the penalty) is zero then the equation will go back to ordinary least squares estimations, whereas a very large lambda would add too much weight to the model which will lead to under-fitting. Considering this, it is worthy to note the necessity in making sure we have reviewed exactly how lambda is chosen, as this could help avoid this issue of over-fitting.

Through Lasso Regression (Least Absolute Shrinkage and Selection Operator), the absolute value of magnitude of the coefficient is added as the penalty term to the loss function. As before, let us take the following cost function Eq. (3) into consideration (Rosenberg, 2015):

$\displaystyle\sum\limits_{i=1}^{n}\left(Y_{i}-\sum\limits_{j=1}^{p}{X}_{{ij}}% \beta_{j}\right)^{2}+\lambda\sum\limits_{j=1}^{p}|\beta_{j}|$ (3)

Considering the above Eq. (3) like before, if lambda ( $\lambda$ – the penalty) is zero then the equation will again go back to ordinary least squares estimations, whereas a very large lambda would make the coefficients approach zero, thus resulting in an under-fit model like before.

The key difference between these two techniques lies in the fact that Lasso is intended to shrink the coefficient of the less important variables to zero, thus removing some of these features altogether, which works well if feature selection is the goal of a particular model trimming technique (“Differences between L1 and L2”, 2013). However, if the correction of multicollinearity is your goal, then Lasso (L1 regulation) is not the method of choice.

Therefore, L2 regulation techniques become our method of choice. Ridge Regression is a relatively simple process that can be employed to help correct for incidents of multicollinearity where the subtraction of a variable is not an option and feature selection is not a concern. The trade-off of this type of adjustment is that it naturally results in biased estimates, so it is important to keep this in mind when presenting the results of its application.

4.2 Ridge Regression for linear models

Ridge Regression is a variant of least squares regression and is oftentimes used when multicollinearity cases are identified. The traditional ordinary least squares (OLS) regression produces unbiased estimates for the regression coefficients, however, if you introduce the confounding issue of highly correlated explanatory variables, your resulting OLS parameter estimates end up with large variance (as discussed earlier). Therefore, it could be beneficial to utilize a technique such as Ridge Regression in order to ensure a smaller variance in resulting parameter estimates (Chatterjee et al., 2000; Wicklin, 2013).

From the final Ridge Regression results (Table 4), we can derive the appropriate ridge parameter or “k” to include in the analysis. In review of Table 4, we want to focus on the ridge parameter column and the associated values under each variable column in order to derive the new parameter estimates. There are several schools of thought concerning how to choose the best value of “k”. The author recommends reading Dorugade and Kashid’s 2010 paper for more information on this matter. The current study will look for the least increase in root mean square error (RMSE) with an appropriate decrease in ridge variable inflation factors for each variable. Table 4 shows a gradual adjustment over several iterations of potential “k” values between the range of 0 (no adjustment) and 0.00014. Ultimately, it seems that the ridge parameter of 0.00012 is the best option, as there is only a slight increase in RMSE from 26.0275 to 26.4517 and significant drop in VIF for each of the target predictor variables to below our cutoff of 10.

Table 4
Ridge Regression results for cholesterol loss model

Model	Ridge	RMSE	Age VIF	Weight VIF	Cholesterol VIF		Triglycerides VIF		HDL VIF		LDL VIF		Height VIF
iteration	adjustment
1	0	26.0275	1.94	2.67	2144274.	02	2647.	57	179909.	00	1814533.	58	2.04
2	0.0002	26.4434	1.88	2.36	305.	48	2.	66	27.	61	258.	89	1.80
3	0.0004	26.4483	1.88	2.36	77.	54	2.	38	8.	49	66.	00	1.80
4	0.0006	26.4500	1.88	2.36	34.	78	2.	32	4.	90	29.	82	1.80
5	0.0008	26.4508	1.88	2.36	19.	75	2.	30	3.	64	17.	10	1.80
6	0.0010	26.4513	1.88	2.36	12.	77	2.	30	3.	05	11.	20	1.80
7 ${}^{*}$	0.0012	26.4517	1.88	2.36	8.	98	2.	29	2.	73	7.	99	1.80
8	0.0014	26.4519	1.88	2.36	6.	69	2.	29	2.	54	6.	05	1.80

${}^{*}$ denotes the first model where all predictor variable VIFs are below 10.

Table 5

Parameter estimates for adjusted model

Variable	DF	Parameter estimate	Standard error	t value	Pr $>$ $\|$ t $\|$	Variance inflation
Intercept	1	41.7689	85.0039	0.49	0.7092	0
Age	1	0.3093	1.6827	0.18	0.8843	1.8808
Weight	1	$-$ 0.2079	0.2377	$-$ 0.87	0.5425	2.3578
Cholesterol	1	$-$ 0.1920	0.3280	$-$ 0.59	0.6628	8.9800
Triglycerides	1	$-$ 0.0220	0.0752	$-$ 0.29	0.8188	2.2909
HDL	1	0.3210	0.6240	0.51	0.6975	2.7340
LDL	1	0.5200	0.3360	1.55	0.3652	7.9880
Height	1	$-$ 0.7999	1.3882	$-$ 0.58	0.6672	1.7999

The results displayed in Table 5 can then be used as the final adjusted model with the multicollinearity issue controlled. We can use these results in our interpretation of the model. For example, in our original model, were all other variables to remain the same, a one unit increase in cholesterol would result in an estimated 169.20 unit decrease in cholesterol loss with a standard error of 157.60; however, in our adjusted model, were all other variables to remain the same, a one unit increase in cholesterol would result in an estimated 0.19 unit decrease in cholesterol loss with a standard error of 0.33. This new estimation makes a lot more sense given the range and types of data we are working with, and would not have been possible had we not corrected for multicollinearity through Ridge Regression. Additionally, considering that total cholesterol is a part of the calculation of cholesterol loss, a higher total cholesterol at baseline might be expected to show a larger decrease in cholesterol over time given certain health interventions (e.g. decrease coffee consumption or increased exercise), which also matches our final model.

5. Conclusion

Multicollinearity, if left untouched, can have a detrimental impact on the generalizability and accuracy of a model. If multicollinearity exists the traditional ordinary least squares estimators are imprecisely estimated, which leads to an inaccuracy in judgment as to how each predictor variable impacts the target outcome variable. Given this information it is essential to detect and solve the issue of multicollinearity before estimating the parameters based on a fitted regression model.

Detecting multicollinearity is a fairly simple procedure that should be explored before implementation of any model utilizing predictor variables that have a chance of being related to each other. Pearson correlation coefficients can be useful in multicollinearity detection, however, these alone are not enough to detect collinearity. Additional measures of collinearity detection include VIF, tolerance, and collinearity diagnostic procedures. After discovering the existence of multicollinearity, you can correct for this through the utilization of a variety of different regularization and variable reduction techniques. One such way to control for multicollinearity is through the implementation of Ridge Regression techniques. Through the steps outlined in this paper, one should be able to not only detect any issue of multicollinearity, but also resolve it in only a few short steps.

5.1 Future directions

As is common with many studies, the implementations of Ridge Regression can not be concluded as an end all for multicollinearity issues. Unfortunately, the trade-off of this technique is that a method such as Ridge Regression naturally results in biased estimates. A more thorough review into the assumptions and specifications of Ridge Regression would be appropriate if you intend to use this model for explanatory purposes of highly complex models.

On the other hand, several researchers and data scientists have worked hard to explore the value of procedures like Elastic Nets and bootstrapping to help resolve the L1/L2 debate to multicollinearity correction (Cross Validated, 2015; Shtatland et al., 2004; Dixon, 1993; Efron, 1993; Hall, 1992; Hjorth, 1994; Shao & Tu, 1995; Stine, 1990; Unknown, 2010). There also exists substantive research into the cause and effect of multicollinearity in studies from fields across the research spectrum. For every issue that arises, there is a plethora of procedures that could be used to help control for and correct the effects that an issue such as multicollinearity can have on the integrity of a model. Given this, the author has included several references and recommended articles for your review to help further the understanding of all statisticians and programmers as to the effects of multicollinearity on research models. SAS code and calculations used for this paper are available upon request.

References

Allison,

(2012, September 10). When Can You Safely Ignore Multicollinearity? Retrieved from http://statisticalhorizons.com/multicollinearity.

Afshartous,

, & Preston,

R.A.

(2011). Key Results of Interaction Models With Centering. Journal of Statistics Education, 19(3). Retrieved from https://www.amstat.org/publications/jse/v19n3/afshartous.pdf.

Chatterjee,

Hadi,

A.S.

, & Price,

(2000). Regression Analysis by Examples. (3rd ed.), New York, NY: Wiley VCH.

Cross Validated. (2015, November 28). What is elastic net regularization, and how does it solve the drawbacks of Ridge (L2) and Lasso (L1)? Retrieved from https://stats.stackexchange.com/questions/184029/what-is-elastic-net-regularization-and-how-does-it-solve-the-drawbacks-of-ridge/184031#184031.

Differences between L1 and L2 as Loss Function and Regularization. (2013, December 18). Retrieved from http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/.

Dixon,

P.M.

(1993). The bootstrap and the jackknife: Describing the precision of ecological indices. Design and Analysis of Ecological Experiments. 290-318, New York, NY: Chapman & Hall.

Draper,

N.R.

, & Smith,

(2003). Applied regression analysis. (3

{}^{\text{rd}}

ed.), New York, NY: Wiley.

Dorugade,

A.V.

, & Kashid,

D.N.

(2010). Alternative Method for Choosing Ridge Parameter for Regression. Applied Mathematical Sciences, 4(9): 447-456.

Efron,

, & Tibshirani,

R.J.

(1993). An Introduction to the Bootstrap. New York, NY: Chapman & Hall.

10.

Hair,

J.F.

Jr. Anderson,

R.E.

Tatham,

R.L.

, & Black,

W.C.

(1995). Multivariate Data Analysis. (3rd ed). New York, NY: Macmillan.

11.

Hall,

(1992). The Bootstrap and Edgeworth Expansion. New York, NY: Springer-Verlag.

12.

Hjorth,

J.S.U.

(1994). Computer Intensive Statistical Methods. London: Chapman & Hall.

13.

Joshi,

Kulkarni,

, & Deshpande,

(2012). Multicollinearity Diagnostics in Statistical Modeling & Remedies to Deal With it Using SAS. Proceedings from PhUSE 2012. Retrieved from https://www.lexjansen.com/phuse/2012/sp/SP07.pdf.

14.

Kennedy,

(1992). A Guide to Econometrics. Oxford: Blackwell.

15.

Marquardt,

D.W.

(1970). Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics, 12, 591-256.

16.

Montgomery,

D.C.

Peck,

E.A.

, & Vining,

G.G.

(2001). Introduction to linear regression analysis. (3

{}^{\text{rd}}

ed.), New York, NY: Wiley.

17.

Neter,

Wasserman,

, & Kutner,

M.H.

(1989). Applied Linear Regression Models. Homewood, IL: Irwin.

18.

Ng,

A.Y.

(2004). Feature selection, L1 vs L2 regularization, and rotational invariance. Proceedings of the 21

{}^{\text{st}}

International Conference on Machine Learning. https://guides.library.uwa.edu.au/c.php?g=324904&p=2809543.

19.

Rosenberg,

(2015). L1 and L2 Regularization [Lecture notes]. Retrieved from https://davidrosenberg.github.io/ml2015/docs/2b.L1L2-regularization.pdf.

20.

Semantic Community. (2013, December 29). SAS Public Data Sets. Retrieved from http://semanticommunity.info/Data_Science/SAS_Public_Data_Sets#Story.

21.

Shao,

, & Tu,

(1995). The Jackknife and Bootstrap. New York, NY: Springer-Verlag.

22.

Shtatland,

E.S.

Kleinman,

, & Cain,

E.M.

(2004). A New Strategy of Model Building in PROC LOGISTIC With Automatic Variable Selection, Validation, Shrinkage and Model Averaging. Conference proceedings from SAS Users Group International Meeting 2004 (SUGI 29). Montreal, Canada. Retrieved from http://www2.sas.com/proceedings/sugi29/191-29.pdf.

23.

Stine,

(1990). An introduction to bootstrap methods: Examples and ideas. Sociological Methods & Research, 18, 243-291.

24.

Unknown. (2010, December 3). Sample 24982: Jackknife and Bootstrap Analyses. Retrieved from http://support.sas.com/kb/24/982.html#pur.

25.

Unknown. (2018). What is Multicollinearity? [Lecture notes]. Retrieved from https://onlinecourses.science.psu.edu/stat501/node/344.

26.

van Wieringen,

W.N.

(2018). Ridge Regression [Lecture notes]. Retrieved from https://arxiv.org/pdf/1509.09169.pdf.

27.

Wicklin,

(2013, march 20). Understanding Ridge Regression in SAS. Retrieved from http://blogs.sas.com/content/iml/2013/03/20/compute-ridge-regression.html.