Abstract
The work describes a series of techniques designed to obtain regression models resistant to multicollinearity and having some other features needed for meaningful results. These models include enhanced ridge-regressions with several regularization parameters, regressions by data segments and by levels of the dependent variable, latent class models, unitary response, models, orthogonal and equidistant regressions, minimization in Lp-metric, and other criteria and models. All the approaches have been practically implemented in various projects and found useful for decision making in economics, management, marketing research, and other fields requiring data modeling and analysis.
Introduction
Besides classical and modern methods of regression modeling described in recent literature (for example, Young, 2017; Demidenko, 2020; Irizarry, 2020), there are plenty of techniques developed for solving various special problems. In continuation to the previous review on co-operative game theory in regression modeling (Lipovetsky, 2021), the current inspection presents works on building models by different criteria and having different properties. This consideration includes enhanced ridge regressions with several regularization parameters helping to diminish distorting impact of multicollinear regressors on their parameters in the model, regressions by data segments and by levels of the dependent variable, latent class regressions, models for unitary constant response, orthogonal and equidistant regression models, minimization of deviations in different metrics and in generalized power Lp-metric, and many other criteria and models. The described approaches have been checked and practically implemented in various research projects in economics, management, marketing research, and they can be as well helpful in other fields requiring data modeling and analysis.
Ridge-kind regularizations in regression
The ridge-regression was originated by Hoerl and Kennard (1970, 2000) for building a model resistant to multicollinearity, and it was modified in various works among which the most popular are LASSO, Elastic nets, and Shapley value regression as well. In development of the classical ridge regression, the work (Lipovetsky & Conklin, 2005a) suggested to regularize the model not only by the minimum norm of its parameters but also by the deviations from orthogonality between the regressors and residual errors, and deviations from three other desired properties of the solution. This objective produces a generalization of the ridge regression to two-parameter model which is not prone to multicollinearity and always outperforms a regular one-parameter ridge by the better quality of fit. The further works (Lipovetsky, 2006, 2009, 2010, 2018) studied the characteristics of quality for the two-parameter ridge regression and extended it to a family of several enhanced ridge models, with even better characteristics of fit and other valuable statistical features. Application of the ridge regression to the known in marketing research problem of survey sample balancing with maximum effective base was considered in (Lipovetsky, 2007a).
Some other regularization methods have been developed in (Lipovetsky & Conklin, 2001a, 2003a, b). Comparison of several regularization techniques based on the orthonormal decomposition of the data matrix was performed in (Lipovetsky & Conklin, 2014) where it was shown that these approaches are useful in practical regression modeling especially for big data.
Other criteria and models
Representation of the ordinary least squares (OLS) solution for the multiple linear regression via weighted mean of partial slopes, regression models by data segments via discriminant analysis (DA), and latent class regressions in the iteratively reweighted least squares (IRLS) approach have been described in (Lipovetsky & Conklin, 2001b, 2005b,c). Unitary response regression models, and regression split by levels of the dependent variable have been considered in the works (Lipovetsky, 2007b, 2012).
Criteria of shortest distance from the observations to the theoretical surface, used in cases of errors by all variables, had been studied in the works on orthogonal regression in special metrics and in implicit function forms (Lipovetsky, 1975, 1976, 1979). Other criteria, such as the equidistant deviations, and optimization by the generalized power Lp-metric for deviations in regression are presented in (Lipovetsky, 2007c, d).
Determining theoretical form of the relation between variables by dimensional consideration (Lipovetsky, 1987), and based on it building of the constant elasticity substitution (CES) mixed with the generalized Box-Cox (GBC) function had been used for studies on the globally concave, monotone and flexible cost functions of electricity consumption (Tishler & Lipovetsky, 1997, 2000).
The questions of prediction have been considered in (Lipovetsky & Conklin, 2014), where particularly it was demonstrated why the predicted dependent variable and the coefficient of multiple determination in the OLS regression do not depend on the degree ill-conditioning of correlation matrix. In forecasting for a new set of the predictors correlated in any real data, the work (Lipovetsky, 2017) shows how to adjust the new values of the predictors taking into account their own structure of correlations. Comparison of the regression modeling and prediction by the individual observations versus their frequency have been studied in (Lipovetsky, 2019) where it is explained why a model built by a dataset could have a low quality of fit and poor predictions of individual observations, while using the frequencies of possible combinations of the predictors and the outcome yields the model with the same parameters but of a high quality of fit and precise predictions.
Conclusions
The listed techniques of modeling are useful for solving various specific regression problems and finding meaningful and interpretable results necessary to data scientists, managers, and decision makers in actual applications of statistical models in various fields.
