Logistic and multinomial-logit models: A brief review on their modifications and extensions

Abstract

The work presents various techniques of the logistic and multinomial-logit modeling with their modifications. These methods are useful for regression modeling with a binary or categorical outcome, structuring in regression and clustering, singular value decomposition and principal component analysis with positive loadings, and numerous other applications. Particularly, these models are employed in the discrete choice modeling and the best-worst scaling known in applied psychology and socio-economics studies.

Keywords

Logistic and MNL regression cluster analysis SVD PCA DCM BWS

1. Introduction

Logistic and multinomial-logit regressions are widely used modeling techniques applied across numerous fields. The two previous reviews listed the works on the linear multiple regression and its modifications used for the continuous dependent variable (Lipovetsky, 2021a, b), and the current review describes methods developed for solving various special problems with the binary and categorical dependent variables. They include techniques for the logistic and multinomial-logit (MNL) modeling with their modifications useful for regression modeling, structuring in regression and clustering, singular value decomposition (SVD) and principal component analysis (PCA) with positive loadings, and numerous other applications. Particularly, these models are widely employed in the item response theory (IRT), discrete choice modeling (DCM), and the best-worst scaling (BWS), well-known in applied psychology and socio-economics studies (Linden, 2019; Mair, 2018). All the described approaches have been developed and tried in real research projects in economics, management, marketing research, and they can be applied in other fields as well.

2. Logistic regression models enhanced

A generalization of the regular logistic model to a larger family of flexible functions with richer structure produced by the Box-Cox power transformation was suggested in (Lipovetsky & Conklin, 2000), where it was presented via a hyperbolic arcsine function having a better predictive ability. With the Box-Cox parameter reaching zero, the generalized function reduces to the regular logistic model, with other values it yields some algebraic forms. Applying the entropy criterion in logistic regression was described in (Lipovetsky, 2006a), where it was shown that this approach yields a logistic model with coefficients proportional to the coefficients of linear regression. Based on this property, the Shapley value estimation of predictors’ importance (Lipovetsky, 2021a) was employed for obtaining robust parameters adjusted to the logistic model with interpretable coefficients robust to multicollinearity.

Double logistic in regression modeling was presented in (Lipovetsky, 2010a), where instead of the regular sigmoid curve the double sigmoid behavior was described. It consists of the first increase to an early saturation at an intermediate level and the second sigmoid with the eventual plateau of saturation. Such functions have been used, for example, in biometrics, physiology, activation processes in tri-state neural networks. A trinomial response model given via one logit regression was considered in (Lipovetsky, 2015a) where it is shown that a response variable of three ordinal categorical levels of negative-neutral-positive kind can be obtained in one logit regression, with the positive category predictions located closer to 1, negative – closer to 0, and neutral are in the middle of such a continuous 0-1 scale. Examples of such ordered categorical output can be seen in large-medium-small size of soda that people buy in relation to other meals and demographics; gold – silver – bronze medaling in Olympic sport, with relevant predictors of training hours, diet, age, and popularity of this kind of activity in athletes’ home country.

Analytical closed-form solution for binary logit regression by categorical predictors was described in (Lipovetsky, 2015b). In contrast to the common belief that the logit model has no analytical presentation, it is possible to find such a solution in the case of categorical predictors. No special software and no iterative procedures of nonlinear estimation are needed to obtain a model with its coefficients, their standard errors and $t$ -statistics, as well as the residual and null deviances. The explicit expressions in the analytical formulae can be easily used for arithmetical calculation of the logit regression parameters. The work (Lipovetsky, 2010b) considers implicit model built via logistic regression for description of the so-called supercritical pitchfork bifurcation known in chaotic systems. Bifurcation is a sudden change in the behavior of a function due to a small variation in the parameters of a system, when the number of solutions and their structure can change abruptly if a parameter passes some critical threshold. This approach can be used for various observed phenomena as a good approximation for messy data characterized by a wide range of the response variable values at each point of the predictors’ values, for instance, in advertising research and market mix modeling.

3. MNL in relations to logistic and linear regressions with special properties

MNL, or multinomial-logit regression, is a widely used tool for categorical dependent variable problems, particularly for DCM, or discrete choice modeling. In the work (Lipovetsky, 2011) it was shown that by a special rearrangement of data, the different kinds of MNL, such as conditional and multinomial-logits can be represented via the binary logit regressions which are much easier to build and to interpret. Application of this approach to finding DCM utility and probability of choices via empirical Bayes estimation was performed on the example from a real marketing research project in the work (Lipovetsky, 2014).

MNL structuring can be applied for building multiple linear regressions with improved and special features. In the work (Lipovetsky, 2008a), to get a better fit for the values of the dependent variable it was segmented to a few ranges and built as a linear aggregate of the chain regressions weighted by the MNL shares. Several linear-MNL hybrid models were constructed by the maximum likelihood objectives for the multinomial output and least squares (LS) for the segmented linear aggregates. These hybrid models always outperform ordinary linear regressions, demonstrating a better quality of fit and more precise prediction results. Another work (Lipovetsky, 2009a) considers multiple linear regression generalized by its coefficients varying by each observation. Such individual coefficients are defined via MNL shares of the predictors’ importance. This approach corresponds to a special MNL parameterization in generalized additive and in projection pursuit modeling, is related to the random-coefficients regression. Linear regressions with special coefficients built in parameterization via exponential, logistic, and MNL functions are described in the work (Lipovetsky, 2009b). To obtain always positive coefficients the exponential parameterization is applied, and to get coefficients in the assigned range the logistic parameterization is used. The total of coefficients obtained by the MNL parameterization equals one, so they define the shares of predictors which is useful for interpretation of their importance. All these regression models are constructed by nonlinear optimization techniques, have stable solutions and good quality of fit, have simple structure of the linear aggregates, demonstrate high predictive ability, and suggest a convenient way to identify the main predictors.

4. Logit and MNL in SVD, PCA, and clustering methods with special features

Logit and MNL functions occurred to be very useful in constructing other multivariate techniques with special properties. For example, SVD, or Singular Value Decomposition is widely used in data processing, reduction, and visualization. However, a positive matrix approximated by the first several dual vectors of the regular SVD can yield irrelevant negative elements. In the work (Lipovetsky & Conklin, 2005) it was shown that the logistic SVD modification can be applied, producing the matrix approximation in a desired range of values at any step of approximation. In another paper (Lipovetsky, 2009c), the exponent, logistic, and MNL parameterization was applied for the eigenvectors’ elements of SVD and PCA, or principal component analysis with the nonnegative loadings. In contrast to the regular PCA and SVD, a matrix decomposition by the positive vectors shows explicitly which variables and with which precision contribute to the data approximation. The LS objective of matrix fit is reduced to the Rayleigh quotient for variational description of the eigenvalues, the eigenvectors with the nonlinear parameterization are found in the Newton-Raphson optimizing procedure, and the results get interpretation by the Perron-Frobenius theory for each subset of variables identified by sparse loading vectors.

Application of MNL structuring for clustering problems is studied in (Lipovetsky, 2012). The maximum likelihood objectives for estimating probability of each multivariate observation’s assignment to one particular cluster or to at least one or more clusters are considered, and combination of both objectives yields maximization by the total odds of probability to belong to one or another cluster. The gradient of the total odds objective is reduced to the MNL probabilities leading to a convenient clustering procedure presented via an iteratively re-weighted least squares (IRLS) technique. Several other objectives for clustering are also described. Another work (Lipovetsky, 2013) considers how to find clusters’ centers and sizes in the nonlinear LS optimization with multinomial parameterization. The method is especially useful for large data sets as it operates on the summary statistics only. This approach also works for the problem of finding clusters’ centers and sizes by the variance-covariance matrix when the original data is not available. Estimation of the clusters centers and sizes can be followed by actual clustering, and the applications are discussed.

5. Logit and MNL applications to BWS and other choice models

The logistic and MNL techniques plays an important role in special techniques on choice modeling widely used in psychologic and socio-economic research and applications. One of the most popular approach to problems of choice and prioritization is the Best-Worst Scaling, or BWS (Louviere et al., 2000, 2015; Marley et al., 2008, 2016), sometimes also called the Maximum Difference, or MaxDiff method. In the works (Lipovetsky & Conklin, 2014, 2015, 2019), employing the logistic and MNL regression properties, the BWS solutions were obtained in the analytical closed-form, with and without hierarchical Bayesian modeling, with adjustment to non-available items and network effects, respectively. The works (Lipovetsky, 2018a, b, 2019) present new approaches of quantum probability amplitude and complex utility in entangled discrete choice modeling, choice probability estimations on the aggregate and individual respondent level, and BWS prioritization method based on D. Kahneman’s System 1 approach to the process of making choices, respectively.

Various other approaches to the choice modeling and decision making solved with logistic and MNL techniques include, for example, van Westendrop price sensitivity meter, and Bradley-Terry choice model (Lipovetsky, 2006b, 2008b). In the large area of the multiple-criteria decision making, for example, in the Analytic Hierarchy Process (AHP) originated by T. Saaty (1980, 2005), the new extensions can be achieved with logit and MNL modeling as well (Lipovetsky, 2021c, d).

6. Conclusions

The listed techniques of modified and enhanced logistic and multinomial-logit modeling vary across specific requirements, but they all are useful for solving actual problems in different fields. The suggested approaches are convenient in application, and can enrich practical statistical modeling and decision making.

References

Linden, van der, W.J., ed. (2019). Handbook of Item Response Theory, in 3 volumes. Chapman and Hall/CRC, Boca Raton, FL.

Lipovetsky

(2006a). Entropy criterion in logistic regression and shapley value of predictors. Journal of Modern Applied Statistical Methods, 5, 121-132.

Lipovetsky

(2006b). Van westendrop price sensitivity in statistical modeling. International Journal of Operations and Quantitative Management, 12, 141-156.

Lipovetsky

(2008a). Multinomial structuring in linear regression. Model Assisted Statistics and Applications, 3, 241-247.

Lipovetsky

(2008b). Bradley-terry choice probability in maximum likelihood and eigenproblem solutions. International Journal of Information Technology & Decision Making, 7(3), 395-405.

Lipovetsky

(2009a). Regression with individual coefficients defined via multinomial shares of predictors. International Journal of Operations and Quantitative Management, 15, 101-116.

Lipovetsky

(2009b). Linear regression with special coefficient features attained via parameterization in exponential, logistic, and multinomial-logit forms. Mathematical and Computer Modelling, 49, 1427-1435.

Lipovetsky

(2009c). PCA and SVD with nonnegative loadings. Pattern Recognition, 42, 68-76.

Lipovetsky

(2010a). Double logistic curve in regression modeling. Journal of Applied Statistics, 37(11), 1785-1793.

10.

Lipovetsky

(2010b). Supercritical pitchfork bifurcation in implicit regression modeling. International Journal of Artificial Life Research, 1(4), 1-9.

11.

Lipovetsky

(2011). Conditional and multinomial logits as binary logit regressions. Advances in Adaptive Data Analysis, 3, 309-324.

12.

Lipovetsky

(2012). Total odds and other objectives for clustering via multinomial-logit model. Advances in Adaptive Data Analysis, 4(3), 1-12.

13.

Lipovetsky

(2013). Finding cluster centers and sizes via multinomial parameterization. Applied Mathematics and Computation, 221, 571-580.

14.

Lipovetsky

(2014). Discrete choice models for utility and probability in empirical bayes estimation. Advances in Adaptive Data Analysis, 6(2-3), 1-14.

15.

Lipovetsky

(2015a). Trinomial response modeling in one logit regression. Annals of Data Science, 2(2), 157-163.

16.

Lipovetsky

(2015b). Analytical closed-form solution for binary logit regression by categorical predictors. Journal of Applied Statistics, 42(1), 37-49.

17.

Lipovetsky

(2018a). Quantum paradigm of probability amplitude and complex utility in entangled discrete choice modeling. Journal of Choice Modelling, 27, 62-73.

18.

Lipovetsky

(2018b). MaxDiff choice probability estimations on aggregate and individual level. International Journal of Business Analytics, 5(1), 55-69.

19.

Lipovetsky

(2019). Express analysis for prioritization: Best-worst scaling alteration to system 1. Journal of Management Analytics, 7(1), 12-27.

20.

Lipovetsky

(2021a). Game theory in regression modeling: A brief review on shapley value regression. Model Assisted Statistics and Applications, 16(2), 165-168.

21.

Lipovetsky

(2021b). Modified Ridge and Other Criteria: A Brief Review on Meaningful Regression Models. Model Assisted Statistics and Applications, 16, 3, forthcoming.

22.

Lipovetsky

(2021c). Predictor analysis in group decision making. Stats, 4, 108-121,

23.

Stats

|

Free Full-Text

|

Predictor Analysis in Group Decision Making (mdpi.com).

24.

Lipovetsky

(2021d). AHP in Nonlinear Scaling: from Two-Envelope Problem to Modeling by Predictors. Production, 31. doi: 10.1590/0103-6513.20210007.

25.

Lipovetsky

, & Conklin

(2000). Box-cox generalization of logistic and algebraic binary response models. International J. of Operations and Quantitative Management, 6, 276-285.

26.

Lipovetsky

, & Conklin

(2005). Singular value decomposition in additive, multiplicative, and logistic forms. Pattern Recognition, 38, 1099-1110.

27.

Lipovetsky

, & Conklin

(2014). Best-worst scaling in analytical closed-form solution. Journal of Choice Modelling, 10, 60-68.

28.

Lipovetsky

, & Conklin

(2015). MaxDiff priority estimations with and without HB-MNL. Advances in Adaptive Data Analysis, 7(1-2), 1-10.

29.

Lipovetsky

, & Conklin

(2019). Choice models adjusted to non-available items and network effects. International Journal of Business Analytics, 6(1), 1-19.

30.

Louviere

J.J.

Hensher

D.A.

, & Swait

(2000). Stated Choice Methods: Analysis and Applications. Cambridge, Cambridge University Press.

31.

Louviere

J.J.

Flynn

T.N.

, & Marley

A.A.J.

(2015). Best-Worst Scaling: Theory, Methods and Applications. Cambridge, UK: Cambridge University Press.

32.

Mair

(2018). Modern Psychometrics with R, Springer, Cham, Switzerland.

33.

Marley

A.A.J.

Flynn

T.N.

, & Louviere

J.J.

(2008). Probabilistic models of set-dependent and attribute-level best-worst choice. Journal of Mathematical Psychology, 52, 281-296.

34.

Marley

A.A.J.

Islam

, & Hawkins

G.E.

(2016). A formal and empirical comparison of two score measures for best-worst scaling. Journal of Choice Modelling, 21, 15-24.

35.

Saaty

T.L.

(1980). The Analytic Hierarchy Process. McGraw-Hill, New York.

36.

Saaty

T.L.

(2005). Theory and Applications of the Analytic Network Process: Decision Making with Benefits, Opportunities, Costs, and Risks. RWS Publications, Pittsburgh, PA.