Abstract
The work considers various multivariate statistical techniques in their modifications and applications to management, information systems, economics, decision making, and marketing research problems. The methods include eigenvectors for many-way matrices, dual partial lest squares, modified factor and cluster analyses, and enhanced canonical correlation analysis. These approaches have been applied in numerous real projects and proved to be useful for data analysts, managers, and decision makers in solving practical problems.
Keywords
Introduction
The previous reviews (Lipovetsky, 2021a, b, c) described modifications of the regression modeling, including multiple linear regression, regularized regressions, logistic and multinomial regressions adjusted to different special requirements of the problems under consideration. The current work discusses methods of multivariate statistical modeling of a non-regression kind, including the eigenproblem analysis employed in the singular value decomposition (SVD) and principal component analysis (PCA), partial least squares (PLS) and canonical correlations analysis (CCA), factor analysis (FA) and cluster analysis (CA), multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA). These methods are used for data reduction, structuring the relations between variables, grouping of observations, and other purposes of data analysis and visualization of multi-space data used in applied behavioral and socio-economics studies. Various modifications have been developed, for example, the eigenproblems for multidimensional matrices (tensors), generalized SVD and PCA for 3- or 4-dimensional (3D or 4D) matrices, robust canonical correlation analysis (RCA), orthonormal canonical analysis (ORCA), CCA for three data sets and partial canonical correlation analysis (PCCA), dual partial lest squares (DPLS), analytical closed-form solution for a general factor, and mixed normal distributions for finding cluster centers. All the described approaches have been tried in real research projects in economics, management, information systems, decision making, marketing research, and they can be applied in other fields as well.
Eigenvector analysis for multi-mode matrices
Complex data can be sometimes presented in a generalized matrices of three- or more-ways, also called arrays, or tensors, or multi-mode matrices. A data in the regular 2D-matrix can correspond to respondents (in rows) evaluating some object by attributes (in columns). If there are many such objects, the 2D-matrices of them can be stacked, a layer by layer, into a “cube” of data, or 3D-matrix with entries of respondents, attributes, and objects, in three directions, respectively. If this data were gathered in several different moments of time, the panel data can be arranged in the 4D-matrix with the time as the fourth dimension. The SVD for such a matrix can be reduced to a non-linear eigenproblem of transformed data described in (Lipovetsky, 1994). More methods of estimation for eigenvalues and eigenvectors in each direction of the multi-way matrices were developed and applied to a 3D-data problem of choice in the paper (Lipovetsky & Tishler, 1994). Eigenvectors for 3D-matrices were also employed for evaluation of the information systems (IS) impact on the business success (Ahituv et al., 1999). Eigenvectors of 4D-matrices were used to find relations between economic and IS by the large American companies in time (Tishler et al., 1998).
Canonical correlation analysis and its extensions
The canonical correlations analysis (CCA) finds the vectors for linear combinations of the variables in two data sets which maximize the pair correlation between these two aggregates. The problem with the classical CCA is that it includes inversion of two matrices of correlations in the two data sets, so the multicollinearity among the variables in any of the sets can have a detrimental effect on the meaningful values of the vectors for the data aggregation. If to consider maximization of the covariance between the two aggregated variables, the problem does not contain the matrix inversion, so this is the technique of the robust canonical correlation analysis (RCA) which is not prone to multicollinearity (Tishler & Lipovetsky, 2000). This approach matches the SVD of the intercorrelations matrix between the two sets of variables, it also corresponds to one of the main PLS techniques, and to the inter-battery factor analysis. CCA for three data sets was considered in the work (Tishler & Lipovetsky, 1996) where the partial canonical correlation analysis (PCCA) was introduced and employed in finding relations between the firms’ managerial structure and their utility from IS in the context of the excluded impact of the environment and technology. Application of the eigenvector, CCA, and RCA analyses to investigations on large-scale project classification, performance, and identification of the management critical success factors had been performed in the works (Tishler et al., 1996; Lipovetsky et al., 1997; Dvir et al., 1998, 2003; Shenhar et al., 2002).
Cluster, factor, and other multivariate analyses
Mixed-normal distributions and multinomial parameterization in finding clusters centers and sizes by the variance-covariance matrix for clustering problems and estimation of probability of multivariate observations to belong to one or more clusters had been described with different applications in the works (Lipovetsky, 2012b, 2013a, c). Studying of the tractable measure of component overlap for data coming from a Gaussian mixture model, finding an intrinsic structure within data (clusterability) before actual clustering, and dimensionality reduction by projecting the data to the Fisher’s linear subspace, had been performed by covariance matrix decomposition described in the works (Nowakowska et al., 2014, 2015, 2016). Techniques of decreasing respondent heterogeneity for improving the clustering quality is proposed in (Lipovetsky & Conklin, 2018).
Factor analysis (FA) problem of which factors are useful for meaningful consideration is discussed in (Lipovetsky, 2017) in relation to application of the Likert scales, and it is shown that some factors, even with a big variance, are useless for analysis. The triads relations between the pair correlations are mentioned in (Lipovetsky, 2002), and the analytical solution for a general factor is obtained in the work (Lipovetsky & Manewitsch, 2019). Relations between such multivariate techniques as MANOVA, LDA, FA, and clustering estimation are discussed in (Lipovetsky, 2015).
Several modifications of the SVD and PCA had been developed, including decompositions for a positive matrix or matrices of other special features (Lipovetsky & Conklin, 2005; Lipovetsky, 2009, 2016). The partial least squares (PLS) related to RCA technique, and the dual PLS (or DPLS), extended to three and more data sets had been introduced and applied in different problems, particularly, for the international comparisons and data fusion (Lipovetsky, 2012a, 2013b). Multivariate least squares in comparison to CCA, RCA, and PCA, is considered in (Lipovetsky et al., 2002). Based on SVD, the orthonormal canonical correlation analysis (ORCA) and application to financial and marketing research problems are presented in (Lipovetsky, 2021).
Conclusion
The described techniques of modified multivariate methods are convenient in applications and helpful for solving various problems in different fields where statistical modeling and decision making are required.
