Abstract
Multi-view learning utilizes information from multiple representations to advance the performance of categorization. Most of the multi-view learning algorithms based on support vector machines seek the separating hyperplanes in different feature spaces, which may be unreasonable in practical application. Besides, most of them are designed to balanced data, which may lead to poor performance. In this work, a novel multi-view learning algorithm based on maximum margin of twin spheres support vector machine (MvMMTSSVM) is introduced. The proposed method follows both maximum margin principle and consensus principle. By following the maximum margin principle, it constructs two homocentric spheres and tries to maximize the margin between the two spheres for each view separately. To realize the consensus principle, the consistency constraints of two views are introduced in the constraint conditions. Therefore, it not only deals with multi-view class-imbalanced data effectively, but also has fast calculation efficiency. To verify the validity and rationlity of our MvMMTSSVM, we do the experiments on 24 binary datasets. Furthermore, we use Friedman test to verify the effectiveness of MvMMTSSVM.
Introduction
With the advent of the big data era, the types of data are becoming more and more diverse. For the same object, the data collected by different measurements or different media is called multi-view (MV) data [1]. Multiple representations or views can afford complementary information for the algorithms, thus increasing the classification performance. Multi-view learning (MVL) is a new emerging and fast developing direction in the machine learning community. The corresponding algorithms are of great interest for applications in medical diagnosis [2], face recognition [3], human emotion recognition [4], action recognition [5], document classification [6], and so on.
In the early research, there are two ways to deal with multi-view data: 1) regardless of the differences between data from different views, the features are combined together and sing-view learning method is used to train a model; 2) the classifier is trained for each view separately, and then the corresponding results are integrated and optimized by utilizing the relevance of different views [7, 8]. Each of them has pros and cons. Later, many researchers have found that constructing MVL algorithms using multi-views jointly can get better results than the aforementioned two methods [9, 10].
Most MVL methods fall into the following three types [1]: co-training style algorithms [11–13], co-regularization style algorithms [14–17] and margin-consistency style algorithms [18, 19]. In order to solve different machine learning tasks, some algorithms have been put forward, such as MV clustering [20–22], MV transfer learning [23, 24], MV feature selection [25, 26], MV graph embedding [27], MV ensemble learning [28, 29], etc. Besides, some scholars have done thorough analysis on the generalization error bounds based on the theory of machine learning, such as Rademacher complexity [30], PAC-Bayes bounds [31], and so forth.
The support vector machine (SVM) [32] tries to seek a separating hyperplane to classify the samples of two classes. It has already outperformed other machine learning methods due to its good properties. To improve its computational speed or generalization performance, different kinds of SVM-based algorithms have emerged. One of the types is the sphere-based algorithm which looks for the hypersphere rather than the hyperplane to classify. In 2005, the core vector machine (CVM) [33] was firstly proposed by introducing the core set. CVM uses the minimal enclosing ball approximation algorithm, which can massively reduce the computational time. Inspired by the thought of twin SVM [34], twin hypersphere SVM (THSVM) [35] seeks two independent balls to describe the samples. THSVM is especially suitable for the samples of two class belonging to different Gaussian distributions. Different from THSVM, the small sphere and large margin approach (SSLM) [36] seeks two homocentric spheres, and it has unique advantages of dealing with class-imbalance classification problem. However, SSLM needs to solve a large quadratic programming problem (QPP) to gain the optimal solutions, which makes it have expensive calculation cost. Therefore, a maximum margin of twin spheres SVM (MMTSSVM) [37] is raised to further improve this issue.
Till now, many SVM-based MVL algorithms have been studied. Most of them belong to co-regularization style algorithms, such as SVM-2K [14], MV-SVM-2C [38], MV privileged SVM [30, 39], MV nonparallel SVM [7], MV generalized SVM [40], etc. Nevertheless, most of the aforementioned SVM-based MVL algorithms seek the hyperplanes in different feature spaces, which may be unreasonable for real applications. In addition, when the data is unbalanced, their performance may become worse. To overcome the above defects, we propose a novel MVL algorithm based on MMTSSVM (MvMMTSSVM) in this paper, and its key characteristics are listed as follows: The proposed MvMMTSSVM can make full use of the coherence and the diversity of multiple representations by following the principle of margin maximization and the consensus principle. Observing the principle of margin maximization, the proposed MvMMTSSVM constructs two homocentric spheres for each view separately, and tries to maximize the distance of the two homocentric spheres in each view respectively. The absolute value constraints combined the two representations can make sure the implementation of the consensus principle. The optimal solutions of MvMMTSSVM are obtained just by solving a smaller-scale QPP and a linear programming problem (LPP). The proposed method not only inherits the fast training speed of MMTSSVM for data imbalance classification problem, but also deals with the MVL problem effectively and efficiently. Therefore, it is more suitable for solving practical problems.
The remainder of the paper is arranged as follows. In Section 2, we outline the basics of MMTSSVM. In Section 3, we give the modeling process, solving process, and algorithm flow of the proposed MvMMTSSVM. In Section 4, we perform the experiments to testify the performance of MvMMTSSVM. In the end, the conclusion is made in Section 5.
Related works
For the sake of simplicity, we use the following notations. Let the training set be
From the perspective of MVL, the MMTSSVM can be seen as a single-view algorithm. It has achieved good results in handling imbalanced classification problem. The primal formulations of MMTSSVM are as follows,
One can assign the label of new testing point
As MMTSSVM has several overwhelming advantages over other algorithms, we will give the detailed process of MVL algorithm based on MMTSSVM in the following.
Suppose

Illustration of MvMMTSSVM.
Under the assumption that the two views are equally important, the MvMMTSSVM aims to learn two decision functions for the two views, and its detailed explanations are as follows:
(1) The variables to be solved in problem (4) are
(2) In problem (5), we need to obtain the variables
(3) The first constraint of (4) (resp. (5)) makes the positive (negative) classifier of two views be consistent, which is also the embodiment of distance minimization of kernel canonical correlation analysis (KCCA). The existence of ε allows some points to violate the constraints appropriately. Therefore, the gap between the two views is shrunk into ε and the proposed model follows the consensus principle. The variables η i , i ∈ I1 (resp. μ j , j ∈ I2) are used to punish the positive (resp. negative) samples who fail to meet ε-similarity of the positive (resp. negative) classifiers.
(4) The MvMMTSSVM regards the positive class as majority class. In (4), minimizing variables
In order to solve the problem (4), we can construct its Lagrangian function as follows,
Differentiating the Lagrangian function L1 with respect to variables
Because
Since τ i ≥ 0, i ∈ I1, from Eq.(11), we can get
From Eqs. (12) and (13), we can derive the centers of two homocentric spheres of view A and view B, respectively, where
Taking the values of
Because the square of the radiuses of the small spheres of view A and view B are
To obtain the optimal solutions of the second optimization problem (5), we still need to derive its dual problem. By introducing the Lagrangian multipliers, the Lagrangian function of (5) can be written as:
Similarly, we can derive the dual formulation of (5) as follows:
In order to obtain
Obviously, the dual problem (19) is a QPP. Problem (28) is a LPP with respect to the variables
After acquiring the optimal solutions
Because the two representations can complement each other, we can also make the prediction by using two views collectively, where
In above all, the flowchart of proposed MvMMTSSVM is summarized as Algorithm 1.
In this section, we compare our MvMMTSSVM with other basic algorithms to verify its effectiveness on imbalanced multi-view datasets. All the experiments are operated in Matlab R2014a. The system configuration of the personal computer is Inter Core i3-4160, CPU3.60GHz with 8.00GB RAM.
Datasets

Illustration of Synthetic dataset 1.

Illustration of Synthetic dataset 2.

Illustration of Corel dataset.
The information of these datasets is summarized in Table 1. These multi-view datasets are all imbalanced data.
The information of 6 multi-view datasets
We compare the proposed MvMMTSSVM with the following benchmark methods: SVM-2K [14]: It is a two-view learning algorithm, which firstly combines the SVM and KCCA. MvTSVM [16]: It is a multi-view leaning algorithm, which implements the consensus principle and empirical risk minimization principle simultaneously. It needs to train the TSVMs in different views, then combines them by introducing the constraint of similarity. Nevertheless, it needs matrix inversion which will affect the solving speed. MMTSSVM-A/B: MMTSSVM [37] is a single view learning algorithm which is trained separately on view A or view B. joint-MMTSSVM: It can be regarded as a kind of multi-view learning algorithm, since it adopts the concatenating strategy to combine the two views into one single view.
Experimental settings
We adopt the averaged G-means to evaluate the classification performance. The definition of G-means is:
The Gaussian RBF kernel K (
Experimental results
Table 2 shows the average G-means and elapsed time of six compared algorithms on 24 imbalanced datasets, where the bold values are the best G-means. The results in Table 2 indicate that the proposed MvMMTSSVM obtains satisfactory performance because it obtains the highest G-means on 14 datasets out of 24 comparisons.
Performance comparison of six algorithms on 24 datasets
Performance comparison of six algorithms on 24 datasets
To further analyze the experimental results, we adopt the method of Win/Draw/Loss (WDL). The last line of Table 2 displays the number of datasets for which our MvMMTSSVM obtains higher, equal or lower G-means than the other five methods. The proposed MvMMTSSVM obtains higher G-means than SVM-2K on 16 comparisons. Compared with MvTSVM, our MvMMTSSVM has better ability of dealing with class imbalanced problem, because it is superior to MvTSVM in 17 out of 24 datasets. The proposed MvMMTSSVM yields better performance than MMTSSVM-A and MMTSSVM-B, because its G-means is better or equal to that of MMTSSVM-A and MMTSSVM-B in 21 out of 24 comparions. In datasets Synthetic1 and Synthetic2, MMTSSVM-B performs better than MMTSSVM-A, which can also be seen in Fig. 2 and Fig. 3 obviously. Our MvMMTSSVM obtains higher G-means than j-MMTSSVM on 19 datasets. Besides, SVM-2K and MvTSVM perform better than MMTSSVM-A, MMTSSVM-B and j-MMTSSVM.
From the perspective of elapsed time, we find that the proposed method spends more running time than MMTSSVM-A, MMTSSVM-B and j-MMTSSVM. The reason is that the number of Lagrange multipliers is four times as that of MMTSSVM-A, MMTSSVM-B and j-MMTSSVM. The proposed MvMMTSSVM spends less running time than MvTSVM in most datasets. That’s because our algorithm doesn’t involve matrix inverse operation and the optimal solutions can be got just by solving a QPP and a LPP, which can greatly reduce the computational cost.
In addition, we do experiments on four datasets, i.e. Synthetic1, Synthetic2, Ionosphere and Advertisement by using K-nearest neighbor (KNN) [43] and multi-view nonparallel SVM (MvNPSVM) [7]. Because KNN is a traditional machine learning algorithm for single-view learning, it cannot deal with multi-view data directly. We adopt similar strategy as MMTSSVM. For the sake of fairness, the optimal solutions of MvNPSVM are solved by the function “quadprog” in Matlab toolbox. The results are shown in Table 3. Compared with KNN-A, KNN-B and KNN, the proposed MvMMTSSVM can get higher G-means on datasets Synthetic2, Ionosphere and Advertisement. However, it takes longer time than KNN. The performance of MvNPSVM is better than our MvMMTSSVM on dataset Ionosphere. However, the computational efficiency of MvNPSVM is too low. Especially, MvNPSVM runs out of memory on dataset Advertisement. The main reason is that the number of Lagrange multipliers need to be solved in MvNPSVM is eight times the sample size, which will take a long time to get the optimal solution.
Results of KNN and MvNPSVM on four datasets
The averaged G-means of our MvMMTSSVM in Table 2 are not always higher than that of other five methods. Therefore, we use Friedman test [44] to do further analysis. For each dataset, the highest G-means is ranked as 1, the next is ranked as 2, and so on. The average ranks of the six compared algorithms are shown in Table 4. Significantly, the last line of Table 4 shows that the average rank of our MvMMTSSVM is 2.02, which is the lowest among the six algorithms. The results reflect that the proposed MvMMTSSVM performs best among the six compared algorithms.
Average rank on G-means of six algorithms
Average rank on G-means of six algorithms
The null-hypothesis is that the six algorithms are equivalent. The Friedman statistic can be calculated by
From Eq.(34), we can get
Because of rejecting the null hypothesis, we can further proceed with the Nemenyi test [45]. Fig. 5 displays the results, where the the hollow circles stand for the average ranks of the six algorithms and the straight lines centered on the “∘” are the critical difference CD, where

Friedman test.
In this work, we propose a new MVL termed as MvMMTSSVM which follows the the maximum principle and the consensus principle at the same time. By following the maximum principle, it seeks two homocentric spheres and tries to maximize the margin of the two spheres on each view. Besides, the absolute value constraints of two views can make sure the implementation of the consensus principle. The proposed method not only inherits the elegant formulation of MMTSSVM, but also deals with the multi-view imbalance problem effectively. Therefore, it is more suitable for practical application. Experimental results on 24 binary datasets confirm that our MvMMTSSVM is feasible and valid.
