Abstract
Radial basis function (RBF) neural networks for Multi-Instance Multi-Label (MIML) directly can exploit the connections between instances and labels so that they can preserve useful prior information, but they only adopt Gaussian radial basis function as their RBF whose parameters are difficult to determine. In this paper, parameters can be obtained by multi-objective optimization methods with multi performance measures treated as objectives, specifically, parameter estimation of different RBFs by an improved multi-objective particle swarm optimization (MOPSO) is proposed where Recall rate and Precision rate are chosen to obtain the most desirable Pareto optimal solution set. Furthermore, share-learning factor is proposed to modify the particle velocity in standard MOPSO to improve the global search ability and group cooperative ability. It is experimentally demonstrated that the proposed method can estimate the reliable parameters of different RBFs, and it is also very competitive with the state of art MIML methods.
Keywords
Introduction
Multi-Instance Multi-Label (MIML) models have achieved great development, some real-world problems such as text categorization, image classification, gene classification can be formalized under this framework. MIML is a type of machine learning model whose training set is not composed of several instances but labeled bags [1]. A bag consisting of several instances can correspond to one or more labels. If there is at least one positive example of a label in a bag, the bag has the corresponding label. MIML builds a learning model for bags that have been labeled, and then predicts the label of unmarked bags based on the model.
MIML methods can be classified into degeneration based [2, 3], generative models based [4, 5] and other strategies based [6, 7, 8, 9, 10, 11, 12, 13, 14, 15]. Degeneration based methods utilized the Multi-Label (ML) learning or Multi-Instance (MI) learning as a bridge to degenerate MIML problems into traditional supervised learning problems, but the degeneration based algorithm might suffer from the loss of information during the reduction process. Generative models, such as Dirichlet-Bernoulli Alignment [4], hidden conditional random fields [5], could be used to directly model the MIML data and establish the conditional probability distribution between variables, but they suffered from some outliers. Other strategies based include neural network based methods [6, 7, 8, 9], time saving strategy based [10], weak label learning based [11], knowledge discovery based [12], regularization based [13], parameter estimation based [14, 15] and so on. MIMLRBF [6], a neural based method, was derived from traditional Gaussian radial basis function neural networks. It could directly exploit connections between instances and labels so that it could preserve useful prior information.
MOPSO [16] is a kind of multi-objective optimization method which has been widely used in various fileds [17, 18]. MOPSO is to embed Pareto dominance mechanism and use external archive to save non-dominated solutions. The external archive stores the non-dominated solutions generated in each iteration and deletes the old ones. But MOPSO might easily get into the local optimum. To address this problem, methods involved are to keep the diversity of the Pareto optimal solutions and ensure the ability of finding the true Pareto front. The group division strategies, especially clustering methods, were employed to improve the convergence and the diversity of particles [19, 20, 21], grid partition based algorithms [22, 23] were also proposed, combined with the dynamic distribution of non dominated solutions, these methods strengthened the exploration of adjacent regions of “low-density” regions or “non distributed” regions. Strategies of estimating important features like direction vectors or center points [24, 25] were also very common. The above methods did not consider the prior knowledge of the overall particles which might result in the lack of diversity of Pareto optimal solution sets, and affect the final optimization results.
In this paper, a neural network based MIML algorithm with an improved MOPSO is proposed. The main contributions of our paper are summarized as follows:
A MIMLRBF parameter estimation method by multi-objective optimization is proposed. During the process of MIMLRBF parameter estimation, the classification Recall rate and Precision rate are adopted as two contradictory objectives in an improved MOPSO so that parameters are optimized directly by the performance measures. This strategy tends to obtain the most desirable Pareto optimal solution set so that parameters can be estimated more accurately. To obtain the reliable Pareto optimal solution set composed with more diverse particles, share-learning factor is proposed to modify the particle velocity in standard MOPSO. It modifies the velocity updating formulas which can make the particles share information with all the other particles besides personal and global best particle to improve the particle global search ability and group cooperative ability. Different RBFs with estimated parameters are embedded into the MIML model. To explore the most suitable RBF embedded in MIML model. Different RBFs are introduced in standard MIMLRBF, and the parameters to be estimated by each RBF in the proposed MOPSO are also given. The most suitable RBF will be selected by classification performance, it will also be compared with the state of art MIML methods.
The rest of the paper is organized as follows: Section 2 introduces MIMLRBF, multi-objective optimization. Section 3 introduces the proposed method. Section 4 gives the experiment design and the results. Finally, the conclusions of this paper are given in Section 5.
In this section, MIMLRBF algorithm is reviewed in Section 2.1, multi-objective optimization problem and MOPSO are introduced in Section 2.2.
The structure of MIMLRBF.
In the traditional MIML setting [3]. Let
MIMLRBF [6] which is derived from traditional Gaussian radial basis function neural networks can directly exploit connections between instances and labels so that it can preserve useful prior information. MIMLRBF has been a hotspot for the perfect approximation performance and global optimal property. The first layer of a MIMLRBF neural network consists of medoids calculated by performing an improved K-medoids clustering on an MIML data set. In the second layer, the weights of a MIMLRBF neural network are assigned through the singular value decomposition (SVD). The main research focus of MIMLRBF was on the adaption of K-medoids clustering algorithm, the improvement of SVD and the application of different types of Radial basis functions. IMIMLRBF was proposed to address the unbalanced samples, but it always took longer than that of standard MIMLRBF. Steepest descent method (SD) was proposed in calculating the weights between the hidden and output layers of RBF network [27] in the process of SVD solution. Gradient descent optimization algorithm and momentum were also added in the process of solving the parameters of hidden layer and output layer based on the traditional MIMLRBF method, which avoided the error caused by direct use of SVD method in the process of solving parameters [28]. The Radial Basis was discussed in [29], it proved that Inverse Multi-Quadric Function was the most suitable one in MIMLRBF.
MIMLRBF [6] trains the classifier by means of Gaussian radial basis function. As is shown in Fig. 1, the network includes the input layer, the hidden layer and the output layer. The input is a bag containing
The activation function of the hidden layer in MIMLRBF algorithm is a radial basis function which is a Gaussian function, but the performance of algorithm depends on the center, the width of radial basis function and weight values. The exact number of parameter was discussed in [6] by enumeration after several experiments. Practically speaking, the process of enumeration experiment was very time-consuming and it could not always obtain the optimal parameters. Furthermore, MIMLRBF only considers the Gaussian style activation while there are also several other types of radial basis functions.
Parameter estimation of RBF neural networks was actually an optimization problem in MIMLRBF. Single-objective optimization [30] and multi-objective optimization [31] have been both successfully applied in the structure or parameter estimation of RBF neural networks. However, the focus on the classification accuracy in single-objective optimization might cause over fitting. Model accuracy and complexity were always utilized as two fitness functions in parameter estimation of RBF neural networks, but existing approaches for the MIML problem did not seek to optimize performance measures such as F-measure and average precision directly [32]. Existing MIMLRBF methods have obtained the Gaussian RBF parameters from only classification accuracy rate by enumeration experiment [29], but it was very time-consuming and might cause over fitting, so multi-objective optimization methods, especially those with multi performance measures as objectives, might be required in parameter estimation of RBF neural networks.
Multi-objective optimization problem and MOPSO
The multi-objective optimization problem can be generally described as follows:
where
it is called
In general, the sub objectives of the multi-objective optimization problem are always contradictory. The improvement of one sub objective may reduce the performance of another or several other sub objectives, that is, it is impossible to achieve the optimal value of multiple sub objectives at the same time, but only coordinate and compromise among them to achieve the optimization of each sub objective as much as possible.
The particles in the traditional MOPSO rely on the information of personal best particle Pbest and global best particle Gbest, it contains particle velocity and position updates for the population. The update formula of particle velocity and position is as follows:
where
To settle parameter estimation of radial basis function in MIMLRBF, parameter estimation of RBFs by MOPSO is proposed where Recall rate and Precision rate are chosen to obtain the most desirable Pareto optimal solution set. Furthermore, share-learning factor is proposed to modify the particle velocity in MOPSO to improve the global search ability and group cooperative ability. The details of the proposed method can be divided into the following parts.
Parameter estimation for RBFs in MIMLRBF with MOPSO
Embedding different RBFs into MIML model
As is shown in Fig. 1, assume that
where
Furthermore, the result of enumeration experiment in [6] showed that when the number of
where
where
As is displayed in Eq. (9), the activation function of the hidden layer in MIMLRBF algorithm is a radial basis function which is a Gaussian function, but the performance of algorithm depends on the center, the width of radial basis function and weight values, which are always estimated by enumeration after several experiments. Different RBFs are considered embedding into the MIML model. The mathematic Eq. (9) can be written in a more general format:
where
Different RBFs and those parameters to be estimated
How to estimate the hyper parameters given in Table 1 accurately and efficiently so that better classification performance can be achieved? In this proposed method, an improved MOPSO is utilized for estimating those parameters where Recall rate and Precision rate are chosen to obtain the most desirable Pareto optimal solution set. These two contradictory indicators play important role in evaluating the classification performance, so the two indicators are utilized as multi objectives to determine the parameters in MOPSO. After obtaining the Pareto optimal solution set, the parameters that make the F-score best in the Pareto optimal set are assigned as the candidate ones of different RBFs. In this paper, specifically, Precision rate is represented by average precision, Recall rate is by average recall and F-score is by average F1, the formulas are displayed as follows:
Average precision evaluates the average fraction of proper labels ranked above a particular label
Average recall evaluates the average fraction of proper labels that have been predicted.
Average F1 is a tradeoff between average precision and average recall. The flowchart of the proposed method is displayed below.
In the above algorithm, Firstly, the first layer of MIMLRBF neural network is formed by performing K-medoids clustering on training examples of each possible label (Step1-Step4), it segments
The movement trajectory of a particle in share-learning based MOPSO.
The particles in the traditional MOPSO rely fully on the information of personal best particle Pbest and global best particle Gbest, ignoring the other particles’ information, so it may reduce the diversity of the particle swarm. Pareto Frontier is also crucial to the performance of the proposed algorithm. To obtain the reliable Pareto Frontier composed with more diverse particles, the average best position which can be regarded as the share–learning factor is proposed to modify the velocity updating formulas which can make the particles share information with all the other particles besides personal and global best particle, in this way will the algorithm improve the global search ability and group cooperative ability.
The movement trajectory of a particle is displayed in Fig. 2, the trajectory of a particle in the standard MOPSO is [baseline=(char.base)] [shape=circle,draw,inner sep=0.2pt] (char) 1;
It should be noted that, share-learning based MOPSO model generally takes the minimization function as the objectives. Therefore, -average precision and -average recall are adopted as the two objective functions which can be calculated by different RBFs and corresponding parameters. At the same time, the search area (
In Algorithm 2, SLPSOOpeartor in Step6 contains particle velocity and position updates for the population by share-learning based MOPSO. Firstly, the update formula of particle velocity and position is as follows:
where
where
where MaxIt is the maximum iterations,
Let
When MOPSO is applied in estimating the hyper parameters, the complexity of traditional MOPSO is
Experimental results and discussion
Experimental datasets and evaluation index
The experiment of this paper will be tested on three public datasets, namely Reuters, Scene and Haloarcula marismortu [33]. The Reuters dataset with 2000 documents is derived from widely used Reuters-21578 collection. Each document contains a bag of instances, where each instance is a text segment of sliding window. Scene dataset contains 2000 natural scene images. A set of labels is manually assigned to each image like desert, mountains, sea, sunset and trees. Haloarcula marismortui is a kind of archaea genome, there are 304 proteins (examples) with a total of 234 gene ontology terms (label classes) on molecular function in the Haloarcula marismortui dataset. All the experiment are run on Matlab R2018a with Intel (R) core (TM) i5-7500 CPU@3.40 GHz , 4 GB memory, 64 bit operating system. The specific datasets information is shown in Table 2.
Characteristics of the datasets
Characteristics of the datasets
All the experiment is conducted by 10-fold cross validation. The first experiment helps to find which radial basis function is the most suitable for MIML model. The second experiment compares the method proposed in this paper with other MIML methods. The third experiment analyzes the parameters of improved MOPSO in our method. The fourth experiment compares the results of the proposed method with standard PSO and MOPSO during the process of optimization. The indicators including Hamming Loss, Ranking Loss, One error, Coverage are adopted for performance evaluation.
Hamming Loss indicator reflects the degree of misclassification of bags, i.e., a proper label is missed or a wrong label is predicted. Therefore, the smaller the index value, the better the learning model. It can be calculated as:
where
where
Coverage evaluates the average value of how far it is needed to go down the list of labels in order to traverse all the proper labels of the bag. The smaller Coverage indicates the better performance
where
To test the performance of different RBFs mentioned in Table 1, according to the different parameters to be determined mentioned above, 5 RBFs are verified in the proposed algorithm on three datasets respectively. The parameters of MOPSO algorithm are set as follows: the number of particles is 50, the capacity of External archive is 25, the individual learning factor
It can be found in Tables 3–5 that all the classification indicators of Inverse Multi-Quadric are the best on both Scene and Haloarcula marismortui. The Ranking loss, Coverage and One error indicators are the also excellent on the Reuters dataset, while the Hamming loss corresponding to Gaussian RBF is the smallest. It can be concluded that Inverse Multi-Quadric is more suitable to on the three tested MIML datasets. So Inverse Multi-Quadric is applied as RBF function in MIMLRBF in the following experiment.
Results of different RBFs tested on reuters
Results of different RBFs tested on reuters
Results of different RBFs tested on scene
Results of different RBFs on Haloarcula marismortui
This part conducts the contrast experiment between the method proposed in this paper and other MIML methods. In this paper, Inverse Multi Quadric is selected as the RBF function, and its parameters are estimated by MOPSO method. Furthermore, 9 state-of-art methods KISAR, DMIMLSVM, M3MIML, MIMLBOOST, MIMLNN, MIMLRBF, MIMLSVM, MIMLFAST, MIMLWEL and other algorithms are selected for comparative test by 10-fold cross validation. The descriptions and parameters of these algorithms are shown in Table 6, at the same time, the evaluation metrics including Hamming Loss, Ranking Loss, Coverage, One Error and Training time results tested on the three dataset are displayed in Tables 7–9 respectively.
Descriptions and parameters setting of other MIML methods
Descriptions and parameters setting of other MIML methods
It can be seen from Tables 7–9 that the classification performance of the method proposed has achieved stable and excellent performance on the three datasets. Specifically, on the Reuters dataset, our algorithm achieves the best values in Ranking Loss and Coverage, while best Hamming Loss and One error are obtained by MIMLNN and MIMLFAST respectively. As for the Scene dataset, our algorithm performs excellent in Ranking Loss and Coverage, with the optimal values of the other two indicators achieved by KISAR. Except for the One error, on the Haloarcula marismortui datasets, our algorithm is the top performer according to the rest 3 indicators. In a word, compared with other MIML algorithms, our method is superior to other methods to some extent in classification performance The MIMLFAST consumes the least training time on Reuters and Scene datasets of all the algorithms while the MIMLNN performs best on Haloarcula marismortui. Efficiency is highly improved by optimizing the approximated ranking loss with SGD based on a two level linear model, as well discovering sub-concepts for complicated labels in a shared space in MIMLFAST. MIMLBOOST and DMIMLSVM degenerate the MIML into a series of Multi-Instance or Multi-Label tasks which result in the high complexity and low efficiency with the dramatic expansion of the hypothesis space. Ours spend much less time in training than those of M3MIML, DMIMLSVM and MIMLBOOST
Results run on the reuters
Results run on the scene
Results run on the Haloarcula marismortui
In this part, the proposed method is conducted with the different number of iterations. As is introduced in the Section 4.1, the maximum number of iterations in improved MOPSO is 40. Inverse Multi Quadric basis functions which have better results are selected to test the influence of different iteration times. The algorithm is tested by the setting iterations as 10,20,30,40,50. Hypervolume (HV) [25] is an important evaluation index for the performance of the objective optimization algorithm which is only calculated by Pareto Frontier. It can be calculated as:
The trend of HV index with different generations.
The PFs with different generations on three datasets.
It can be seen that the results of HV on three datasets illustrates that the convergence and diversity of the improved MOPSO algorithm solutions have been improved with the increase of generations. In particular, when the number of generations is 40, the optimal HV values are obtained on three datasets, and with the generations increasing, HV tends to be stable.
Figure 4 shows the Pareto Frontiers (PFs) corresponding to the improved MOPSO at different generations on three datasets. PF with 40 generations is closer to the origin of the coordinate axis and searches a wider range of Pareto optimal solutions than with 10. It can be intuitively found that the trend of PF is consistent with the results in Fig. 3.
In order to prove the superiority of our method, this paper intends to compare the proposed method with standard PSO [34] and MOPSO. In standard PSO, when obtaining the corresponding parameters of Inverse Multi-Quadric function, the optimization algorithm is replaced with singleobjective optimization method aiming at maximizing F-score. The specific fitness function is set as: F-score, the number of particles is 50, the individual learning factor
The comparisons with standard PSO and MOPSO on three datasets.
It can be seen from Fig. 5 that the classification effects of the proposed method on three datasets are significantly better than those of standard PSO and MOPSO. Only the One error on Haloarcula marismortui dataset is slightly lower than that of standard MOPSO. Therefore, compared with standard PSO and MOPSO method, parameter optimization by the proposed method can obtain better classification results in MIML model.
This paper organically combines share-learning based MOPSO model with MIMLRBF, and focuses on utilizing the improved MOPSO model to estimate the parameters of different RBFs, so as to make the classification results of the algorithm more accurate. In this paper different kinds of RBFs are embedded into the MIML model, parameter estimation of RBFs by MOPSO is proposed where Recall rate and Precision rate are chosen to obtain the most desirable Pareto optimal solution set share-learning factor is proposed to modify the particle velocity in MOPSO to improve the global search ability and group cooperative ability. In the future, multi-objective methods will be inserted into MIML models to obtain more important information other than hyper parameters only. Also, how to reduce the time complexity of multi-objective methods in MIML will be studied.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant nos. 61976108 and 61572241, Natural Science Research Project of Colleges and Universities in Jiangsu Province under Grant nos. 19KJB520005, Humanities and Social Sciences project of the Ministry of education of China under Grant nos. 22YJC870001.
