A multi-instance multi-label learning algorithm based on radial basis functions and multi-objective particle swarm optimization

Abstract

Radial basis function (RBF) neural networks for Multi-Instance Multi-Label (MIML) directly can exploit the connections between instances and labels so that they can preserve useful prior information, but they only adopt Gaussian radial basis function as their RBF whose parameters are difficult to determine. In this paper, parameters can be obtained by multi-objective optimization methods with multi performance measures treated as objectives, specifically, parameter estimation of different RBFs by an improved multi-objective particle swarm optimization (MOPSO) is proposed where Recall rate and Precision rate are chosen to obtain the most desirable Pareto optimal solution set. Furthermore, share-learning factor is proposed to modify the particle velocity in standard MOPSO to improve the global search ability and group cooperative ability. It is experimentally demonstrated that the proposed method can estimate the reliable parameters of different RBFs, and it is also very competitive with the state of art MIML methods.

Keywords

Multi-Instance Multi-Label radial basis function multi-objective particle swarm optimization share-learning factor

1. Introduction

Multi-Instance Multi-Label (MIML) models have achieved great development, some real-world problems such as text categorization, image classification, gene classification can be formalized under this framework. MIML is a type of machine learning model whose training set is not composed of several instances but labeled bags [1]. A bag consisting of several instances can correspond to one or more labels. If there is at least one positive example of a label in a bag, the bag has the corresponding label. MIML builds a learning model for bags that have been labeled, and then predicts the label of unmarked bags based on the model.

MIML methods can be classified into degeneration based [2, 3], generative models based [4, 5] and other strategies based [6, 7, 8, 9, 10, 11, 12, 13, 14, 15]. Degeneration based methods utilized the Multi-Label (ML) learning or Multi-Instance (MI) learning as a bridge to degenerate MIML problems into traditional supervised learning problems, but the degeneration based algorithm might suffer from the loss of information during the reduction process. Generative models, such as Dirichlet-Bernoulli Alignment [4], hidden conditional random fields [5], could be used to directly model the MIML data and establish the conditional probability distribution between variables, but they suffered from some outliers. Other strategies based include neural network based methods [6, 7, 8, 9], time saving strategy based [10], weak label learning based [11], knowledge discovery based [12], regularization based [13], parameter estimation based [14, 15] and so on. MIMLRBF [6], a neural based method, was derived from traditional Gaussian radial basis function neural networks. It could directly exploit connections between instances and labels so that it could preserve useful prior information.

MOPSO [16] is a kind of multi-objective optimization method which has been widely used in various fileds [17, 18]. MOPSO is to embed Pareto dominance mechanism and use external archive to save non-dominated solutions. The external archive stores the non-dominated solutions generated in each iteration and deletes the old ones. But MOPSO might easily get into the local optimum. To address this problem, methods involved are to keep the diversity of the Pareto optimal solutions and ensure the ability of finding the true Pareto front. The group division strategies, especially clustering methods, were employed to improve the convergence and the diversity of particles [19, 20, 21], grid partition based algorithms [22, 23] were also proposed, combined with the dynamic distribution of non dominated solutions, these methods strengthened the exploration of adjacent regions of “low-density” regions or “non distributed” regions. Strategies of estimating important features like direction vectors or center points [24, 25] were also very common. The above methods did not consider the prior knowledge of the overall particles which might result in the lack of diversity of Pareto optimal solution sets, and affect the final optimization results.

In this paper, a neural network based MIML algorithm with an improved MOPSO is proposed. The main contributions of our paper are summarized as follows:

1)
A MIMLRBF parameter estimation method by multi-objective optimization is proposed. During the process of MIMLRBF parameter estimation, the classification Recall rate and Precision rate are adopted as two contradictory objectives in an improved MOPSO so that parameters are optimized directly by the performance measures. This strategy tends to obtain the most desirable Pareto optimal solution set so that parameters can be estimated more accurately.
2)
To obtain the reliable Pareto optimal solution set composed with more diverse particles, share-learning factor is proposed to modify the particle velocity in standard MOPSO. It modifies the velocity updating formulas which can make the particles share information with all the other particles besides personal and global best particle to improve the particle global search ability and group cooperative ability.
3)
Different RBFs with estimated parameters are embedded into the MIML model. To explore the most suitable RBF embedded in MIML model. Different RBFs are introduced in standard MIMLRBF, and the parameters to be estimated by each RBF in the proposed MOPSO are also given. The most suitable RBF will be selected by classification performance, it will also be compared with the state of art MIML methods.

The rest of the paper is organized as follows: Section 2 introduces MIMLRBF, multi-objective optimization. Section 3 introduces the proposed method. Section 4 gives the experiment design and the results. Finally, the conclusions of this paper are given in Section 5.
2. Related work

In this section, MIMLRBF algorithm is reviewed in Section 2.1, multi-objective optimization problem and MOPSO are introduced in Section 2.2.

Figure 1.

The structure of MIMLRBF.

2.1 MIMLRBF

In the traditional MIML setting [3]. Let $X$ be the bags and $Y=\left\{{y_{1},y_{2},\ldots,y_{Q}}\right\}$ be the set of labels. A MIML dataset that consists of $m$ bags of instances is denoted as $\left\{{\left({X_{1},Y_{1}}\right),\left({X_{2},Y_{2}}\right),\ldots,\left({X_% {m},Y_{m}}\right)}\right\}$ , where $X_{i}$ is a bag consisting of $n_{i}$ instances $\left\{{x_{i,1},x_{i,2},\ldots,x_{i,n_{i}}}\right\}$ , The output $Y_{i}$ represents the labels associated with $X_{i}$ , which is a subset of all possible labels $y_{l}\in Y,\left({l=1,2,\ldots,Q}\right)$ . $n_{i}$ is the number of instances describing the $i-\textit{th}$ bag, $Q$ is the number of labels of the $i-\textit{th}$ bag. $y_{l}=1$ if the $l-\textit{th}$ label is positive for $X_{i}$ , and $-$ 1 otherwise.

MIMLRBF [6] which is derived from traditional Gaussian radial basis function neural networks can directly exploit connections between instances and labels so that it can preserve useful prior information. MIMLRBF has been a hotspot for the perfect approximation performance and global optimal property. The first layer of a MIMLRBF neural network consists of medoids calculated by performing an improved K-medoids clustering on an MIML data set. In the second layer, the weights of a MIMLRBF neural network are assigned through the singular value decomposition (SVD). The main research focus of MIMLRBF was on the adaption of K-medoids clustering algorithm, the improvement of SVD and the application of different types of Radial basis functions. IMIMLRBF was proposed to address the unbalanced samples, but it always took longer than that of standard MIMLRBF. Steepest descent method (SD) was proposed in calculating the weights between the hidden and output layers of RBF network [27] in the process of SVD solution. Gradient descent optimization algorithm and momentum were also added in the process of solving the parameters of hidden layer and output layer based on the traditional MIMLRBF method, which avoided the error caused by direct use of SVD method in the process of solving parameters [28]. The Radial Basis was discussed in [29], it proved that Inverse Multi-Quadric Function was the most suitable one in MIMLRBF.

MIMLRBF [6] trains the classifier by means of Gaussian radial basis function. As is shown in Fig. 1, the network includes the input layer, the hidden layer and the output layer. The input is a bag containing $n$ instances $\left\{{x_{1},x_{2},\ldots x_{n}}\right\}$ , $x_{k}$ is a d-dimensional feature vector $\left[{x_{k,1},x_{k,2},\ldots x_{k,d}}\right]^{T}$ . The hidden layer in this net consists of $Q$ sets of bags $U_{l=0}^{Q}\left\{{C_{1}^{l},C_{2}^{l},\ldots C_{M_{l}}^{l}}\right\}$ , where $M_{l}$ is the number of bags whose label is $y_{l}$ , and sum total of hidden layer is $M=\sum_{l=0}^{Q}M_{l}$ . A prototype vector is fixed on each node of the hidden layer as the center of basis function $\emptyset\left(.\right)$ , each output layer has the corresponding label $y_{l}\in Y,\left({l=1,2,\ldots,Q}\right)$ . Together with an additional basis function $\emptyset_{0}$ whose value is 1, the neural network with the specified weight matrix $W=\left[{w_{jl}}\right]_{\left({M+1}\right)\ast Q}$ between the hidden and output layer is trained.

The activation function of the hidden layer in MIMLRBF algorithm is a radial basis function which is a Gaussian function, but the performance of algorithm depends on the center, the width of radial basis function and weight values. The exact number of parameter was discussed in [6] by enumeration after several experiments. Practically speaking, the process of enumeration experiment was very time-consuming and it could not always obtain the optimal parameters. Furthermore, MIMLRBF only considers the Gaussian style activation while there are also several other types of radial basis functions.

Parameter estimation of RBF neural networks was actually an optimization problem in MIMLRBF. Single-objective optimization [30] and multi-objective optimization [31] have been both successfully applied in the structure or parameter estimation of RBF neural networks. However, the focus on the classification accuracy in single-objective optimization might cause over fitting. Model accuracy and complexity were always utilized as two fitness functions in parameter estimation of RBF neural networks, but existing approaches for the MIML problem did not seek to optimize performance measures such as F-measure and average precision directly [32]. Existing MIMLRBF methods have obtained the Gaussian RBF parameters from only classification accuracy rate by enumeration experiment [29], but it was very time-consuming and might cause over fitting, so multi-objective optimization methods, especially those with multi performance measures as objectives, might be required in parameter estimation of RBF neural networks.

2.2 Multi-objective optimization problem and MOPSO

The multi-objective optimization problem can be generally described as follows:

$\displaystyle\textit{minF}\left(x\right)=\left({f_{1}\left(x\right),f_{2}\left% (x\right),\ldots,f_{m}\left(x\right)}\right)^{T}\ \ s.t.x\in\Omega$ (1)

where $x=\left({x_{1},x_{2},\ldots x_{n}}\right)^{T}$ is an $n$ -dimensional decision vector, $f_{i}\left(x\right)$ is the value of the $i-\textit{th}$ objective function; $m$ is the number of target objects, $\Omega$ is the decision space, { $F\left(x\right)|x\in\Omega$ } is the target space. Pareto domination is defined as, if $x,y\in\Omega$ , it is said that $x$ is Pareto dominant compared with $y$ , if and only if

$\displaystyle\forall i\in\left\{{1,2,\ldots m}\right\}:f_{i}\left(x\right)% \leqslant f_{i}\left(y\right)\wedge\exists j\in\left\{{1,2,\ldots m}\right\}:f% _{j}\left(x\right)<f_{j}\left(y\right)$ (2)

it is called $x$ dominates $y$ , denoted as $x\prec y$ . The set of all Pareto optimal solutions is called Pareto optimal set (PS), which is defined as $\textit{PS}\buildrel\Delta\over{=}\left\{{x^{\ast}|\neg\exists y\in\Omega:y% \prec x^{\ast}}\right\}$ . The surface formed by all Pareto optimal solutions in the PS in the target space is called Pareto Frontier (PF), which is defined as:

$\displaystyle\textit{PF}\buildrel\Delta\over{=}\{F\left({x^{\ast}}\right)=% \left({f_{1}\left({x^{\ast}}\right),f_{2}\left({x^{\ast}}\right),\ldots,f_{m}% \left({x^{\ast}}\right)}\right)^{T}|x^{\ast}\in\textit{PS}\}$ (3)

In general, the sub objectives of the multi-objective optimization problem are always contradictory. The improvement of one sub objective may reduce the performance of another or several other sub objectives, that is, it is impossible to achieve the optimal value of multiple sub objectives at the same time, but only coordinate and compromise among them to achieve the optimization of each sub objective as much as possible.

The particles in the traditional MOPSO rely on the information of personal best particle Pbest and global best particle Gbest, it contains particle velocity and position updates for the population. The update formula of particle velocity and position is as follows:

$\displaystyle v_{i}\left({t+1}\right)=\omega\ast v_{i}\left(t\right)+r_{1}c_{1% }\ast\left({\textit{pbest}_{i}\left(t\right)-x_{i}\left(t\right)}\right)+r_{2}% c_{2}\ast\left({\textit{gbest}_{i}\left(t\right)-x_{i}\left(t\right)}\right)$ (4) $\displaystyle x_{i}\left({t+1}\right)=x_{i}\left(t\right)+v_{i}\left({t+1}\right)$ (5)

where $v_{i}\left(t\right)$ and $x_{i}\left(t\right)$ represent the value of the velocity and position of $t-\textit{th}$ generation particle $p$ in the $i-\textit{th}$ dimension, $\textit{pbest}_{i}\left(t\right)$ means the value of the $i-\textit{th}$ dimensional pbest of the individual optimal position of particle $p$ , $\textit{gbest}_{i}\left(t\right)$ is the value of the $i-\textit{th}$ dimension of the global optimal position gbest, $\omega$ is the inertia weight constant, $r_{1},r_{2}$ are random values in the range [0,1], $c_{1}$ and $c_{2}$ are the two random constant values.

3. The proposed method

To settle parameter estimation of radial basis function in MIMLRBF, parameter estimation of RBFs by MOPSO is proposed where Recall rate and Precision rate are chosen to obtain the most desirable Pareto optimal solution set. Furthermore, share-learning factor is proposed to modify the particle velocity in MOPSO to improve the global search ability and group cooperative ability. The details of the proposed method can be divided into the following parts.

3.1 Parameter estimation for RBFs in MIMLRBF with MOPSO

3.1.1 Embedding different RBFs into MIML model

As is shown in Fig. 1, assume that $S=\left\{{\left({X_{i},Y_{i}}\right)|1\leqslant i\leqslant N}\right\}$ as MIML training set, $U_{l}=\{X_{i}|\left({X_{i},Y_{i}}\right)\in S,y_{l}\in Y_{i}\},\left({l=1,2,% \ldots,Q}\right)$ means the set of bags with the $l-\textit{th}$ label. Considering the two bags of instances $A=\left\{{a_{1},a_{2}\ldots,a_{n_{a}}}\right\}$ , $B=\left\{{b_{1},b_{2}\ldots,b_{n_{b}}}\right\},$ the average Hausdorff distance between $A$ and $B$ introduced in [6] is written as:

$\displaystyle\textit{aveH}\left({A,B}\right)=\frac{\mathop{\sum}\nolimits_{a% \in A}\textit{min}_{b\in B}\textit{dist}\left({a,b}\right)+\mathop{\sum}% \nolimits_{b\in B}\textit{min}_{a\in A}\textit{dist}\left({a,b}\right)}{\left|% A\right|+\left|B\right|}$ (6)

where $\textit{dist}\left({.,.}\right)$ denotes the distance between $A$ and $B$ , $\left|.\right|$ is the cardinality of a set. $\textit{dist}\left({.,.}\right)$ measures the distance from the nearest neighbor in the other bag. The average value of the above distances is written as $\textit{aveH}\left({.,.}\right)$ when finish traversing all the distances in both bags. K-medoids clustering is used for segmenting $U_{l}$ into $M_{l}$ disjoint groups of bags $G_{j}^{l}\left({1\leqslant j\leqslant M_{l}}\right)$ , whose medoids can be calculated as:

$\displaystyle C_{j}^{l}=\textit{arg}\mathop{\text{min}}\nolimits_{A\in G_{j}^{% l}}\mathop{\sum}\nolimits_{B\in G_{j}^{l}}\textit{aveH}\left({A,B}\right)$ (7)

Furthermore, the result of enumeration experiment in [6] showed that when the number of $M_{l}$ is 0.1* $\left|{U_{l}}\right|$ , the performance of MIMLRBF tends to level up. In order to exploit the medoids from each possible class, the weight $W=\left[{w_{jl}}\right]_{\left({M+1}\right)\ast Q}$ between the hidden layer and the output layer can be obtained by minimizing the sum-of-squares error functions which can be expressed as:

$\displaystyle E=\frac{1}{2}\mathop{\sum}\nolimits_{i=1}^{N}\mathop{\sum}% \nolimits_{l=1}^{Q}\left({Y_{l}\left({X_{i}}\right)-t_{l}^{i}}\right)^{2}$ (8)

where $t_{l}^{i}$ is the desired output of $X_{i}$ on the $l-\textit{th}$ label, which takes the value of $+$ 1 if $y_{l}\in Y_{i}$ and $-$ 1 otherwise. $Y_{l}\left({X_{i}}\right)=\mathop{\sum}\nolimits_{j=0}^{M}w_{jl}\ast\varphi_{j% }\left({X_{i}}\right)$ is the output of $X_{i}$ on the $l-\textit{th}$ label, where $\varphi_{j}\left({X_{i}}\right)$ is the activation of $j-th$ basis function on $X_{i}$ . The basis function $\varphi_{j}\left(.\right)$ can make Gaussian style activation as follows:

$\displaystyle\varphi_{j}\left({X_{i}}\right)=\textit{exp}\left({-c^{2}r^{2}}\right)$ (9)

$r^{2}$ in the Eq. (9) is set $\frac{\textit{aveH}\left({X_{i},C_{j}}\right)^{2}}{2\sigma_{j}^{2}}\ \textit{% aveH}\left({X_{i},C_{j}}\right)$ measures the average Hausdorff distance between $X_{i}$ and the $j-\textit{th}$ medoid in input layer, $\sigma$ which controls the smoothness of basis function is set $\rho\ast\left({\frac{\mathop{\sum}\nolimits_{p=1}^{M-1}\mathop{\sum}\nolimits_% {q=p+1}^{M}\textit{aveH}\left({C_{p},C_{q}}\right)}{M\left({M-1}\right)/2}}\right)$ . It is obvious that $\rho$ and $c$ in the Eq. (9) are the two hyper parameters. Differentiating the error function of Eq. (8) with respect to $W$ and setting the derivative to zero gives the normal equations for the least sum-of-squares problem as follows:

$\displaystyle W=\left({\emptyset^{T}\emptyset}\right)^{-1}\emptyset^{T}T$ (10)

where $\emptyset=\left[{\varphi_{ij}}\right]_{N\ast\left({M+1}\right)}$ with elements $\varphi_{ij}=\varphi_{j}\left({X_{i}}\right)$ , $W=\left[{w_{jl}}\right]_{\left({M+1}\right)\ast Q}$ with elements $w_{jl}$ , and $T=\left[{t_{ij}}\right]_{N\ast Q}$ with elements $t_{il}=t_{l}^{i}$ .

As is displayed in Eq. (9), the activation function of the hidden layer in MIMLRBF algorithm is a radial basis function which is a Gaussian function, but the performance of algorithm depends on the center, the width of radial basis function and weight values, which are always estimated by enumeration after several experiments. Different RBFs are considered embedding into the MIML model. The mathematic Eq. (9) can be written in a more general format:

$\displaystyle\varphi\left(X\right)=\emptyset\left(||X||\right)$ (11)

where $\emptyset$ is the radial basis function.It is a function whose value only depends on the distance from the base point. Those who satisfy $\emptyset\left(X\right)=\emptyset\left(||X||\right)$ are all called radial basis functions. Different radial basis functions are introduced combined with $r$ displayed in Eq. (9), and the hyper parameters to be estimated by each function are also given, as shown in Table 1.

Table 1

Different RBFs and those parameters to be estimated

Radial basis function	Mathematic expression	hyper parameters to be estimated
Gaussian	$\emptyset\left(r\right)=e^{-c^{2}r^{2}}$	$c,\rho$
Markoff	$\emptyset\left(r\right)=e^{-c\|r\|}$	$c,\rho$
Multi-Quadric	$\emptyset\left(r\right)=\left(c^{2}+r^{2}\right)^{\beta}(\beta>0)$	$c,\beta,\rho$
InverseMulti-Quadric	$\emptyset\left(r\right)=\left(c^{2}+r^{2}\right)^{-\beta}(\beta>0)$	$c,\beta,\rho$
Thin plate spline	$\emptyset\left(r\right)=r^{k}$	$k,\rho$

3.1.2 Parameter estimation of RBFs by MOPSO

How to estimate the hyper parameters given in Table 1 accurately and efficiently so that better classification performance can be achieved? In this proposed method, an improved MOPSO is utilized for estimating those parameters where Recall rate and Precision rate are chosen to obtain the most desirable Pareto optimal solution set. These two contradictory indicators play important role in evaluating the classification performance, so the two indicators are utilized as multi objectives to determine the parameters in MOPSO. After obtaining the Pareto optimal solution set, the parameters that make the F-score best in the Pareto optimal set are assigned as the candidate ones of different RBFs. In this paper, specifically, Precision rate is represented by average precision, Recall rate is by average recall and F-score is by average F1, the formulas are displayed as follows:

$\displaystyle\textit{average precision}=\frac{1}{Q}\mathop{\sum}\nolimits_{l=1% }^{Q}\frac{A_{l}}{B_{l}}$ (12)

Average precision evaluates the average fraction of proper labels ranked above a particular label $y_{l}\in Y,\left({l=1,2,\ldots,Q}\right)$ . $Q$ is the number of labels, $A_{l}$ is the number of bags correctly annotated with $y_{l}$ , and $B_{l}$ is the number of bags automatically annotated with $y_{l}$ . The larger average precision, the better the performance.

$\displaystyle\textit{average recall}=\frac{1}{Q}\mathop{\sum}\nolimits_{l=1}^{% Q}\frac{A_{l}}{C_{l}}$ (13)

Average recall evaluates the average fraction of proper labels that have been predicted. $C_{l}$ is number of bags manually annotated with a particular label $y_{l}\in Y,\left({l=1,2,\ldots,Q}\right)$ (the groundtruth). The larger average recall shows the better performance.

$\displaystyle\textit{average F1}=\frac{2\ast\textit{average presion}\ast% \textit{average recall}}{\textit{average presion}+\textit{average recall}}$ (14)

Average F1 is a tradeoff between average precision and average recall. The flowchart of the proposed method is displayed below.

Algorithm 1: The Pseudo-code of proposed method
Inputs: $S$ : The MIML training set $\left\{{\left({X_{i},Y_{i}}\right),\ldots,\left({X_{N},Y_{N}}\right)}\right\}$ , $T$ : desired output of $X_{i}$ , $X$ : the test MIML bag, $M_{l}$ : the cluster number, RBF: Radial Basis Function, Para: the parameters of RBF that needed to be determined, EP: external archive
Outputs: $Y$ , the predicted label set for $X$
Process:
1. For $y_{l}\in Y$ do
2. Set $U_{l}=\{X_{i}\|\left({X_{i},Y_{i}}\right)\in S,y_{l}\in Y_{i}\}$
3. $C_{j}^{l}\leftarrow$ K-medoids ( $U_{l},M_{l})$ , $\left({1\leqslant j\leqslant M_{l}}\right)$
4. end for
5. $\textit{EP}\leftarrow$ MOPSO( $\textit{RBF, Para, HDL},\ldots$ );
6. $\textit{Para}_{\textit{best}}\leftarrow\textit{arg}\mathop{\max}\nolimits_{% \textit{Para}\in\textit{EP}}\textit{average F1}$
7. $\emptyset\leftarrow\textit{RBF}\left(C,\textit{Para}_{\textit{best}}\right)$
8. $W=\left({\emptyset^{T}\emptyset}\right)^{-1}\emptyset^{T}T$
9. $Y=\left\{{l\left\|{y_{l}\left({X_{i}}\right)=W\ast\emptyset}\right\rangle 0,y_{% l}\in Y}\right\}$

In the above algorithm, Firstly, the first layer of MIMLRBF neural network is formed by performing K-medoids clustering on training examples of each possible label (Step1-Step4), it segments $U_{l}$ into $M_{l}=0.1\ast\left|{U_{l}}\right|$ disjoint groups of bags $G_{j}^{l}$ , whose medoids $C_{j}^{l}$ can be calculated by Eq. (7). After that, HDL represents the hidden layer in Fig. 1, its nodes consists of $C_{j}^{l}$ , parameters of different RBFs are estimated by MOPSO (Step5- Step6), Firstly, Pareto optimal set EP is obtained with the two objective functions average precision and average recall (Step5), secondly, the parameters that make the average F1 best in the Pareto optimal set are assigned as the $\textit{Para}_{\textit{best}}$ of different RBFs (Step6). The weights $W=\left[{w_{jl}}\right]_{\left({M+1}\right)\ast Q}$ between the hidden and output layer is trained through minimizing the sum-of squares error function as shown in Eq. (10) by different RBFs with their corresponding parameters (Step7-Step8). Finally, the test MIML example is fed to the trained neural networks for prediction (Step9).

3.2 An improved MOPSO with share-learning factor

Figure 2.

The movement trajectory of a particle in share-learning based MOPSO.

The particles in the traditional MOPSO rely fully on the information of personal best particle Pbest and global best particle Gbest, ignoring the other particles’ information, so it may reduce the diversity of the particle swarm. Pareto Frontier is also crucial to the performance of the proposed algorithm. To obtain the reliable Pareto Frontier composed with more diverse particles, the average best position which can be regarded as the share–learning factor is proposed to modify the velocity updating formulas which can make the particles share information with all the other particles besides personal and global best particle, in this way will the algorithm improve the global search ability and group cooperative ability.

The movement trajectory of a particle is displayed in Fig. 2, the trajectory of a particle in the standard MOPSO is [baseline=(char.base)] [shape=circle,draw,inner sep=0.2pt] (char) 1; $\to$ [baseline=(char.base)] [shape=circle,draw,inner sep=0.2pt] (char) 2; $\to$ [baseline=(char.base)] [shape=circle,draw,inner sep=0.2pt] (char) 3; $\to$ [baseline=(char.base)] [shape=circle,draw,inner sep=0.2pt] (char) 4;. After introducing share-learning factor, the trajectory of a particle becomes [baseline=(char.base)] [shape=circle,draw,inner sep=0.2pt] (char) 1; $\to$ [baseline=(char.base)] [shape=circle,draw,inner sep=0.2pt] (char) 2; $\to$ [baseline=(char.base)] [shape=circle,draw,inner sep=0.2pt] (char) 3; $\to$ [baseline=(char.base)] [shape=circle,draw,inner sep=0.2pt] (char) 5; $\to$ [baseline=(char.base)] [shape=circle,draw,inner sep=0.2pt] (char) 6;. Obviously, the searching area determined by the share-learning based MOPSO contains the area by standard MOPSO, and the standard MOPSO can also be considered as the special case of share-learning based MOPSO. It is clear that adding the share-learning factor can enhance the global search capability of particles.

It should be noted that, share-learning based MOPSO model generally takes the minimization function as the objectives. Therefore, -average precision and -average recall are adopted as the two objective functions which can be calculated by different RBFs and corresponding parameters. At the same time, the search area ( $c$ : 0, 10], $\beta$ : [0, 10], $k$ : [0, 10], $\rho$ : [0, 1]) is determined according to the parameter range mentioned in MIMLRBF and the test of previous experiments. So the Step5 in Algorithm 1 can be replaced by the share-learning based MOPSO. The detailed flowchart of the improved MOPSO is shown as follows below.

Algorithm 2: The Pseudo-code of share–learning based MOPSO
Inputs: $P$ : population, $N$ : swarm size $v$ : velocity of particles Pbest: personal best Gbest: global bset, EP: external archive
Outputs:EP
Process:
1. Initialize the random swarm $P$ with $N$ , velocity of particles $v$ , and personal best Pbest;
2. Store all the nondominated solutions in the external archive EP;
3. while termination criterion not fullfilled do
4. $G\leftarrow$ GridBuild(EP);
5. $\textit{Gbest}\leftarrow$ GridSelection(EP, $G$ );
6. $P\leftarrow$ SLPSOOpeartor(Pv, Pbest, Gbest);
7. $\textit{Pbest}\leftarrow$ UpdatePbest(Pbest, $P$ );
8. $\textit{EP}\leftarrow$ UpdateArchive( $\textit{EP}\cup P$ );
9. if $\|\textit{EP}\|>N$ then
10. $\textit{EP}\leftarrow$ DeleteSolution(EP, G);
11. end if
12. end while
13. return EP

In Algorithm 2, SLPSOOpeartor in Step6 contains particle velocity and position updates for the population by share-learning based MOPSO. Firstly, the update formula of particle velocity and position is as follows:

$\displaystyle v_{ij}\left({t+1}\right)=\omega\ast V_{ij}\left(t\right)+r_{1}c_% {1}\ast\left({P_{ij}\left(t\right)-X_{ij}\left(t\right)}\right)+r_{2}c_{2}\ast% \left({G_{j}\left(t\right)-X_{ij}\left(t\right)}\right)+r_{3}c_{3}\ast\left({C% _{j}\left(t\right)-X_{ij}\left(t\right)}\right)$ (15)

where $i$ represents $i-\textit{th}$ particle of the swarm, $V_{ij}\left(t\right)$ and $X_{ij}\left(t\right)$ represent the value of the velocity and position of the particle in the $j-\textit{th}$ dimension in the $t-\textit{th}$ generation, $\textit{Pbest}\ P_{ij}\left(t\right)$ means the value of individual optimal position of a particle in the $j-\textit{th}$ dimension, $\textit{Gbest}\ G_{j}\left(t\right)$ is the value of the global optimal position in the $j-\textit{th}$ dimension, $C_{j}\left(t\right)$ is the share-learning factor defined as:

$\displaystyle C\left(t\right)=\frac{1}{M}\mathop{\sum}\nolimits_{i=1}^{M}P_{i}% \left(t\right)$ (16)

where $M$ is the number of particles, $\omega$ controls the impact of the previous velocity on the current velocity, the purpose of designing the adaptive inertia weight is to balance the particles’ global and local search ability, $\omega$ can be defined as follows:

$\displaystyle\omega=\omega_{\textit{max}}-\left({\omega_{\textit{max}}-\omega_% {\textit{min}}}\right)\ast\frac{t-1}{\textit{MaxIt}-1}$ (17)

where MaxIt is the maximum iterations, $\omega_{\textit{max}}=0.9,\omega_{\textit{min}}=0.4$ are selected as initial and final value of inertia weight. $r_{1},r_{2},r_{3}$ are random values in the range [0,1]. Acceleration coefficients $c_{1},c_{2},c_{3}$ play an important role in improving the convergence rate. $c_{1},c_{2}$ are the two random constant values. $c_{3}$ represents the dynamically changeable weight of share-learning factor for increasing the opportunity of jumping out of the local optimum, which can be calculated as:

$\displaystyle c_{3}=1+\frac{t}{\textit{MaxIt}}$ (18)

3.3 Complexity analysis

Let $d$ be the number of features of instances, $m, n$ be the number of training and test bags respectively, $p$ be the mean number of instances in a bag, $s$ be the number of multi-objective functions, $t$ be population size. Two time-consuming of traditional MIMLRBF training process are the K-medoids clustering and calculating $\varphi_{ij}$ . The complexity of K-medoids clustering is $O((\textit{pm})^{2})$ , $\varphi_{ij}$ needs to calculate the distance of each prototype vector for each bag, and thus its complexity is also $O(d\ast(\textit{pm})^{2})$ . In all, the traditional MIMLRBF training process has the following complexity $O((d+1)\ast(\textit{pm})^{2})$ .

When MOPSO is applied in estimating the hyper parameters, the complexity of traditional MOPSO is $O(\textit{st}^{2})$ , average precision and average recall are also calculated by the test bags by MIMLRBF, so the complexity of parameter estimation by MOPSO is $O(d\ast(\textit{pn})^{2}\ast\textit{st}^{2})$ . To sum up, the training process of the proposed method has the following complexity $O((d+1)\ast(\textit{pm})^{2}+d\ast(\textit{pn})^{2}\ast\textit{st}^{2})$ .

4. Experimental results and discussion

4.1 Experimental datasets and evaluation index

The experiment of this paper will be tested on three public datasets, namely Reuters, Scene and Haloarcula marismortu [33]. The Reuters dataset with 2000 documents is derived from widely used Reuters-21578 collection. Each document contains a bag of instances, where each instance is a text segment of sliding window. Scene dataset contains 2000 natural scene images. A set of labels is manually assigned to each image like desert, mountains, sea, sunset and trees. Haloarcula marismortui is a kind of archaea genome, there are 304 proteins (examples) with a total of 234 gene ontology terms (label classes) on molecular function in the Haloarcula marismortui dataset. All the experiment are run on Matlab R2018a with Intel (R) core (TM) i5-7500 CPU@3.40 GHz , 4 GB memory, 64 bit operating system. The specific datasets information is shown in Table 2.

Table 2
Characteristics of the datasets

Dataset	Number of	Number of	Number of	Instances per bag			Labels per example(k)
	examples	classes	features	Min	Max	Mean $\pm$ std	$k=$ 1	$k=$ 2	$k\geqslant$ 3
Reuters	2000	7	243	2	26	3.56 $\pm$ 2.71	1701	290	9
Scene	2000	5	15	9	9	9.00 $\pm$ 0.00	1543	442	15
Haloarcula marismortui	304	234	216	2	7	3.13 $\pm$ 1.09	74	111	119

All the experiment is conducted by 10-fold cross validation. The first experiment helps to find which radial basis function is the most suitable for MIML model. The second experiment compares the method proposed in this paper with other MIML methods. The third experiment analyzes the parameters of improved MOPSO in our method. The fourth experiment compares the results of the proposed method with standard PSO and MOPSO during the process of optimization. The indicators including Hamming Loss, Ranking Loss, One error, Coverage are adopted for performance evaluation.

Hamming Loss indicator reflects the degree of misclassification of bags, i.e., a proper label is missed or a wrong label is predicted. Therefore, the smaller the index value, the better the learning model. It can be calculated as:

$\displaystyle\textit{Hamming Loss}=\frac{1}{m}\mathop{\sum}\nolimits_{i=1}^{m}% \frac{\left|{f(X_{i})\Delta Y_{i}}\right|}{M}$ (19)

where $m$ is the number of bags, $M$ represents the total number of all labels, $Y_{i}$ is recorded as the true label of $X_{i}$ , $f(X_{i})$ is the predicted label of $X_{i}$ , and $\Delta$ means the exclusive or (XOR) operation. Ranking Loss indicator is used to evaluate the value of the ranking error in the label ranking sequence where the ranking of irrelevant labels takes precedence over the related ones. The smaller the Ranking Loss means the better classification performance. It can be written as:

$\displaystyle\textit{Ranking Loss}=\frac{1}{m}\mathop{\sum}\nolimits_{i=1}^{m}% \frac{1}{\left|{Y_{i}}\right|\left|{\overline{Y_{i}}}\right|}\left|{\{\left({y% _{1},y_{2}}\right)|f\left({X_{i},y_{1}}\right)\leqslant f\left({X_{i},y_{2}}% \right),\left({y_{1},y_{2}}\right)\in Y_{i}\ast\overline{Y_{i}}\}}\right|$ (20)

where $\overline{Y_{i}}$ is the complementary sets of $Y_{i}$ . $f\left({X_{i},y}\right)$ returns a real-value indicating the confidence for $y$ to be a proper label of $X_{i}$ One error evaluates how many times the top-ranked label is not the correct label of the bag. The smaller the indicator value the better the performance.

$\displaystyle\textit{One error}=\frac{1}{m}\mathop{\sum}\nolimits_{i=1}^{m}% \left\{{\left[{\textit{arg}_{y\in Y}\textit{maxf}\left({X_{i},y}\right)}\right% ]\notin Y_{i}}\right\}$ (21)

Coverage evaluates the average value of how far it is needed to go down the list of labels in order to traverse all the proper labels of the bag. The smaller Coverage indicates the better performance

$\displaystyle\textit{Coverage}=\frac{1}{m}\mathop{\sum}\nolimits_{i=1}^{m}% \textit{max}_{y\in Y}\textit{rank}_{f\left({X_{i},y}\right)}-1$ (22)

where $\textit{rank}_{f\left({x_{i},y}\right)}$ returns the rank of $y$ derived form $f\left({X_{i},y}\right)$ .

4.2 Radial basis function selection in MIMLRBF

To test the performance of different RBFs mentioned in Table 1, according to the different parameters to be determined mentioned above, 5 RBFs are verified in the proposed algorithm on three datasets respectively. The parameters of MOPSO algorithm are set as follows: the number of particles is 50, the capacity of External archive is 25, the individual learning factor $c_{1}$ is 2, the group learning factor $c_{2}$ is 2, and the variation factor is 0.1, the maximum number of iterations is 40, the number of grids per dimension is 7. The performance of different RBFs on the three datasets is displayed as follows.

It can be found in Tables 3–5 that all the classification indicators of Inverse Multi-Quadric are the best on both Scene and Haloarcula marismortui. The Ranking loss, Coverage and One error indicators are the also excellent on the Reuters dataset, while the Hamming loss corresponding to Gaussian RBF is the smallest. It can be concluded that Inverse Multi-Quadric is more suitable to on the three tested MIML datasets. So Inverse Multi-Quadric is applied as RBF function in MIMLRBF in the following experiment.

Table 3
Results of different RBFs tested on reuters

RBF	HL	RL	Coverage	One error
Gauss	0.0332 $\pm$ 0.0008	0.0238 $\pm$ 0.0007	0.3125 $\pm$ 0.0053	0.0750 $\pm$ 0.0000
Markoff	0.1493 $\pm$ 0.0005	0.0672 $\pm$ 0.0089	0.5975 $\pm$ 0.0371	0.1475 $\pm$ 0.0159
Multi-Quadric	0.3904 $\pm$ 0.0018	0.3426 $\pm$ 0.0017	2.2050 $\pm$ 0.0212	0.5475 $\pm$ 0.0230
Inverse Multi-Quadric	0.0643 $\pm$ 0.0025	0.0191 $\pm$ 0.0005	0.2700 $\pm$ 0.0035	0.0625 $\pm$ 0.0018
Thin plate spline	0.4065 $\pm$ 0.0197	0.5931 $\pm$ 0.0818	3.9050 $\pm$ 0.5020	0.8225 $\pm$ 0.0053

Table 4

Results of different RBFs tested on scene

RBF	HL	RL	Coverage	One error
Gauss	0.3395 $\pm$ 0.0166	0.2415 $\pm$ 0.0134	1.2225 $\pm$ 0.0619	0.4175 $\pm$ 0.0053
Markoff	0.3120 $\pm$ 0.0163	0.2281 $\pm$ 0.0178	1.1575 $\pm$ 0.0725	0.3950 $\pm$ 0.0035
Multi-Quadric	0.5005 $\pm$ 0.0018	0.4622 $\pm$ 0.0289	1.9000 $\pm$ 0.0071	0.6925 $\pm$ 0.0053
Inverse Multi-Quadric	0.2870 $\pm$ 0.0028	0.1829 $\pm$ 0.0035	0.9800 $\pm$ 0.0354	0.3475 $\pm$ 0.0018
Thin plate spline	0.2950 $\pm$ 0.0290	0.4607 $\pm$ 0.0028	2.0000 $\pm$ 0.0141	0.7225 $\pm$ 0.0088

Table 5

Results of different RBFs on Haloarcula marismortui

RBF	HL	RL	Coverage	One error
Gauss	0.1307 $\pm$ 0.0045	0.5841 $\pm$ 0.0204	187.55 $\pm$ 11.233	0.6500 $\pm$ 0.0589
Markoff	0.1134 $\pm$ 0.0035	0.5157 $\pm$ 0.0223	157.65 $\pm$ 8.8503	0.7742 $\pm$ 0.0456
Multi-Quadric	0.7713 $\pm$ 0.0029	0.5604 $\pm$ 0.0260	184.52 $\pm$ 6.0222	0.7841 $\pm$ 0.0091
Inverse Multi-Quadric	0.0092 $\pm$ 0.0005	0.1974 $\pm$ 0.0259	71.919 $\pm$ 6.3978	0.5242 $\pm$ 0.0171
Thin plate spline	0.7080 $\pm$ 0.0010	0.6354 $\pm$ 0.0219	192.44 $\pm$ 4.2826	0.7060 $\pm$ 0.0430

4.3 Contrast experiments with other MIML algorithms

This part conducts the contrast experiment between the method proposed in this paper and other MIML methods. In this paper, Inverse Multi Quadric is selected as the RBF function, and its parameters are estimated by MOPSO method. Furthermore, 9 state-of-art methods KISAR, DMIMLSVM, M3MIML, MIMLBOOST, MIMLNN, MIMLRBF, MIMLSVM, MIMLFAST, MIMLWEL and other algorithms are selected for comparative test by 10-fold cross validation. The descriptions and parameters of these algorithms are shown in Table 6, at the same time, the evaluation metrics including Hamming Loss, Ranking Loss, Coverage, One Error and Training time results tested on the three dataset are displayed in Tables 7–9 respectively.

Table 6
Descriptions and parameters setting of other MIML methods

Algorithm	Description	Parameters setting
KISAR	‘Key Instance Sharing Among Related Labels’. It discovers what instances trigger what label.	The parameter set for liblinear: 500; the epsilon parameter for the algorithm: 1e-3; the maximum number of prototypes for k-means: 1000; maximum optimization iteration: 20.
M3MIML	‘Maximum Margin Method for MIML’. It exploits the connection between instance and labels by a maximum margin method.	svm.type: ‘Linear’; the tolerance value for lambda described: 1e-6; the tolerance value for difference between alpha: 1e-4; the cost parameter used in SVM: 1e-2, number of iterations: 50.
DMIMLSVM	‘Direct MIMLSVM’. It tackles MIML problems directly in a regularization framework.	$\mu$ is a parameter to trade off the discrepancy and commonness among the labels: 0; $\gamma$ is a regularization parameter balancing the model complexity and the empirical risk: 1e10; $\lambda$ balances empirical loss function involving two terms: 0.
MIMLBOOST	A degeneration based MIML method. It uses a multi-instance learning method (MIBOOST) as a bridge.	svm.type: ‘Linear’; the cost parameter used in SVM: 1; the number of boosting rounds: 5.
MIMLNN	‘MIML Neural Network’. It solves MIML problems by two stages of multilayer perception.	Ratio used to obtain number of clusters: 0.4; the regularization parameter used to compute matrix inverse: 0.5.
MIMLRBF	‘Radial basis function (RBF) neural networks for MIML’. It exploits the connections between instances and labels by Gaussian radial basis function.	Ratio used to obtain number of centroids: 0.080; the parameters used to determine the standard deviation of the Gaussian activation function: 0.6;
MIMLSVM	A degeneration based MIML method. It uses a multi-label learning method (MLSVM) as a bridge.	svm.type: ‘RBF’; the value of Gamma whose kernel is $\textit{exp}\left({-\textit{Gamma}\ast\left\|{x1-x2}\right\|^{2}}\right)$ : 1; the cost parameter used in SVM: 1.
MIMLFAST	‘Fast MIML Learning’. It provides an approximation and stochastic gradient descent to optimize original MIML.	dimension of the shared space: 100; norm of each vector: 10; number of iterations: 10; step size of SGD: 0.005; lambda: 1e-5; number of sub concepts: 5.
MIMLWEL	‘MIML Learning with weak label’. It is used for the weak label setting by assuming that highly relevant labels share some common instances.	Parameter controls the empirical loss on labeled data: 50; parameter controls the difference between learned training targets and original input training targets: 1; parameter controls the similarity between training bags and their prototypes: 2; iteration number: 20; ratio used to obtain number of centroids: 0.080; parameters used to determine the standard deviation of the Gaussian activation function: 1.

It can be seen from Tables 7–9 that the classification performance of the method proposed has achieved stable and excellent performance on the three datasets. Specifically, on the Reuters dataset, our algorithm achieves the best values in Ranking Loss and Coverage, while best Hamming Loss and One error are obtained by MIMLNN and MIMLFAST respectively. As for the Scene dataset, our algorithm performs excellent in Ranking Loss and Coverage, with the optimal values of the other two indicators achieved by KISAR. Except for the One error, on the Haloarcula marismortui datasets, our algorithm is the top performer according to the rest 3 indicators. In a word, compared with other MIML algorithms, our method is superior to other methods to some extent in classification performance The MIMLFAST consumes the least training time on Reuters and Scene datasets of all the algorithms while the MIMLNN performs best on Haloarcula marismortui. Efficiency is highly improved by optimizing the approximated ranking loss with SGD based on a two level linear model, as well discovering sub-concepts for complicated labels in a shared space in MIMLFAST. MIMLBOOST and DMIMLSVM degenerate the MIML into a series of Multi-Instance or Multi-Label tasks which result in the high complexity and low efficiency with the dramatic expansion of the hypothesis space. Ours spend much less time in training than those of M3MIML, DMIMLSVM and MIMLBOOST

Table 7

Results run on the reuters

	HL	RL	Coverage	One error	Training time(s)
KISAR	0.0404 $\pm$ 0.0004	0.1854 $\pm$ 0.0079	0.8450 $\pm$ 0.0500	0.0300 $\pm$ 0.0050	96.80 $\pm$ 11.37
M3MIML	0.0379 $\pm$ 0.0050	0.0231 $\pm$ 0.0056	0.3275 $\pm$ 0.0025	0.0625 $\pm$ 0.0075	20706 $\pm$ 273.1
DMIMLSVM	0.0512 $\pm$ 0.0005	0.0457 $\pm$ 0.0040	0.4506 $\pm$ 0.0312	0.1300 $\pm$ 0.0006	16211 $\pm$ 215.0
MIMLBOOST	0.1703 $\pm$ 0.0031	0.5105 $\pm$ 0.0044	2.6220 $\pm$ 0.0648	0.3009 $\pm$ 0.0003	16536 $\pm$ 315.3
MIMLNN	0.0347 $\pm$ 0.0040	0.0273 $\pm$ 0.0035	0.3175 $\pm$ 0.0225	0.0800 $\pm$ 0.0000	22.20 $\pm$ 0.32
MIMLRBF	0.0604 $\pm$ 0.0080	0.0415 $\pm$ 0.0020	0.3950 $\pm$ 0.0050	0.1350 $\pm$ 0.0010	19.68 $\pm$ 1.82
MIMLSVM	0.1750 $\pm$ 0.0150	0.2534 $\pm$ 0.0125	1.6825 $\pm$ 0.0875	0.5225 $\pm$ 0.0375	34.89 $\pm$ 0.51
MIMLFAST	0.3825 $\pm$ 0.0146	0.1083 $\pm$ 0.0036	0.5225 $\pm$ 0.0125	0.0100 $\pm$ 0.0100	2.99 $\pm$ 0.34
MIMLWEL	0.0986 $\pm$ 0.0022	0.5396 $\pm$ 0.0113	2.4725 $\pm$ 0.0175	0.0325 $\pm$ 0.0025	101.26 $\pm$ 4.32
OURS	0.0643 $\pm$ 0.0025	0.0191 $\pm$ 0.0005	0.2700 $\pm$ 0.0035	0.0625 $\pm$ 0.0018	1187 $\pm$ 34.62

Table 8

Results run on the scene

	HL	RL	Coverage	One error	Training time(s)
KISAR	0.1650 $\pm$ 0.0100	0.4863 $\pm$ 0.0000	1.2425 $\pm$ 0.1375	0.1525 $\pm$ 0.0325	19.57 $\pm$ 1.24
M3MIML	0.5190 $\pm$ 0.0069	0.2904 $\pm$ 0.0185	1.5050 $\pm$ 0.0867	0.4500 $\pm$ 0.0078	28015 $\pm$ 369.6
DMIMLSVM	0.2271 $\pm$ 0.0073	0.2437 $\pm$ 0.0039	1.2439 $\pm$ 0.0122	0.4283 $\pm$ 0.0139	40068 $\pm$ 45.00
MIMLBOOST	0.2467 $\pm$ 0.0008	0.5859 $\pm$ 0.0143	2.1342 $\pm$ 0.0048	2.3331 $\pm$ 0.1942	40870 $\pm$ 399.5
MIMLNN	0.2240 $\pm$ 0.0100	0.2367 $\pm$ 0.0088	1.1650 $\pm$ 0.0450	0.4200 $\pm$ 0.0200	17.02 $\pm$ 0.07
MIMLRBF	0.1805 $\pm$ 0.0088	0.1917 $\pm$ 0.0044	1.0225 $\pm$ 0.0159	0.3500 $\pm$ 0.0071	20.89 $\pm$ 0.33
MIMLSVM	0.3470 $\pm$ 0.0100	0.4125 $\pm$ 0.0555	2.1825 $\pm$ 0.0275	0.7550 $\pm$ 0.0450	31.75 $\pm$ 0.28
MIMLFAST	0.7575 $\pm$ 0.0015	0.7694 $\pm$ 0.0319	2.2375 $\pm$ 0.0325	0.3700 $\pm$ 0.1150	1.75 $\pm$ 0.04
MIMLWEL	0.2200 $\pm$ 0.0020	0.6732 $\pm$ 0.0036	1.6200 $\pm$ 0.0050	0.1725 $\pm$ 0.0075	485.09 $\pm$ 2.81
OURS	0.2870 $\pm$ 0.0028	0.1829 $\pm$ 0.0035	0.9800 $\pm$ 0.0354	0.3475 $\pm$ 0.0018	1767.5 $\pm$ 1.7678

Table 9

Results run on the Haloarcula marismortui

	HL	RL	Coverage	One error	Training time(s)
KISAR	0.0151 $\pm$ 0.0006	0.8493 $\pm$ 0.0263	195.21 $\pm$ 4.646	0.2452 $\pm$ 0.0319	1.7800 $\pm$ 0.0354
M3MIML	0.0205 $\pm$ 0.0001	0.2764 $\pm$ 0.0040	110.387 $\pm$ 1.1519	0.7419 $\pm$ 0.0342	26797 $\pm$ 353.55
DMIMLSVM	0.0133 $\pm$ 0.0003	0.4820 $\pm$ 0.0107	117.59 $\pm$ 0.2465	0.5876 $\pm$ 0.0103	80750 $\pm$ 2485.6
MIMLBOOST	0.0141 $\pm$ 0.0007	0.5140 $\pm$ 0.0116	131.64 $\pm$ 1.2632	0.4526 $\pm$ 0.0336	86276 $\pm$ 2491.5
MIMLNN	0.0107 $\pm$ 0.0010	0.2809 $\pm$ 0.0043	92.050 $\pm$ 1.5203	0.5833 $\pm$ 0.0354	0.6923 $\pm$ 0.0003
MIMLRBF	0.0165 $\pm$ 0.0023	0.2142 $\pm$ 0.0141	86.621 $\pm$ 2.9140	0.6226 $\pm$ 0.0160	0.8316 $\pm$ 0.0180
MIMLSVM	0.0163 $\pm$ 0.0008	0.2370 $\pm$ 0.0170	93.152 $\pm$ 4.4183	0.6076 $\pm$ 0.0418	1.8844 $\pm$ 0.0103
MIMLFAST	0.9828 $\pm$ 0.0006	0.8042 $\pm$ 0.0235	150.87 $\pm$ 3.3234	0.6167 $\pm$ 0.0354	3.8150 $\pm$ 0.0247
MIMLWEL	0.0167 $\pm$ 0.0003	0.6148 $\pm$ 0.0232	127.88 $\pm$ 3.2409	0.2667 $\pm$ 0.0000	65.757 $\pm$ 2.3561
OURS	0.0092 $\pm$ 0.0005	0.1974 $\pm$ 0.0259	71.919 $\pm$ 6.3978	0.5242 $\pm$ 0.0171	356.81 $\pm$ 11.466

4.4 Parameter selection of the proposed method

In this part, the proposed method is conducted with the different number of iterations. As is introduced in the Section 4.1, the maximum number of iterations in improved MOPSO is 40. Inverse Multi Quadric basis functions which have better results are selected to test the influence of different iteration times. The algorithm is tested by the setting iterations as 10,20,30,40,50. Hypervolume (HV) [25] is an important evaluation index for the performance of the objective optimization algorithm which is only calculated by Pareto Frontier. It can be calculated as:

$\displaystyle\textit{HV}\left(\textit{PF}\right)=\textit{Leb}\left(\cup_{x\in% \textit{PF}}\left[{f_{1}\left(x\right),z_{r}}\right]\ast\ldots\ast\left[{f_{m}% \left(x\right),z_{r}}\right]\right)$ (23)

$\textit{Leb}\left(.\right)$ represents Lebesgue measure, $z^{r}=(z_{1}^{r},\ldots z_{m}^{r})^{T}$ represents a reference point dominated by all the solutions on the Pareto Front. HV measures the hypervolume of the area enclosed by the solution set PF and the reference set $z^{r}$ of the multi-objective optimization algorithm to be evaluated in the target space. The larger the value of HV, the better the convergence and diversity of the algorithm. The trend of HV index with different iterations is illustrated in Fig. 3.

Figure 3.

The trend of HV index with different generations.

Figure 4.

The PFs with different generations on three datasets.

It can be seen that the results of HV on three datasets illustrates that the convergence and diversity of the improved MOPSO algorithm solutions have been improved with the increase of generations. In particular, when the number of generations is 40, the optimal HV values are obtained on three datasets, and with the generations increasing, HV tends to be stable.

Figure 4 shows the Pareto Frontiers (PFs) corresponding to the improved MOPSO at different generations on three datasets. PF with 40 generations is closer to the origin of the coordinate axis and searches a wider range of Pareto optimal solutions than with 10. It can be intuitively found that the trend of PF is consistent with the results in Fig. 3.

4.5 Comparison with standard PSO and MOPSO

In order to prove the superiority of our method, this paper intends to compare the proposed method with standard PSO [34] and MOPSO. In standard PSO, when obtaining the corresponding parameters of Inverse Multi-Quadric function, the optimization algorithm is replaced with singleobjective optimization method aiming at maximizing F-score. The specific fitness function is set as: F-score, the number of particles is 50, the individual learning factor $c_{1}$ is 2, the group learning factor $c_{2}$ is 2, the maximum number of iterations is 40, $\omega$ is 0.5. The standard MOPSO is also tested in this method, compared with the proposed method, share–learning strategy is not applied in standard MOPSO. The specific comparison of the three different optimization algorithms is shown in the Fig. 5.

Figure 5.

The comparisons with standard PSO and MOPSO on three datasets.

It can be seen from Fig. 5 that the classification effects of the proposed method on three datasets are significantly better than those of standard PSO and MOPSO. Only the One error on Haloarcula marismortui dataset is slightly lower than that of standard MOPSO. Therefore, compared with standard PSO and MOPSO method, parameter optimization by the proposed method can obtain better classification results in MIML model.

5. Conclusions

This paper organically combines share-learning based MOPSO model with MIMLRBF, and focuses on utilizing the improved MOPSO model to estimate the parameters of different RBFs, so as to make the classification results of the algorithm more accurate. In this paper different kinds of RBFs are embedded into the MIML model, parameter estimation of RBFs by MOPSO is proposed where Recall rate and Precision rate are chosen to obtain the most desirable Pareto optimal solution set share-learning factor is proposed to modify the particle velocity in MOPSO to improve the global search ability and group cooperative ability. In the future, multi-objective methods will be inserted into MIML models to obtain more important information other than hyper parameters only. Also, how to reduce the time complexity of multi-objective methods in MIML will be studied.

Footnotes

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant nos. 61976108 and 61572241, Natural Science Research Project of Colleges and Universities in Jiangsu Province under Grant nos. 19KJB520005, Humanities and Social Sciences project of the Ministry of education of China under Grant nos. 22YJC870001.

References

Nguyen

and Raich

, Incomplete Label Multiple Instance Multiple Label Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2022), 1320–1337.

Zhou

Z.-H.

and Zhang

M.-L.

, Multi-instance multi-label learning with application to scene classification, in: 20th Annual Conference on Neural Information Processing Systems, NIPS 2006, Neural information processing systems foundation, Vancouver, BC, Canada, 2007, pp. 1609–1616.

Zhou

Z.-H.

Zhang

M.-L.

Huang

S.-J.

and Li

Y.-F.

, Multi-Instance Multi-Label Learning, Artificial Intelligence 176 (2012), 2291–2320.

Yang

S.-H.

Zha

and Hu

B.-G.

, Dirichlet-bernoulli alignment: A generative model for multi-class multi-label multi-instance corpora, in: 23rd Annual Conference on Neural Information Processing Systems, NIPS 2009, Curran Associates Inc., Vancouver, BC, Canada, 2009, pp. 2143–2150.

Zha

Z.-J.

Hua

X.-S.

Mei

Wang

G.-J.

and Wang

, Joint multi-label multi-instance learning for image classification, in: 2008 IEEE Conference on Computer Vision and Pattern Recognition, Ancholage, USA, 2008, pp. 1–8.

Zhang

M.-L.

and Wang

Z.-J.

, MIMLRBF: RBF neural networks for multi-instance multi-label learning, Neurocomputing 72 (2009), 3951–3956.

Chen

Chi

and Feng

, Multi-instance multi-label image classification: A neural approach, Neurocomputing 99 (2013), 298–306.

Feng

and Zhou

Z.-H.

, Deep MIML Network, In: Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17), AAAI press, San Francisco, USA, 2017, pp. 1884–1890.

Yang

Zhou

J.T.

Cai

and Ong

Y.S.

, MIML-FCN+: Multi-Instance Multi-Label Learning via Fully Convolutional Networks with Privileged Information, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017, pp. 5996–6004.

10.

Huang

S.-J.

Gao

and Zhou

Z.-H.

, Fast Multi-Instance Multi-Label Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (2019), 2614–2627.

11.

Yang

S.-J.

Jiang

and Zhou

Z.-H.

, Multi-instance multi-label learning with weak label, in: Proceedings of the 27th AAAI Conference on Artificial Intelligences (AAAI’12), AAAI Press, Beijing, China, 2013, pp. 1862–1868.

12.

Y.F.

J.A.

Jiang

and Zhou

Z.H.

, Towards discovering what patterns trigger what labels, In: Proceedings of the 26th AAAI Conference on Artificial Intelligences (AAAI’12), AAAI Press, Toronto, Canada, 2012, pp. 1012–1018.

13.

Zhang

M.-L.

and Zhou

Z.-H.

, M3MIML: A Maximum Margin Method for Multi-instance Multi-label Learning, in: 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 2008, pp. 688–697.

14.

Pham

A.T.

Raich

Fern

X.Z.

and Arriaga

J.P.

, Multi-instance multi-label learning in the presence of novel class instances, in: 32nd International Conference on Machine Learning, ICML 2015, International Machine Learning Society (IMLS), Lile, France, 2015, pp. 2417–2425.

15.

Pham

A.T.

Raich

and Fern

X.Z.

, Dynamic Programming for Instance Annotation in Multi-Instance Multi-Label Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (2017), 2381–2394.

16.

Coello

C.A.C.

Pulido

G.T.

and Lechuga

M.S.

, Handling multiple objectives with particle swarm optimization, IEEE Transactions on Evolutionary Computation 8 (2004), 256–279.

17.

Hurtado

García-Nieto

Navas-Delgado

Nebro

A.J.

and Aldana-Montes

J.F.

, Reconstruction of gene regulatory networks with multi-objective particle swarm optimisers, Appl Intell 51 (2021), 1972–1991.

18.

Tan

Yuan

Huang

and Liu

, Low-carbon joint scheduling in flexible open-shop environment with constrained automatic guided vehicle by multi-objective particle swarm optimization, Applied Soft Computing 111 (2021), 107695.

19.

Wang

and Xiao

, Multi-objective particle swarm optimization based on cooperative hybrid strategy, Appl Intell 50 (2020), 256–269.

20.

Wang

Zhang

Wang

and You

, Grid search based multi-population particle swarm optimization algorithm for multimodal multi-objective optimization, Swarm and Evolutionary Computation 62 (2021), 100843.

21.

Xiong

Yang

Zhao

and Wang

, Evolutionary many-objective optimization algorithm based on angle and clustering, Appl Intell 51 (2021), 2045–2062.

22.

Wang

Zhang

You

and Tu

, Handling multimodal multi-objective problems through self-organizing quantum-inspired particle swarm optimization, Information Sciences 577 (2021), 510–540.

23.

Zou

Yang

and Zheng

, A two-archive algorithm with decomposition and fitness allocation for multi-modal multi-objective optimization, Information Sciences 574 (2021), 413–430.

24.

Luo

Huang

Yang

Wang

and Feng

, A many-objective particle swarm optimizer based on indicator and direction vectors for many-objective optimization, Information Sciences 514 (2020), 166–202.

25.

Zitzler

and Thiele

, Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach, IEEE Transactions on Evolutionary Computation 3 (1999), 257–271.

26.

and Shi

, Improvement of learning algorithm for the multi-instance multi-label RBF neural networks trained with imbalanced samples, Journal of Information Science and Engineering 29 (2013), 765–776.

27.

and Shi

, Weights optimization for multi-instance multi-label RBF neural networks using steepest descent method, Neural Comput & Applic 22 (2013), 1563–1569.

28.

Bao

Liu

and Wang

, Text Categorization by Multi-instance Multi-label and Momentum Stochastic Gradient Descent Strategy, in: 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence, ACM, Sanya China, 2020, pp. 1–4.

29.

Bao

Liu

Yang

and Wang

, Multi-instance Multi-label Text Categorization Algorithm Based on Multi-quadric Function Radial Basis Network Model, in: 2020 3rd International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 2020, pp. 133–136.

30.

Dong

and Wang

, Fast Multi-Objective Antenna Optimization Based on RBF Neural Network Surrogate Model Optimized by Improved PSO Algorithm, Applied Sciences 9 (2019), 2589.

31.

Qasem

S.N.

and Shamsuddin

S.M.

, Radial basis function network based on time variant multi-objective particle swarm optimization for medical diseases diagnosis, Applied Soft Computing 11 (2011), 1427–1438.

32.

Aggarwal

Ghoshal

Ankith

M.S.

Sinha

Ramakrishnan

Kar

and Jain

, Scalable optimization of multivariate performance measures in multi-instance multi-label learning, in: 31st AAAI Conference on Artificial Intelligence, (AAAI’17), AAAI press, San Francisco, USA, United states, 2017, pp. 1698–1704.

33.

J.-S.

Huang

S.-J.

and Zhou

Z.-H.

, Genome-Wide Protein Function Prediction through Multi-Instance Multi-Label Learning, IEEE/ACM Transactions on Computational Biology and Bioinformatics 11 (2014), 891–902.

34.

Kennedy

and Eberhart

, Particle swarm optimization, in: Proceedings of ICNN’95 – International Conference on Neural Networks, Perth, Australia, 1995, pp. 1942–1948.

A multi-instance multi-label learning algorithm based on radial basis functions and multi-objective particle swarm optimization

Abstract

Keywords

1. Introduction

2.2 Multi-objective optimization problem and MOPSO

3.1 Parameter estimation for RBFs in MIMLRBF with MOPSO

3.1.1 Embedding different RBFs into MIML model

4. Experimental results and discussion

4.1 Experimental datasets and evaluation index

Table 2 Characteristics of the datasets

Table 3 Results of different RBFs tested on reuters

Table 6 Descriptions and parameters setting of other MIML methods

Footnotes

Acknowledgments

References

Table 2
Characteristics of the datasets

Table 3
Results of different RBFs tested on reuters

Table 6
Descriptions and parameters setting of other MIML methods