Robust fused hypergraph neural networks for multi-label classification

Abstract

Deep neural networks have been adopted in multi-label classification for their excellent performance, however, existing methods fail to comprehensively utilize the high-order correlations between instances and the high-order correlations between labels, and these methods are difficult to deal with label noise effectively. We propose a novel end-to-end deep framework named Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN), which can effectively utilize the two kinds of high-order correlations and adopt them to mitigate the impact of label noise. In RFHNN, Hypergraph Neural Networks (HNNs) are adopted to mine and utilize the high-order correlations of the instances in the feature space and the label space respectively. The high-order correlations of the instances can not only improve the accuracy of the classification and the discrimination of the proposed model, but also lay the foundation for the subsequent noise correction module. Meanwhile, a hypergraph construction method based on the Apriori algorithm is proposed to realize Hypergraph Neural Networks (HNNs), which can mine robust second-order and high-order label correlations effectively. Effective classifiers are learned based on the correlations between the labels, which will not only improve the accuracy of the model, but can also enhance the subsequent noise correction module. In addition, we have designed a noise correction module in the networks. With the help of the high-order correlations among the instances and the effective classifier, the framework can effectively correct the noise and improve the robustness of the model. Extensive experimental results on datasets demonstrate that our proposed approach is better than the state-of-the-art multi-label classification algorithms. When dealing with the multi-label training datasets with noise in the label space, our proposed method also has great performance.

Keywords

Multi-label classification fused hypergraph neural network high-order label correlations noise correction robust classification framework

1 Introduction

Multi-label learning is one of the active topics in the field of machine learning and pattern recognition. In the multi-label learning framework, each instance is represented by a feature vector, while it may be associated with multiple category labels. The goal is to induce a function that is able to assign multiple proper labels (from a given label set) to unseen instances [36, 38]. With the introduction of the concept of multi-label learning, many scholars have carried out research on the basis of it and put forward a lot of effective algorithms.

Fig. 1

Illustration of the proposed RFHNN; The labels with wavy underlines are indeed existing but missing in the ground-truth, and the labels with straight underlines are mis-indexed labels in the ground-truth annotation. Hypergraph construction can capture the high-order correlations of instances in feature space and label space, then these high-order correlations will be adopted into the model with the help of the hypergraph neural networks (refer to the module in the blue dashed box in Fig. 1); In terms of label correlations, hypergraphs are built with Apriori algorithm, where each node denotes a label. The labels are represented by word embeddings, then stacked HNNs are learned over the label hypergraph to map these label representations into a set of inter-dependent object classifiers, which are applied to the image representation extracted from the input image for multi-label image recognition (refer to the module in the green dashed box in Fig. 1). The noise correction module is proposed based on the label noise transition matrix. With the help of the high-order correlations among the instances and effective classifiers achieved above, it can correct the noise and improve the robustness of the model (refer to the module in the red dashed box in Fig. 1).

Motivated by recent development in deep learning, more and more researchers begin to apply deep learning to multi-label classification problems. Researchers have applied various deep networks such as deep neural network [9, 35], deep convolutional neural network, recurrent neural network [2 , 23], graph convolutional neural network [3] and so on to multi-label classification. However, as observed, we can find that a certain label noise often exists in the multi-label datasets and it may degrade the performance of multi-label classification. The noise could make the labels of instances incorrect, and they can be divided into two cases: (1) several essential labels are missing, and (2) several labels are mislabeled. As shown in Fig. 1, it is obvious that there is no ’road’ in the picture located in row 1 column 1 and ’beach’ should exist instead, so ’road’ is a mis-indexed label while ’beach’ is a missing label. These kinds of label noise will increase the training errors of the classifier and decrease the accuracy of the multi-label classification. Meanwhile, the mining of label correlations is very important for multi-label classification. Though almost all the recent works on multi-label learning have taken label correlations into consideration, most of them just consider the second-order correlations (i.e., pairwise correlations) between labels. In fact, the correlations between labels are often more than second-order in multi-label classification problems. If just paying attention to the second-order correlations, it may lead to information loss and affect the accuracy of multi-label classification. How to effectively combine the label high-order correlations with the instance high-order correlations to alleviate label noise is an important issue. To deal with the above issue, we design an ingenious framework named Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN) based on Hypergraph Neural Networks (HNNs). The RFHNN can improve the accuracy of the multi-label classification and take the mining of label correlations and noise correction into consideration at the same time. In order to effectively mine the high-order label correlations and realize the anti-noise function, we need to effectively characterize the high-order correlations among labels and high-order correlations among instances. We choose hypergraph [18] and hypergraph neural network [7] to realize the characterization of these high-order correlations. RFHNN can be roughly divided into three modules, which correspond to the three dashed boxes in different colours (blue, green, red) in Fig. 1

Hypergraph Neural Networks (HNNs) (module in the blue dashed box in Fig. 1) are adopted to mine the high-order correlations of the instances in the feature space and the label space respectively. The high-order correlations of the instances in the feature space can enable the proposed RFHNN to obtain more high-quality image features, while the high-order correlations of instances in the label space can utilize the high-order structure to improve the discriminative ability. These two high-order correlations will play an important role in the noise correction part. Before introducing the specific noise correction module, we need to expound the label correlations mining part (module in the green dashed box in Fig. 1) first, as the label correlations mining can provide an effective classifier, which also make contributions in the noise correction module. Here we propose to utilize the association rule mining algorithm in data mining to mine the second-order and high-order label correlations. The strong association rules generated by the well-established Apriori algorithm can mine the most reliable label correlations. Then we combine the label correlations with the word-to-vector representation of the label text to learn an effective classifier, which can improve the accuracy of classification while also providing assistance for the following noise correction. We design an noise correction module to correct the noise in the label space (module in the red dashed box in Fig. 1). This noise correction module is based on the high-order correlations among instances and the classifier mentioned above. It completes the correction of the noise by learning a label noise transition matrix which represents the probability of being mislabeled between labels. With the help of the effective classifier and high-order structure of instances in feature and label spaces, we can learn a great label noise transition matrix. As a result, our model can effectively deal with the noise problem in the label space, which greatly improves the overall robustness of the model.

Therefore, the main contributions can be highlighted as follows:

(1) We propose a novel end-to-end deep framework for multi-label classification. This deep framework is based on Hypergrpah Neural Networks (HNNs), which can effectively integrate the high-order correlations of the instances in the feature space and the label space. The high-order correlations among instances will play an important role in improving model classification accuracy, discrimination and robustness.

(2) We propose a hypergraph construction method based on Apriori on multi-label datasets, which can mine label correlations (including high-order correlations) effectively. We conduct the corresponding hypergraph and then design the corresponding HNN based on it. The label correlations help us learn an effective classifier and work well in noise correction.

(3) We propose a noise correction module based on the label noise transition matrix, which can effectively reduce the influence of noise in label space. The high-order correlations among instances and the label correlations ensure the quality of the noise transition matrix. This module can reduce the impact of noise, thus it can effectively improve the whole robustness of our model.

The rest of this paper is organized as follows. Section 2 gives a brief review of related work and background. Then we formulate the problem and present the proposed approach in Section 3. We discuss experimental results in Section 4 and conclude in Section 5.

2 Related work and background

Multi-label learning algorithms [36, 38] can be divided into two broad categories: problem transformation methods (PTMs) and algorithm adaptation methods (AAMs). Due to the establishment of large-scale manual datasets and the rapid development of deep convolutional networks, the performance of image classification has recently been significantly improved. Many efforts have been devoted to extending deep convolutional networks for multi-label image recognition. Our method is mainly based on hypergraph neural network to deal with the problem of noise and mining label correlations in multi-label classification.

In this section, we will introduce the related work in multi-label classification with deep learning and multi-label classification with noise in label space. The basic knowledge of hypergraph and hypergraph neural network are also included here.

2.1 Multi-label classification with deep learning

Zhang and Zhou [35] proposed the BP-MLL method which is the first multi-label classification method based on DNN. BP-MLL transforms the multi-label classification problem into several binary classification problems and utilizes the loss function to mine the label correlations. Gong et al. [9] trained deep convolutional neural networks for multi-label classification with a ranking-based strategy. The weighted approximated-ranking loss is shown to work particularly well for multi-label annotation problems. Wang et al. [22] proposed the CNN-RNN method for multi-label classification. CNN-RNN uses networks to convert both images and labels into a common latent space, while the label correlations can be mined by LSTM. Yeh et al. [31] proposed a multi-label embedding method based on deep learning and canonical correlations analysis. It can solve the multi-label classification problem with large-scale data well by using the deep neural network for spatial transformation. Chen et al. [2] proposed a deep architecture that consists of two parts, the spatial transformer and the LSTM sub-network. The two components perform alternately to locate the attentional region and capture the dependency on them. Wang et al. [24] proposed a model based on recurrent neural network which consists of a visual attention model and a confidence-ranked LSTM. This method can get proper label sequence automatically and has great robustness in label prediction. Chen et al. [3] proposed the ML-GCN based on graph convolutional network. The method utilizes the word-embedding representations to learn a classifier, and then second-order label correlations is mined by the correlations matrix, which is built in a data-driven way. Wu et al. [25] proposed a high-order semantic learning model based on adaptive hypergraph neural networks (AdaHGNN). The adaptive hypergraph is constructed based on the label embedding, and the hypergraph neural network can improve the ability of the multi-label classification model. Tan et al. [20] proposed a hypergraph-induced convolutional network (HI-GCN) which is inspired by the nature of hypergraph coding high-order connections. The model can effectively mine higher-order label correlations with high adaptivity and scalability. Xu et al. [29] proposed the TSGCN, which learns from both input and output spaces for multi-label image classification. TSGCN is the multi-label classification model conducted by mapping spatial object correlations and semantic label correlations.

2.2 Multi-label classification with noise in label space

The noise that exists in the label space includes false-positive noise and false-negative noise. Multi-label classification with false positive-noise alone is equivalent to partial multi-label learning, and multi-label classification with false-negative noise alone is equivalent to weak-label learning. We will briefly explain these two aspects here.

Partial multi-label learning is extended from partial label learning [26 , 34], but the difficulty of partial multi-label learning is far greater than partial label learning, because there are multiple correct labels in the candidate label set for each instance in the partial multi-label learning framework. So mining the information of the complex label space effectively is the key to improving the performance of partial multi-label learning. For partial multi-label learning, not all the relevant labels in the given training label set are the really relevant labels in absolute truth. Xie and Huang [27] first introduced the concept of partial multi-label learning. They gave the new learning framework and its solutions. The method calculates the confidence for all the labels in the candidate set for each instance when performing the multi-label classification operation. Wang et al. [21] proposed the method of DRAMA which can be divided into two steps to solve the problem of partial multi-label learning. Firstly, the confidence of the labels corresponding to each instance is calculated, and then the confidence matrix and gradient superposition algorithm are used to build and optimize the model. Fang and Zhang [6] first used the propagation of label information to solve the problem of the confidence of the label matrix, and then they used the pairwise loss between the labels to build the model. Afterwards, the two methods named P-VLS and P-MAP are proposed based on different predictors.

For weak-label learning, the labels corresponding to some instances are incomplete, as not all irrelevant labels in the given training label set are the really irrelevant labels in absolute truth. Sun et al. [19] proposed a weak-label learning method named WELL which adopts the hypothesis of low-rank similarity matrix to mine the correlations among instances, and then completes the dissemination of the label information. Lin et al. [16] proposed a sparse reconstruction-based weak label learning method named LSR. This method gives an incomplete correlations matrix between instances and labels, then it uses instance correlations, label correlations, and correlations between instances and labels simultaneously with sparse constraints to complete the relation matrix between instances and labels. Xu et al. [30] proposed a weak-label learning algorithm named Maxide based on the low-rank matrix completion algorithm. This algorithm uses side information to reduce the observation information required for matrix completion, and can effectively complete missing label matrices in the state of weak-label. Dong et al. [4] proposed a weak-label learning method named SSWL based on ensemble learning, which considers feature space similarity and label space similarity to make up missing labels. This method integrates multiple different models through the collaborative regularization framework, which can improve the robustness of the model when the label information is insufficient.

Based on the above analysis, it can be found that the existing multi-label classification algorithms have paid attention to the label correlations from different aspects. However, most of the existing methods fail to effectively integrate and utilize the high-order label correlations and high-order instance correlations. Besides, how to flexibly improve the robustness of the model with the help of the above-mentioned higher-order correlations is also a key issue. Our proposed method can effectively utilize the high-order correlations of instances and high-order correlations of labels from the perspective of alleviating label noise to improve the accuracy and robustness of multi-label classification models, which is detailed in Section 3.

2.3 Hypergraph preliminaries

Let V denote a finite set of instances, and let E be a family of subsets e of V such that ∪_e∈E = V. G = (V, E) is then called a hypergraph with the vertex set V and the hyperedge set E. A hyperedge that contains just two vertices is just a simple graph edge. A weighted hypergraph is a hypergraph that has a positive number ω (e) which is associated with each hyperedge e called the weight of hyperedge e. We denote a weighted hypergraph by G = (V, E, ω). A hyperedge e is defined to be incident with a vertex v when v ∈ e. The degree of each vertex v ∈ V is defined as: $\begin{matrix} d (v) = \sum_{{e \in E rvert v \in e}} ω (e) \end{matrix}$ (1)

Let rvertSrvert denote the cardinality of a given arbitrary set S. The degree of a hyperedge e ∈ E is defined as: δ (e) = rvertervert. A hypergraph G can be represented by a rvertVrvert × rvertErvert matrix H with entries h (v, e) =1, if v ∈ e and 0 otherwise, called the incidence matrix of G. Based on matrix H, the degree of each vertex and each hyperedge can be calculated as: $\begin{matrix} d (v) = \sum_{e \in E} ω (e) h (v, e), δ (e) = \sum_{v \in V} h (v, e) \end{matrix}$ (2)

Let D_v and D_e denote the diagonal matrices containing the vertex and hyperedge degrees respectively, and let W denote the diagonal matrix containing the weights of hyperedges [37].

2.4 Hypergraph neural network

The hypergraph structure is more complicated than the ordinary graph structure, and it is more difficult to use the hypergraph structure properly. At present, most of the work with the hypergraph structure are based on the hypergraph laplacian matrix. The derived hypergraph laplacian matrix is a positive semi-definite matrix, which integrates the information of the hypergraph well and is widely used. Following the random walk model, Zhou et al. [37] proposed the following normalized hypergraph Laplacian L: $L_{h} = I - D_{v}^{- 1 / 2} {HWD}_{e}^{- 1} H^{T} D_{v}^{- 1 / 2}$ (3)

A hypergraph can describes the high-order correlations among instances. Feng et al. [7] proposed a hypergraph neural network framework which can deal with the the high-order correlations with the hypergraph for representation learning. Specifically, when there is a hypergraph signal X ∈ R^n×C₁ with n nodes and C₁ dimensional features, Y ∈ R^n×C₂ can be obtained after the following hyperedge convolution formulation: $Y = D_{v}^{- 1 / 2} HW D_{e}^{- 1} H^{⊤} D_{v}^{- 1 / 2} X Θ$ (4) where W = diag (w₁, w₂, …, w_n). Θ ∈ R^C₁×C₂ is the parameter to be learned during the training process. The filter Θ is applied over the nodes in the hypergraph to extract features.

3 Proposed approach

For multi-label classification, let $D = {(x_{i}, y_{i})}_{i = 1}^{n} = {X, Y}$ denote a set of d- dimensional training instances X ∈ R^m×d and the associated labels Y ∈ {0, 1} ^m×n, where m and n are the number of instances and label attributes respectively. The goal of multi-label classification algorithms is to train a predictor f : X → Y from D in the training stage, so that the label $\hat{y}$ of a test instance $\hat{x}$ can be predicted accordingly.

Based on the effective Hypergraph Neural Networks (HNNs) [7, 25], we propose the Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN). Our method mainly revolves around anti-noise and mining high-order label correlations and instance correlations. As we have mentioned in the section of Introduction, the noise in the label space reduces the accuracy of the classifier and causes interference to the mining of label correlations. We develop a novel hypergraph fusion method to deal with the challenge. We construct corresponding hypergraphs in both feature space and label space and then fuse the two hypergraph structures with networks. The high-order correlations of the instances in the feature space can effectively correct inaccurate correlations of instances in label space which is caused by noisy data (refer to the module in the blue and red dashed box in Fig. 1). At the same time, we also adopt the HNNs and Apriori algorithm to effectively mine the high-order label correlations to generate the classifier. This classifier utilizes the information of the label correlations and provides support for the anti-noise module (refer to the module in the green dashed box in Fig. 1).

We will describe the proposed model in the following three parts in detail: Fused Hypergraph Neural Networks (module in the blue dashed box), Multi-label Recognition with High-order correlations in label space (module in the green dashed box) and Hypergraph construction noise correction (module in the red dashed box). The specific practices are detailed as follows.

3.1 Fused hypergraph neural networks (Module in the blue dashed box)

Here we introduce the fused hypergraph neural networks module in our model. It is used is to merge the hypergraph networks which are based on the feature space information and the label space information. Mining and using the structure information in the feature space and the label space can make the constructed model more accurate and discriminative. The construction of the fused hypergraph networks neural module can be divided into two steps as described below.

Firstly, we construct hypergraphs in both feature space and label space. In feature space, different from the simple graph where each edge represents the vertex-to-vertex correlation, the incidence matrix H^f of a hypergraph describes the vertex-to-hyperedge correlation. To achieve this, we regard each instance as one vertex and try to generate a hyperedge for each vertex by following the method in [37]. More specifically, we generate the hyperedge e_i by the following formulation [40]: $e_{i} = {v_{j} rvert θ (x_{i}, x_{j}) \geq 0.1 σ_{i}}, i, j = 1, \dots, n$ (5) where θ (x_i, x_j) indicates a similarity measurement between x_i and x_j while σ_i is the average similarity between x_i and each of the other instances (For example, in Fig. 1, the images with black, red, yellow, and blue grids have the similar feature, so they share the same hyperedge).

For the instance structure in the feature space, we use a hypergraph matrix H^f to represent the high-order correlations among instances in the feature space. Based on the hypergraph matrix, we propose the corresponding hypergraph neural network. The main function of this hypergraph network is to use the high-order structure of the instance to learn better image features. Here we adopt X ∈ R^m×d with m nodes and d dimensional features to represent the initial deep feature extracted by CNN, then the hyperedge convolution can be formulated by: $\begin{matrix} X^{k + 1} = σ (D_{v}^{- 1 / 2} H^{f} {WD}_{e}^{- 1} (H^{f})^{⊤} D_{v}^{- 1 / 2} X^{k} Θ_{f}^{k}) \end{matrix}$ (6) where W = diag (w₁, w₂, …, w_n), D_v, D_e are the corresponding variables according to H^f. $Θ_{f}^{k} \in R^{d \times d}$ is the parameter to be learned during the training process, and σ (·) is a non-linear activation. After convolution, we can obtain the image feature X ∈ R^n×d that incorporates the high-order structural information of the feature space.

In label space, a hypergraph is built where each vertex corresponds to one training instance and each hyperedge for one label includes all the training instances that are relevant to the same label. (In Fig. 1, the black and red instances have the same label ’beach’, so they share the same hyperedge ’beach’). For the instance structure in the label space, we use a hypergraph matrix H^l = Y to represent the high-order correlations among instances in the label space. Based on the hypergraph matrix, we propose the corresponding hypergraph neural network. The main function of the hypergraph network here is to effectively use the high-order correlations of instances in the label space to improve the overall discrimination of the model. The hypergraph network that we propose here mainly acts on the results after classification. As a result, the high-order correlations of the instances in the label space will carry out effective label information dissemination. That is, the instances that are similar in the label space can get similar label sets, which will improve the prediction accuracy of our model. Here we adopt $\hat{Y} \in R^{m \times n}$ with m nodes and n dimensional features to represent the predicted labels, then the hyperedge convolution can be formulated by: ${\hat{Y}}^{k + 1} = σ (D_{v}^{- 1 / 2} H^{l} W D_{e}^{- 1} (H^{l})^{⊤} D_{v}^{- 1 / 2} {\hat{Y}}^{k} Θ_{l}^{k})$ (7) where W = diag (w₁, w₂, …, w_n), D_v, D_e are the corresponding variables according to H^l. $Θ_{l}^{k} \in R^{n \times n}$ is the parameter to be learned during the training process, and σ (·) is a non-linear activation. After convolution, we can obtain the prediction results $\hat{Y} \in R^{m \times n}$ that incorporates the high-order structural information of the label space.

Secondly, based on these two hypergraphs, we give the corresponding hypergraph neural networks. These two hypergraphs can make full use of the high-order correlations of the instances in the feature space and the label space. These two high-order correlations will act on the process of extracted deep features and the classified prediction labels, and lay the foundation for the following noise correction module (Section 3.3).

3.2 Multi-label recognition with high-order correlations in label space (Module in the green dashed box)

In this module, we first utilize Apriori algorithm to mine the second-order and high-order label correlations. Then we combine the label correlations with the word-to-vector representation of the label text to learn an effective classifier [3], which can improve the accuracy of classification while also providing assistance for the following noise correction in effect.

How to mine label correlations is very important in multi-label classification, as label correlations mined by the existing methods sometimes don’t have strong universality. This may decrease the robustness of multi-label classification. So we introduce the association rule mining algorithm in data mining to mine the second-order and high-order label correlations, which can alleviate the above problem. The association rule mining algorithm is an important branch of data mining. It generally uses the ’if-then’ logic rules to describe the rules and patterns of certain attributes in items from a large amount of data.

Our innovation focuses on the frequent item mining methods in data mining to mine co-occurrence correlations between labels and find the subsets of labels that often appear at the same time. We believe that there is a strong correlations among the labels that often appear at the same time. As a result, we create a second-order or high-order correlations for these labels to connect them. The Apriori algorithm is one of the most influential and widely used algorithms for mining frequent itemsets and association rules. This algorithm can generate strong association rules with frequent itemsets. When searching for frequent itemsets, the algorithm requires that all non-empty itemsets must be frequent. This step can filter out some co-occurrence noise in the multi-label datasets in advance, so it can make the second-order and high-order label correlations that we find be more robust and effective.

We take each label set for each instance as a data record, so we can mine the corresponding frequent co-occurrence label subset from these data, and then find the association rules. We set up a certain degree of confidence, and then generate some strong association rules. We regard these strong association rules as second-order or higher-order correlations, and then generate corresponding hypergraphs. Specifically, for the label matrix Y, we regard label sets for instances such as y₁, y₂, ⋯ , y_m as itemsets, and mine the frequent co-occurrence label subset that they contain. For example, (l₂, l₃, l₅) is a frequent label subset and then we can achieve a strong association rule based on it. As we repute that there is a high-order correlation among the three labels, we enclose the three labels with a hyperedge e₁. $(l_{2}, l_{3}, l_{5}) ⟶ e_{1}$ (8) We can achieve a hypergraph H^c after the above process. Then we use the hypergraph neural network to incorporate such second-order and high-order correlations into our model, which can greatly improve the performance of our model. Different label correlations have different importance in the model. Here we use the confidence degree of association rules corresponding to the higher-order correlations to reflect this importance. As a result, we need to add the confidence matrix C ∈ R^n×n to the convolution operation.

The labels are represented by Z ∈ R^n×d_wev (n is the number of label attributes and d_wev is the dimensionality of word-embedding vector). The convolution operation can be written as follows. $Z^{k + 1} = σ (D_{v}^{- 1 / 2} H^{c} W D_{e}^{- 1} {CH}^{c} D_{v}^{- 1 / 2} Z^{k} Θ_{c}^{k})$ (9) where C = diag (c₁, c₂, …, c_n) is the confidence matrix of association rules, which is also regarded as the weight of hyperedges. σ (·) is a non-linear activation. W = diag (w₁, w₂, …, w_n), D_v, D_e are the corresponding variables according to H^c. $Θ_{c}^{k} \in R^{d_{1} \times d_{2}}$ is the parameter to be learned during the training process. d₁ and d₂ are the dimensions of the middle layers of the networks. With the help of a multi-layer hypergraph neural network, we can get Classifer ∈ R^n×d from Z ∈ R^n×d_wev.

3.3 Hypergraph construction noise correction (Module in the red dashed box)

In multi-label application problems, there is often a certain amount of noise in the label space. They will affect the distribution of the label space, thus interfering with the training of the model. Therefore, effectively distinguishing and correcting the noise will greatly improve the accuracy of the learned model. This section designs a noise correction module (the module in the red dashed box in Fig. 1). The specific ideas are described as follows:

This section mainly designs the noise correction method based on the label transformation matrix T ∈ R^n×n, and combines it with the above two modules (Section 3.1 and 3.2). It is assumed here that the noise in the label space is class-conditional multi-label noise, that is, there will be a certain conversion correlations between the two labels (The probability of converting from y to $\tilde{y}$ can be expressed as $p (\tilde{y} rvert y)$ ), and this transformation correlations is independent of instance features [17]. This assumption is consistent with the noise that actually exists in the label space of multi-label datasets, because most labels in multi-label training datasets are manually annotated, and similar objects (label) are prone to be mislabeled in the manual labeling process.

Here we suppose that y^i* = 1 represents the instance x with the i class label, and ${\tilde{y}}^{j} = 1$ represents the instance x is labeled with the j class label under noise interference, and the conversion between the two can be done by the label transformation matrix $T_{ji}^{*} = p ({\tilde{y}}^{j} = 1 ∣ y^{i *} = 1)$ to achieve. The label conversion matrix here T ∈ R^n×n is an asymmetric matrix. $\begin{matrix} p ({\tilde{y}}^{j} = 1 ∣ x) \\ = \sum_{i} p ({\tilde{y}}^{j} = 1 ∣ y^{i *} = 1) p (y^{i *} = 1 ∣ x) \\ = \sum_{i} t_{ji}^{*} p (y^{i *} = 1 ∣ x) \end{matrix}$ (10) Suppose that the probability of the true label predicted by the classifier trained by the model is expressed as $\hat{p} (y^{i *} = 1 ∣ x, θ)$ . Here the model is modified with the help of the label transformation matrix T, which contributes to make the predicted distribution of the model match the label distribution of the noisy data. $\hat{p} ({\tilde{y}}^{j} = 1 ∣ x, θ, T) = \sum_{i} t_{ji} \hat{p} (y^{i *} = 1 ∣ x, θ)$ (11) The model with the added label transformation matrix consists of two parts: the basic prediction model (the parameter is θ) and the noise transformation matrix (the parameter is T), both of which need to be obtained by training the model. The transformation matrix T is directly introduced into the cross-entropy loss calculation, and the model is trained by maximizing the cross-entropy between the noisy label $\tilde{y}$ and the prediction value from the model.

The noise transition matrix T is the core of the noise correction module, and it is important to learn an accurate noise transition matrix. T can be approximated by the following two steps: ${\bar{x}}^{i} = \underset{x \in X}{argmax} \hat{p} ({\tilde{y}}^{i} = 1 rvert x)$ (12) $T_{ij} = \hat{p} ({\tilde{y}}^{j} = 1 rvert {\bar{x}}^{i})$ (13)

There are two issues that need to focus on here. The first point is that there is a hypergraph neural network before and after the noise correction module in the model separately. They have been described in detail in Section 3.1. These two hypergraph neural networks can effectively use the high-order correlations of instances in the feature space and the label space, and the high-order correlations of the instances in the feature space can effectively deal with noise in the label space. The second point is that we learn the noise transition matrix on the basis of the great classifier achieved in Section 3.2. We adopt the hypergraph neural network and apriori algorithm to obtain the classifier. On one hand, such a good classifier can help us get a better noise transition matrix, and on the other hand, it can also directly improve the accuracy of the classification of our proposed model.

From the above three parts, we can see that these three parts can be combined well with hypergraph neural networks, thereby they can greatly improve the classification accuracy, discriminability, and robustness of the model. These three parts are combined based on the hypergraph neural network to form an end-to-end model, and the first two parts (Section 3.1 and 3.2) lay the foundation for the noise correction part (Section 3.3). The most significant difference between our model and the existing models is that we make full use of the high-order correlations in the hypergraph neural networks, including the high-order correlations of the instances in the feature space and the label space, and the high-order correlations between the labels. We use the information in the training set as much as possible, which also determines that we will achieve a more powerful multi-label classification model.

3.4 Network training scheme

For the instance x_i, if the instance belongs to the jth label, then y_ij = 1; otherwise y_ij = 0. The entire network will be trained based on the cross-entropy loss function, and the specific loss function is expressed as follows: $\begin{matrix} L (y_{i}, \hat{y_{i}}) & = \sum_{j = 1}^{n} (y_{ij} \log (σ ({\hat{y}}_{ij})) \\ + (1 - y_{ij}) \log (1 - σ ({\hat{y}}_{ij}))) \end{matrix}$ (14) where σ (·) is the sigmod function.

The general steps of model training are mainly divided into three steps. First, the noise correction module (Section 3.3) is closed, and only the fused hypergraph neural network module (Section 3.1) and the multi-label classification module (Section 3.2) are trained. The main reason for doing so is that the training of the noise correction module needs to be based on a relatively mature classifier. Then, the parameters of the fused hypergraph neural network module and the multi-label classification module are fixed, and the parameters of the noise correction module are learned. Finally, all modules are turned on at the same time for training and fine-tuning to obtain the best classification result.

4 Experiments

4.1 Datasets and settings

To validate the proposed Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN), we download four benchmark datasets for experiments, i.e., MS-COCO [15], VOC2007 [5], IAPRTC-12 [10] and ESPGame [11]. The statistics of the four real world datasets are summarized in Table 1.

Table 1
Datasets properties

Dataset Domain Instances Labels Cardinality

MS-COCO image 122585 80 2.9

VOC2007 image 9963 20 2.5

IAPRTC-12 image 19627 291 5.7

ESPGame image 20770 268 4.6

Dataset	Domain	Instances	Labels	Cardinality
MS-COCO	image	122585	80	2.9
VOC2007	image	9963	20	2.5
IAPRTC-12	image	19627	291	5.7
ESPGame	image	20770	268	4.6

Here we need to introduce the general structure of our experimental part. Our experiments can be mainly divided into three parts. (1) The first part is about the mining and verification of high-order label correlations. The high-order label correlations mining and utilization is one of the key parts in our proposed method. In this section, we verify that the Apriori algorithm can indeed mine the high-order correlations between labels. (2) The second part is the experiments on datasets without noise in the label space. Our proposed RFHNN will be compared with the latest multi-label classification algorithms based on deep learning. The experiments in this part are based on two datasets MS-COCO and VOC2007, and we don’t add additional noise to the label space of these two datasets. (3) The third part is the experiments on datasets with noise in the label space. For the fairness of comparison and to reflect the advantages of our method in noise correction, we conduct experiments on the training datasets with noise in the label space.

In our experiments, we follow ML-GCN [3] to set the experiments parameters, and the CNN module uses the ResNet-101 which is pre-trained on ImageNet. A two-layer HGNN is adopted with output dimensions of 1024 and 2048 to generate classifiers in our model, while the hidden layer is set as 1024 in FHConv. For word-to-vector of labels, we follow ML-GCN to choose 300-dim GloVe which is trained on the Wikipedia dataset. We choose LeakyReLU as the nonlinear activation function. For network optimization, we adopt SGD as the optimizer with a learning rate of 0.01. The weight decay is set to be 0.0001 and the momentum is 0.9. We implement the network based on PyTorch, and all the experiments are performed on a 64-Bit Linux workstation with an Intel E5-2650 CPU, an NVIDIA Titan X Pascal card, and 256GB memory.

In the multi-label learning problem, since each instance may have multiple category labels at the same time, the single-label evaluation metrics that are commonly used in traditional supervised learning, such as accuracy, precision, and recall, cannot be directly used for the performance evaluation of the multi-label learning system. Therefore, researchers have successively proposed a series of multi-label evaluation metrics. Here we consider four evaluation metrics, i.e., Macro-F1, Micro-F1, Average precision, and Hamming loss, which are widely used in multi-label learning to evaluate the prediction performance. Based on the symbolic representation in the definition of the problem, we denote Y_i as the set of related labels belonging to the instance x_i, then in order to characterize the binary classification performance of the predictors on each label, four basic quantities related to the test instance are usually used: TP_j (true positive), FP_j (false positive), TN_j (true negative) and FN_j (false negative). $\begin{matrix} Macro - P (CP) = \frac{1}{n} \sum_{j = 1}^{n} \frac{{TP}_{j}}{{TP}_{j} + {FP}_{j}} \end{matrix}$ (15) $\begin{matrix} Macro - R (CR) = \frac{1}{n} \sum_{j = 1}^{n} \frac{{TP}_{j}}{{TP}_{j} + {FN}_{j}} \end{matrix}$ (16) $\begin{matrix} Macro - F 1 (CF 1) = \frac{2 \times CP \times CR}{CP + CR} \end{matrix}$ (17) $\begin{matrix} Micro - P (OP) = \frac{\sum_{j = 1}^{n} {TP}_{j}}{\sum_{j = 1}^{n} {TP}_{j} + \sum_{j = 1}^{n} {FP}_{j}} \end{matrix}$ (18) $\begin{matrix} Micro - R (OR) = \frac{\sum_{j = 1}^{n} {TP}_{j}}{\sum_{j = 1}^{n} {TP}_{j} + \sum_{j = 1}^{n} {FN}_{j}} \end{matrix}$ (19) $\begin{matrix} Micro - F 1 (OF 1) = \frac{2 \times OP \times OR}{OP + OR} \end{matrix}$ (20) $\begin{matrix} Average precision = \frac{1}{m} \sum_{i = 1}^{m} \frac{1}{∥ Y_{i} ∥} \sum_{y \in Y_{i}} \\ \frac{∥ y^{'} ∥ {rank}_{f} (x, y^{'}) \leq {rank}_{f} (x_{i}, y), y^{'} \in Y_{i} ∥}{{rank}_{f} (x_{i}, y)} \end{matrix}$ (21)

Here, rank_f is the ranking function corresponding to the real-valued function f (·), and h (·) is the multi-label classifier we learned.

The mean average precision (mAP) [22] adopted in Section 4.3 is different from the Average precision adopted in Section 4.4. The mean average precision (mAP) is widely adopted in the evaluation of multi-label image classification and image retrieval, which is detailed as follows: $AP (l_{i}) = \frac{1}{m_{l_{i}}} \sum_{j = 1}^{m} P_{l_{i}} (j) \times (R_{l_{i}} (j) - R_{l_{i}} (j - 1))$ (22) $mAP = \frac{1}{n} \sum_{i = 1}^{n} AP (l_{i})$ (23) where m_{l
_i} represents the number of all instances associated with the label l_i, m represents the total number of instances in the dataset, j represents the serial number of the real-valued output of all instances under the classifier sorted in descending order, P_{l
_i} (j) and R_{l
_i} (j) represent the precision and recall corresponding to the first j instances.

For Macro-F1, Micro-F1, Average precision and mean average precision (mAP), the larger the values, the better the performance [36].

4.2 The high-order correlations mined by Apriori

From the previous analysis, we can see that the high-order label correlations are crucial to the improvement of the accuracy of the multi-label classification algorithm. Our proposed method adopts the mature Apriori algorithm to indirectly realize the mining of label correlations by mining frequent itemsets, and then integrates the high-order label correlations into the learning process with the hypergraph neural network. In the whole process, the key point is whether we can successfully mine accurate high-order label correlations by Apriori algorithm.

In order to demonstrate that our method can effectively mine high-order correlations, here we use Apriori algorithm to mine the correlations among the labels in dataset IAPRTC-12. On different support levels, we can see that the high-order correlations mined are different in Table 2. For example, when we set support as 0.01, we can get the freqitemsets {bike, cycling, helmet, jersey, short, cyclist} and the six labels often appear at the same time. To some extent, this explains that there is a wide range in high-order correlations among labels. These high-order correlations contain rich information and are of great help to multi-label classification. In addition, we introduce the association rules generated under different confidence levels in Table 3. From our experiments, we can see that the Apriori algorithm can indeed mine the label correlations from label space, and from the semantic information of the labels, the label correlations correlations mined are reasonable.

Table 2
The FreqItemsets mined by Apriori under different support

Support Number of FreqItemsets Select FreqItemset

1 2 3 4 5 6

0.050 23 8 - - - - {building, sky}

0.015 95 109 17 2 - - {cycling, cyclist, jersey, short}

0.010 131 218 75 35 21 7 {bike, cycling, helmet, jersey, short, cyclist}

0.005 210 606 327 121 5 - {bed, curtain, room, wall, window}

0.001 291 2631 - - - - {adult, child}

Support	Number of FreqItemsets	Select FreqItemset
0.050	23	8	-	-	-	-	{building, sky}
0.015	95	109	17	2	-	-	{cycling, cyclist, jersey, short}
0.010	131	218	75	35	21	7	{bike, cycling, helmet, jersey, short, cyclist}
0.005	210	606	327	121	5	-	{bed, curtain, room, wall, window}
0.001	291	2631	-	-	-	-	{adult, child}

Table 3

The associate rules mined by Apriori under different confidence

Confidence	Number of Associate Rules					Select Rule
	2	3	4	5	6
0.90	4	66	162	214	143	{bike, cycling, helmet, jersey, short}↦{cyclist}(100%)
0.70	17	131	313	427	296	{square, street}↦{lamp}(85.990%)
0.50	64	221	409	556	399	{park}↦{tree}(62.500%)
0.30	155	294	470	615	428	{room, wall}↦{window}(34.411%)
0.10	336	385	490	630	434	{beach}↦{sea, sky}(29.552%)

4.3 Prediction performance on datasets without noise in label space

To validate the proposed Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN) on datasets without noise in label space, we consider the following evaluation metrics, i.e., average per-class precision (CP), recall (CR), F1 (CF1), average overall precision (OP), recall (OR), F1 (OF1) and mean average precision (mAP), which are detailed in Section 4.1.

In this part of the experiment, we compare our proposed approach with the following state-of-art multi-label classification methods: A Unified Framework for Multi-Label Image Classification (CNN-RNN) [22], Order-Free Rnn with Visual Attention for Multi-Label Classification (Order-Free RNN) [2], Multi-Label Image Recognition by Recurrently Discovering Attentional Regions (RNN-Attention) [23], Multi-Label Zero-Shot Learning with Structured Knowledge Graphs (ML-ZSL) [13], Learning Spatial Regularization with Imagelevel Supervisions for Multi-Label Image Classification (SRN) [39], Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning (Multi-Evidence) [8], Multi-Label Image Recognition with Graph Convolutional Networks (ML-GCN) [3] (both Binary and Re-weighted schemes) and Joint Input and Output Space Learning for Multi-Label Image Classification (TSGCN) (both Binary and Re-weighted schemes) [29].

Table 4 illustrates and compares the performances of the above methods on MS-COCO [15]. From the experimental results, we can draw the following interesting observations. The proposed RFHNN outperforms most of the state-of-art methods under the seven metrics, which validates our theoretical results. We can see that the proposed method is significantly better than the current CNN based methods, which proves that our proposed RFHNN can mine high-order label correlations accurately and improve the performance of classification. Especially when comes to the comparison between our method and ML-GCN, we can dig out and use more high-order information by hypergraph structure in the training dataset. This high-order information fits well with the characteristics of label correlations in multi-label learning, which can better help us perform multi-label classification.

Table 4
Quantitative results by our proposed method and compared methods on MS-COCO validation set

Methods All Top-3

mAP CP CR CF1 OP OR OF1 CP CR CF1 OP OR OF1

CNN-RNN 61.2 - - - - - - 66.0 55.6 60.4 69.2 66.4 67.8

RNN-Attention - - - - - - - 79.1 58.7 67.4 84.0 63.0 72.0

Order-Free RNN - - - - - - - 71.6 54.8 62.1 74.2 62.2 67.7

ML-ZSL - - - - - - - 74.1 64.5 69.0 - - -

SRN 77.1 81.6 65.4 71.2 82.7 69.9 75.8 85.2 58.8 67.4 87.4 62.5 72.9

ResNet-101 77.3 80.2 66.7 72.8 83.9 70.8 76.8 84.1 59.4 69.7 89.1 62.8 73.6

Multi-Evidence - 80.4 70.2 74.9 85.2 72.5 78.4 84.5 62.2 70.6 89.1 64.3 74.7

MLGCN (Binary) 80.3 81.1 70.1 75.2 83.8 74.2 78.7 84.9 61.3 71.2 88.8 65.2 75.2

MLGCN (Reweight) 83.0 85.1 72.0 78.0 85.8 75.4 80.3 89.2 64.1 74.6 90.5 66.5 76.7

TSGCN (Binary) 83.5 81.5 72.3 76.7 84.9 75.3 79.8 84.1 67.1 74.6 89.5 69.3 78.1

TSGCN (Reweight) 83.7 83.9 73.1 78.1 85.6 75.9 80.5 85.4 66.9 75.0 89.9 68.6 77.8

RFHNN 83.4 84.8 73.5 78.7 86.3 74.9 80.2 90.2 63.7 74.7 91.3 64.2 75.4

Methods	All	Top-3
CNN-RNN	61.2	-	-	-	-	-	-	66.0	55.6	60.4	69.2	66.4	67.8
RNN-Attention	-	-	-	-	-	-	-	79.1	58.7	67.4	84.0	63.0	72.0
Order-Free RNN	-	-	-	-	-	-	-	71.6	54.8	62.1	74.2	62.2	67.7
ML-ZSL	-	-	-	-	-	-	-	74.1	64.5	69.0	-	-	-
SRN	77.1	81.6	65.4	71.2	82.7	69.9	75.8	85.2	58.8	67.4	87.4	62.5	72.9
ResNet-101	77.3	80.2	66.7	72.8	83.9	70.8	76.8	84.1	59.4	69.7	89.1	62.8	73.6
Multi-Evidence	-	80.4	70.2	74.9	85.2	72.5	78.4	84.5	62.2	70.6	89.1	64.3	74.7
MLGCN (Binary)	80.3	81.1	70.1	75.2	83.8	74.2	78.7	84.9	61.3	71.2	88.8	65.2	75.2
MLGCN (Reweight)	83.0	85.1	72.0	78.0	85.8	75.4	80.3	89.2	64.1	74.6	90.5	66.5	76.7
TSGCN (Binary)	83.5	81.5	72.3	76.7	84.9	75.3	79.8	84.1	67.1	74.6	89.5	69.3	78.1
TSGCN (Reweight)	83.7	83.9	73.1	78.1	85.6	75.9	80.5	85.4	66.9	75.0	89.9	68.6	77.8
RFHNN	83.4	84.8	73.5	78.7	86.3	74.9	80.2	90.2	63.7	74.7	91.3	64.2	75.4

Table 5 illustrates and compares the performances of the above methods on MS-COCO [15]. Following [3, 22], we use the training set to train our model, and evaluate the recognition performance on the test set. In order to compare with other state-of-the-art methods, we compare the results of average precision (AP) and mean average precision (mAP). The results on VOC2007 are presented in Table 5, and the results of many previous works on VOC2007 are based on the VGG model. For fairness, we also report the results that use VGG models as the base model. It is clear to see that our proposed method obtains improvements over the previous methods. Concretely, the proposed RFHNN obtains 94.6% mAP, which outperforms ML-GCN (Re-weight) by 0.3% and outperforms TSGCN (Re-weight) by 0.6%. Also, it achieves improvement on 9 out of the 20 labels. The dataset VOC2007 contains only 20 labels, and the number of tags is relatively small, so the high-order information among labels is relatively insufficient. Nevertheless, the existing high-order information also make help for our results, which fully illustrates the importance of high-order information.

Table 5

Comparisons of AP and mAP with state-of-art methods on the VOC 2007 dataset

Methods	aero	bike	bird	boat	battle	bus	car	cat	chair	cow	mAP
CNN-RNN	96.7	83.1	94.2	92.8	61.2	82.1	89.1	94.2	64.2	83.6	84.0
RLSD	96.4	92.7	93.8	94.1	71.2	92.5	94.2	95.7	74.3	90.0	88.5
VeryDeep	98.9	95.0	96.8	95.4	69.7	90.4	93.5	96.0	74.2	86.6	89.7
ResNet-101	99.5	97.7	97.8	96.4	65.7	91.8	96.1	97.6	74.2	80.9	89.9
FeV+LV	97.9	97.0	96.6	94.6	73.6	93.9	96.5	95.5	73.7	90.3	90.6
HCP	98.6	97.1	98.0	95.6	75.3	94.7	95.8	97.3	73.1	90.2	90.9
RNN-Attention	98.6	97.4	96.3	96.2	75.2	92.4	96.5	97.1	76.5	92.0	91.9
Atten-Reinforce	98.6	97.1	97.1	95.5	75.6	92.8	96.8	97.3	78.3	92.2	92.0
MLGCN (Binary)	99.6	98.3	97.9	97.6	78.2	92.3	97.4	97.4	79.2	94.4	93.1
MLGCN (Reweight)	99.5	98.5	98.6	98.1	80.8	94.6	97.2	98.2	82.3	95.7	94.0
TSGCN (Binary)	99.3	98.1	96.3	95.5	86.4	94.6	97.2	97.1	85.4	92.1	94.2
TSGCN (Reweight)	98.9	98.5	96.8	97.3	87.5	94.2	97.4	97.7	84.1	92.6	94.3
RFHNN	99.1	97.8	97.4	98.6	82.1	92.9	97.6	98.3	82.0	96.2	94.6
Methods	table	dog	horse	motor	person	plant	sheep	sofa	train	tv	mAP
CNN-RNN	70.0	92.4	91.7	84.2	93.7	59.8	93.2	75.3	99.7	78.6	84.0
RLSD	74.2	95.4	96.2	92.1	97.9	66.9	93.5	73.7	97.5	87.6	88.5
VeryDeep	87.8	96.0	96.3	93.1	97.2	70.0	92.1	80.3	98.1	87.0	89.7
ResNet-101	85.0	98.4	96.5	95.9	98.4	70.1	88.3	80.2	98.8	89.2	89.9
FeV+LV	82.8	95.4	97.7	95.9	98.6	77.6	88.7	78.0	98.3	89.0	90.6
HCP	80.0	97.3	96.1	94.9	96.3	78.3	94.7	76.2	97.9	91.5	90.9
RNN-Attention	98.6	97.4	96.3	96.2	75.2	92.4	96.5	97.1	76.5	92.0	91.9
Atten-Reinforce	98.6	97.1	97.1	95.5	75.6	92.8	96.8	97.3	78.3	92.2	92.0
MLGCN (Binary)	99.6	98.3	97.9	97.6	78.2	92.3	97.4	97.4	79.2	94.4	93.1
MLGCN (Reweight)	99.5	98.5	98.6	98.1	80.8	94.6	97.2	98.2	82.3	95.7	94.0
TSGCN (Binary)	89.7	98.2	96.5	96.9	98.9	85.1	96.7	87.5	98.6	93.8	94.2
TSGCN (Reweight)	89.3	98.4	98.0	96.1	98.7	84.9	96.6	87.2	98.4	93.7	94.3
RFHNN	98.3	98.8	98.4	98.6	80.1	92.9	97.6	98.3	92.5	95.8	94.6

4.4 Prediction performance on datasets with noise in label space

To validate the proposed Robust Fused Hypergraph Neural Networks for Multi-label Classification (RFHNN) on datasets with noise in label space, we consider three evaluation metrics, i.e., Macro-F1, Micro-F1 and Average precision, which are widely-used in multi-label learning to evaluate the prediction performance of all the methods. The definitions of the three metrics can be found in Section 4.1 [36].

In this part of the experiment, we compare our proposed approach with the following state-of-art multi-label classification methods: Multi-Label Image Recognition with Graph Convolutional Networks (ML-GCN) [3], Multi-Label Learning with Global and Local Label Correlations (GLOCAL) [41], Partial Multi-Label Learning with Noisy Label Identification (PML-NI) [28], Multi-label Manifold Learning (ML2) [12] and Learning Deep Latent Spaces for Multi-Label Classification (C2AE) [31]. We also report the results of the baseline algorithm, Binary Relevance (BR) [1]. Besides, we have added a set of ablation experiments here to verify the ability of the anti-noise of our method. Classifier based on Hypergraph neural networks (CHNN) is proposed by omitting Fused Hypergraph Neural Networks (Module in the blue dashed box, Section 3.1) and Hypergraph Construction Noise Correction (Module in the red dashed box, Section 3.3) on the basis of RFHNN.

For the Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN) that we proposed, there are currently no ready-made datasets with label noise for us to use, so we designed the datasets to verify the effectiveness of our proposed method [14, 32]. We selected two commonly used natural datasets i.e., IAPRTC-12 and ESPGame. These datasets have been divided into training and test sets in advance. For each dataset, to simulate false-positive noise and false-negative noise at the same time, we will randomly add some labels to the label matrix (randomly set negative labels (0) to positive (1)) to simulate the case false-positive noise, while we will randomly remove some labels from the label matrix (randomly set positive labels (1) to negative (0)) [32] to simulate the case of false-negative noise. Specifically, we make noise datasets according to the proportion of the total number of labels. In almost all commonly used multi-label datasets, the number of relevant labels for each instance is much smaller than the number of irrelevant labels [19]. In the experiment, the proportion of false-positive noise we added was 10%, 20%, and the proportion of false-negative noise was 10%, 30%. We used them in pairs so the experimental verification was carried out in four different situations (20% -50%). The dataset we designed is close to the actual scenario and can effectively verify the performance of our proposed method. For datasets of image, we extract 4096-dimensional deep features by using the ResNet-101 which is pre-trained on ImageNet, we didn’t perform the fine-tuning for fairness and computational efficiency.

Table 6 gives the illustrations of multi-label classification performance under the situation of noisy labels on the two datasets. The table can be divided into three parts longitudinally, which corresponds to three different evaluation indicators, i.e., Macro-F1, Micro-F1, and Average precision. The Noise column in the table refers to the scale of the noise of training datasets. From the experimental results, we can draw the following interesting observations:

Table 6
Quantitative results on several datasets, where the best ones are in bold

Micro-F1

datasets Noise RFHNN CHNN MLGCN Glocal PML-NI ML2 C2AE BR

IAPRTC-12 20% 0.2169 0.2048 0.2044 0.1187 0.0475 0.1837 0.0728 0.1908

30% 0.1988 0.1855 0.1701 0.0937 0.0434 0.1143 0.0592 0.2526

40% 0.1989 0.1964 0.1871 0.0371 0.0482 0.1135 0.0772 0.0902

50% 0.1808 0.1518 0.1520 0.0311 0.0444 0.1232 0.0607 0.1209

ESPGame 20% 0.2183 0.1408 0.1695 0.0914 0.0419 0.1268 0.0759 0.1750

30% 0.1954 0.1694 0.1761 0.0890 0.0385 0.0952 0.0529 0.2132

40% 0.1837 0.1590 0.1453 0.0404 0.0427 0.0715 0.0561 0.0806

50% 0.1610 0.1164 0.1485 0.0168 0.0393 0.1054 0.0631 0.1143

Macro-F1

datasets Noise RFHNN CHNN MLGCN Glocal PML-NI ML2 C2AE BR

IAPRTC-12 20% 0.1983 0.1836 0.1708 0.0840 0.0472 0.1778 0.0467 0.1409

30% 0.1909 0.1805 0.1564 0.0557 0.0430 0.1052 0.0351 0.1992

40% 0.1933 0.1563 0.1628 0.0260 0.0478 0.1152 0.0445 0.0610

50% 0.1845 0.1121 0.1175 0.0219 0.0440 0.1063 0.0306 0.0838

ESPGame 20% 0.2080 0.1951 0.1739 0.0690 0.0418 0.1249 0.0283 0.1353

30% 0.1833 0.1526 0.1563 0.0426 0.0384 0.1015 0.0315 0.1648

40% 0.1719 0.1478 0.1492 0.0300 0.0426 0.1098 0.0371 0.0596

50% 0.1591 0.1507 0.1397 0.0136 0.0392 0.1009 0.0291 0.0867

Average precision

datasets Noise RFHNN CHNN MLGCN Glocal PML-NI ML2 C2AE BR

IAPRTC-12 20% 0.3377 0.2766 0.2347 0.3399 0.3343 0.2852 0.0855 0.1463

30% 0.3073 0.2736 0.2386 0.2767 0.2358 0.2061 0.0773 0.1848

40% 0.3015 0.2451 0.2529 0.3169 0.3046 0.2322 0.0879 0.0929

50% 0.2918 0.2103 0.2413 0.2680 0.2156 0.1975 0.0718 0.1076

ESPGame 20% 0.3293 0.2853 0.2208 0.2987 0.3034 0.2476 0.0860 0.1472

30% 0.2989 0.2597 0.2513 0.2427 0.2137 0.1663 0.0747 0.1671

40% 0.2774 0.2528 0.2129 0.2581 0.2778 0.2070 0.0660 0.0950

50% 0.2792 0.2473 0.2647 0.2362 0.1985 0.1645 0.0841 0.1136

Micro-F1
IAPRTC-12	20%	0.2169	0.2048	0.2044	0.1187	0.0475	0.1837	0.0728	0.1908
	30%	0.1988	0.1855	0.1701	0.0937	0.0434	0.1143	0.0592	0.2526
	40%	0.1989	0.1964	0.1871	0.0371	0.0482	0.1135	0.0772	0.0902
	50%	0.1808	0.1518	0.1520	0.0311	0.0444	0.1232	0.0607	0.1209
ESPGame	20%	0.2183	0.1408	0.1695	0.0914	0.0419	0.1268	0.0759	0.1750
	30%	0.1954	0.1694	0.1761	0.0890	0.0385	0.0952	0.0529	0.2132
	40%	0.1837	0.1590	0.1453	0.0404	0.0427	0.0715	0.0561	0.0806
	50%	0.1610	0.1164	0.1485	0.0168	0.0393	0.1054	0.0631	0.1143
Macro-F1
datasets	Noise	RFHNN	CHNN	MLGCN	Glocal	PML-NI	ML2	C2AE	BR
IAPRTC-12	20%	0.1983	0.1836	0.1708	0.0840	0.0472	0.1778	0.0467	0.1409
	30%	0.1909	0.1805	0.1564	0.0557	0.0430	0.1052	0.0351	0.1992
	40%	0.1933	0.1563	0.1628	0.0260	0.0478	0.1152	0.0445	0.0610
	50%	0.1845	0.1121	0.1175	0.0219	0.0440	0.1063	0.0306	0.0838
ESPGame	20%	0.2080	0.1951	0.1739	0.0690	0.0418	0.1249	0.0283	0.1353
	30%	0.1833	0.1526	0.1563	0.0426	0.0384	0.1015	0.0315	0.1648
	40%	0.1719	0.1478	0.1492	0.0300	0.0426	0.1098	0.0371	0.0596
	50%	0.1591	0.1507	0.1397	0.0136	0.0392	0.1009	0.0291	0.0867
Average precision
datasets	Noise	RFHNN	CHNN	MLGCN	Glocal	PML-NI	ML2	C2AE	BR
IAPRTC-12	20%	0.3377	0.2766	0.2347	0.3399	0.3343	0.2852	0.0855	0.1463
	30%	0.3073	0.2736	0.2386	0.2767	0.2358	0.2061	0.0773	0.1848
	40%	0.3015	0.2451	0.2529	0.3169	0.3046	0.2322	0.0879	0.0929
	50%	0.2918	0.2103	0.2413	0.2680	0.2156	0.1975	0.0718	0.1076
ESPGame	20%	0.3293	0.2853	0.2208	0.2987	0.3034	0.2476	0.0860	0.1472
	30%	0.2989	0.2597	0.2513	0.2427	0.2137	0.1663	0.0747	0.1671
	40%	0.2774	0.2528	0.2129	0.2581	0.2778	0.2070	0.0660	0.0950
	50%	0.2792	0.2473	0.2647	0.2362	0.1985	0.1645	0.0841	0.1136

Fig.2

Several multi-label image annotation examples on IAPRTC-12 datasets. For each image, we show the ground-truth annotations, and the labels predicted by the proposed RFHNN. The labels in black are those that match with ground-truth annotation. The blue labels denote the correctly predicted ones, while the red labels are those that are wrongly predicted. Besides, we use green labels to represent the labels that are correctly predicted but missing in the ground-truth annotations.

(1) The proposed RFHNN outperforms most of the state-of-art methods on the four datasets. For example, under the situation of noise = 20%, on dataset ESPGame, our method improves the best results of the baselines by 4.33% (Micro-F1), 3.41% (Macro-F1), which validates our theoretical results. Through the overall analysis of Table 6, we can find that when the noise increase, the performance of the method we choose to compare shows a sharp decline. In contrast, our method can still maintain relatively acceptable performance, which reflects the high accuracy and strong anti-noise ability of our method.

(2) An important characteristic of multi-label learning evaluation is that the results obtained under different evaluation metrics are quite different, so here we select four representative evaluation metrics, which include instance-based metrics and label-based metrics. We can find from the experimental results that there are some differences in the results between different methods under these four indicators. For example, the method that we proposed works well on Micro-F1 and Macro-F1, and the performance of GLOCAL method on Average precision is very prominent. Our method has generally good results on these four indicators, and some of the poor results are also within the acceptable range, which shows that our method has good stability.

(3) Label correlations is a very important topic in multi-label learning. Most of the existing multi-label classification methods have taken this aspect into consideration, but most methods just consider the second-order correlations, which will cause the loss of information. However, in the case of datasets with noise in label space, if the method is too sensitive to the correlations of the labels, the performance may even be worse than the simple method that does not consider the label correlations. This means that when we deal with noisy datasets, if we use an effective label correlations mining method, we must use an effective anti-noise module at the same time. Otherwise, the label correlations will play a negative role.

(4) Through the comparison of RFHNN with CHNN, we can find that the high-order correlations mining module (Section 3.1) and the anti-noise module (Section 3.3) play important roles in dealing with noisy datasets. CHNN is a degenerate version of our model, but it still has better performance than ML-GCN with the help of higher-order label correlations (Section 3.2). The performance of RFHNN on IAPRTC-12 and ESPGame is better than that on MS-COCO and VOC2007 in the comparison with ML-GCN, because our method makes better use of the information in the label space, and IAPRTC-12 and ESPGame have more kinds of labels and their label space information is more abundant.

To visually demonstrate the effectiveness of the proposed RFHNN, we present a case for further study in which RFHNN is applied to IAPRTC-12 dataset. The annotation results of several instance images are shown in Fig. 1 annotiation. The proposed RFHNN correctly predict most of the labels of these images, and RFHNN can even find the missing labels in ground-truth. For example, our method tags the image in row 1 and column 4 with the missing label ’building’. The performance shows that our method can work well in real-world image annotation applications.

From the above experimental results and analysis, we can find that our proposed method is superior to the state-of-the-art methods in most cases, which well demonstrates its effectiveness.

5 Conclusion

In this paper, we propose Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN) to solve the problem of multi-label classification. RFHNN proposes a hypergraph construction method based on Apriori in multi-label datasets, which can mine robust high-order label correlations effectively. This method utilizes the hypergraph to exploit the high-order correlations of instances in label space and develops the Hypergraph Neural Networks (HNN) based on it. Then the high-order structure can improve the classification ability. Meanwhile, a novel hypergraph fusion method based on the complementarity between feature space and label space is proposed to design the Fused Hypergraph Neural Networks (FHNN). The high-order correlations of the instances in the feature space can effectively correct noise in the label space.

There are many interesting future works. For example, we find that most of these existing high-order correlations are strongly directional. We will try to use the directed hypergraph to further enhance the effectiveness of our model in the next work.

Footnotes

Acknowledgments

This work is supported from the National Natural Science Foundation of China (Grant Nos. 61876087 and 62076135).

References

Boutell

M.R.

, Luo

, Shen

and Brown

C.M.

, Learning multi-labelscene classification, Pattern Recognition 37(9) (2004), 1757–1771.

Chen

, Chen

, Yeh

, Wang

Y.F.

Order-free rnn with visual attention for multi-label classification, In AAAI Conference on Artificial Intelligence, pages 6714–6721, 2018.

Chen

, Wei

, Wang

, Guo

Multi-label image recognition with graph convolutional networks, In IEEE Conference on Computer Vision and Pattern Recognition, 2019.

Dong

, Li

, Zhou

Learning from semi-supervised weak-label data, In AAAI Conference on Artificial Intelligence, pages 2926–2933, 2018.

Everingham

, Gool

L.V.

, Williams

C.K.

, Winn

and ZissermanThe

, pascal visual object classes (voc) challenge, InternationalJournal of Computer Vision 88(2) (2010), 303–338.

Fang

, Zhang

Partial multi-label learning via credible label elicitation, In AAAI Conference on Artificial Intelligence, pages 3518–3525, 2019.

Feng

, You

, Zhang

, Ji

, Gao

Hypergraph neural networks, In AAAI Conference on Artificial Intelligence, pages 3558–3565, 2019.

, Yang

, Yu

Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning, In IEEE Conference on Computer Vision and Pattern Recognition, pages 1277–1286, 2018.

Gong

, Jia

, Leung

, Toshev

, Ioffe

Deep convolutional ranking for multilabel image annotation, In arXiv preprint arXiv:1312.4894, 2013.

10.

Guillaumin

, Mensink

, Verbeek

, Schmid

Tagprop: Discriminative metric learning in nearest neighbor models for image autoannotation, In International Conference on Computer Vision, pages 309–316, 2009.

11.

Guillaumin

, Mensink

, Verbeek

, Schmid

Tagprop: Discriminative metric learning in nearest neighbor models for image autoannotation, In International Conference on Computer Vision, pages 309–11-Jun-22316, 2009.

12.

Hou

, Geng

, Zhang

Multi-label manifold learning, In AAAI Conference on Artificial Intelligence, pages 1680–1686, 2016.

13.

Lee

, Fang

, Yeh

, Wang

Y.F.

Multi-label zero shot learning with structured knowledge graphs, In IEEE Conference on Computer Vision and Pattern Recognition, pages 1576–1585, 2018.

14.

, Fu

Robust multi-label semi-supervised classification, In IEEE International Conference on Big Data, pages 27–36, 2018.

15.

Lin

, Maire

, Belongie

, Hays

, Perona

, Ramanan

Microsoft coco: Common objects in context, In European Conference on Computer Vision, pages 740–755, 2014.

16.

Lin

, Ding

, Hu

Image tag completion via image specific and tag-specific linear sparse reconstructions, In IEEE Conference on Computer Vision and Pattern Recognition, pages 1618–1625, 2013.

17.

Patrini

, Rozza

, Menon

A.K.

, Nock

, Qu

Making deep neural networks robust to label noise: a loss correction approach, In IEEE Conference on Computer Vision and Pattern Recognition, pages 2233–2241, .

18.

Sun

, Ji

, Ye

Hypergraph spectral learning for multi-label classification, In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 668–676, 2008..

19.

Sun

, Zhang

, Zhou

Multi-label learning with weak label, In AAAI Conference on Artificial Intelligence, pages 593–598, 2010..

20.

Tan

, Ma

, Zhan

, Mao

Hypergraph induced graph convolutional network for multi-label image recognition, In International Conference on Internet of Things and Intelligent Applications, pages 1–5, 2020.

21.

Wang

, Liu

, Zhao

, Zhang

, Hu

, ChenDiscriminative

Discriminative and correlative partial multi-label learning, In International Joint Conference on Artificial Intelligence, pages 3691–3697, 2019.

22.

Wang

, Yang

, Mao

, Huang

, Xu

Cnn-rnn: A unified framework for multi-label image classification, In IEEE Conference on Computer Vision and Pattern Recognition, pages 2285–2294, 2016..

23.

Wang

, Chen

, Li

, Xu

, Lin

Multi-label image recognition by recurrently discovering attentional regions, In International Conference on Computer Vision, pages 464–472, 2017.

24.

Wang

, Chen

, Li

, Xu

, Lin

Multi-label image recognition by recurrently discovering attentional regions, In International Conference on Computer Vision, pages 464–472, 2017.

25.

, Chen

, Li

, Xiao

, Hu

Adahgnn: Adaptive hypergraph neural networks for multi-label image classifcation, In ACM International Conference on Multi-media, pages 284–293, 2020.

26.

, Zhang

Towards enabling binary decomposition for partial label learning, In International Joint Conference on Artificial Intelligence, pages 2868–2874, 2018.

27.

Xie

, Huang

Partial multi-label learning, In AAAI Conference on Artificial Intelligence, pages 4302–4309, 2018.

28.

Xie

, Huang

Partial multi-label learning with noisy label identification, IEEE Transactions on Pattern Analysis and Machine Intelligence, page Early Access, 2021.

29.

, Tian

, Wang

, Kang

and Chen

, Joint inputand output space learning for multi-label image classification,, IEEE Transactions on Multimedia 23 (2020), 1696–1707.

30.

, Jin

, Zhou

Speedup matrix completion with side information: Application to multi-label learning, In Annual Conference on Neural Information Processing Systems, pages 2301–2309, 2013.

31.

Yeh

, Wu

, Ko

, Wang

Y.F.

Learning deep latent spaces for multi-label classification, In AAAI Conference on Artificial Intelligence, pages 2838–2844, 2017.

32.

Zhang

, Yu

, Fu

, Zhu

, Chen

and Hu

, Hybridnoise-oriented multilabel learning, IEEE Transactions onCybernetics 50(6) (2020), 2837–2850.

33.

Zhang

, Yu

Solving the partial label learning problem: An instance-based approach, In International Joint Conference on Artificial Intelligence, pages 4048–4054, 2015.

34.

Zhang

, Yu

and Tang

, Disambiguation-free partial labellearning, IEEE Transactions on Knowledge and Data Engineering 29(10) (2017), 2155–2167.

35.

Zhang

, Zhou

Multilabel neural networks with applications to functional genomics and text categorization, volume 18, pages 1338–1351, 2006.

36.

Zhang

and Zhou

, A review on multi-label learning algorithms, IEEE Transactions on Knowledge and Data Engineering 26(8) (2014), 1819–1837.

37.

Zhou

, Huang

, Schölkopf

Learning with hypergraphs: Clustering, classification, and embedding, In Annual Conference on Neural Information Processing Systems, pages 1601–1608, 2006.

38.

Zhou

, Zhang

Multi-label learning, Encyclopedia of Machine Learning and Data Mining, pages 875–881, 2017.

39.

Zhu

, Li

, Ouyang

, Yu

, Wang

Learning spatial regularization with imagelevel supervisions for multi-label image classification, In IEEE Conference on Computer Vision and Pattern Recognition, pages 5513–5522, 2017.

40.

Zhu

, Zhu

, Zhang

, Hu

, He

Adaptive hypergraph learning for unsupervised feature selection, In 1International Joint Conference on Artificial Intelligence, pages 3581–3587, 2017.

41.

Zhu

, Kwok

J.T.

and Zhou

, Multi-label learning with global andlocal label correlation, IEEE Transactions on Knowledge andData Engineering 30(6) (2018), 1081–1094.

Robust fused hypergraph neural networks for multi-label classification

Abstract

Keywords

1 Introduction

2.1 Multi-label classification with deep learning

2.2 Multi-label classification with noise in label space

2.3 Hypergraph preliminaries

3.1 Fused hypergraph neural networks (Module in the blue dashed box)

4.1 Datasets and settings

Table 1 Datasets properties Dataset Domain Instances Labels Cardinality MS-COCO image 122585 80 2.9 VOC2007 image 9963 20 2.5 IAPRTC-12 image 19627 291 5.7 ESPGame image 20770 268 4.6

Footnotes

Acknowledgments

References

Table 1
Datasets properties

Dataset Domain Instances Labels Cardinality

MS-COCO image 122585 80 2.9

VOC2007 image 9963 20 2.5

IAPRTC-12 image 19627 291 5.7

ESPGame image 20770 268 4.6