Abstract
Deep neural networks have been adopted in multi-label classification for their excellent performance, however, existing methods fail to comprehensively utilize the high-order correlations between instances and the high-order correlations between labels, and these methods are difficult to deal with label noise effectively. We propose a novel end-to-end deep framework named Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN), which can effectively utilize the two kinds of high-order correlations and adopt them to mitigate the impact of label noise. In RFHNN, Hypergraph Neural Networks (HNNs) are adopted to mine and utilize the high-order correlations of the instances in the feature space and the label space respectively. The high-order correlations of the instances can not only improve the accuracy of the classification and the discrimination of the proposed model, but also lay the foundation for the subsequent noise correction module. Meanwhile, a hypergraph construction method based on the Apriori algorithm is proposed to realize Hypergraph Neural Networks (HNNs), which can mine robust second-order and high-order label correlations effectively. Effective classifiers are learned based on the correlations between the labels, which will not only improve the accuracy of the model, but can also enhance the subsequent noise correction module. In addition, we have designed a noise correction module in the networks. With the help of the high-order correlations among the instances and the effective classifier, the framework can effectively correct the noise and improve the robustness of the model. Extensive experimental results on datasets demonstrate that our proposed approach is better than the state-of-the-art multi-label classification algorithms. When dealing with the multi-label training datasets with noise in the label space, our proposed method also has great performance.
Keywords
Introduction
Multi-label learning is one of the active topics in the field of machine learning and pattern recognition. In the multi-label learning framework, each instance is represented by a feature vector, while it may be associated with multiple category labels. The goal is to induce a function that is able to assign multiple proper labels (from a given label set) to unseen instances [36, 38]. With the introduction of the concept of multi-label learning, many scholars have carried out research on the basis of it and put forward a lot of effective algorithms.

Illustration of the proposed RFHNN; The labels with wavy underlines are indeed existing but missing in the ground-truth, and the labels with straight underlines are mis-indexed labels in the ground-truth annotation. Hypergraph construction can capture the high-order correlations of instances in feature space and label space, then these high-order correlations will be adopted into the model with the help of the hypergraph neural networks (refer to the module in the blue dashed box in Fig. 1); In terms of label correlations, hypergraphs are built with Apriori algorithm, where each node denotes a label. The labels are represented by word embeddings, then stacked HNNs are learned over the label hypergraph to map these label representations into a set of inter-dependent object classifiers, which are applied to the image representation extracted from the input image for multi-label image recognition (refer to the module in the green dashed box in Fig. 1). The noise correction module is proposed based on the label noise transition matrix. With the help of the high-order correlations among the instances and effective classifiers achieved above, it can correct the noise and improve the robustness of the model (refer to the module in the red dashed box in Fig. 1).
Motivated by recent development in deep learning, more and more researchers begin to apply deep learning to multi-label classification problems. Researchers have applied various deep networks such as deep neural network [9, 35], deep convolutional neural network, recurrent neural network [2, 23], graph convolutional neural network [3] and so on to multi-label classification. However, as observed, we can find that a certain label noise often exists in the multi-label datasets and it may degrade the performance of multi-label classification. The noise could make the labels of instances incorrect, and they can be divided into two cases: (1) several essential labels are missing, and (2) several labels are mislabeled. As shown in Fig. 1, it is obvious that there is no ’road’ in the picture located in row 1 column 1 and ’beach’ should exist instead, so ’road’ is a mis-indexed label while ’beach’ is a missing label. These kinds of label noise will increase the training errors of the classifier and decrease the accuracy of the multi-label classification. Meanwhile, the mining of label correlations is very important for multi-label classification. Though almost all the recent works on multi-label learning have taken label correlations into consideration, most of them just consider the second-order correlations (i.e., pairwise correlations) between labels. In fact, the correlations between labels are often more than second-order in multi-label classification problems. If just paying attention to the second-order correlations, it may lead to information loss and affect the accuracy of multi-label classification. How to effectively combine the label high-order correlations with the instance high-order correlations to alleviate label noise is an important issue. To deal with the above issue, we design an ingenious framework named Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN) based on Hypergraph Neural Networks (HNNs). The RFHNN can improve the accuracy of the multi-label classification and take the mining of label correlations and noise correction into consideration at the same time. In order to effectively mine the high-order label correlations and realize the anti-noise function, we need to effectively characterize the high-order correlations among labels and high-order correlations among instances. We choose hypergraph [18] and hypergraph neural network [7] to realize the characterization of these high-order correlations. RFHNN can be roughly divided into three modules, which correspond to the three dashed boxes in different colours (blue, green, red) in Fig. 1
Hypergraph Neural Networks (HNNs) (module in the blue dashed box in Fig. 1) are adopted to mine the high-order correlations of the instances in the feature space and the label space respectively. The high-order correlations of the instances in the feature space can enable the proposed RFHNN to obtain more high-quality image features, while the high-order correlations of instances in the label space can utilize the high-order structure to improve the discriminative ability. These two high-order correlations will play an important role in the noise correction part. Before introducing the specific noise correction module, we need to expound the label correlations mining part (module in the green dashed box in Fig. 1) first, as the label correlations mining can provide an effective classifier, which also make contributions in the noise correction module. Here we propose to utilize the association rule mining algorithm in data mining to mine the second-order and high-order label correlations. The strong association rules generated by the well-established Apriori algorithm can mine the most reliable label correlations. Then we combine the label correlations with the word-to-vector representation of the label text to learn an effective classifier, which can improve the accuracy of classification while also providing assistance for the following noise correction. We design an noise correction module to correct the noise in the label space (module in the red dashed box in Fig. 1). This noise correction module is based on the high-order correlations among instances and the classifier mentioned above. It completes the correction of the noise by learning a label noise transition matrix which represents the probability of being mislabeled between labels. With the help of the effective classifier and high-order structure of instances in feature and label spaces, we can learn a great label noise transition matrix. As a result, our model can effectively deal with the noise problem in the label space, which greatly improves the overall robustness of the model.
Therefore, the main contributions can be highlighted as follows:
(1) We propose a novel end-to-end deep framework for multi-label classification. This deep framework is based on Hypergrpah Neural Networks (HNNs), which can effectively integrate the high-order correlations of the instances in the feature space and the label space. The high-order correlations among instances will play an important role in improving model classification accuracy, discrimination and robustness.
(2) We propose a hypergraph construction method based on Apriori on multi-label datasets, which can mine label correlations (including high-order correlations) effectively. We conduct the corresponding hypergraph and then design the corresponding HNN based on it. The label correlations help us learn an effective classifier and work well in noise correction.
(3) We propose a noise correction module based on the label noise transition matrix, which can effectively reduce the influence of noise in label space. The high-order correlations among instances and the label correlations ensure the quality of the noise transition matrix. This module can reduce the impact of noise, thus it can effectively improve the whole robustness of our model.
The rest of this paper is organized as follows. Section 2 gives a brief review of related work and background. Then we formulate the problem and present the proposed approach in Section 3. We discuss experimental results in Section 4 and conclude in Section 5.
Multi-label learning algorithms [36, 38] can be divided into two broad categories: problem transformation methods (PTMs) and algorithm adaptation methods (AAMs). Due to the establishment of large-scale manual datasets and the rapid development of deep convolutional networks, the performance of image classification has recently been significantly improved. Many efforts have been devoted to extending deep convolutional networks for multi-label image recognition. Our method is mainly based on hypergraph neural network to deal with the problem of noise and mining label correlations in multi-label classification.
In this section, we will introduce the related work in multi-label classification with deep learning and multi-label classification with noise in label space. The basic knowledge of hypergraph and hypergraph neural network are also included here.
Multi-label classification with deep learning
Zhang and Zhou [35] proposed the BP-MLL method which is the first multi-label classification method based on DNN. BP-MLL transforms the multi-label classification problem into several binary classification problems and utilizes the loss function to mine the label correlations. Gong et al. [9] trained deep convolutional neural networks for multi-label classification with a ranking-based strategy. The weighted approximated-ranking loss is shown to work particularly well for multi-label annotation problems. Wang et al. [22] proposed the CNN-RNN method for multi-label classification. CNN-RNN uses networks to convert both images and labels into a common latent space, while the label correlations can be mined by LSTM. Yeh et al. [31] proposed a multi-label embedding method based on deep learning and canonical correlations analysis. It can solve the multi-label classification problem with large-scale data well by using the deep neural network for spatial transformation. Chen et al. [2] proposed a deep architecture that consists of two parts, the spatial transformer and the LSTM sub-network. The two components perform alternately to locate the attentional region and capture the dependency on them. Wang et al. [24] proposed a model based on recurrent neural network which consists of a visual attention model and a confidence-ranked LSTM. This method can get proper label sequence automatically and has great robustness in label prediction. Chen et al. [3] proposed the ML-GCN based on graph convolutional network. The method utilizes the word-embedding representations to learn a classifier, and then second-order label correlations is mined by the correlations matrix, which is built in a data-driven way. Wu et al. [25] proposed a high-order semantic learning model based on adaptive hypergraph neural networks (AdaHGNN). The adaptive hypergraph is constructed based on the label embedding, and the hypergraph neural network can improve the ability of the multi-label classification model. Tan et al. [20] proposed a hypergraph-induced convolutional network (HI-GCN) which is inspired by the nature of hypergraph coding high-order connections. The model can effectively mine higher-order label correlations with high adaptivity and scalability. Xu et al. [29] proposed the TSGCN, which learns from both input and output spaces for multi-label image classification. TSGCN is the multi-label classification model conducted by mapping spatial object correlations and semantic label correlations.
Multi-label classification with noise in label space
The noise that exists in the label space includes false-positive noise and false-negative noise. Multi-label classification with false positive-noise alone is equivalent to partial multi-label learning, and multi-label classification with false-negative noise alone is equivalent to weak-label learning. We will briefly explain these two aspects here.
Partial multi-label learning is extended from partial label learning [26, 34], but the difficulty of partial multi-label learning is far greater than partial label learning, because there are multiple correct labels in the candidate label set for each instance in the partial multi-label learning framework. So mining the information of the complex label space effectively is the key to improving the performance of partial multi-label learning. For partial multi-label learning, not all the relevant labels in the given training label set are the really relevant labels in absolute truth. Xie and Huang [27] first introduced the concept of partial multi-label learning. They gave the new learning framework and its solutions. The method calculates the confidence for all the labels in the candidate set for each instance when performing the multi-label classification operation. Wang et al. [21] proposed the method of DRAMA which can be divided into two steps to solve the problem of partial multi-label learning. Firstly, the confidence of the labels corresponding to each instance is calculated, and then the confidence matrix and gradient superposition algorithm are used to build and optimize the model. Fang and Zhang [6] first used the propagation of label information to solve the problem of the confidence of the label matrix, and then they used the pairwise loss between the labels to build the model. Afterwards, the two methods named P-VLS and P-MAP are proposed based on different predictors.
For weak-label learning, the labels corresponding to some instances are incomplete, as not all irrelevant labels in the given training label set are the really irrelevant labels in absolute truth. Sun et al. [19] proposed a weak-label learning method named WELL which adopts the hypothesis of low-rank similarity matrix to mine the correlations among instances, and then completes the dissemination of the label information. Lin et al. [16] proposed a sparse reconstruction-based weak label learning method named LSR. This method gives an incomplete correlations matrix between instances and labels, then it uses instance correlations, label correlations, and correlations between instances and labels simultaneously with sparse constraints to complete the relation matrix between instances and labels. Xu et al. [30] proposed a weak-label learning algorithm named Maxide based on the low-rank matrix completion algorithm. This algorithm uses side information to reduce the observation information required for matrix completion, and can effectively complete missing label matrices in the state of weak-label. Dong et al. [4] proposed a weak-label learning method named SSWL based on ensemble learning, which considers feature space similarity and label space similarity to make up missing labels. This method integrates multiple different models through the collaborative regularization framework, which can improve the robustness of the model when the label information is insufficient.
Based on the above analysis, it can be found that the existing multi-label classification algorithms have paid attention to the label correlations from different aspects. However, most of the existing methods fail to effectively integrate and utilize the high-order label correlations and high-order instance correlations. Besides, how to flexibly improve the robustness of the model with the help of the above-mentioned higher-order correlations is also a key issue. Our proposed method can effectively utilize the high-order correlations of instances and high-order correlations of labels from the perspective of alleviating label noise to improve the accuracy and robustness of multi-label classification models, which is detailed in Section 3.
Hypergraph preliminaries
Let V denote a finite set of instances, and let E be a family of subsets e of V such that ∪e∈E = V. G = (V, E) is then called a hypergraph with the vertex set V and the hyperedge set E. A hyperedge that contains just two vertices is just a simple graph edge. A weighted hypergraph is a hypergraph that has a positive number ω (e) which is associated with each hyperedge e called the weight of hyperedge e. We denote a weighted hypergraph by G = (V, E, ω). A hyperedge e is defined to be incident with a vertex v when v ∈ e. The degree of each vertex v ∈ V is defined as:
Let rvertSrvert denote the cardinality of a given arbitrary set S. The degree of a hyperedge e ∈ E is defined as: δ (e) = rvertervert. A hypergraph G can be represented by a rvertVrvert × rvertErvert matrix H with entries h (v, e) =1, if v ∈ e and 0 otherwise, called the incidence matrix of G. Based on matrix H, the degree of each vertex and each hyperedge can be calculated as:
Let D v and D e denote the diagonal matrices containing the vertex and hyperedge degrees respectively, and let W denote the diagonal matrix containing the weights of hyperedges [37].
The hypergraph structure is more complicated than the ordinary graph structure, and it is more difficult to use the hypergraph structure properly. At present, most of the work with the hypergraph structure are based on the hypergraph laplacian matrix. The derived hypergraph laplacian matrix is a positive semi-definite matrix, which integrates the information of the hypergraph well and is widely used. Following the random walk model, Zhou et al. [37] proposed the following normalized hypergraph Laplacian L:
A hypergraph can describes the high-order correlations among instances. Feng et al. [7] proposed a hypergraph neural network framework which can deal with the the high-order correlations with the hypergraph for representation learning. Specifically, when there is a hypergraph signal X ∈ Rn×C1 with n nodes and C1 dimensional features, Y ∈ Rn×C2 can be obtained after the following hyperedge convolution formulation:
For multi-label classification, let
Based on the effective Hypergraph Neural Networks (HNNs) [7, 25], we propose the Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN). Our method mainly revolves around anti-noise and mining high-order label correlations and instance correlations. As we have mentioned in the section of Introduction, the noise in the label space reduces the accuracy of the classifier and causes interference to the mining of label correlations. We develop a novel hypergraph fusion method to deal with the challenge. We construct corresponding hypergraphs in both feature space and label space and then fuse the two hypergraph structures with networks. The high-order correlations of the instances in the feature space can effectively correct inaccurate correlations of instances in label space which is caused by noisy data (refer to the module in the blue and red dashed box in Fig. 1). At the same time, we also adopt the HNNs and Apriori algorithm to effectively mine the high-order label correlations to generate the classifier. This classifier utilizes the information of the label correlations and provides support for the anti-noise module (refer to the module in the green dashed box in Fig. 1).
We will describe the proposed model in the following three parts in detail: Fused Hypergraph Neural Networks (module in the blue dashed box), Multi-label Recognition with High-order correlations in label space (module in the green dashed box) and Hypergraph construction noise correction (module in the red dashed box). The specific practices are detailed as follows.
Fused hypergraph neural networks (Module in the blue dashed box)
Here we introduce the fused hypergraph neural networks module in our model. It is used is to merge the hypergraph networks which are based on the feature space information and the label space information. Mining and using the structure information in the feature space and the label space can make the constructed model more accurate and discriminative. The construction of the fused hypergraph networks neural module can be divided into two steps as described below.
Firstly, we construct hypergraphs in both feature space and label space. In feature space, different from the simple graph where each edge represents the vertex-to-vertex correlation, the incidence matrix H
f
of a hypergraph describes the vertex-to-hyperedge correlation. To achieve this, we regard each instance as one vertex and try to generate a hyperedge for each vertex by following the method in [37]. More specifically, we generate the hyperedge e
i
by the following formulation [40]:
For the instance structure in the feature space, we use a hypergraph matrix H
f
to represent the high-order correlations among instances in the feature space. Based on the hypergraph matrix, we propose the corresponding hypergraph neural network. The main function of this hypergraph network is to use the high-order structure of the instance to learn better image features. Here we adopt X ∈ Rm×d with m nodes and d dimensional features to represent the initial deep feature extracted by CNN, then the hyperedge convolution can be formulated by:
In label space, a hypergraph is built where each vertex corresponds to one training instance and each hyperedge for one label includes all the training instances that are relevant to the same label. (In Fig. 1, the black and red instances have the same label ’beach’, so they share the same hyperedge ’beach’). For the instance structure in the label space, we use a hypergraph matrix H
l
= Y to represent the high-order correlations among instances in the label space. Based on the hypergraph matrix, we propose the corresponding hypergraph neural network. The main function of the hypergraph network here is to effectively use the high-order correlations of instances in the label space to improve the overall discrimination of the model. The hypergraph network that we propose here mainly acts on the results after classification. As a result, the high-order correlations of the instances in the label space will carry out effective label information dissemination. That is, the instances that are similar in the label space can get similar label sets, which will improve the prediction accuracy of our model. Here we adopt
Secondly, based on these two hypergraphs, we give the corresponding hypergraph neural networks. These two hypergraphs can make full use of the high-order correlations of the instances in the feature space and the label space. These two high-order correlations will act on the process of extracted deep features and the classified prediction labels, and lay the foundation for the following noise correction module (Section 3.3).
In this module, we first utilize Apriori algorithm to mine the second-order and high-order label correlations. Then we combine the label correlations with the word-to-vector representation of the label text to learn an effective classifier [3], which can improve the accuracy of classification while also providing assistance for the following noise correction in effect.
How to mine label correlations is very important in multi-label classification, as label correlations mined by the existing methods sometimes don’t have strong universality. This may decrease the robustness of multi-label classification. So we introduce the association rule mining algorithm in data mining to mine the second-order and high-order label correlations, which can alleviate the above problem. The association rule mining algorithm is an important branch of data mining. It generally uses the ’if-then’ logic rules to describe the rules and patterns of certain attributes in items from a large amount of data.
Our innovation focuses on the frequent item mining methods in data mining to mine co-occurrence correlations between labels and find the subsets of labels that often appear at the same time. We believe that there is a strong correlations among the labels that often appear at the same time. As a result, we create a second-order or high-order correlations for these labels to connect them. The Apriori algorithm is one of the most influential and widely used algorithms for mining frequent itemsets and association rules. This algorithm can generate strong association rules with frequent itemsets. When searching for frequent itemsets, the algorithm requires that all non-empty itemsets must be frequent. This step can filter out some co-occurrence noise in the multi-label datasets in advance, so it can make the second-order and high-order label correlations that we find be more robust and effective.
We take each label set for each instance as a data record, so we can mine the corresponding frequent co-occurrence label subset from these data, and then find the association rules. We set up a certain degree of confidence, and then generate some strong association rules. We regard these strong association rules as second-order or higher-order correlations, and then generate corresponding hypergraphs. Specifically, for the label matrix Y, we regard label sets for instances such as y1, y2, ⋯ , y
m
as itemsets, and mine the frequent co-occurrence label subset that they contain. For example, (l2, l3, l5) is a frequent label subset and then we can achieve a strong association rule based on it. As we repute that there is a high-order correlation among the three labels, we enclose the three labels with a hyperedge e1.
The labels are represented by Z ∈ Rn×d
wev
(n is the number of label attributes and d
wev
is the dimensionality of word-embedding vector). The convolution operation can be written as follows.
In multi-label application problems, there is often a certain amount of noise in the label space. They will affect the distribution of the label space, thus interfering with the training of the model. Therefore, effectively distinguishing and correcting the noise will greatly improve the accuracy of the learned model. This section designs a noise correction module (the module in the red dashed box in Fig. 1). The specific ideas are described as follows:
This section mainly designs the noise correction method based on the label transformation matrix T ∈ Rn×n, and combines it with the above two modules (Section 3.1 and 3.2). It is assumed here that the noise in the label space is class-conditional multi-label noise, that is, there will be a certain conversion correlations between the two labels (The probability of converting from y to
Here we suppose that yi* = 1 represents the instance x with the i class label, and
The noise transition matrix T is the core of the noise correction module, and it is important to learn an accurate noise transition matrix. T can be approximated by the following two steps:
There are two issues that need to focus on here. The first point is that there is a hypergraph neural network before and after the noise correction module in the model separately. They have been described in detail in Section 3.1. These two hypergraph neural networks can effectively use the high-order correlations of instances in the feature space and the label space, and the high-order correlations of the instances in the feature space can effectively deal with noise in the label space. The second point is that we learn the noise transition matrix on the basis of the great classifier achieved in Section 3.2. We adopt the hypergraph neural network and apriori algorithm to obtain the classifier. On one hand, such a good classifier can help us get a better noise transition matrix, and on the other hand, it can also directly improve the accuracy of the classification of our proposed model.
From the above three parts, we can see that these three parts can be combined well with hypergraph neural networks, thereby they can greatly improve the classification accuracy, discriminability, and robustness of the model. These three parts are combined based on the hypergraph neural network to form an end-to-end model, and the first two parts (Section 3.1 and 3.2) lay the foundation for the noise correction part (Section 3.3). The most significant difference between our model and the existing models is that we make full use of the high-order correlations in the hypergraph neural networks, including the high-order correlations of the instances in the feature space and the label space, and the high-order correlations between the labels. We use the information in the training set as much as possible, which also determines that we will achieve a more powerful multi-label classification model.
For the instance x
i
, if the instance belongs to the jth label, then y
ij
= 1; otherwise y
ij
= 0. The entire network will be trained based on the cross-entropy loss function, and the specific loss function is expressed as follows:
The general steps of model training are mainly divided into three steps. First, the noise correction module (Section 3.3) is closed, and only the fused hypergraph neural network module (Section 3.1) and the multi-label classification module (Section 3.2) are trained. The main reason for doing so is that the training of the noise correction module needs to be based on a relatively mature classifier. Then, the parameters of the fused hypergraph neural network module and the multi-label classification module are fixed, and the parameters of the noise correction module are learned. Finally, all modules are turned on at the same time for training and fine-tuning to obtain the best classification result.
Datasets and settings
To validate the proposed Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN), we download four benchmark datasets for experiments, i.e., MS-COCO [15], VOC2007 [5], IAPRTC-12 [10] and ESPGame [11]. The statistics of the four real world datasets are summarized in Table 1.
Datasets properties
Datasets properties
Here we need to introduce the general structure of our experimental part. Our experiments can be mainly divided into three parts. (1) The first part is about the mining and verification of high-order label correlations. The high-order label correlations mining and utilization is one of the key parts in our proposed method. In this section, we verify that the Apriori algorithm can indeed mine the high-order correlations between labels. (2) The second part is the experiments on datasets without noise in the label space. Our proposed RFHNN will be compared with the latest multi-label classification algorithms based on deep learning. The experiments in this part are based on two datasets MS-COCO and VOC2007, and we don’t add additional noise to the label space of these two datasets. (3) The third part is the experiments on datasets with noise in the label space. For the fairness of comparison and to reflect the advantages of our method in noise correction, we conduct experiments on the training datasets with noise in the label space.
In our experiments, we follow ML-GCN [3] to set the experiments parameters, and the CNN module uses the ResNet-101 which is pre-trained on ImageNet. A two-layer HGNN is adopted with output dimensions of 1024 and 2048 to generate classifiers in our model, while the hidden layer is set as 1024 in FHConv. For word-to-vector of labels, we follow ML-GCN to choose 300-dim GloVe which is trained on the Wikipedia dataset. We choose LeakyReLU as the nonlinear activation function. For network optimization, we adopt SGD as the optimizer with a learning rate of 0.01. The weight decay is set to be 0.0001 and the momentum is 0.9. We implement the network based on PyTorch, and all the experiments are performed on a 64-Bit Linux workstation with an Intel E5-2650 CPU, an NVIDIA Titan X Pascal card, and 256GB memory.
In the multi-label learning problem, since each instance may have multiple category labels at the same time, the single-label evaluation metrics that are commonly used in traditional supervised learning, such as accuracy, precision, and recall, cannot be directly used for the performance evaluation of the multi-label learning system. Therefore, researchers have successively proposed a series of multi-label evaluation metrics. Here we consider four evaluation metrics, i.e., Macro-F1, Micro-F1, Average precision, and Hamming loss, which are widely used in multi-label learning to evaluate the prediction performance. Based on the symbolic representation in the definition of the problem, we denote Y
i
as the set of related labels belonging to the instance x
i
, then in order to characterize the binary classification performance of the predictors on each label, four basic quantities related to the test instance are usually used: TP
j
(true positive), FP
j
(false positive), TN
j
(true negative) and FN
j
(false negative).
Here, rank f is the ranking function corresponding to the real-valued function f (·), and h (·) is the multi-label classifier we learned.
The mean average precision (mAP) [22] adopted in Section 4.3 is different from the Average precision adopted in Section 4.4. The mean average precision (mAP) is widely adopted in the evaluation of multi-label image classification and image retrieval, which is detailed as follows:
For Macro-F1, Micro-F1, Average precision and mean average precision (mAP), the larger the values, the better the performance [36].
From the previous analysis, we can see that the high-order label correlations are crucial to the improvement of the accuracy of the multi-label classification algorithm. Our proposed method adopts the mature Apriori algorithm to indirectly realize the mining of label correlations by mining frequent itemsets, and then integrates the high-order label correlations into the learning process with the hypergraph neural network. In the whole process, the key point is whether we can successfully mine accurate high-order label correlations by Apriori algorithm.
In order to demonstrate that our method can effectively mine high-order correlations, here we use Apriori algorithm to mine the correlations among the labels in dataset IAPRTC-12. On different support levels, we can see that the high-order correlations mined are different in Table 2. For example, when we set support as 0.01, we can get the freqitemsets {bike, cycling, helmet, jersey, short, cyclist} and the six labels often appear at the same time. To some extent, this explains that there is a wide range in high-order correlations among labels. These high-order correlations contain rich information and are of great help to multi-label classification. In addition, we introduce the association rules generated under different confidence levels in Table 3. From our experiments, we can see that the Apriori algorithm can indeed mine the label correlations from label space, and from the semantic information of the labels, the label correlations correlations mined are reasonable.
The FreqItemsets mined by Apriori under different support
The FreqItemsets mined by Apriori under different support
The associate rules mined by Apriori under different confidence
To validate the proposed Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN) on datasets without noise in label space, we consider the following evaluation metrics, i.e., average per-class precision (CP), recall (CR), F1 (CF1), average overall precision (OP), recall (OR), F1 (OF1) and mean average precision (mAP), which are detailed in Section 4.1.
In this part of the experiment, we compare our proposed approach with the following state-of-art multi-label classification methods: A Unified Framework for Multi-Label Image Classification (CNN-RNN) [22], Order-Free Rnn with Visual Attention for Multi-Label Classification (Order-Free RNN) [2], Multi-Label Image Recognition by Recurrently Discovering Attentional Regions (RNN-Attention) [23], Multi-Label Zero-Shot Learning with Structured Knowledge Graphs (ML-ZSL) [13], Learning Spatial Regularization with Imagelevel Supervisions for Multi-Label Image Classification (SRN) [39], Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning (Multi-Evidence) [8], Multi-Label Image Recognition with Graph Convolutional Networks (ML-GCN) [3] (both Binary and Re-weighted schemes) and Joint Input and Output Space Learning for Multi-Label Image Classification (TSGCN) (both Binary and Re-weighted schemes) [29].
Table 4 illustrates and compares the performances of the above methods on MS-COCO [15]. From the experimental results, we can draw the following interesting observations. The proposed RFHNN outperforms most of the state-of-art methods under the seven metrics, which validates our theoretical results. We can see that the proposed method is significantly better than the current CNN based methods, which proves that our proposed RFHNN can mine high-order label correlations accurately and improve the performance of classification. Especially when comes to the comparison between our method and ML-GCN, we can dig out and use more high-order information by hypergraph structure in the training dataset. This high-order information fits well with the characteristics of label correlations in multi-label learning, which can better help us perform multi-label classification.
Quantitative results by our proposed method and compared methods on MS-COCO validation set
Quantitative results by our proposed method and compared methods on MS-COCO validation set
Table 5 illustrates and compares the performances of the above methods on MS-COCO [15]. Following [3, 22], we use the training set to train our model, and evaluate the recognition performance on the test set. In order to compare with other state-of-the-art methods, we compare the results of average precision (AP) and mean average precision (mAP). The results on VOC2007 are presented in Table 5, and the results of many previous works on VOC2007 are based on the VGG model. For fairness, we also report the results that use VGG models as the base model. It is clear to see that our proposed method obtains improvements over the previous methods. Concretely, the proposed RFHNN obtains 94.6% mAP, which outperforms ML-GCN (Re-weight) by 0.3% and outperforms TSGCN (Re-weight) by 0.6%. Also, it achieves improvement on 9 out of the 20 labels. The dataset VOC2007 contains only 20 labels, and the number of tags is relatively small, so the high-order information among labels is relatively insufficient. Nevertheless, the existing high-order information also make help for our results, which fully illustrates the importance of high-order information.
Comparisons of AP and mAP with state-of-art methods on the VOC 2007 dataset
To validate the proposed Robust Fused Hypergraph Neural Networks for Multi-label Classification (RFHNN) on datasets with noise in label space, we consider three evaluation metrics, i.e., Macro-F1, Micro-F1 and Average precision, which are widely-used in multi-label learning to evaluate the prediction performance of all the methods. The definitions of the three metrics can be found in Section 4.1 [36].
In this part of the experiment, we compare our proposed approach with the following state-of-art multi-label classification methods: Multi-Label Image Recognition with Graph Convolutional Networks (ML-GCN) [3], Multi-Label Learning with Global and Local Label Correlations (GLOCAL) [41], Partial Multi-Label Learning with Noisy Label Identification (PML-NI) [28], Multi-label Manifold Learning (ML2) [12] and Learning Deep Latent Spaces for Multi-Label Classification (C2AE) [31]. We also report the results of the baseline algorithm, Binary Relevance (BR) [1]. Besides, we have added a set of ablation experiments here to verify the ability of the anti-noise of our method. Classifier based on Hypergraph neural networks (CHNN) is proposed by omitting Fused Hypergraph Neural Networks (Module in the blue dashed box, Section 3.1) and Hypergraph Construction Noise Correction (Module in the red dashed box, Section 3.3) on the basis of RFHNN.
For the Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN) that we proposed, there are currently no ready-made datasets with label noise for us to use, so we designed the datasets to verify the effectiveness of our proposed method [14, 32]. We selected two commonly used natural datasets i.e., IAPRTC-12 and ESPGame. These datasets have been divided into training and test sets in advance. For each dataset, to simulate false-positive noise and false-negative noise at the same time, we will randomly add some labels to the label matrix (randomly set negative labels (0) to positive (1)) to simulate the case false-positive noise, while we will randomly remove some labels from the label matrix (randomly set positive labels (1) to negative (0)) [32] to simulate the case of false-negative noise. Specifically, we make noise datasets according to the proportion of the total number of labels. In almost all commonly used multi-label datasets, the number of relevant labels for each instance is much smaller than the number of irrelevant labels [19]. In the experiment, the proportion of false-positive noise we added was 10%, 20%, and the proportion of false-negative noise was 10%, 30%. We used them in pairs so the experimental verification was carried out in four different situations (20% -50%). The dataset we designed is close to the actual scenario and can effectively verify the performance of our proposed method. For datasets of image, we extract 4096-dimensional deep features by using the ResNet-101 which is pre-trained on ImageNet, we didn’t perform the fine-tuning for fairness and computational efficiency.
Table 6 gives the illustrations of multi-label classification performance under the situation of noisy labels on the two datasets. The table can be divided into three parts longitudinally, which corresponds to three different evaluation indicators, i.e., Macro-F1, Micro-F1, and Average precision. The Noise column in the table refers to the scale of the noise of training datasets. From the experimental results, we can draw the following interesting observations:
Quantitative results on several datasets, where the best ones are in bold
Quantitative results on several datasets, where the best ones are in bold

Several multi-label image annotation examples on IAPRTC-12 datasets. For each image, we show the ground-truth annotations, and the labels predicted by the proposed RFHNN. The labels in black are those that match with ground-truth annotation. The blue labels denote the correctly predicted ones, while the red labels are those that are wrongly predicted. Besides, we use green labels to represent the labels that are correctly predicted but missing in the ground-truth annotations.
(1) The proposed RFHNN outperforms most of the state-of-art methods on the four datasets. For example, under the situation of noise = 20%, on dataset ESPGame, our method improves the best results of the baselines by 4.33% (Micro-F1), 3.41% (Macro-F1), which validates our theoretical results. Through the overall analysis of Table 6, we can find that when the noise increase, the performance of the method we choose to compare shows a sharp decline. In contrast, our method can still maintain relatively acceptable performance, which reflects the high accuracy and strong anti-noise ability of our method.
(2) An important characteristic of multi-label learning evaluation is that the results obtained under different evaluation metrics are quite different, so here we select four representative evaluation metrics, which include instance-based metrics and label-based metrics. We can find from the experimental results that there are some differences in the results between different methods under these four indicators. For example, the method that we proposed works well on Micro-F1 and Macro-F1, and the performance of GLOCAL method on Average precision is very prominent. Our method has generally good results on these four indicators, and some of the poor results are also within the acceptable range, which shows that our method has good stability.
(3) Label correlations is a very important topic in multi-label learning. Most of the existing multi-label classification methods have taken this aspect into consideration, but most methods just consider the second-order correlations, which will cause the loss of information. However, in the case of datasets with noise in label space, if the method is too sensitive to the correlations of the labels, the performance may even be worse than the simple method that does not consider the label correlations. This means that when we deal with noisy datasets, if we use an effective label correlations mining method, we must use an effective anti-noise module at the same time. Otherwise, the label correlations will play a negative role.
(4) Through the comparison of RFHNN with CHNN, we can find that the high-order correlations mining module (Section 3.1) and the anti-noise module (Section 3.3) play important roles in dealing with noisy datasets. CHNN is a degenerate version of our model, but it still has better performance than ML-GCN with the help of higher-order label correlations (Section 3.2). The performance of RFHNN on IAPRTC-12 and ESPGame is better than that on MS-COCO and VOC2007 in the comparison with ML-GCN, because our method makes better use of the information in the label space, and IAPRTC-12 and ESPGame have more kinds of labels and their label space information is more abundant.
To visually demonstrate the effectiveness of the proposed RFHNN, we present a case for further study in which RFHNN is applied to IAPRTC-12 dataset. The annotation results of several instance images are shown in Fig. 1 annotiation. The proposed RFHNN correctly predict most of the labels of these images, and RFHNN can even find the missing labels in ground-truth. For example, our method tags the image in row 1 and column 4 with the missing label ’building’. The performance shows that our method can work well in real-world image annotation applications.
From the above experimental results and analysis, we can find that our proposed method is superior to the state-of-the-art methods in most cases, which well demonstrates its effectiveness.
In this paper, we propose Robust Fused Hypergraph Neural Networks for Multi-Label Classification (RFHNN) to solve the problem of multi-label classification. RFHNN proposes a hypergraph construction method based on Apriori in multi-label datasets, which can mine robust high-order label correlations effectively. This method utilizes the hypergraph to exploit the high-order correlations of instances in label space and develops the Hypergraph Neural Networks (HNN) based on it. Then the high-order structure can improve the classification ability. Meanwhile, a novel hypergraph fusion method based on the complementarity between feature space and label space is proposed to design the Fused Hypergraph Neural Networks (FHNN). The high-order correlations of the instances in the feature space can effectively correct noise in the label space.
There are many interesting future works. For example, we find that most of these existing high-order correlations are strongly directional. We will try to use the directed hypergraph to further enhance the effectiveness of our model in the next work.
Footnotes
Acknowledgments
This work is supported from the National Natural Science Foundation of China (Grant Nos. 61876087 and 62076135).
