Graph structure learning based on feature and label consistency

Abstract

Graph Neural Networks (GNNs) have achieved remarkable success in graph-related tasks by combining node features and graph topology elegantly. Most GNNs assume that the networks are homophilous, which is not always true in the real world, i.e., structure noise or disassortative graphs. Only a few works focus on generalizing graph neural networks to heterophilous or low homophilous networks, where connected nodes may have different labels. In this paper, we design a simple and effective Graph Structure Learning strategy based on Feature and Label consistency (GSLFL) to increase the homophilous level of networks for generalizing any existing GNNs to heterophilous networks. Specifically, we first introduce a method to learn graph structure based on node features and then modify the graph structure based on label consistency. Further, we combine the GSLFL with three existing GNNs to learn node representations and graph structure together. And we design a self-training method to iteratively train models and modify graph structure with pseudo-labels. Finally, our empirical results on 6 public networks with homophily or heterophily, and structure attacks show that our methods outperform the state-of-the-art methods in most cases.

Keywords

Graph neural networks graph structure learning structure attacks heterophily

1. Introduction

The graphs represent the relationships of objects, which are ubiquitous in the real world, such as social networks, traffic networks, bio-informatics networks. Graph neural networks (GNNs) have shown the advantages to learn the node or graph representations for graph-related tasks, for example, node classification [1, 2, 3, 4], graph classification [5, 6, 7]. Most GNNs integrate the node features and graph structure via aggregating the information from neighbors to update the node representations. One premise of this message propagation mechanism is the homophily assumption of networks that the connected nodes have the same labels or similar features. The homophily is a general principle in most real-world networks. But there are also some networks with heterophily, such as disassortative graphs, in where the connected nodes may have different labels and features. For example, the different amino acid types often connect in protein structures, most people prefer to communicate with people of opposite gender in dating networks [8]. Meanwhile, the networks may be noisy or incomplete due to the inevitable error during the data collection.

Most GNNs cannot generalize to networks with heterophily, in where the GNNs may gather large amounts of noise from neighbor nodes. At the same time, recent studies have shown that the small and less noticeable disturbances in graph structure can disastrously reduce the performance of the strongest and most popular GNNs [9]. In Fig. 1, we show the node classification results of different models on two real-world datasets under various structure noise ratios (see Definition 1 in 3 section). The x-axis represents different structure noise ratios, and the y-axis represents the classification accuracy of the model. We change the structure noise ratio of networks by attacking the graph structure and maintain the same number of nodes and edges (Algorithm 1 in 4.1 section). The higher graph structure noise means the lower homophily. The details of the experiments can be found in the empirical evaluation section. We can see that the classification accuracy of all models decreases rapidly as the structure noise increases (the level of homophily decreases) except for MLP that only uses node features. And the performance of simple MLP is even superior to many popular GNNs when the structure noise is high.

Figure 1.

The node classification results of models with different structure noise ratios.

Recently, a few works have begun to generalize the GNNs to networks with different levels of homophily, such as GEOM-GCN [10], H2GCN [11], FAGCN [12], GPRGNN [13]. These GNN models are well-designed for networks with heterophily. Recently, Ma et al. [14] propose that homophily may not be a necessity for Graph Neural Networks and GCN can achieve good performance on heterophilous networks under the condition that nodes with the same label share similar neighborhood patterns. But the condition is not always true in real world. So it is also necessary to explore how to generalize the GNNs to networks with heterophily.

From Fig. 1, we also find an interesting phenomenon that all the popular GNNs can achieve nearly 100% classification accuracy when the structure noise decreases to zero, that is, all connected nodes belong to the same class. Therefore, we propose whether it is possible to reduce the structure noise of the networks for improving the homophily level of networks, and thus improve the performance of the existing GNNs. Inspired by this, we propose a simple and efficient framework of Graph Structure Learning based on node Features and Label consistency (GSLFL), which can reduce the structure noise and be used as a plug-in to combine with any existing GNNs.

Different from these well-designed GNN models for heterophilous networks, our method focuses on improving the robustness of existing GNN models from the perspective of graph structure learning, which can deal with networks with different homophily levels. To the best of our knowledge, it is the first work that uses graph structure learning to solve the problem of performance degradation of most GNNs when the strong homophily is not satisfied. And there are very few works to consider label consistency in graph structure learning.

Specifically, similar to most graph structure learning methods, we first propose to learn the graph structure based on node features or representations. The intuition is that similar nodes in feature space should be connected and belong to the same class, which may reduce the structure noise and improve the level of network homophily. However, there may be noise in node features, so the learned structure may bring new structure noise. To reduce the structure noise, we design a novel strategy to modify the learned graph structure based on label consistency that connected nodes should have the same labels. Further, we design a self-training method to iteratively train models and modify graph structure. The proposed GSLFL is simple and effective. A large number of experimental results on standard datasets show that our proposed GSLFL combined with the popular GNNs can achieve state-of-the-art performance under different homophilous levels of networks.

The main contributions of the paper are as follows:

We find an interesting phenomenon that any existing GNNs can achieve nearly 100% node classification accuracy on most standard datasets by modifying the graph structure based on the label consistency to improve the homophilous level of networks.

We propose a novel graph structure learning framework based on node features and label consistency, which can be plugged into any existing GNNs.

We design self-training method to iteratively train models and modify graph structure based on predicted pseudo-labels.

We run a large number of experiments on various types of standard datasets with homophily and heterophily, and structure attacks, which proves our proposed method, GSLFL, can improve the performance of any existing GNNs and achieve state-of-the-art performance on the node classification task.

The organization of this paper is as follows. Section 2 introduces the notations and related works. Section 3 presents the Graph Structure Learning based on node Features and Label consistency, and a self-training method. The experimental results and analysis are shown in Section 4. In Section 5, we propose our conclusions and future works.

2. Related works

In line with the focus of our work, we briefly introduce related works on Graph Neural Networks. For more details of Graph Neural Networks, please refer to the surveys [15, 16, 17].

Spectral graph convolutions. Gori et al. [18], Scarselli et al. [19] firstly propose Graph Neural Networks to deal with graphs based on deep-learning methods. Recently, many Graph Neural Networks have been proposed and shown great success in the graph-related task, including two main families: spectral GNNs and spatial GNNs. The spectral GNNs learn node representations based on graph spectral theory. Bruna et al. [20] design a convolution operation based on the eigendecomposition of the graph Laplacian. ChebNet [21] proposes to approximate the convolution operation by Chebyshev polynomials to reduce time complexity. Further, Graph Convolutional Network (GCN) [1] devises to limit the spectral graph convolutions in directed connected nodes. After that, many variants of GCN have been proposed, such as DGCN [22], DeepGCNs [23], AM-GCN [24], GCNII [25]. However, all of the spectral graph convolution networks are only suitable for transductive tasks and cannot generalize unobserved graphs.

Spatial graph convolutions. To generalize GNNs to inductive tasks, similar to CNN, some researchers explore to design convolution from the perspective of the spatial domain. GraphSAGE [3] proposes to learn node representations by sampling and aggregating features from the local neighbor nodes of the central node. Graph Attention Network (GAT) [2] introduces an attention-based method to evaluate the contributions of different neighbor nodes for the target node, and then aggregates the information from the neighbors based on the attention coefficients. DisenGCN [26] devises a neighborhood routing mechanism, which can dynamically identify the latent factors that help assign the neighbors to different channels for extracting specific features. FastGCN [27] proposes to view graph convolutions from the angle of integral transforms of embedding functions under probability measures and performs importance sampling in each layer.

GNNs for different homophily levels. All the GNNs mentioned above are based on the assumption of networks with strong homophily. However, there are many networks in the real world that do not meet the assumption, like dating networks. To generalize the GNNs to networks with heterophily, GEOM-GCN [10] introduces a novel geometric aggregate method that includes three modules, node embedding, structural neighborhood, and bi-level aggregation. H2GCN [11] proposes to learn node representations for networks with heterophily based on three techniques, separation of the target node and its neighbor node representations, high-order neighbor nodes, and a combination of intermediate. Recently, FAGCN [12] explores the node features from the angle of frequency signals and devises a novel self-gating mechanism that adaptively aggregates low-frequency and high-frequency signals during the message passing, which is suitable for different homophily levels of networks. GPRGNN [13] proposes to learn the Generalized PageRank weights adaptively so as to jointly extract the node features and structural information, regardless of the networks with homophily or heterophily. All of these models emphasize the preservation of original node features and the adaptive aggregation of neighbors’ information.

Graph structure learning methods. Most of the above GNNs can be only used when the graph structures are available. But the existing graph structures are often noisy or incomplete, or many data have no graph structures. To the end, many works propose to learn the graph structure from data. Henaff et al. [28] propose a general scheme to construct graph structure via a Gaussian diffusion Kernel on Euclidean distance. GLCN [29] proposes to combine graph learning and graph convolution into a unified network architecture to learn the optimal graph structure which is most suitable for GNNs on semi-supervised task. AGCN [30] designs a learning method to measure the similarity between nodes and combines the learned structure with the original graph Laplacian matrix. LDS [31] proposes to jointly learn the graph structure and GCN by approximately solving a bi-level program. IDGL [32] devises an end-to-end graph learning method to iteratively learn graph structure and node representations. In addition, IDGL controls the quality of the learned graph structure via graph regularization, such as smoothness, connectivity, sparsity. SimP-GCN [33] proposes to preserve node similarity and learn graph structure together. Pro-GNN [34] proposes to jointly learn a graph structure and graph neural network model to defend adversarial attacks based on some intrinsic properties of real-world graphs, e.g., low-rank, sparse, feature smoothness. However, all these methods only use the node features or representations for learning graph structure. In this paper, we propose to learn graph structure based on node feature and label consistency for generalizing GNNs to networks with different homophilous level.

3. Preliminaries

In this section, we introduce some basic concepts and notations used in this paper. Let $G=(V,E)$ be an undirected and unweighted graph, where $V$ is the set of nodes and $E$ is the set of edges. $X\in R^{n*d}$ denotes the node features, where $n$ is the number of nodes and $d$ represents the dimensions of node features. Let $A\in R^{n*n}$ be the adjacency matrix, where $A_{ij}=1$ if there is an edge between node $i$ and node $j$ , otherwise $A_{ij}=0$ . The diagonal matrix of node degree denoted as $D\in R^{n*n}$ , where $D_{ij}=\sum_{j}{A_{ij}}$ . $Y\in R^{n*C}$ denotes the labels of nodes, where $C$ is the number of classes. For node classification, $m<n$ nodes have labels $Y_{L}$ and the labels $Y_{U}$ of the remaining $n-m$ nodes are missing. The task of node classification is to learn a classifier $f:(A,X,Y_{L})\to Y_{U}$ to infer the labels for unlabeled nodes.

Most Graph Neural Networks learn node representations via the following message propagation mechanism:

$\displaystyle h_{v}^{k}=f(h_{v}^{k-1},g\{h_{u}^{k-1}:u\in N(v)\});h_{v}^{0}=x_% {v}$ (1)

where $f$ represents the update function and $g$ is the aggregate function that gathers the information from the neighbor nodes. $N(v)$ are the neighbor nodes of node $v$ . $k$ is the layer of model. After learning the node presentations, the prediction labels can be denoted as:

$\displaystyle y_{v}=\textit{argmax}\{\textit{softmax}(h_{v}^{K})W\}$ (2)

Where $W$ are the parameters and $K$ represents the number of GNNs layer.

Follow the recent research works [10, 11, 12], we focus on homophily in class labels. We first define the structure noise ratio (It is contrary to the edge homophily ratio defined in [11]).

Definition 1. (structure noise ratio) $\textit{SNR}=\frac{\{(u,v):y_{u}\neq y_{v}\bigwedge(u,v)\in E\}}{|E|}$ is the fraction of edges in a graph $G$ that connected nodes have the different labels.

The graphs have strong homophily when structure noise ratio is small $\textit{SNR}\rightarrow 0$ , and the heterophilous graphs have high structure noise ratio $\textit{SNR}\rightarrow 1$ .

4. Methodology

In the section, we first introduce a method to change the structure noise, then discuss the phenomenon that most GNN models can achieve almost 100% node classification accuracy when the structure noise ratio is zero, and give an intuitive explanation. Inspired by this, we propose a structure learning strategy to reduce the structure noise of networks and improve the homophilous level of networks. Then, we combine our method, Graph Structure Learning based on Feature and Label consistency (GSLFL), with three graph neural networks, GCN, GAT, and GPRGNN. Further, we propose a sel-training method to train models and modify graph structure. Finally, we give an analysis of our method by comparing it with similar works.

4.1 The method of changing structure noise

In this paper, we define the structural noise ratio as the fraction of connected nodes with different labels on a graph (see Definition 1 in 3 section). We introduce a scheme to change the structure noise ratio on real-world networks, as shown in Algorithm 1 that references [35] but is different. The input of Algorithm 1 is adjacency matrix $A$ , node labels $L$ , structure noise ratio SNR. We first calculate the original structure noise ratio in lines 1–9 and then evaluate the number of edges that need to be changed in line 11. If the original structure noise ratio is higher than SNR, we need to add the intra-class edges and remove the inter-class edges (in lines 13–17), otherwise, we need to delete the intra-class edges and increase the inter-class edges (in lines 18–22). Here we take the first case as an example to explain in detail. We randomly select num edges as the bad edge set that connected nodes have different labels. For each bad edge $A_{ij}$ , let’s randomly select node $i$ or node $j$ . Suppose we select node $i$ , and then randomly select another node $k$ which is not connected with node $i$ and has the same label with node $i$ . Let $A_{ij}=0$ , $A_{ji}=0$ , $A_{ik}=1$ , $A_{ki}=1$ . Finally, we return the corrective adjacency matrix $A$ . It is worth noting that we modify the graph structure while maintaining the same number of nodes and edges in Algorithm 1.

Algorithm 1.
Input: the adjacency matrix $A$ , node labels $L$ , structure noise ratio SNR
1: Function SNR_noise ( $A, L$ )
2: num_noise_edges $=$ 0
3: for connected node $i$ and node $j$ ( $A_{ij}=1$ )
4: if $L_{i}\neq L_{j}$
5: num_noise_edges $++$
6: end if
7: end for
8: return $\frac{\textit{num\_noise\_edges}}{\|E\|}$
9: end function
10: Function change_SNR ( $A,L,\textit{SNR}$ )
11: original_rate = SNR_noise ( $A, L$ )
12: num $=$ $\|\textit{original\_rate-SNR}\|*\|E\|$
13: if $\textit{original\_rate}>\textit{SNR}$
14: bad_edges $=$ randomly select num edges that connect different labeled nodes
15: for $A_{ij}$ in bad_edges
16: randomly select node $i$ or $j$ ; suppose the selected node is $i$ , randomly select another node $k$ that has same label
with $i$ and $A_{ik}\neq 1$ , let $A_{ik}=A_{ki}=1$ , $A_{ij}=A_{ji}=0$
17: end for
18: else
19: good_edges $=$ randomly select num edges that connect the same label nodes
20: for $A_{ij}$ in good_edges
21: randomly select node $i$ or $j$ ; suppose the selected node is $i$ , randomly select another node $k$ that has different
label with $i$ and $A_{ik}\neq 1$ , let $A_{ik}=A_{ki}=1$ , $A_{ij}=A_{ji}=0$
22: end for
23: end if
24: return corrective adjacency matrix $A$
25: end function
26: output: A $=$ change_SNR ( $A,L,\textit{SNR}$ )

Figure 2.

The node classification results of models on real-world networks with $\textit{SNR}=0$ .

In the real world, the networks have different structure noise ratios, as shown in Table 1. The structure noise ratio of homophilous networks is low, and it is high on heterophilous networks. Based on Algorithm 1, we set different structure noise ratios, which can change the noise ratio of the real-world networks and affect the homophilous level of the networks. When the structure noise ratio is zero, all nodes in the network are only connected with the same labeled nodes. And there is no structure noise. So the network has the very strong homophily. We set the structure noise ratio of the real-world networks as zero, and then run the models to classify nodes on the original graph structure and the modified graph structure respectively. The experimental results are shown in Fig. 2. The x-axis represents different datasets, and the y-axis represents the classification accuracy of the model. We find that the node classification accuracy of all models in the modified graph structure is much higher than that of the original graph structure on all datasets (homophilous graphs and heterophilous graphs). Moreover, the node classification accuracy of all models on the modified graph is almost 100% on most datasets.

We give an intuitive explanation for this phenomenon from the perspective of label propagation. We assume that there is no structure noise in the network, that is, all connected nodes have the same labels. We run label propagation in the network, in where there are no unlabeled isolated nodes, and at least one node in each connected subgraph has labels. In this case, all unlabeled nodes can only get the label information from the nodes in the same category, and cannot receive the label information from the nodes in different categories. Finally, all nodes can get the correct label information, and the label propagation can achieve 100% node classification accuracy. Recently, as Dong et al. [35] have proved that the decoupled GCN(like APPNP) is essentially the same as the two-step label propagation. In the above experiments, we choose many training and validation samples. In this case, the unlabeled isolated nodes are very few and connected subgraphs without labeled nodes are also very few. So GCN, GAT, and APPNP can achieve almost 100% classification accuracy in the modified graph structure. Inspired by this, we explore to learn a clean graph structure for decreasing the structure noise ratio.

4.2 Graph structure learning based on feature and label consistency

It is known that deep neural networks are prone to noise and error. Most GNN methods are highly sensitive to the quality of graph structure, and they require the perfect graph structure to learn node representations. But most of the real-world graph structures are noisy. And the performance of GNN models is significantly degraded when suffering the disturbance of graph structure, as shown in Fig. 1. This poses a great challenge to apply GNN to practical problems, especially in some risk-critical scenarios, such as medical analysis. Recently, many researchers have begun to focus on graph structure learning, and they are committed to jointly learning the optimal graph structure and GNN models. Most of the existing graph structure learning methods case the problem as metric learning based on distance between nodes or similarity of features. To the best of our knowledge, very little work is done to consider label consistency in graph structure learning. We are the first to study graph structure learning from the perspective of improving the homophilous level of networks, and jointly use the node features and label consistency.

Figure 3.

The framework of graph structure learning based on feature and label consistency.

The existing methods of metric learning in graph structure learning usually include kernel function [28], attention mechanism [29] and cosine similarity [32]. These methods are learnable and based on the node features or representations in the hidden layer. The intuition behind this is that the nodes close to each other in the feature space should be connected in the graph structure. Inspired by this, we propose a simple and effective Graph Structure Learning framework based on Feature and Label consistency (GSLFL), as shown in Fig. 3. Our method includes three modules, structure learning module, structure modification module, and GNN learning module. We firstly propose to learn the similarity between nodes in the feature space and establish the sparse graph structure based on the threshold or KNN. Then let the learned graph structure combine with the original graph structure to form a new graph structure. However, both the learning graph structure and the original graph structure may have structure noise. Therefore, we further propose a graph structure modification scheme, which uses the labels of nodes in the training set and validation set to reduce the structure noise ratio of the network and improve the homophilous level of the network. Finally, our GSLFL can combine with any existing graph neural network methods for various graph-related downstream tasks.

In the structure learning module, we can use any existing metric learning methods to learn the similarity between nodes. The similarity between node $i$ and node $j$ can be denoted as follows:

$\displaystyle S_{ij}=M(h_{i},h_{j}).$ (3)

Where $h_{i}$ and $h_{j}$ are the representaions of node $i$ and $j$ , respectively. And they are node features $x_{i}$ , $x_{j}$ at the begining. The $M$ is the metric function, for examlpe, cosine similarity:

$\displaystyle S_{ij}=\cos(h_{i},h_{j}).$ (4)

Where $W_{p}$ represents the learned parameters and $m$ is an integer. Then we normalize the similarity matrix and learn a sparse matrix based on the threshold or KNN. For example, the threshold method:

$\displaystyle S_{ij}=\left\{\begin{array}[]{ll}1,&\text{if }S_{ij}>t\\ 0,&\text{otherwise}\\ \end{array}\right.$ (5)

Where $t$ represents a threshold. The learned similarity matrix $S$ is dynamic in front of each layer of GNN models and adaptive for downstream tasks, or only in front of the first layer of GNN models. To reduce the time complexity of computing similarity between each pair of nodes, we can only calculate the similarity between targeted node and its $k$ -order neighbors (for exapmle, $k=3$ ). The learned graph structure and the original graph structure (if existence) are combined to form a new graph structure:

$\displaystyle A=A+S.$ (6)

Then the structure modification module is used to delete inter-class edges and add intra-class edges based on the label consistency of nodes. Refer to Algorithm 1 for the detailed processes. Note that we only modify the connection relationships between local subgraphs (training set and validation set) and set $\textit{SNR}=0$ .

Finally, we can run any existing GNN models on the modified graph structure to learn the representations of nodes or graphs and solve the downstream tasks. For example, we use GCN for node classification.

$\displaystyle Z=f(X,A)=\textit{softmax}(\hat{A}\textit{ReLU}(\hat{A}XW^{0})W^{% 1}),$ (7)

where $\hat{A}=D^{-\frac{1}{2}}\vec{A}D^{-\frac{1}{2}}$ , $\vec{A}=A+I$ , $Z\in R^{n*C}$ . In this paper, we use cross-entropy loss and regularized term as learning loss function:

$\displaystyle\mathcal{L}=-\sum_{l\in Y_{L}}\sum_{c=1}^{C}Y_{l}^{c}\ln Z_{l}^{c},$ (8)

However, in real world, the labeled samples are rare, and the number of training set and validation set are few, which means that we can only delete few inter-class edges and add few intra-class edges based on the labeled nodes. To solve the problem, we propose a self-training method to enhance the training set based on the prediction of models. Specifically, we obtain the pseudo-labels of test set based on threshold $\lambda$ :

$\displaystyle Y_{P}=\{c|i\in V_{\textit{test}},Z_{i}^{c}>\lambda\},$ (9)

where $V_{\textit{test}}$ is test set and $Z_{i}^{c}$ is the probability that node $i$ belongs to category $c$ . We add the pseudo-labels $Y_{P}$ into training set for modifying the graph structure and training models based on the new labeled nodes:

$\displaystyle Y_{U}=Y_{U}\cup Y_{P},$ (10)

We iteratively train model and modify graph structure with pseudo-labels in several stages until the performance of the model is not improved.

Discussions: Time complexity analysis. The time complexity of structure learning module is $O(\textit{ave\_deg}^{k}*n*d)$ , where ave_deg is the average degree of nodes, $n$ is the number of nodes and $d$ is the dimention of $X$ , $k=3$ in this paper. The structure modification module requires $O(|E|)$ time complexity, where $|E|$ represents the number of edges. Therefore, the increased time complexity of our method comparing with GNN models is $O(\textit{ave\_deg}^{k}*n*\hat{d}+|E|)$ .

An intuitive explanation for the effectiveness of the structure modification module is as follows. We first denote the graph structure as:

$\displaystyle\left[\begin{array}[]{cc}A_{LL}&A_{LU}\\ A_{UL}&A_{UU}\\ \end{array}\right],$ (11)

where $A_{LL}$ represents the subgraph structure formed by the known labeled nodes (training set and validation set), $A_{UU}$ represents the subgraph formed between test nodes, $A_{LU}$ and $A_{UL}$ represent the edges between known labeled nodes and test nodes. We modify the structure in $A_{LL}$ by reducing the structure noise ratio to zero, which improves the homophily level of subgraph $A_{LL}$ and the whole network. And the propagation label information in $A_{LL}$ is not noisy, which may promote the propagation of this clean information in $A_{LU}$ and $A_{UL}$ and improve the performance of the models. Of course, in the experiments, we found that the performance of the models in the training set and the validation set is far better than that in the test set on some datasets, which means the models may be overfitting. But the proposed GSLFL can still greatly improve the performance of existing GNN models on all datasets. The problem of how to enhance the generalization ability of the models and alleviate the overfitting is left to future work.

Comparison with similar works. LC-GNN [36] designs a label-consistency graph neural network, which penalizes the inconsistency between the predicted label distribution and the real label distribution in the loss function. Our GSLFL is to learn optimal graph structure to enhance the level of network homophily based on node features and label consistency. Chen et al. [37] propose two methods to alleviate the over-smoothing issue (the performance of GNNs degrades greatly when stacking many layers), MADRge and AdaEdge. The AdaEdeg optimizes the graph structure by adding the intra-class edges and removing the inter-class edges, which is most similar to our method. The AdaEdge optimizes the topology based on the model predictions and uses two hyperparameters to control the number of added or removed edges. But we firstly learn graph structure based on node features and then modify the topology based on labels in training and validation set. And our scheme of adding or removing edges is different from AdaEdge. Finally, the motivation of ours is to improve the homophilous level of networks for generalizing any existing GNNs to heterophilous networks and the AdaEdge is to alleviate over-smoothing. Recently, GAUG [38] propose to leverage graph auto-encoder as edge predictor module and the method adds or removes edges based on edge predicted score. However, all above methods of adding or removing edges change the number of egdes with a hyperparameter and our method maintain the same number of edges without any hyperparameter.

5. Experiments

In this section, we show the results of node classification on 6 standard datasets and prove the effectiveness of our GSLFL method. First of all, we introduce the datasets and experimental setup, and then integrate our method into the three existing methods, GCN, GAT, GPRGNN to get the improved models, GCN $+$ , GAT $+$ , GPRGNN $+$ . We show the node classification results under two kinds of data splits and the performance of models under structure attacks. Finally, we show the ablation study.

5.1 Datasets and experimental setup

Datasets. We use 6 public datasets including three citation networks (Cora, Citeseer, Pubmed),1

¹
https://github.com/kimiyoung/planetoid/tree/master/data.

an Actor co-occurrence network (Actor), and two Wikipedia network (Chameleon, Squirrel). The detailed statistics of datasets are shown in Table 1. From the table, we know that the structure noise ratio of three citation datasets is low, and they are homophliy networks. The others datasets with high structure noise ratio are heterophily networks. We compare the improved methods with the following baseline methods: MLP, Node2Vector [39], popular GNN models-ChebNet [21], GCN [1], GAT [2], APPNP [40], graph structure learning method-Pro-GNN [34], SimP-GCN [33], GAUG-M

+

GCN [38], GNN models for heterophilous networks-Geom-GCN [10], H2GCN [11], FAGCN [12], GPRGNN [13], MLP

+

GCN [14].

Table 1

Statistics of the datasets

Datasets	Nodes	Edges	Classes	Feature	Noise ratio
Cora	2708	5429	7	1433	0.1900
Citeseer	3327	4732	6	3707	0.2638
Pubmed	19717	44338	3	500	0.1976
Chameleon	2277	36101	5	2325	0.7695
Actor	7600	33544	5	931	0.7807
Squirrel	5201	217073	5	2089	0.7776

We use the source codes of models including Node2Vector, ChebNet, GCN, GAT, APPNP, which are from the website.2

https://github.com/rusty1s/pytorch_geometric.

And they are based on PyTorch Geometric. The source codes of other methods are from the original papers. We run all the models on the node classification task under the transitive setting. We use two random data split methods to get training/validation/test set, dense split [10, 13] (48%/32%/20%) and sparse split (10%/10%/80%). For all models, we run each experiment in 10 random splits and different initializations on all datasets except for the results of Geom-GCN are from the original paper.

For all comparative methods, we follow the original setup from the authors. And for the three improved models, GCN $+$ , GAT $+$ , GPRGNN $+$ , we set the same parameters with the GCN, GAT, GPRGNN, respectively. Specifically, we set the epochs as 1000, learning rate as 0.01, weight decay as 0.0005, early stopping as 200, dimension of the hidden layer as 64, dropout as 0.5. The GCN, GAT, GCN $+$ , GAT $+$ all use two-layer networks, and the first layer of GAT and GAT $+$ uses 8 attention heads. We set the number of iterations as 10, $\alpha=0.1$ in GPRGNN and GPRGNN $+$ . We use cosine similarity to build a graph in the structure learning module, $t\in\{0.7,0.8,0.9\}$ in Eq. (5), and the $\lambda\in\{0.8,0.9\}$ in Eq. (9).

5.2 Results of node classification under dense and sparse splits

In the section, we first show the accuracy of node classification on the standard datasets that are usually used to evaluate the performance of models in the community. The results of dense split and sparse split are shown in Tables 2 and 3, respectively.

Table 2
Accuracy of node classification under dense split

Methods/data	Cora	Citeseer	Pubmed	Actor	Chamelon	Squirrel
MLP	76.61 $\pm$ 1.74	72.69 $\pm$ 1.50	86.40 $\pm$ 0.51	35.17 $\pm$ 0.80	46.38 $\pm$ 2.99	31.28 $\pm$ 0.27
Node2Vector	81.54 $\pm$ 0.66	58.03 $\pm$ 0.87	79.29 $\pm$ 0.47	23.50 $\pm$ 0.60	52.14 $\pm$ 1.65	31.62 $\pm$ 6.99
ChebNet	86.96 $\pm$ 1.76	76.61 $\pm$ 1.99	89.16 $\pm$ 0.30	37.01 $\pm$ 0.74	57.00 $\pm$ 2.01	40.67 $\pm$ 0.31
APPNP	88.91 $\pm$ 1.04	76.23 $\pm$ 1.40	86.18 $\pm$ 0.46	32.57 $\pm$ 1.08	44.71 $\pm$ 3.48	44.77 $\pm$ 0.34
Geom-GCN	85.27	77.99	90.05	31.63	60.90	38.14
Pro-GNN	87.08 $\pm$ 1.57	76.58 $\pm$ 1.91	87.56 $\pm$ 0.45	35.16 $\pm$ 1.64	59.12 $\pm$ 2.34	40.86 $\pm$ 2.56
SimP-GCN	88.27 $\pm$ 1.68	78.46 $\pm$ 1.53	89.31 $\pm$ 0.56	35.85 $\pm$ 1.53	62.39 $\pm$ 1.84	47.68 $\pm$ 1.24
GAUG-M $+$ GCN	89.35 $\pm$ 0.38	78.86 $\pm$ 0.44	87.55 $\pm$ 0.07	31.01 $\pm$ 0.45	60.61 $\pm$ 0.89	46.33 $\pm$ 0.73
MLP $+$ GCN	87.01 $\pm$ 1.35	76.35 $\pm$ 1.85	89.77 $\pm$ 0.39	36.24 $\pm$ 1.09	68.04 $\pm$ 1.86	54.48 $\pm$ 1.11
H2GCN	87.81 $\pm$ 1.35	77.06 $\pm$ 1.64	89.40 $\pm$ 0.34	35.86 $\pm$ 1.03	57.10 $\pm$ 1.58	36.42 $\pm$ 1.89
FAGCN	87.56 $\pm$ 0.83	75.30 $\pm$ 0.67	88.69 $\pm$ 0.18	35.98 $\pm$ 0.53	62.02 $\pm$ 0.98	40.57 $\pm$ 1.30
GCN	87.56 $\pm$ 1.53	77.42 $\pm$ 1.28	87.02 $\pm$ 0.37	30.76 $\pm$ 0.89	56.25 $\pm$ 1.65	45.66 $\pm$ 0.39
GCN $+$	89.11 $\pm$ 0.73	79.89 $\pm$ 1.30	90.99 $\pm$ 0.24	38.53 $\pm$ 1.25	61.01 $\pm$ 2.32	50.12 $\pm$ 1.81
Improve	1.77%	3.19%	4.56%	25.26%	8.46%	9.77%
GAT	88.04 $\pm$ 1.52	76.56 $\pm$ 1.82	85.57 $\pm$ 0.38	27.39 $\pm$ 1.21	57.41 $\pm$ 1.48	42.72 $\pm$ 0.33
GAT $+$	89.80 $\pm$ 1.03	79.07 $\pm$ 1.35	89.65 $\pm$ 0.38	36.44 $\pm$ 2.18	63.62 $\pm$ 3.47	46.58 $\pm$ 4.57
Improve	2.00%	3.28%	4.77%	33.04%	10.78%	9.03%
GPRGNN	88.32 $\pm$ 1.70	77.42 $\pm$ 1.64	87.40 $\pm$ 0.33	34.11 $\pm$ 1.09	64.25 $\pm$ 1.94	49.93 $\pm$ 0.53
GPRGNN $+$	90.11 $\pm$ 0.76	78.49 $\pm$ 1.40	91.17 $\pm$ 0.40	39.31 $\pm$ 0.77	69.60 $\pm$ 1.69	56.64 $\pm$ 0.54
Improve	2.03%	1.38%	4.31%	15.24%	8.33%	13.44%

Table 3

Accuracy of node classification under sparse split

Methods/data	Cora	Citeseer	Pubmed	Actor	Chamelon	Squirrel
MLP	64.29 $\pm$ 1.07	65.12 $\pm$ 1.24	84.95 $\pm$ 0.22	34.01 $\pm$ 0.52	32.90 $\pm$ 3.77	22.11 $\pm$ 2.38
Node2Vector	76.32 $\pm$ 0.53	53.38 $\pm$ 0.88	77.80 $\pm$ 0.29	22.80 $\pm$ 0.46	44.93 $\pm$ 0.89	25.62 $\pm$ 4.16
ChebNet	80.80 $\pm$ 1.03	70.49 $\pm$ 0.31	87.21 $\pm$ 0.38	33.41 $\pm$ 0.58	44.02 $\pm$ 2.98	27.96 $\pm$ 1.52
APPNP	84.85 $\pm$ 1.20	72.43 $\pm$ 0.86	85.91 $\pm$ 0.28	32.30 $\pm$ 0.77	42.52 $\pm$ 3.90	35.85 $\pm$ 3.23
Pro-GNN	83.70 $\pm$ 1.40	72.04 $\pm$ 1.86	87.98 $\pm$ 0.62	31.47 $\pm$ 1.69	51.68 $\pm$ 3.82	28.66 $\pm$ 5.77
SimP-GCN	83.44 $\pm$ 0.51	72.85 $\pm$ 0.49	86.87 $\pm$ 0.24	33.57 $\pm$ 1.34	52.47 $\pm$ 2.81	37.11 $\pm$ 1.68
GAUG-M $+$ GCN	83.20 $\pm$ 0.21	73.17 $\pm$ 0.36	86.63 $\pm$ 0.08	29.75 $\pm$ 0.25	54.65 $\pm$ 0.34	36.88 $\pm$ 0.50
MLP $+$ GCN	83.92 $\pm$ 0.89	73.07 $\pm$ 0.35	87.49 $\pm$ 0.25	33.99 $\pm$ 0.47	55.28 $\pm$ 1.82	37.09 $\pm$ 1.51
H2GCN	83.15 $\pm$ 0.47	71.16 $\pm$ 0.26	87.37 $\pm$ 0.07	33.74 $\pm$ 0.36	44.08 $\pm$ 0.81	29.42 $\pm$ 0.45
FAGCN	83.35 $\pm$ 0.77	67.92 $\pm$ 0.51	87.39 $\pm$ 0.13	32.49 $\pm$ 0.73	49.33 $\pm$ 0.98	30.83 $\pm$ 0.62
GCN	82.84 $\pm$ 0.74	72.09 $\pm$ 0.73	86.48 $\pm$ 0.17	27.85 $\pm$ 1.01	50.10 $\pm$ 2.75	36.08 $\pm$ 1.31
GCN $+$	83.95 $\pm$ 0.22	73.04 $\pm$ 0.17	87.55 $\pm$ 0.26	34.14 $\pm$ 1.61	53.49 $\pm$ 1.93	38.55 $\pm$ 1.86
Improve	1.34%	1.32%	1.24%	22.58%	6.76%	6.84%
GAT	83.65 $\pm$ 0.87	71.87 $\pm$ 0.86	85.08 $\pm$ 0.20	27.17 $\pm$ 0.85	51.15 $\pm$ 2.60	31.89 $\pm$ 1.89
GAT $+$	84.65 $\pm$ 0.58	72.85 $\pm$ 0.67	88.11 $\pm$ 1.70	32.07 $\pm$ 1.34	54.34 $\pm$ 2.96	36.58 $\pm$ 1.57
Improve	1.19%	1.36%	3.56%	18.03%	6.24%	14.67%
GPRGNN	84.86 $\pm$ 0.70	72.79 $\pm$ 0.71	86.47 $\pm$ 0.18	31.96 $\pm$ 0.97	51.25 $\pm$ 3.11	36.13 $\pm$ 0.60
GPRGNN $+$	85.98 $\pm$ 0.54	74.11 $\pm$ 0.91	89.03 $\pm$ 1.50	33.35 $\pm$ 3.53	58.27 $\pm$ 2.38	38.11 $\pm$ 1.16
Improve	1.32%	1.81%	2.96%	4.35%	13.70%	5.48%

From Table 2, we find that our models achieve the best performance on all datasets, for example, GPRGNN $+$ achieves 6.21%, 2.30% relative improvement over state-of-the-art models on Actor and Chameleon respectively. And our models, GCN $+$ , GAT $+$ , GPRGNN $+$ are always better than GCN, GAT, GPRGNN on all datasets. The most important thing is that our models GCN $+$ , GAT $+$ are better than original GCN, GAT by a large margin on three networks (Actor, Chameleon, Squirrel) with low homophily level. And the increased relative accuracy is more than 9% on the three datasets. The GCN $+$ , GAT $+$ achieve better performance than state-of-the-art models (Geom-GCN, H2GCN, FAGCN, GPRGNN) that are specially designed for networks with low homophily. Even for the state-of-the-art model, GPRGNN, our method can still improve its performance on all datasets. The results show that our method can help the GNN models generalize to networks with heterophily and enhance the robustness of models. It should be noted that our models have great advantages over all comparative models on three networks (Cora, Citeseer, Pubmed) with strong homophily, which further proves the effectiveness of our models.

From Table 2, we also find that simple MLP, which only uses the node features and ignores the graph structure, is better than some popular GNN models on heterophilous networks. But the GNN models, such as GCN, GAT, can achieve better performance than MLP when running on the corrective graph (GCN $+$ , GAT $+$ ), which proves the effectiveness of our method. From Table 3, we can see that our models, GCN $+$ , GAT $+$ , GPRGNN $+$ , are superior to GCN, GAT, GPRGNN on all datasets, which show that our methods can enhance the performance of GNN models under the sparse split.

5.3 Results of node classification under structure attacks

In this section, we show the robustness of the models under structure attacks. The details of attack strategies are shown in Algorithm 1. It should be noted that we assume that we know the labels of all nodes and the graph structure during the structure attacks. We randomly delete the intra-class edges and add the inter-class edges. Then we run all models on the networks with different structure noise ratios. We show the results of the models on three datasets. Figures 4 and 5 show the results of node classification under dense split and sparse split respectively.

Figure 4.

The node classification results of models on different structure noise ratios under dense split.

Figure 5.

The node classification results of models with different structure noise ratios under sparse split.

From Fig. 4, we find that the node classification accuracy of almost all models decreases with the increase of structure noise ratio (the level of the network homophily decreases), indicating the effectiveness of structure attacks. Meanwhile, we find that the performance of popular GNN models, such as GCN, GAT, and APPNP, decreases significantly with the increase of structure noise ratio. However, GRPGNN shows the robustness to structure attacks, which indicates that it is useful to generalize GNN models to networks with different homophilous levels. Node2vector achieves the worst performance under structure attacks because it completely depends on the network structure for representation learning. But MLP only uses node features and is stable under different structure noise ratios. The most important thing is that our models are better than the original models under different structure noise ratios, which proves the robustness of our model against structure attacks. Finally, on Chameleon, we observe that as noise increases, the classification performance first decreases, it eventually begins to increase, showing a V-shape pattern, which is similer to [14]. The reason may be that as we keep adding more noises, the neighborhood pattern gradually approaches the same for nodes with the same label.

From Fig. 5, we can also find similar conclusions as in Fig. 4. For example, our improved models are always better than the original models. However, the difference is that almost all models are worse than simple MLP under sparse split when the structure noise ratio is high, which shows that the existing GNN models still have a huge space to improve the robustness for structure attacks.

5.4 Ablation study

Table 4
The results of models under dense split

Models	Cora	Citeseer	Actor	Chameleon
GCN	87.56 $\pm$ 1.53	77.42 $\pm$ 1.28	30.76 $\pm$ 0.89	56.25 $\pm$ 1.65
$+$ SL	88.28 $\pm$ 1.70	77.93 $\pm$ 2.34	35.56 $\pm$ 1.06	57.27 $\pm$ 2.61
$+$ SM	88.45 $\pm$ 1.59	78.89 $\pm$ 1.04	34.80 $\pm$ 0.63	58.82 $\pm$ 2.07
$+$ Self	88.14 $\pm$ 1.36	78.42 $\pm$ 1.56	29.51 $\pm$ 1.86	57.46 $\pm$ 2.71
GCN $+$	89.11 $\pm$ 0.73	79.89 $\pm$ 1.30	38.53 $\pm$ 1.25	61.01 $\pm$ 2.32
GAT	88.04 $\pm$ 1.52	76.56 $\pm$ 1.82	27.39 $\pm$ 1.21	57.41 $\pm$ 1.48
$+$ SL	89.02 $\pm$ 1.57	78.23 $\pm$ 2.21	32.01 $\pm$ 1.16	58.49 $\pm$ 1.78
$+$ SM	88.30 $\pm$ 1.09	77.37 $\pm$ 1.18	35.67 $\pm$ 1.46	60.05 $\pm$ 2.05
$+$ Self	88.81 $\pm$ 1.67	77.23 $\pm$ 1.49	26.45 $\pm$ 1.44	59.07 $\pm$ 1.58
GAT $+$	89.80 $\pm$ 1.03	79.07 $\pm$ 1.35	36.44 $\pm$ 2.18	63.62 $\pm$ 3.47
GPR	88.32 $\pm$ 1.70	77.42 $\pm$ 1.64	34.11 $\pm$ 1.09	64.25 $\pm$ 1.94
$+$ SL	89.56 $\pm$ 1.83	77.83 $\pm$ 1.27	35.47 $\pm$ 1.46	67.51 $\pm$ 1.68
$+$ SM	88.45 $\pm$ 1.36	77.28 $\pm$ 0.92	38.06 $\pm$ 1.26	66.83 $\pm$ 2.29
$+$ Self	88.86 $\pm$ 1.02	78.05 $\pm$ 1.75	35.58 $\pm$ 1.65	66.09 $\pm$ 1.86
GPR $+$	90.11 $\pm$ 0.76	78.49 $\pm$ 1.40	39.31 $\pm$ 0.77	69.60 $\pm$ 1.69
SNR	0.1900	0.2638	0.7807	0.7776
SNR $+$	0.1624	0.1552	0.3731	0.2378

We conduct the ablation study to verify the effect of the structure learning module, structure modification module, and self-training method. The results of models for node classification under dense split are shown in Table 4. The “ $+$ SL” represents that we only use the structure learning module to improve GNN models. The “ $+$ SM” is that we only use the structure modification module. The “ $+$ Self” is that we only use the self-training method. The “SNR” is the original structure noise ratio of networks and “SNR $+$ ” represents the structure noise ratio of the modification networks. From the table, we find that both structure learning and structure modification can improve the performance of three GNN models on all datasets, which demonstrates the effectiveness of the two modules. Furthermore, we find that the self-training method can improve the performance of models on three networks(Cora, Citeseer, Chameleon), but it has a negative effect for GCN and GAT on Actor, which may be that pseudo-labels introduce much noise in the process of model training. Finally, we find that our method can reduce the structure noise ratio on all datasets. For example, the structure noise ratio on original Actor is 0.7807 and it is 0.3731 on the corrective network, which shows that our method can significantly decrease the structure noise ratio and increase the homophilous level of networks.

6. Conclusions

In this paper, we explore to generalize the GNN models to networks with different homophilous levels and improve the robustness of GNN models. Firstly, we find that most popular GNN models can achieve nearly 100% accuracy for node classification when the structure noise ratio of the networks is zero, which means the networks have strong homophily. So we propose a simple and effective Graph Structure Learning framework based on Feature and Label consistency (GSLFL) for reducing the structure noise and increasing the homophily level of networks. Our method includes three modules, structure learning module, structure modification module, and GNN module. We conduct a large number of experiments on 6 standard datasets that include homophilous networks and heterophilous networks, which demonstrates that our method can significantly improve the performance of existing GNN models on networks with different homophily levels. And the experiments under structure attacks further prove that our method can enhance the robustness of GNN models. In the future, we will aim to improve the generalization performance of our method, and further extend our method to other tasks, for example, graph classification.

Footnotes

Acknowledgments

This paper is supported by the National Key Research and Development Program of China (Grant No. 2018YFB1403400), the National Natural Science Foundation of China (Grant No. 61876080), the Key Research and Development Program of Jiangsu(Grant No. BE2019105), the Collaborative Innovation Center of Novel Software Technology and Industrialization at Nanjing University.

References

Kipf

T.N.

and Welling

, Semi-supervised classification with graph convolutional networks, in: International Conference on Learning Representations, ICLR, 2017.

Veličković

Cucurull

Casanova

Romero

Liò

and Bengio

, Graph Attention Networks, in: International Conference on Learning Representations, ICLR, 2018.

Hamilton

Ying

and Leskovec

, Inductive representation learning on large graphs, in: Advances in Neural Information Processing Systems, 2017, pp. 1024–1034.

Liu

Long

Zhang

and Lv

, TriATNE: Tripartite Adversarial Training for Network Embeddings,

I E E E T r a n s a c t i o n s o n C y b e r n e t i c s

, 2021.

Ying

You

Morris

Ren

Hamilton

and Leskovec

, Hierarchical graph representation learning with differentiable pooling, in: Advances in Neural Information Processing Systems, 2018, pp. 4800–4810.

Leskovec

and Jegelka

, How powerful are graph neural networks? in: International Conference on Learning Representations, ICLR, 2019.

Gao

and Ji

, Graph U-Nets, in: International Conference on Machine Learning, 2019, pp. 2083–2092.

Pandit

Chau

D.H.

Wang

and Faloutsos

, Netprobe: a fast and scalable system for fraud detection in online auction networks, in: Proceedings of the 16th International Conference on World Wide Web, 2007, pp. 201–210.

Zhang

and Zitnik

, GNNGuard: Defending graph neural networks against adversarial attacks, Advances in Neural Information Processing Systems 33 (2020).

10.

Pei

Wei

Chang

K.C.-C.

Lei

and Yang

, Geom-GCN: Geometric Graph Convolutional Networks, in: International Conference on Learning Representations, ICLR, 2019.

11.

Zhu

Yan

Zhao

Heimann

Akoglu

and Koutra

, Beyond homophily in graph neural networks: Current limitations and effective designs, Advances in Neural Information Processing Systems 33 (2020).

12.

Wang

Shi

and Shen

, Beyond Low-frequency Information in Graph Convolutional Networks, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2021.

13.

Chien

Peng

and Milenkovic

, Adaptive Universal Generalized PageRank Graph Neural Network, in: International Conference on Learning Representations, ICLR, 2021.

14.

Liu

Shah

and Tang

, Is Homophily a Necessity for Graph Neural Networks? arXiv preprint arXiv:2106. 06134, 2021.

15.

Zhang

Cui

and Zhu

, Deep learning on graphs: A survey, IEEE Transactions on Knowledge and Data Engineering, 2020.

16.

Zhou

Cui

Zhang

Yang

Liu

Wang

and Sun

, Graph neural networks: A review of methods and applications, AI Open 1 (2020), 57–81.

17.

Zhu

Zhang

Liu

and Wang

, Deep Graph Structure Learning for Robust Representations: A Survey, arXiv preprint arXiv:2103.03036, 2021.

18.

Gori

Monfardini

and Scarselli

, A new model for learning in graph domains, in: Proceedings IEEE International Joint Conference on Neural Networks, Vol. 2, IEEE, 2005, pp. 729–734.

19.

Scarselli

Tsoi

A.C.

Gori

and Hagenbuchner

, Graphical-based learning environments for pattern recognition, in: Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), Springer, 2004, pp. 42–56.

20.

Estrach

J.B.

Zaremba

Szlam

and LeCun

, Spectral networks and deep locally connected networks on graphs, in: International Conference on Learning Representations, ICLR, 2014.

21.

Defferrard

Bresson

and Vandergheynst

, Convolutional neural networks on graphs with fast localized spectral filtering, in: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016, pp. 3844–3852.

22.

Zhuang

and Ma

, Dual graph convolutional networks for graph-based semi-supervised classification, in: Proceedings of the 2018 World Wide Web Conference, 2018, pp. 499–508.

23.

Muller

Thabet

and Ghanem

, Deepgcns: Can gcns go as deep as cnns? in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9267–9276.

24.

Wang

Zhu

Cui

Shi

and Pei

, Am-gcn: Adaptive multi-channel graph convolutional networks, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 1243–1253.

25.

Chen

Wei

Huang

Ding

and Li

, Simple and deep graph convolutional networks, in: International Conference on Machine Learning, PMLR, 2020, pp. 1725–1735.

26.

Cui

Kuang

Wang

and Zhu

, Disentangled graph convolutional networks, in: International Conference on Machine Learning, PMLR, 2019, pp. 4212–4221.

27.

Chen

and Xiao

, FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling, in: International Conference on Learning Representations, ICLR, 2018.

28.

Henaff

Bruna

and LeCun

, Deep convolutional networks on graph-structured data, arXiv preprint arXiv:1506. 05163, 2015.

29.

Jiang

Zhang

Lin

Tang

and Luo

, Semi-supervised learning with graph learning-convolutional networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11313–11320.

30.

Wang

Zhu

and Huang

, Adaptive graph convolutional neural networks, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, 2018.

31.

Franceschi

Niepert

Pontil

and He

, Learning discrete structures for graph neural networks, in: International Conference on Machine Learning, PMLR, 2019, pp. 1972–1982.

32.

Chen

and Zaki

, Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings, Advances in Neural Information Processing Systems, 2020.

33.

Jin

Derr

Wang

Liu

and Tang

, Node similarity preserving graph convolutional networks, in: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021, pp. 148–156.

34.

Jin

Liu

Tang

Wang

and Tang

, Graph structure learning for robust graph neural networks, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 66–74.

35.

Dong

Chen

Feng

Ding

and Cui

, On the Equivalence of Decoupled Graph Convolution Network and Label Propagation, arXiv preprint arXiv:2010.12408, 2020.

36.

Huang

Hou

Shen

Gao

and Cheng

, Label-Consistency based Graph Neural Networks for Semi-supervised Node Classification, in: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020, pp. 1897–1900.

37.

Chen

Lin

Zhou

and Sun

, Measuring and relieving the over-smoothing problem for graph neural networks from the topological view, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 3438–3445.

38.

Zhao

Liu

Neves

Woodford

Jiang

and Shah

, Data augmentation for graph neural networks, arXiv preprint arXiv:2006.06830, 2020.

39.

Grover

and Leskovec

, node2vec: Scalable feature learning for networks, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 855–864.

40.

Klicpera

Bojchevski

and Günnemann

, Predict then Propagate: Graph Neural Networks meet Personalized PageRank, in: International Conference on Learning Representations, ICLR, 2018.

Graph structure learning based on feature and label consistency

Abstract

Keywords

1. Introduction

3. Preliminaries

4.1 The method of changing structure noise

5.1 Datasets and experimental setup

1 https://github.com/kimiyoung/planetoid/tree/master/data.

Table 2 Accuracy of node classification under dense split

Table 4 The results of models under dense split

Footnotes

Acknowledgments

References

¹
https://github.com/kimiyoung/planetoid/tree/master/data.

Table 2
Accuracy of node classification under dense split

Table 4
The results of models under dense split