Abstract
Recently, several studies have reported that Graph Convolutional Networks (GCN) exhibit defects in integrating node features and topological structures in graphs. Although the proposal of AMGCN compensates for the drawbacks of GCN to some extent, it still cannot solve GCN’s insufficient fusion abilities fundamentally. Thus it is essential to find a network component with stronger fusion abilities to substitute GCN. Meanwhile, a Deep Adaptive Graph Neural Network (DAGNN) proposed by Liu et al. can adaptively aggregate information from different hops of neighborhoods, which remarkably benefits its fusion abilities. To replace GCN with DAGNN network in AMGCN model and further strengthen the fusion abilities of DAGNN network itself, we make further improvements based on DAGNN model to obtain DAGNN variant. Moreover, experimentally the fusion abilities of the DAGNN variant are verified to be far stronger than GCN. And then build on that, we propose a Deep Adaptive Multi-channel Graph Neural Network (DAMGNN). The results of lots of comparative experiments on multiple benchmark datasets show that the DAMGNN model can extract relevant information from node features and topological structures to the maximum extent for fusion, thus significantly improving the accuracy of node classification.
Keywords
Introduction
Graphic Data composed of entities and relationships are ubiquitous in the real world, such as social networks, point clouds, citation networks, biological networks, knowledge graphs, and molecular structures. Recently, graph neural networks for graph representation learning exerts an increasing tremendous fascination on folks, and many researchers aim to develop better Graph Neural Network Models. This phenomenon pushes the rapid development in the field of Graph Neural Network achieved great success in a host of applications, for example, node classification [1, 2, 3, 4, 5, 6, 7], graph classification [8, 9, 10, 11, 12, 13], and link prediction [14, 15, 16, 17, 18].
Typical Graph Convolutional Networks (GCN, Graph Convolutional Networks) [4] and their variants [3, 5, 6, 15, 19] usually adopt neighborhood aggregation (or messaging) scheme through considering node features and topological structures at the same time to learn node representation, the key step is feature aggregation, that is, a single node aggregates feature information from its topological neighbors in each convolutional layer. In this way, the feature information is propagated to the embedding of nodes through the topology of the network, which is used for downstream tasks, and the whole process mentioned above is supervised by the labels for partial nodes. GCN provides a kind of fusion strategy basing on node features and topological structures of node embedding and supervises the process of integration by the end-to-end learning framework, which partly explains its enormous success.
However, quite a few recent studies have revealed some drawbacks of GCN in the integration of node features and topological structures. For example, Li et al. [20] point out that GCN actually performs Laplacian smoothing on node features, which makes the node embeddings in the whole network converge gradually. But since the node embeddings are too similar, it’s unable to distinguish different nodes. In addition, the increase of convolutional layers will cause the node embeddings to become gradually similar. Therefore, shallow architecture is generally used in the design of convolutional layers in GCN and its variants. However, shallow architecture leads to that the nodes can only receive information from nearby neighbors, but not from the larger neighborhood, which will lead to the decline of GCN performance. Experiments directed by Zhu et al. [21] show that GCN has great drawbacks in the ability of adaptive integration of node features and topological structures. Although they design a novel adaptive multi-channel framework to enhance the integration ability of GCN and propose an Adaptive Multi-channel Graph Convolutional Networks (AMGCN) based on it, it cannot essentially improve AMGCN performance by using the original GCN as the core component of the network. As a result, it is necessary to find a network model better than GCN in the ability of integration node features and topological structures to improve the performance of classification models.
Recently, Liu et al. [22] manifest that when multiple GCN layers are superimposed, the performance degradation of Graph Neural Networks (GNN) is not caused by over smoothing [7, 20, 23], but the entanglement of representation transformation and propagation. Based on this, Liu et al. proposed the Deep Adaptive Graph Neural Networks (DAGNN), which decouples representation transformation and propagation and learns the representation of graph nodes from a larger receptive field without any performance degradation. Inspired by this, can we use DAGNN model, which has the capability to expand the receptive field of graph nodes and a unique mechanism to decouple entanglement, to design a model that has better fusion node features and topological structure capabilities than GCN?
In order to verify the above questions, we design a DAGNN variant based on DAGNN, and evaluated the ability of DAGNN variant to fuse node features and topological structures through experiments. The corresponding experimental results demonstrate that the ability of DAGNN variant is far superior to GCN in the fusion of node features and topological structures. And on this basis, we use DAGNN variant with stronger fusion ability to replace GCN in AMGCN and propose a Deep Adaptive Multi-channel Graph Neural Networks (DAMGNN) for semi-supervised classification. The core idea of DAMGNN is to utilize the DAGNN variant module to learn the embedding of nodes from the feature space, topological space, and common space at the same time, and adopt the adaptive multi-channel framework to combine the embeddings learned in different spaces to obtain the final node embedding and employ it for downstream tasks.
In summary, the contributions of this paper are as follows:
Based on DAGNN model, we propose a DAGNN variant model, and through experiments, we reach a conclusion that DAGNN variant is stronger than GCN in terms of fusion node features and topological structures. We replace GCN components with DAGNN variant in AMGCN model. Based on this, we propose a new graph neural network model-deep adaptive multi-channel graph neural networks (DAMGNN) for semi-supervised classification, which can adaptively extract useful information from different spaces and fuse it for classification tasks. Our extensive comparative experiments on a series of benchmark data sets verify that DAMGNN can extract highly relevant information from node features and topological structures, which can be used for node classification tasks. Compared with the current mainstream models, the classification accuracy of the DAMGNN model has been significantly improved.
The graph neural network model can be divided into four main categories: Recurrent Graph Neural Networks (RecGNNs, Recurrent Graph Neural Networks) designed to learn to represent nodes with recurrent neural structures [24, 25], Graph convolutional neural networks that extend convolution operations from grid data to graph data [26, 27, 28], graph autoencoders (GAEs, Graph Autoencoders) that learn network embedding and graph generation distribution [29, 30], Spatial-temporal Graph Neural Networks (STGNNs, Spatial-temporal Graph Neural Networks) aiming to learn hidden patterns from spatial-temporal graphs [31, 32]. Among them, due to the high efficiency of graph data processing by graph convolutional neural networks, in recent years, the application of graph convolutional neural networks has become increasingly widespread. For example, Spectral CNN [25] based on the convolution theorem, MoNet [6] based on aggregation function, R-GCNs [33] for modeling edge information, HA-GCN [10] for modeling high-order information, Deep graph convolutional neural network Co-Train [34] and large-scale graph convolutional neural network GraphSAGE [5]. The main idea of the graph convolutional neural network is to generate the representation of the node by aggregating the feature information of the node itself and the feature information of the neighboring nodes. In essence, most Graph Convolutional Neural Network models learn node embedding by integrating node features and topological structures. Recently, some studies have been carried out to explore the fusion mechanism of graph convolutional neural networks. For instance, Li et al. [20] point out that GCN performs Laplace smoothing on node features actually, work [35, 26] prove that the topological structure of the graph plays a role of low-pass filtering on the feature of the node. However, The literature [21] proves that the ability of GCN to extract relevant information from node features and topological structures and perform adaptive fusion is flawed, although a novel adaptive multi-channel framework is developed in [21] to enhance GCN’s ability of adaptive fusion of node features and topological structures, it does not substantially improve the GCN structure itself, which results in that the performance of the classification model based on GCN could not be substantially improved. Moreover, the above research works are to avoid the over-smoothness caused by the stacking of multiple graph convolutional layers. The models adopt a shallow architecture to make the nodes only receive the information of their close neighbors when aggregated. The DAGNN proposed in work [22] decouples representation transformation and propagation so that nodes can receive the information of multi-hop neighbors without over-smoothing. However, DAGNN only learns the embedding of nodes on the topological graph, which leads to insufficient information contained in the learned node embeddings. Therefore, it remains to be improved.
Fusion capability of dagnn variant
In this section, we will elaborate on the design ideas of DAGNN variant and then illustrate how to explore the ability of DAGNN variant to adaptively fuse node features and topological structures through a series of simple yet intuitive experiments.
DAGNN variant module
Despite the truth that the DAGNN model can separate representation transformation from the propagation so that a larger receptive field can be applied without degrading performance, after in-depth and detailed analysis, we found that the DAGNN model always uses the number of node categories for the dimensionality of node embedding, compared with this approach, if the dimension of the node embedding can be adjusted according to different data sets, it will help to improve the performance of the model. Therefore, in the DAGNN variant model, we set the dimension of the node embedding vector as a hyperparameter, so that the size of the embedding node can be adjusted artificially to adapt to different data sets. Besides, to enable the generated node embeddings to be used in the subsequent adaptive multi-channel framework to further enhance the fusion capability of DAGNN variant, we remove the final SoftMax function processing of DAGNN model. Combining the above two improvement ideas, we design a DAGNN variant model based on the DAGNN model, and its mathematical definition is as follows:
Where
The framework of DAGNN variant model. This diagram illustrates how the model generates a corresponding embedding representation for a node, here the bold lowercase letters 
References literature [21], we conduct a series of comparative experiments to explore the ability of DAGNN variant to adaptively fuse node features and topological structures. The main idea is to establish two networks respectively, the first network node labels are highly correlated with the node features while the second network is highly correlated with the topology, then we test the adaptive fusion ability of DAGNN variant in these two experiments. Finally, we test the adaptive fusion ability of the DAGNN variant on three real data sets to further ensure the reliability of the conclusion. If the adaptive fusion ability of DAGNN variant is better than that of GCN, then the DAGNN variant should be able to perform better than GCN in these kinds of experiments, in other words, the DAGNN variant has better adaptive fusion ability of node features and topological structures than GCN, which means that the performance of the model based on DAGNN variant is better than that of the model based on GCN.
Verification experiment of node features correlation
First, we generate a random network composed of 900 nodes, where the probability of establishing a connection between any two nodes is 0.03, and the feature of each node is represented by a 50-dimensional vector. To generate node features, we divide the labels of nodes in the entire random network into 3 categories and randomly assign 3 labels to all nodes. For nodes with the same label, we use the same Gaussian distribution to generate node features. The Gaussian distributions of these three types of nodes have the same covariance matrix, but different means. Moreover, in this random network, node labels are highly correlated with the features of nodes, but not with the topological structure.
In this experiment, we utilize DAGNN variant and GCN to train this network respectively. As in literature [21], for each class, we randomly select 20 nodes for training and 200 nodes for testing at random. Besides, we ensure the optimal performance of the model through careful parameter tuning. The final experimental result is: the classification accuracy of GCN is 80.5%, while that of DAGNN variant is 98.0%. Obviously, the classification accuracy of DAGNN variant is significantly higher than that of GCN. The result meets the expectation, which indicates that the DAGNN variant has a stronger ability to adaptively fuse node features and topological structures than GCN.
Verification experiment of topological structures correlation
Similarly, we generate another random network composed of 900 nodes. The feature information of each node in the network is represented by a 50-dimensional vector, and the feature vectors are all randomly generated. To generate specific topological structure, we use the Stochastic Block Model (SBM) to divide the nodes in the random network into three clusters (nodes 0–299, 300–599, 600–899, respectively), in which the probability of establishing a connection between nodes in each cluster is 0.03, and the probability of establishing a connection between nodes in different clusters is 0.0015. In this random network, the label of the node is determined by the cluster where the node is located. In other words, the nodes in the same cluster have the same label, which reflects that the label of the node is highly correlated with the topology.
In the same way, we respectively apply DAGNN variant and GCN to this random network, and the experimental results obtained are in line with the expectations. Among them, the classification accuracy of GCN is 86.5%, while the classification accuracy of DAGNN variant is 95.0%. The classification accuracy of DAGNN variants is still higher than that of GCN, which shows that the DAGNN variant is superior to GCN in adaptively fusing node features and topological structures.
Verification experiment with real data set
In this experiment, we select three real data sets, Citeseer, BlogCatalog, and Flickr. For each data set, we set 30 nodes for each class on the training set for training and 1000 nodes on the test set for testing. The experimental results obtained are consistent with expectations. The classification accuracy of GCN on the Citeseer data set is 72.2% while that of DAGNN variants is 74.6%; the classification accuracy of GCN on the BlogCatalog dataset is 70.0% while that of DAGNN is 91.5%; the classification accuracy rate of GCN on the Flickr dataset is 42.2% while that of DAGNN variants is 72.7%. Obviously, the classification accuracy of DAGNN variants is still higher than that of GCN on these three real data sets, which manifests that the ability of DAGNN variants to adaptively fuse node features and topological structures noticeably surpasses GCN.
DAMGNN: Deep adaptive multi-channel graph neural network
Our work is focused on semi-supervised node classification in a graph
Combined with the latest research, we propose a deep adaptive multi-channel graph neural network model DAMGNN based on AMGCN model, and its structure is shown in Fig. 2.
The framework of DAMGNN model. It can be seen that the DAMGNN model replaces the core component GCN in the AMGCN model with a DAGNN variant and extracts information from different spaces through the DAGNN variant to generate representation of node embedding, then combines them through the adaptive channel framework to become a final embedding, and uses the final embedding to predict the labels of nodes.
The specific design idea is: First, we employ the KNN algorithm to generate a node feature-based structure graph from the original data set. Then we use the structure graph based on node feature and the original structure graph to propagate the node feature in feature space and topological space, and utilize use the DAGNN variant module with stronger fusion ability to learn the embedding representation
First of all, to learn the embedding representation of the node in the feature space, we construct a K-nearest neighbor graph
We use the calculated similarity matrix
Firstly, we input the feature graph (
Secondly, we input the topological graph (
Through the above process, specific DAGNN variant modules can be used in different spaces to extract specific information and encode to generate different embedding representations.
In fact, the feature space and the topological space are correlated to each other. The category of the node may be related to the information in the feature space or the topological space or both spaces. Therefore, we need to extract relevant information in the feature space and topological space and encode it as a specific embedding. We use the parameter-sharing DAGNN variant module to extract specific embeddings from the feature space and topological space respectively so that the extracted embeddings have a similarity.
First, we utilize the DAGNN variant module to extract the specific embeddings
Then, we utilize the DAGNN variant module again to extract the specific embeddings
Among them, the DAGNN variant modules in Eqs (5) and (6) need to use the same DAGNN variant, so that the parameter sharing can be adequately reflected in the model to ensure that the specific embedding
In this section, we mainly introduce in detail how to use the adaptive multi-channel framework in literature [21] to fuse the embeddings extracted from different spaces by DAGNN variants and to generate and optimize the final embeddings for downstream tasks.
Attention mechanism
Through the specific DAGNN variant module and the common DAGNN variant module, we get the specific embeddings
Where
Where
Similarly,
As to the embeddings
First, we adopt L2 normalization to normalize the embedding matrix to
Secondly, we adopt the F norm to calculate the disparity between
For embeddings
Where
Similarly, for embedding
Through the methods mentioned above to enhance the disparity between
In semi-supervised classification tasks, we apply linear transformation and Softmax function to the final embedding
Here,
Further, suppose the training set is
The final loss of the model is the sum of the losses in Eqs (13), (16), (18), and the final loss
Where
For the purpose of validating the effectiveness of our proposed DAMGNN, we verify node classification tasks on five public data sets: Citeseer [10], UAI2020 [37], ACM [38], BlogCatalog [39], Flickr [39].
Datasets and baselines
In this section, we chiefly introduce the data sets used in the node classification task and the baseline models for experimental comparison.
Datasets
As shown in Table 1, we select a total of 5 public data sets to evaluate the DAMGNN model. We conduct preliminary analysis and statistics on each data set and provide all the data websites to ensure the reproducibility of the experiment.
The statistics of the datasets
The statistics of the datasets
All the above data sets can be downloaded from the following URLs:
We compare the DAMGNN model with the most advanced methods currently available, including two network embedding methods and seven methods based on graph neural networks. In addition, we also provide the addresses of all the code websites to ensure the reproducibility of the experiment.
Similarly, the baseline models mentioned above can be downloaded from the following URLs:
Hyperparameter settings of DAMGNN model
Parameter settings: In order to comprehensively evaluate the DAMGNN model, we select three label rates as the training set (that is, 20, 40, and 60 nodes for each class) and 1000 nodes as the test set. For the DAMGNN model, we train three DAGNN variants with a different number of layers (
Learning of the main hyperparameters of the model
Since the performance of DAMGNN model in the node classification task is extremely sensitive to the changes of the hyperparameters
Analysis of parameter 
As clearly illustrated in Fig. 3, as the number of propagation layers increases, the performance of DAMGNN model typically starts as an upward trend and then drops, furthermore, the curve of 20, 40, 60 label rates show similar trends. Obviously, when the model reaches the optimal performance in different label rates, the corresponding number of propagation layers is also different, which indicates that using an appropriate number of propagation layers in the topological space is beneficial to improve the performance of the model. In the topological space, the larger the label rate, the smaller the number of layers corresponding to the optimal model performance, which is indicative that when the label rate of the data set becomes larger, the receptive field corresponding to the node gradually becomes smaller when the model reaches the optimum.
Analysis of parameter 
Analysis of parameter 
From the experimental results shown in Fig. 4, it is uncomplicated to find that the effect of the change of the hyperparameter
It is apparent in Fig. 5 that the effect of the change of hyperparameter
Results (%) of node classification tasks. (the black font is the best result, underline is the second best)
Results (%) of node classification tasks. (the black font is the best result, underline is the second best)
The results of node classification tasks are presented in Table 3, where L/C represents the number of labeled nodes in each category of the training set. According to the experimental results, the following conclusions can be drawn:
Compared with all baselines, the DAMGNN model achieves the best performance on all label rates in the 5 data sets of the experiment. Especially in terms of accuracy (ACC), the DAMGNN model achieves a maximum relative improvement of 11.36% on BlogCatalog and a relative improvement of 6.69% on Flickr. The experimental results show that the improvement effect of our proposed DAMGNN model is very significant compared to the current mainstream methods, which also verifies that the improvement idea we proposed is valid. On all data sets, the performance of DAGNN variants is at the forefront compared with other models, which shows that the stronger ability of DAGNN variant to adaptively fuse node features and topology can enable the model to extract more useful information for generating embedding representation of the node, hence better embedding representation of the node for downstream tasks can be obtained to effectively improve the performance of the model in downstream tasks. Compared with AMGCN, it is obvious that DAMGNN has improved significantly on data sets with better feature maps [21] (such as UAI2010, BlogCatalog, Flickr), which means that DAMGNN can extract more useful information from the feature map and use this information to generate better node embeddings.
Results of DAMGNN and its variants on the Citeseer, Uai2010 dataset.
In this section, we compare the results of DAMGNN and its four variants on the Citeseer, Uai2010 datasets to verify the effectiveness of utilizing an adaptive multi-channel framework and using different depth networks in different spaces for model improvement.
T-DAGNN-V: A separate DAGNN variant model that only learns the embedding of nodes in the topological space. The model can adjust its depth to be adapted to different data sets. F-DAGNN-V: A separate DAGNN variant model that only learns the embedding of nodes in the feature space. The model can also adjust its depth to be adapted to different data sets. C-DAGNN-V: A separate DAGNN variant model that only learns the embedding of nodes in the common space. The model also adjusts the depth to be adapted to different data sets. DAMGNN-C: This model uses the corresponding DAGNN variants in topological space, feature space and common space to learn node embedding, in which all DAGNN variants adapt the same model depth. DAMGNN: This model uses the corresponding DAGNN variants in topological space, feature space and common space to learn node embedding, in which all DAGNN variants adapt the different model depth.
Combining the results of Fig. 6 and Table 3, the following conclusions can be drawn: (1) On all data sets, DAMGNN is superior to DAGNN variants, which demonstrates that the use of an adaptive multi-channel framework in the DAMGNN model can effectively improve the performance of the model. (2) On all data sets, the results of DAMGNN are usually better than DAMGNN-C, indicating that the use of different layers of DAGNN variant modules in different spaces can effectively improve the classification ability of the model, which also means that using appropriate receptive fields in different spaces to propagate node information is beneficial to improve the model’s ability to classify nodes. (3) Comparing the results in Fig. 6 and Table 3, it can be found that although the adaptive multi-channel framework is not adopted, the DAGNN variant still achieves sub-optimal results on most data sets, indicating that the adoption of appropriate receiver fields, removing the entanglement of representation transformation and propagation, and the adoption of appropriate node embedding space are fairly rewarding to improve the classification ability of the model.
Given the insufficient ability of GCN to integrate node features and topology, we propose a new DAGNN variant model based on DAGNN model and verify that the fusion ability of DAGNN variants is stronger than GCN through a series of basic experiments. On this basis, we replace the GCN in the AMGCN model with the DAGNN variant and obtain a deep adaptive multi-channel graph neural network model DAMGNN. This model can learn better node embedding by adjusting the local and global receptive fields of nodes, and it can integrate node feature information and topological structure information to learn appropriate weights. A large number of experimental results on real-world data sets indicate that the DAMGNN model has superior performance compared with the current mainstream models.
