DAMGNN: Deep adaptive multi-channel graph neural networks

Abstract

Recently, several studies have reported that Graph Convolutional Networks (GCN) exhibit defects in integrating node features and topological structures in graphs. Although the proposal of AMGCN compensates for the drawbacks of GCN to some extent, it still cannot solve GCN’s insufficient fusion abilities fundamentally. Thus it is essential to find a network component with stronger fusion abilities to substitute GCN. Meanwhile, a Deep Adaptive Graph Neural Network (DAGNN) proposed by Liu et al. can adaptively aggregate information from different hops of neighborhoods, which remarkably benefits its fusion abilities. To replace GCN with DAGNN network in AMGCN model and further strengthen the fusion abilities of DAGNN network itself, we make further improvements based on DAGNN model to obtain DAGNN variant. Moreover, experimentally the fusion abilities of the DAGNN variant are verified to be far stronger than GCN. And then build on that, we propose a Deep Adaptive Multi-channel Graph Neural Network (DAMGNN). The results of lots of comparative experiments on multiple benchmark datasets show that the DAMGNN model can extract relevant information from node features and topological structures to the maximum extent for fusion, thus significantly improving the accuracy of node classification.

Keywords

Graph neural networks network representation learning deep learning mathematics of computing graph algorithms

1. Introduction

Graphic Data composed of entities and relationships are ubiquitous in the real world, such as social networks, point clouds, citation networks, biological networks, knowledge graphs, and molecular structures. Recently, graph neural networks for graph representation learning exerts an increasing tremendous fascination on folks, and many researchers aim to develop better Graph Neural Network Models. This phenomenon pushes the rapid development in the field of Graph Neural Network achieved great success in a host of applications, for example, node classification [1, 2, 3, 4, 5, 6, 7], graph classification [8, 9, 10, 11, 12, 13], and link prediction [14, 15, 16, 17, 18].

Typical Graph Convolutional Networks (GCN, Graph Convolutional Networks) [4] and their variants [3, 5, 6, 15, 19] usually adopt neighborhood aggregation (or messaging) scheme through considering node features and topological structures at the same time to learn node representation, the key step is feature aggregation, that is, a single node aggregates feature information from its topological neighbors in each convolutional layer. In this way, the feature information is propagated to the embedding of nodes through the topology of the network, which is used for downstream tasks, and the whole process mentioned above is supervised by the labels for partial nodes. GCN provides a kind of fusion strategy basing on node features and topological structures of node embedding and supervises the process of integration by the end-to-end learning framework, which partly explains its enormous success.

However, quite a few recent studies have revealed some drawbacks of GCN in the integration of node features and topological structures. For example, Li et al. [20] point out that GCN actually performs Laplacian smoothing on node features, which makes the node embeddings in the whole network converge gradually. But since the node embeddings are too similar, it’s unable to distinguish different nodes. In addition, the increase of convolutional layers will cause the node embeddings to become gradually similar. Therefore, shallow architecture is generally used in the design of convolutional layers in GCN and its variants. However, shallow architecture leads to that the nodes can only receive information from nearby neighbors, but not from the larger neighborhood, which will lead to the decline of GCN performance. Experiments directed by Zhu et al. [21] show that GCN has great drawbacks in the ability of adaptive integration of node features and topological structures. Although they design a novel adaptive multi-channel framework to enhance the integration ability of GCN and propose an Adaptive Multi-channel Graph Convolutional Networks (AMGCN) based on it, it cannot essentially improve AMGCN performance by using the original GCN as the core component of the network. As a result, it is necessary to find a network model better than GCN in the ability of integration node features and topological structures to improve the performance of classification models.

Recently, Liu et al. [22] manifest that when multiple GCN layers are superimposed, the performance degradation of Graph Neural Networks (GNN) is not caused by over smoothing [7, 20, 23], but the entanglement of representation transformation and propagation. Based on this, Liu et al. proposed the Deep Adaptive Graph Neural Networks (DAGNN), which decouples representation transformation and propagation and learns the representation of graph nodes from a larger receptive field without any performance degradation. Inspired by this, can we use DAGNN model, which has the capability to expand the receptive field of graph nodes and a unique mechanism to decouple entanglement, to design a model that has better fusion node features and topological structure capabilities than GCN?

In order to verify the above questions, we design a DAGNN variant based on DAGNN, and evaluated the ability of DAGNN variant to fuse node features and topological structures through experiments. The corresponding experimental results demonstrate that the ability of DAGNN variant is far superior to GCN in the fusion of node features and topological structures. And on this basis, we use DAGNN variant with stronger fusion ability to replace GCN in AMGCN and propose a Deep Adaptive Multi-channel Graph Neural Networks (DAMGNN) for semi-supervised classification. The core idea of DAMGNN is to utilize the DAGNN variant module to learn the embedding of nodes from the feature space, topological space, and common space at the same time, and adopt the adaptive multi-channel framework to combine the embeddings learned in different spaces to obtain the final node embedding and employ it for downstream tasks.

In summary, the contributions of this paper are as follows:

•
Based on DAGNN model, we propose a DAGNN variant model, and through experiments, we reach a conclusion that DAGNN variant is stronger than GCN in terms of fusion node features and topological structures.
•
We replace GCN components with DAGNN variant in AMGCN model. Based on this, we propose a new graph neural network model-deep adaptive multi-channel graph neural networks (DAMGNN) for semi-supervised classification, which can adaptively extract useful information from different spaces and fuse it for classification tasks.
•
Our extensive comparative experiments on a series of benchmark data sets verify that DAMGNN can extract highly relevant information from node features and topological structures, which can be used for node classification tasks. Compared with the current mainstream models, the classification accuracy of the DAMGNN model has been significantly improved.

2. Related work

The graph neural network model can be divided into four main categories: Recurrent Graph Neural Networks (RecGNNs, Recurrent Graph Neural Networks) designed to learn to represent nodes with recurrent neural structures [24, 25], Graph convolutional neural networks that extend convolution operations from grid data to graph data [26, 27, 28], graph autoencoders (GAEs, Graph Autoencoders) that learn network embedding and graph generation distribution [29, 30], Spatial-temporal Graph Neural Networks (STGNNs, Spatial-temporal Graph Neural Networks) aiming to learn hidden patterns from spatial-temporal graphs [31, 32]. Among them, due to the high efficiency of graph data processing by graph convolutional neural networks, in recent years, the application of graph convolutional neural networks has become increasingly widespread. For example, Spectral CNN [25] based on the convolution theorem, MoNet [6] based on aggregation function, R-GCNs [33] for modeling edge information, HA-GCN [10] for modeling high-order information, Deep graph convolutional neural network Co-Train [34] and large-scale graph convolutional neural network GraphSAGE [5]. The main idea of the graph convolutional neural network is to generate the representation of the node by aggregating the feature information of the node itself and the feature information of the neighboring nodes. In essence, most Graph Convolutional Neural Network models learn node embedding by integrating node features and topological structures. Recently, some studies have been carried out to explore the fusion mechanism of graph convolutional neural networks. For instance, Li et al. [20] point out that GCN performs Laplace smoothing on node features actually, work [35, 26] prove that the topological structure of the graph plays a role of low-pass filtering on the feature of the node. However, The literature [21] proves that the ability of GCN to extract relevant information from node features and topological structures and perform adaptive fusion is flawed, although a novel adaptive multi-channel framework is developed in [21] to enhance GCN’s ability of adaptive fusion of node features and topological structures, it does not substantially improve the GCN structure itself, which results in that the performance of the classification model based on GCN could not be substantially improved. Moreover, the above research works are to avoid the over-smoothness caused by the stacking of multiple graph convolutional layers. The models adopt a shallow architecture to make the nodes only receive the information of their close neighbors when aggregated. The DAGNN proposed in work [22] decouples representation transformation and propagation so that nodes can receive the information of multi-hop neighbors without over-smoothing. However, DAGNN only learns the embedding of nodes on the topological graph, which leads to insufficient information contained in the learned node embeddings. Therefore, it remains to be improved.

3. Fusion capability of dagnn variant

In this section, we will elaborate on the design ideas of DAGNN variant and then illustrate how to explore the ability of DAGNN variant to adaptively fuse node features and topological structures through a series of simple yet intuitive experiments.

3.1 DAGNN variant module

Despite the truth that the DAGNN model can separate representation transformation from the propagation so that a larger receptive field can be applied without degrading performance, after in-depth and detailed analysis, we found that the DAGNN model always uses the number of node categories for the dimensionality of node embedding, compared with this approach, if the dimension of the node embedding can be adjusted according to different data sets, it will help to improve the performance of the model. Therefore, in the DAGNN variant model, we set the dimension of the node embedding vector as a hyperparameter, so that the size of the embedding node can be adjusted artificially to adapt to different data sets. Besides, to enable the generated node embeddings to be used in the subsequent adaptive multi-channel framework to further enhance the fusion capability of DAGNN variant, we remove the final SoftMax function processing of DAGNN model. Combining the above two improvement ideas, we design a DAGNN variant model based on the DAGNN model, and its mathematical definition is as follows:

$\displaystyle\begin{array}[]{ll}Z=\operatorname{MLP}(X)&\in\mathbb{R}^{n\times d% }\\ H_{\ell}=\widehat{A}^{\ell}Z,\ell=1,2,\cdots,k&\in\mathbb{R}^{n\times d}\\ H=\operatorname{stack}\left(Z,H_{1},\cdots,H_{k}\right)&\in\mathbb{R}^{n\times% (k+1)\times d}\\ S=\sigma(Hs)&\in\mathbb{R}^{n\times(k+1)\times 1}\\ \tilde{S}=\text{reshape}(S)&\in\mathbb{R}^{n\times 1\times(k+1)}\\ X_{\text{out}}=\text{squeeze}(\widetilde{S}H)&\in\mathbb{R}^{n\times d}\end{array}$ (1)

Where $d$ represents the hyperparameter of the vector dimension of the node embedding, and ${Z}\in{R}^{{n}\times d}$ is a new node feature matrix learned from the original node feature matrix through MLP, using the symmetrical normalized propagation mechanism to learn $\hat{\mathbf{A}}=\tilde{\mathbf{D}}^{-\frac{1}{2}}\tilde{\mathbf{A}}\tilde{% \mathbf{D}}^{-\frac{1}{2}}$ , where $\tilde{\mathbf{A}}=\mathbf{A}+\mathbf{I}$ . And $k$ is a hyperparameter representing the depth of DAGNN model. ${S}\in{\mathbb{R}}^{{d}\times 1}$ is a trainable projection vector. Further, ${{\sigma(\cdot})}$ represents an activation function, here we apply a sigmoid function. We use Stack, Reshape, and Squeeze to change the dimensions of the data so that the dimensions can match each other during operations. The specific DAGNN variant model structure diagram is shown in Fig. 1.

Figure 1.

The framework of DAGNN variant model. This diagram illustrates how the model generates a corresponding embedding representation for a node, here the bold lowercase letters $x,z,h_{1},h_{2},\cdots,h_{k},x_{\textit{out}}$ indicate embedding vector. And $S$ represents a projection vector used to retain the embedded score generated by different receptive fields, and $s_{0},s_{1},s_{2},\cdots,s_{k}$ represent the retained score of $z,h_{1},h_{2},\cdots,h_{k}$ , respectively.

3.2 Exploratory experiment

References literature [21], we conduct a series of comparative experiments to explore the ability of DAGNN variant to adaptively fuse node features and topological structures. The main idea is to establish two networks respectively, the first network node labels are highly correlated with the node features while the second network is highly correlated with the topology, then we test the adaptive fusion ability of DAGNN variant in these two experiments. Finally, we test the adaptive fusion ability of the DAGNN variant on three real data sets to further ensure the reliability of the conclusion. If the adaptive fusion ability of DAGNN variant is better than that of GCN, then the DAGNN variant should be able to perform better than GCN in these kinds of experiments, in other words, the DAGNN variant has better adaptive fusion ability of node features and topological structures than GCN, which means that the performance of the model based on DAGNN variant is better than that of the model based on GCN.

3.2.1 Verification experiment of node features correlation

First, we generate a random network composed of 900 nodes, where the probability of establishing a connection between any two nodes is 0.03, and the feature of each node is represented by a 50-dimensional vector. To generate node features, we divide the labels of nodes in the entire random network into 3 categories and randomly assign 3 labels to all nodes. For nodes with the same label, we use the same Gaussian distribution to generate node features. The Gaussian distributions of these three types of nodes have the same covariance matrix, but different means. Moreover, in this random network, node labels are highly correlated with the features of nodes, but not with the topological structure.

In this experiment, we utilize DAGNN variant and GCN to train this network respectively. As in literature [21], for each class, we randomly select 20 nodes for training and 200 nodes for testing at random. Besides, we ensure the optimal performance of the model through careful parameter tuning. The final experimental result is: the classification accuracy of GCN is 80.5%, while that of DAGNN variant is 98.0%. Obviously, the classification accuracy of DAGNN variant is significantly higher than that of GCN. The result meets the expectation, which indicates that the DAGNN variant has a stronger ability to adaptively fuse node features and topological structures than GCN.

3.2.2 Verification experiment of topological structures correlation

Similarly, we generate another random network composed of 900 nodes. The feature information of each node in the network is represented by a 50-dimensional vector, and the feature vectors are all randomly generated. To generate specific topological structure, we use the Stochastic Block Model (SBM) to divide the nodes in the random network into three clusters (nodes 0–299, 300–599, 600–899, respectively), in which the probability of establishing a connection between nodes in each cluster is 0.03, and the probability of establishing a connection between nodes in different clusters is 0.0015. In this random network, the label of the node is determined by the cluster where the node is located. In other words, the nodes in the same cluster have the same label, which reflects that the label of the node is highly correlated with the topology.

In the same way, we respectively apply DAGNN variant and GCN to this random network, and the experimental results obtained are in line with the expectations. Among them, the classification accuracy of GCN is 86.5%, while the classification accuracy of DAGNN variant is 95.0%. The classification accuracy of DAGNN variants is still higher than that of GCN, which shows that the DAGNN variant is superior to GCN in adaptively fusing node features and topological structures.

3.2.3 Verification experiment with real data set

In this experiment, we select three real data sets, Citeseer, BlogCatalog, and Flickr. For each data set, we set 30 nodes for each class on the training set for training and 1000 nodes on the test set for testing. The experimental results obtained are consistent with expectations. The classification accuracy of GCN on the Citeseer data set is 72.2% while that of DAGNN variants is 74.6%; the classification accuracy of GCN on the BlogCatalog dataset is 70.0% while that of DAGNN is 91.5%; the classification accuracy rate of GCN on the Flickr dataset is 42.2% while that of DAGNN variants is 72.7%. Obviously, the classification accuracy of DAGNN variants is still higher than that of GCN on these three real data sets, which manifests that the ability of DAGNN variants to adaptively fuse node features and topological structures noticeably surpasses GCN.

Summary: The above verification experiments demonstrate that the ability of DAGNN variant to adaptively fuse node features and topological structures is stronger than that of GCN. Therefore, if the network core component GCN in the classification model can be replaced with a DAGNN variant, it will inevitably further improve the classification effect of the model.

4. DAMGNN: Deep adaptive multi-channel graph neural network

Our work is focused on semi-supervised node classification in a graph $G=(A,X)$ , where ${A}\in{\mathbb{R}}^{{n}\times n}$ is the symmetric adjacency matrix of graph $G$ . When $A_{ij}=1$ , it means that there exists an edge connecting node $i$ with node $j$ , otherwise $A_{ij}=0$ . ${X}\in{\mathbb{R}}^{{n}\times d}$ is the node feature matrix of the graph $G$ , where each row vector $x_{i}$ in $X$ represents the eigenvector of node $i$ , where $n$ is the number of nodes in graph $G$ , and $d$ is the dimension of the node feature.

Combined with the latest research, we propose a deep adaptive multi-channel graph neural network model DAMGNN based on AMGCN model, and its structure is shown in Fig. 2.

Figure 2.

The framework of DAMGNN model. It can be seen that the DAMGNN model replaces the core component GCN in the AMGCN model with a DAGNN variant and extracts information from different spaces through the DAGNN variant to generate representation of node embedding, then combines them through the adaptive channel framework to become a final embedding, and uses the final embedding to predict the labels of nodes.

The specific design idea is: First, we employ the KNN algorithm to generate a node feature-based structure graph from the original data set. Then we use the structure graph based on node feature and the original structure graph to propagate the node feature in feature space and topological space, and utilize use the DAGNN variant module with stronger fusion ability to learn the embedding representation $Z_{T}$ and $Z_{F}$ of the node from the feature space and the topological space respectively. Since feature space and topological space may have common feature information, DAGNN variant module with shared parameters is used to learn specific embeddings $Z_{CT}$ and $Z_{CF}$ from feature space and topological space respectively, and then $Z_{CT}$ and $Z_{CF}$ are combined to generate embeddings $Z_{C}$ based on common space. Furthermore, we use the adaptive multi-channel framework to combine the learned embeddings $Z_{T}$ , $Z_{F}$ , and $Z_{C}$ to obtain the final node representation $Z$ . In the adaptive multi-channel framework, the consistency loss $\mathcal{L}_{{c}}$ is used to improve the similarity between $Z_{CT}$ and $Z_{CF}$ , and the disparity loss $\mathcal{L}_{{d}}$ is used to enhance the disparity between $Z_{T}$ and $Z_{CT}$ as well as between $Z_{F}$ and $Z_{CF}$ . The DAMGNN model combines the DAGNN variant with an adaptive multi-channel mechanism that can enhance model fusion capabilities, which further improves the model’s overall ability to adaptively fuse node features and topological structures. In this way, the model can extract information with stronger relevance from node features and topological structures to generate better embedding representations for the final classification task.

4.1 Specific DAGNN variant modules

First of all, to learn the embedding representation of the node in the feature space, we construct a K-nearest neighbor graph $G_{F}=(A_{F},X)$ based on the node feature matrix $X$ obtained from the original data, where $A_{F}$ is the adjacency matrix of the K-nearest neighbor graph. In addition, we calculate the similarity matrix ${S}\in{\mathbb{R}}^{{n}\times n}$ of n nodes, Where $S_{ij}$ represents the similarity between node $i$ and node $j$ , which is calculated by the cosine similarity between the feature vector $X_{i}$ of node $i$ and the feature vector $X_{j}$ of node $j$ . The specific calculation method is shown in Eq. (2).

$\displaystyle{S}_{{ij}}=\frac{{x}_{{i}}\cdot{x}_{{j}}}{\left|{x}_{{i}}\right|% \cdot\left|{x}_{{j}}\right|}$ (2)

We use the calculated similarity matrix $S$ to select top k similar node pairs in the graph and then connect them to obtain the adjacency matrix $A_{F}$ of the K-nearest neighbor graph.

Firstly, we input the feature graph ( $A_{F},X$ ) in the feature space, and use the DAGNN variant module to generate a specific embeddings $Z_{F}$ , which can be formulated as:

$\displaystyle Z_{F}=\textit{Spec}-\textit{DAGNN}-\textit{Variants}(A_{F},X)$ (3)

Secondly, we input the topological graph ( $A_{T},X$ ) in the topological space and use the DAGNN variant module to generate a specific embedded $Z_{T}$ , which can be formulated as:

$\displaystyle Z_{T}=\textit{Spec}-\textit{DAGNN}-\textit{Variants}(A_{T},X)$ (4)

Through the above process, specific DAGNN variant modules can be used in different spaces to extract specific information and encode to generate different embedding representations.

4.2 Common DAGNN variant modules

In fact, the feature space and the topological space are correlated to each other. The category of the node may be related to the information in the feature space or the topological space or both spaces. Therefore, we need to extract relevant information in the feature space and topological space and encode it as a specific embedding. We use the parameter-sharing DAGNN variant module to extract specific embeddings from the feature space and topological space respectively so that the extracted embeddings have a similarity.

First, we utilize the DAGNN variant module to extract the specific embeddings $Z_{CF}$ from the feature space ( $A_{F},X$ ), which can be formulated as:

$\displaystyle Z_{CF}=\textit{Com}-\textit{DAGNN}-\textit{Variants}(A_{F},X)$ (5)

Then, we utilize the DAGNN variant module again to extract the specific embeddings $Z_{CT}$ from the topological space ( $A_{T},X$ ), which can be formulated as:

$\displaystyle Z_{CT}=\textit{Com}-\textit{DAGNN}-\textit{Variants}(A_{T},X)$ (6)

Among them, the DAGNN variant modules in Eqs (5) and (6) need to use the same DAGNN variant, so that the parameter sharing can be adequately reflected in the model to ensure that the specific embedding $Z_{CF}$ and $Z_{CT}$ extracted by the common DAGNN variant module have sufficient similarity. The final embedding $Z_{C}$ of the module can be formulated as:

$\displaystyle Z_{C}=(Z_{CF}+Z_{CT})/2$ (7)

4.3 Adaptive multi-channel framework

In this section, we mainly introduce in detail how to use the adaptive multi-channel framework in literature [21] to fuse the embeddings extracted from different spaces by DAGNN variants and to generate and optimize the final embeddings for downstream tasks.

4.3.1 Attention mechanism

Through the specific DAGNN variant module and the common DAGNN variant module, we get the specific embeddings $Z_{T}$ and $Z_{F}$ and the common embeddings $Z_{C}$ . Since the category of a node may be related to one or all of its embeddings, we adopt the attention mechanism $\textit{att}(Z_{T},Z_{F},Z_{C})$ to learn the importance of different embeddings, which can be expressed as:

$\displaystyle(\alpha_{t},\alpha_{f},\alpha_{c})=\textit{att}(Z_{T},Z_{F},Z_{C})$ (8)

Where $\alpha_{t}$ , $\alpha_{f}$ , $\alpha_{c}\in\mathbb{R}^{n\times 1}$ respectively represent the attention value of $n$ nodes with embedding $Z_{T}$ , $Z_{F}$ and $Z_{C}$ , and $Z_{T}^{i}\in\mathbb{R}^{1\times d}$ (i.e., the i-th row of $Z_{T}$ ) represents node $i$ in embedding $Z_{T}$ . Here we apply a linear layer and a nonlinear conversion function to convert the embedding $Z_{T}^{i}$ into an intermediate vector, and then multiply the intermediate vector with the shared attention vector $\textbf{q}\in\mathbb{R}^{d\times 1}$ to obtain the attention value $\omega_{T}^{i}$ . The specific expression is as follows:

$\displaystyle\omega_{T}^{i}=\mathbf{q}^{T}\cdot\tanh\left(\mathbf{W}\cdot\left% (\mathbf{z}_{T}^{i}\right)^{T}+\mathbf{b}\right)$ (9)

Where $\mathbf{W}\in R^{h\times d}$ represents the weight matrix, and b is the bias vector. For embedding $Z_{C}$ and $Z_{F}$ , the attention values $\omega_{C}^{i}$ and $\omega_{F}^{i}$ can be obtained in the same way, and then $\omega_{T}^{i}$ , $\omega_{C}^{i}$ , $\omega_{F}^{i}$ normalized by the Softmax function to get the final Attention value:

$\displaystyle\alpha_{T}^{i}=\operatorname{softmax}(\omega_{T}^{i})=\frac{\exp(% \omega_{T}^{i})}{\exp(\omega_{T}^{i})+\exp(\omega_{C}^{i})+\exp(\omega_{F}^{i})}$ (10)

Similarly, $\alpha_{C}^{i}=\operatorname{softmax}(\omega_{C}^{i})$ , $\alpha_{F}^{i}=\operatorname{softmax}(\omega_{F}^{i})$ , the more important the embedding, the larger the corresponding attention value. For all n nodes in the graph, we obtain learnable attention vectors $\alpha_{t}=[\alpha_{T}^{i}]$ , $\alpha_{f}=[\alpha_{F}^{i}]$ , $\alpha_{c}=[\alpha_{C}^{i}]\in\mathbb{R}^{n\times 1}$ , and respectively denote $\alpha_{T}=\textit{diag}(\alpha_{t})$ , $\alpha_{C}=\textit{diag}(\alpha_{c})$ and $\alpha_{F}=\textit{diag}(\alpha_{f})$ . After getting attention values of different embeddings, we are able to calculate the final embedding Z according to the different embeddings and the corresponding attention values. The calculation method is as follows:

$\displaystyle\mathbf{Z}=\alpha_{T}\cdot Z_{T}+\alpha_{C}\cdot Z_{C}+\alpha_{F}% \cdot Z_{F}$ (11)

4.3.2 Loss of consistency

As to the embeddings $Z_{CF}$ and $Z_{CT}$ learned by the common DAGNN variant module in a common space, we utilize the consistency loss to further improve the similarity between $Z_{CF}$ and $Z_{CT}$ .

First, we adopt L2 normalization to normalize the embedding matrix to $\mathbf{Z}_{\textit{CFnor}}$ and $\mathbf{Z}_{\textit{CTnor}}$ , and then use two embedding matrices of normalization to calculate the similarity $S_{T}$ and $S_{F}$ of n nodes in the graph. $S_{T}$ and $S_{F}$ can be expressed as:

$\displaystyle\mathbf{S}_{T}=\mathbf{Z}_{\textit{CTnor}}\cdot\mathbf{Z}_{% \textit{CTnor}}^{T},$

(12) $\displaystyle\mathbf{S}_{F}=\mathbf{Z}_{\textit{CFnor}}\cdot\mathbf{Z}_{% \textit{CFnor}}^{T}.$

Secondly, we adopt the F norm to calculate the disparity between $S_{T}$ and $S_{F}$ to obtain the consistency loss $\mathcal{L}_{{c}}$ of $Z_{CF}$ and $Z_{CT}$ , and enhance the similarity between $Z_{CF}$ and $Z_{CT}$ by reducing the consistency loss $\mathcal{L}_{{c}}$ . The consistency loss $\mathcal{L}_{{c}}$ can be expressed as:

$\displaystyle\mathcal{L}_{c}=\|S_{T}-S_{F}{\|}_{F}^{2}$ (13)
4.3.3 Loss of disparity

For embeddings $Z_{T}$ and $Z_{CT}$ learned through specific DAGNN variant module and common DAGNN variant module, we employ the Hilbert-Schmidt Independence Criterion (HSIC) [36] to enhance the disparity between embeddings $Z_{T}$ and $Z_{CT}$ . The HSIC is a simple but effective independent measurement method, which has been employed in plenty of machine learning tasks. The process of using HSIC to enhance the disparity between $Z_{T}$ and $Z_{CT}$ can be expressed as follows:

$\displaystyle\mathbf{HSIC}(\mathbf{Z}_{T},\mathbf{Z}_{CT})={(n-1)}^{-2}tr({% \mathbf{RK}_{T}}\mathbf{RK}_{CT})$ (14)

Where $\mathbf{K}_{T}$ and $\mathbf{K}_{CT}$ are the Gram matrices composed of $\mathbf{k}_{T}$ and $\mathbf{k}_{CT}$ respectively. Here $k_{T,ij}=k_{T}(Z_{T}^{i},Z_{T}^{j})$ , $k_{CT,ij}=k_{CT}(Z_{CT}^{i},Z_{CT}^{j})$ . In addition, $\mathbf{R}=\mathbf{I}-\frac{1}{n}ee^{T}$ , where I is the identity matrix, and $e$ is an all-one column. In the calculation process, we use the inner product kernel function for $\mathbf{K}_{T}$ and $\mathbf{K}_{CT}$ .

Similarly, for embedding $Z_{F}$ and $Z_{CF}$ learned through specific DAGNN variant modules and common DAGNN variant modules, we also employ HSIC to enhance the disparity between $Z_{F}$ and $Z_{CF}$ . The process can be expressed as:

$\displaystyle\mathbf{HSIC}(\mathbf{Z}_{F},\mathbf{Z}_{CF})={(n-1)}^{-2}tr({% \mathbf{RK}_{F}}\mathbf{RK}_{CF})$ (15)

Through the methods mentioned above to enhance the disparity between $Z_{T}$ and $Z_{CT}$ and the disparity between $Z_{F}$ and $Z_{CF}$ , the loss of disparity is represented by $\mathcal{L}_{{d}}$ and the disparity loss $\mathcal{L}_{{d}}$ can be expressed as:

$\displaystyle\mathcal{L}_{d}=\mathbf{HSIC}(Z_{T},Z_{CT})+\mathbf{HSIC}(Z_{F},Z% _{CF})$ (16)

4.4 Optimization goals

In semi-supervised classification tasks, we apply linear transformation and Softmax function to the final embedding Z obtained in Eq. (11) to calculate the categories of all nodes in the graph. The category of the node in the graph can be expressed as $\hat{\mathbf{Y}}=\left[\hat{y}_{ic}\right]\in\mathbb{R}^{n\times C}$ , where $\hat{y}_{ic}$ is the probability that node $i$ belongs to category $C$ , and the calculation formula of $\hat{\mathbf{Y}}$ is shown in Eq. (17):

$\displaystyle\hat{\mathbf{Y}}=\textit{softmax}(\mathbf{W}\cdot\mathbf{Z}+b)$ (17)

Here, $\operatorname{softmax}(x)=\frac{\exp(x)}{\sum_{c=1}^{C}\exp(x_{c})}$ is to normalize all categories.

Further, suppose the training set is $L$ , for each element $l$ in $L$ , $l\in L$ , its true label is represented by $Y_{l}$ and the predicted label is represented by $\hat{Y}_{l}$ . Then for all nodes in the training set, the classification loss is calculated using cross-entropy loss, and the cross-entropy loss $\mathcal{L}_{t}$ for node classification can be expressed as:

$\displaystyle\mathcal{L}_{t}=-\sum_{l\in L}\sum_{i=1}^{C}\mathbf{Y}_{li}\ln% \hat{\mathbf{Y}}_{li}$ (18)

The final loss of the model is the sum of the losses in Eqs (13), (16), (18), and the final loss $\mathcal{L}$ can be expressed as:

$\displaystyle\mathcal{L}=\mathcal{L}_{t}+\gamma\mathcal{L}_{c}+\beta\mathcal{L% }_{d}$ (19)

Where $\gamma$ and $\beta$ are hyperparameters of consistency loss and disparity loss. Under the supervision of the real label of the node in the data set, backpropagation can be used to optimize the model and learn better embeddings for node classification.

5. Experiments

For the purpose of validating the effectiveness of our proposed DAMGNN, we verify node classification tasks on five public data sets: Citeseer [10], UAI2020 [37], ACM [38], BlogCatalog [39], Flickr [39].

5.1 Datasets and baselines

In this section, we chiefly introduce the data sets used in the node classification task and the baseline models for experimental comparison.

5.1.1 Datasets

As shown in Table 1, we select a total of 5 public data sets to evaluate the DAMGNN model. We conduct preliminary analysis and statistics on each data set and provide all the data websites to ensure the reproducibility of the experiment.

Table 1
The statistics of the datasets

Dataset	Nodes	Edges	Classes	Features	Training	Test
Citeseer	3327	4732	6	3703	120/240/360	1000
UAI2010	3067	28311	19	4973	380/760/1140	1000
ACM	3025	13128	3	1870	60/120/180	1000
BlogCatalog	5196	171743	6	8189	120/240/360	1000
Flickr	7575	239738	9	12047	180/360/540	1000

•

Citeseer: Citeseer is a network of research citations for academic papers, in which nodes represent papers and edges represent citation links. Node attributes are bag-of-words representations of the papers and all nodes are divided into six areas.

•

UAI2010: The UAI2010 data set has 3067 nodes and 28311 edges, and this data set has been tested in graph convolutional networks for community detection in literature [25].

•

ACM: This network is extracted from the ACM data set. The nodes in the network represent papers. Besides, if two papers have the same author, then the nodes corresponding to the two papers will have an edge connecting the two nodes. All nodes in the network are divided into 3 categories (database, wireless communication, data mining), and the attributes of the nodes are bag-of-words that correspond to the keywords of the paper.

•

BlogCatalog: This data set is a large-scale social network in which blog authors and their social relationships come from blog sites. Among them, the nodes in the network represent the bloggers, the edges represent the social relationships of the authors, the attributes of the nodes are composed of keywords in the profile of the blogger, and the labels of the nodes represent the topic categories provided by the bloggers. All nodes in the network are divided into 6 classes.

•

Flickr: Flickr is a hosting site for images and videos, and users on this site interact through photo sharing. The data set is a social network. The nodes in the network represent users, and the relationships between users are represented by edges. All nodes in the network are divided into 9 categories according to the interests of users.

All the above data sets can be downloaded from the following URLs:

•

Citeseer: https://github.com/tkipf/pygcn

•

UAI2010: http://linqs.umiacs.umd.edu/projects//projects/lbc/index.html

•

ACM: https://github.com/Jhy1993/HAN

•

BlogCatalog: https://github.com/mengzaiqiao/CAN

•

Flickr: https://github.com/mengzaiqiao/CAN

5.1.2 Baselines

We compare the DAMGNN model with the most advanced methods currently available, including two network embedding methods and seven methods based on graph neural networks. In addition, we also provide the addresses of all the code websites to ensure the reproducibility of the experiment.

•
DeepWalk [40] is a network embedding method, which uses random walk to obtain context information and learns the embedding representation of the network through Skipgram algorithm.
•
LINE [41] is a large-scale network embedding method, which can preserve the first-order and second-order proximity of the network. In this experiment, we use LINE (1st $+$ 2nd).
•
Chebyshev [42] is a GCN-based method and utilizes Chebyshev filters.
•
GCN [4] is a semi-supervised graph convolutional network model, which uses the neighborhood aggregation method to learn the representation of nodes.
•
GAT [5] is a graph neural network model utilizing the attention mechanism to aggregate node features.
•
DEMO-Net [2] is a specific graph neural network model for node classification.
•
MixHop [1] is a method based on GCN, which mixes the feature representation of high-order neighbor nodes in its graph convolutional layer.
•
AMGCN [21] is a method based on GCN. This method can not only learn the embedding representations of nodes from different spaces, but also adopt an adaptive multi-channel method to combine different embedding representations of the same node for the final node classification task.
•
DAGNN-variant is a graph neural network model that can extract information from the neighborhood of nodes of different orders for learning the embedding representation of the node.

Similarly, the baseline models mentioned above can be downloaded from the following URLs:

•
DeepWalk, LINE: https://github.com/thunlp/OpenNE
•
Chebyshev: https://github.com/tkipf/gcn
•
GCN in Pytorch: https://github.com/tkipf/pygcn
•
GAT in Pytorch: https://github.com/Diego999/pyGAT/
•
DEMO-Net: https://github.com/jwu4sml/DEMO-Net
•
MixHop: https://github.com/samihaija/mixhop
•
AMGCN: https://github.com/zhumeiqiBUPT/AM-GCN

Table 2
Hyperparameter settings of DAMGNN model

Datasets L/C nhid1 nhid2 Dropout $l_{1}$ $l_{2}$ $l_{3}$ lr Weight- decay Epoch k $\beta$ $\gamma$

Citeseer 20 768 256 0.85 16 16 20 0.0002 5e-5 100 7 5e-10 0.01

Citeseer 40 768 256 0.81 7 7 6 0.0002 5e-5 100 7 5e-10 0.001

Citeseer 60 768 256 0.81 7 4 5 0.0002 5e-4 100 7 5e-10 0.001

UAI2010 20 512 128 0.84 3 3 3 0.001 5e-4 200 5 1e-16 0.001

UAI2010 40 512 128 0.84 2 2 2 0.001 5e-4 100 5 1e-17 0.001

UAI2010 60 512 128 0.78 1 4 2 0.001 5e-4 100 4 1e-17 0.001

ACM 20 768 256 0.74 1 1 13 0.00025 5e-4 50 7 1e-11 0.001

ACM 40 768 256 0.74 13 14 13 0.0005 5e-4 50 5 1e-11 0.001

ACM 60 768 256 0.74 13 14 13 0.0005 5e-4 50 5 1e-11 0.001

BlogCatalog 20 512 128 0.86 1 2 4 0.0002 5e-11 100 5 5e-11 0.01

BlogCatalog 40 512 128 0.86 1 2 4 0.0002 5e-11 100 6 5e-11 0.005

BlogCatalog 60 512 128 0.84 1 2 4 0.0002 1e-5 100 5 5e-11 0.005

Flickr 20 512 128 0.83 3 17 15 0.0003 5e-4 50 9 1e-10 0.01

Flickr 40 512 128 0.7 11 4 11 0.0005 5e-4 40 9 1e-10 0.01

Flickr 60 512 128 0.7 3 4 11 0.0005 5e-4 50 9 1e-10 0.01

5.2 Experimental setup

Datasets	L/C	nhid1	nhid2	Dropout	$l_{1}$	$l_{2}$	$l_{3}$	lr	Weight- decay	Epoch	k	$\beta$	$\gamma$
Citeseer	20	768	256	0.85	16	16	20	0.0002	5e-5	100	7	5e-10	0.01
Citeseer	40	768	256	0.81	7	7	6	0.0002	5e-5	100	7	5e-10	0.001
Citeseer	60	768	256	0.81	7	4	5	0.0002	5e-4	100	7	5e-10	0.001
UAI2010	20	512	128	0.84	3	3	3	0.001	5e-4	200	5	1e-16	0.001
UAI2010	40	512	128	0.84	2	2	2	0.001	5e-4	100	5	1e-17	0.001
UAI2010	60	512	128	0.78	1	4	2	0.001	5e-4	100	4	1e-17	0.001
ACM	20	768	256	0.74	1	1	13	0.00025	5e-4	50	7	1e-11	0.001
ACM	40	768	256	0.74	13	14	13	0.0005	5e-4	50	5	1e-11	0.001
ACM	60	768	256	0.74	13	14	13	0.0005	5e-4	50	5	1e-11	0.001
BlogCatalog	20	512	128	0.86	1	2	4	0.0002	5e-11	100	5	5e-11	0.01
BlogCatalog	40	512	128	0.86	1	2	4	0.0002	5e-11	100	6	5e-11	0.005
BlogCatalog	60	512	128	0.84	1	2	4	0.0002	1e-5	100	5	5e-11	0.005
Flickr	20	512	128	0.83	3	17	15	0.0003	5e-4	50	9	1e-10	0.01
Flickr	40	512	128	0.7	11	4	11	0.0005	5e-4	40	9	1e-10	0.01
Flickr	60	512	128	0.7	3	4	11	0.0005	5e-4	50	9	1e-10	0.01

Parameter settings: In order to comprehensively evaluate the DAMGNN model, we select three label rates as the training set (that is, 20, 40, and 60 nodes for each class) and 1000 nodes as the test set. For the DAMGNN model, we train three DAGNN variants with a different number of layers ( $l_{1}\in{\{1\dots 20\}}$ , $l_{2}\in{\{1\dots 20\}}$ , $l_{3}\in{\{1\dots 25\}}$ ) at the same time. The three DAGNN variants have same input dimension $(\textit{nhid}1\in\{512,768\})$ and output dimension $(\textit{nhid}2\in\{128,256\})$ . Besides, we use a learning rate between 0.0002 and 0.001 and a Adam optimizer, dropout range is $0.7\sim 0.9$ , and we select weight-decay in $\{5e-11,1e-5,5e-5,5e-4\}$ and the hyperparameter k $\in\{2\dots 10\}$ for k-nearest neighbor graph. For the coefficients of consistency loss and disparity loss, we choose in $\{0.01,0.001,0.005\}$ and $\{1e-17,1e-16,1e-11,5e-11,1e-10,5e-10\}$ respectively. For the performance of models DeepWalk, LINE, Chebyshev, GCN, GAT, DEMO-Net, MixHop, AMGCN, refer to the literature [21]. For the models DAGNN-Virant, DAMGNN, we run 5 times to take the average and use the accuracy rate (ACC) and macro F1 score (F1) to evaluate the performance of the model. All the detailed parameters of DAMGNN model are provided in Table 2 to ensure the reproducibility of the experiment.

5.3 Learning of the main hyperparameters of the model

Since the performance of DAMGNN model in the node classification task is extremely sensitive to the changes of the hyperparameters $l_{1}$ , $l_{2}$ , and $l_{3}$ , we explore the impact of changes in hyperparameters $l_{1}$ , $l_{2}$ , and $l_{3}$ on the model performance on the Citeseer training dataset to ensure the stability of DAMGNN model performance.

The number of layers of DAGNN variant module in the topological space $l_{1}$ : we test the impact of the change in the number of propagation layers $l_{1}$ of a specific DAGNN module based on topological space in the DAMGNN model. The change of $l_{1}$ ranges from 1 to 20, and the result is shown in Fig. 3.

Figure 3.

Analysis of parameter $l_{1}$ .

As clearly illustrated in Fig. 3, as the number of propagation layers increases, the performance of DAMGNN model typically starts as an upward trend and then drops, furthermore, the curve of 20, 40, 60 label rates show similar trends. Obviously, when the model reaches the optimal performance in different label rates, the corresponding number of propagation layers is also different, which indicates that using an appropriate number of propagation layers in the topological space is beneficial to improve the performance of the model. In the topological space, the larger the label rate, the smaller the number of layers corresponding to the optimal model performance, which is indicative that when the label rate of the data set becomes larger, the receptive field corresponding to the node gradually becomes smaller when the model reaches the optimum.

The number of layers of DAGNN variant modules in the feature space $l_{2}$ : we test the effect of the change in the number of propagation layers $l_{2}$ of a specific DAGNN module based on feature space in the DAMGNN model. The change of $l_{2}$ ranges from 1 to 20, and the result is depicted in Fig. 4.

Figure 4.

Analysis of parameter $l_{2}$ .

Figure 5.

Analysis of parameter $l_{3}$ .

From the experimental results shown in Fig. 4, it is uncomplicated to find that the effect of the change of the hyperparameter $l_{2}$ on the performance of DAMGNN model has a similar trend to that of the hyperparameter $l_{1}$ . Similarly, we can draw the following conclusions that using an appropriate number of propagation layers in the feature space is helpful to improve the performance of the model, in other words, as the label rate of the data set increases, the receptive field corresponding to the node gradually becomes smaller when the model reaches the optimum.

The number of layers of DAGNN variant module in the common space $l_{3}$ : we test the influence of the change in the number of propagation layers $l_{3}$ of a specific DAGNN module based on common space in the DAMGNN model. The change of $l_{3}$ ranges from 1 to 25, and the result is given in Fig. 5.

It is apparent in Fig. 5 that the effect of the change of hyperparameter $l_{3}$ on the performance of the DAMGNN model is similar to that of hyperparameters $l_{1}$ and $l_{2}$ . In the same way, it can be suggested from the statistics that using the appropriate number of propagation layers in common spaces is rewarding to improve the performance of the model, that is, with the label rate of the data set rises, the receptive field corresponding to the node gradually becomes smaller when the model reaches the optimum.

5.4 Analysis and comparison of node classification results

Table 3
Results (%) of node classification tasks. (the black font is the best result, underline is the second best)

Datasets	Metrics	L/ C	DeepWalk	LINE	Chebyshev	GCN	GAT	DEMO- Net	MixHop	AMGCN	DAGNN- Virant	DAMGNN
Citeseer	ACC	20	43.47	32.71	69.80	70.30	72.50	69.50	71.40	73.10	73.30	75.20
Citeseer	ACC	40	45.15	33.32	71.64	73.10	61.54	70.44	71.48	74.70	75.30	76.30
Citeseer	ACC	60	48.86	35.39	73.26	74.48	74.76	71.86	72.16	75.56	76.70	77.10
Citeseer	F1	20	38.09	31.75	65.92	67.50	68.14	67.84	66.96	68.42	69.77	69.93
Citeseer	F1	40	43.18	32.42	68.31	69.70	69.58	66.97	67.40	69.81	71.39	72.38
Citeseer	F1	60	48.01	34.37	70.31	71.24	71.60	68.22	69.31	70.92	73.32	73.92
UAI2010	ACC	20	42.02	43.47	50.02	49.88	56.92	23.45	61.56	70.10	71.50	73.60
UAI2010	ACC	40	51.26	45.37	58.18	51.80	63.74	30.29	65.05	73.14	74.60	77.70
UAI2010	ACC	60	54.37	51.05	59.82	54.40	68.44	34.11	67.66	74.40	75.60	78.30
UAI2010	F1	20	32.93	37.01	33.65	32.86	39.61	16.82	49.19	55.61	59.40	61.33
UAI2010	F1	40	46.01	39.62	38.80	33.80	45.08	26.36	53.86	64.88	60.00	66.71
UAI2010	F1	60	44.43	43.76	40.60	34.12	48.97	29.05	56.31	65.99	64.48	67.87
ACM	ACC	20	62.69	41.28	75.24	87.80	87.36	84.48	81.08	90.40	89.00	91.80
ACM	ACC	40	63.00	45.83	81.64	89.06	88.60	85.70	82.34	90.76	89.90	92.10
ACM	ACC	60	67.03	50.41	85.43	90.54	90.40	86.55	83.09	91.42	91.20	92.70
ACM	F1	20	62.11	40.12	74.86	87.82	87.44	84.16	81.40	90.43	88.98	91.79
ACM	F1	40	61.88	45.79	81.26	89.00	88.55	84.83	81.13	90.66	89.95	92.08
ACM	F1	60	66.99	49.92	85.26	90.49	90.39	84.05	82.24	91.36	91.19	92.68
BlogCatalog	ACC	20	38.67	58.75	38.08	69.84	64.08	54.19	65.46	81.98	91.00	91.30
BlogCatalog	ACC	40	50.80	61.12	56.28	71.28	67.40	63.47	71.66	84.94	92.90	93.10
BlogCatalog	ACC	60	55.02	64.53	70.06	72.66	69.95	76.81	77.44	87.30	93.70	94.10
BlogCatalog	F1	20	34.96	57.75	33.39	68.73	63.38	52.79	64.89	81.36	90.62	90.86
BlogCatalog	F1	40	48.61	60.72	53.86	70.71	66.39	63.09	70.84	84.32	92.65	92.83
BlogCatalog	F1	60	53.56	63.81	68.37	71.80	69.08	76.73	76.38	86.94	93.55	93.83
Flickr	ACC	20	24.33	33.25	23.26	41.42	38.52	34.89	39.56	75.26	67.60	80.30
Flickr	ACC	40	28.79	37.67	35.10	45.48	38.44	46.57	55.19	80.06	76.30	83.30
Flickr	ACC	60	30.10	38.54	41.70	47.96	38.96	57.30	64.96	82.10	81.60	86.60
Flickr	F1	20	21.33	31.19	21.27	39.95	37.00	33.53	40.13	74.63	66.95	80.43
Flickr	F1	40	26.90	37.12	33.53	43.27	36.94	45.23	56.25	79.36	75.72	83.04
Flickr	F1	60	27.28	37.77	40.17	46.58	37.35	56.49	65.73	81.81	81.46	86.55

The results of node classification tasks are presented in Table 3, where L/C represents the number of labeled nodes in each category of the training set. According to the experimental results, the following conclusions can be drawn:

•

Compared with all baselines, the DAMGNN model achieves the best performance on all label rates in the 5 data sets of the experiment. Especially in terms of accuracy (ACC), the DAMGNN model achieves a maximum relative improvement of 11.36% on BlogCatalog and a relative improvement of 6.69% on Flickr. The experimental results show that the improvement effect of our proposed DAMGNN model is very significant compared to the current mainstream methods, which also verifies that the improvement idea we proposed is valid.

•

On all data sets, the performance of DAGNN variants is at the forefront compared with other models, which shows that the stronger ability of DAGNN variant to adaptively fuse node features and topology can enable the model to extract more useful information for generating embedding representation of the node, hence better embedding representation of the node for downstream tasks can be obtained to effectively improve the performance of the model in downstream tasks.

•

Compared with AMGCN, it is obvious that DAMGNN has improved significantly on data sets with better feature maps [21] (such as UAI2010, BlogCatalog, Flickr), which means that DAMGNN can extract more useful information from the feature map and use this information to generate better node embeddings.

5.5 Ablation experiment

Figure 6.

Results of DAMGNN and its variants on the Citeseer, Uai2010 dataset.

In this section, we compare the results of DAMGNN and its four variants on the Citeseer, Uai2010 datasets to verify the effectiveness of utilizing an adaptive multi-channel framework and using different depth networks in different spaces for model improvement.

•

T-DAGNN-V: A separate DAGNN variant model that only learns the embedding of nodes in the topological space. The model can adjust its depth to be adapted to different data sets.

•

F-DAGNN-V: A separate DAGNN variant model that only learns the embedding of nodes in the feature space. The model can also adjust its depth to be adapted to different data sets.

•

C-DAGNN-V: A separate DAGNN variant model that only learns the embedding of nodes in the common space. The model also adjusts the depth to be adapted to different data sets.

•

DAMGNN-C: This model uses the corresponding DAGNN variants in topological space, feature space and common space to learn node embedding, in which all DAGNN variants adapt the same model depth.

•

DAMGNN: This model uses the corresponding DAGNN variants in topological space, feature space and common space to learn node embedding, in which all DAGNN variants adapt the different model depth.

Combining the results of Fig. 6 and Table 3, the following conclusions can be drawn: (1) On all data sets, DAMGNN is superior to DAGNN variants, which demonstrates that the use of an adaptive multi-channel framework in the DAMGNN model can effectively improve the performance of the model. (2) On all data sets, the results of DAMGNN are usually better than DAMGNN-C, indicating that the use of different layers of DAGNN variant modules in different spaces can effectively improve the classification ability of the model, which also means that using appropriate receptive fields in different spaces to propagate node information is beneficial to improve the model’s ability to classify nodes. (3) Comparing the results in Fig. 6 and Table 3, it can be found that although the adaptive multi-channel framework is not adopted, the DAGNN variant still achieves sub-optimal results on most data sets, indicating that the adoption of appropriate receiver fields, removing the entanglement of representation transformation and propagation, and the adoption of appropriate node embedding space are fairly rewarding to improve the classification ability of the model.

6. Conclusion

Given the insufficient ability of GCN to integrate node features and topology, we propose a new DAGNN variant model based on DAGNN model and verify that the fusion ability of DAGNN variants is stronger than GCN through a series of basic experiments. On this basis, we replace the GCN in the AMGCN model with the DAGNN variant and obtain a deep adaptive multi-channel graph neural network model DAMGNN. This model can learn better node embedding by adjusting the local and global receptive fields of nodes, and it can integrate node feature information and topological structure information to learn appropriate weights. A large number of experimental results on real-world data sets indicate that the DAMGNN model has superior performance compared with the current mainstream models.

References

Abu-El-Haija

Perozzi

Kapoor

Alipourfard

Lerman

Harutyunyan

Ver Steeg

and Galstyan

, Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing, in: International Conference on Machine Learning, PMLR, 2019, pp. 21–29.

and Xu

, Net: Degree-specific graph neural networks for node and graph classification, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 406–415.

Hamilton

W.L.

Ying

and Leskovec

, Inductive representation learning on large graphs, arXiv preprint arXiv: 1706.02216, 2017.

Kipf

T.N.

and Welling

, Semi-supervised classification with graph convolutional networks, arXiv preprint arXiv: 1609.02907, 2016.

Veličković

Cucurull

Casanova

Romero

Lio

and Bengio

, Graph attention networks, arXiv preprint arXiv:1710.10903, 2017.

Souza

Zhang

Fifty

and Weinberger

, Simplifying graph convolutional networks, in: International Conference on Machine Learning, PMLR, 2019, pp. 6861–6871.

Tian

Sonobe

Kawarabayashi

K.-i.

and Jegelka

, Representation learning on graphs with jumping knowledge networks, in: International Conference on Machine Learning, PMLR, 2018, pp. 5453–5462.

Gao

and Ji

, Graph u-nets, in: International Conference on Machine Learning, PMLR, 2019, pp. 2083–2092.

Zhang

Cui

Neumann

and Chen

, An end-to-end deep learning architecture for graph classification, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, 2018.

10.

Lee

and Kang

, Self-attention graph pooling, in: International Conference on Machine Learning, PMLR, 2019, pp. 3734–3743.

11.

Wang

Aggarwal

C.C.

and Tang

, Graph convolutional networks with eigenpooling, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 723–731.

12.

Leskovec

and Jegelka

, How powerful are graph neural networks, arXiv preprint arXiv:1810.00826, 2018.

13.

Yuan

and Ji

, Structpool: Structured graph pooling via conditional random fields, in: Proceedings of the 8th International Conference on Learning Representations, 2020.

14.

Kipf

T.N.

and Welling

, Variational graph auto-encoders, arXiv preprint arXiv:1611.07308, 2016.

15.

You

Ying

and Leskovec

, Position-aware graph neural networks, in: International Conference on Machine Learning, PMLR, 2019, pp. 7134–7143.

16.

Cai

and Ji

, A multi-scale approach for graph link prediction, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 3308–3315.

17.

Zhang

and Chen

, Weisfeiler-lehman neural machine for link prediction, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 575–583.

18.

Zhang

and Chen

, Link prediction based on graph neural networks, arXiv preprint arXiv:1802.09691, 2018.

19.

Cui

Kuang

Wang

and Zhu

, Disentangled graph convolutional networks, in: International Conference on Machine Learning, PMLR, 2019, pp. 4212–4221.

20.

Han

and Wu

X.-M.

, Deeper insights into graph convolutional networks for semi-supervised learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, 2018.

21.

Wang

Zhu

Cui

Shi

and Pei

, Am-gcn: Adaptive multi-channel graph convolutional networks, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 1243–1253.

22.

Liu

Gao

and Ji

, Towards deeper graph neural networks, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 338–348.

23.

Chen

Lin

Zhou

and Sun

, Measuring and relieving the over-smoothing problem for graph neural networks from the topological view, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 3438–3445.

24.

Scarselli

Gori

Tsoi

A.C.

Hagenbuchner

and Monfardini

, The graph neural network model, IEEE Transactions on Neural Networks 20(1) (2008), 61–80.

25.

Dai

Kozareva

Dai

Smola

and Song

, Learning steady-states of iterative algorithms over graphs, in: International Conference on Machine Learning, PMLR, 2018, pp. 1106–1114.

26.

Bruna

Zaremba

Szlam

and LeCun

, Spectral networks and locally connected networks on graphs, arXiv preprint arXiv:1312.6203, 2013.

27.

Micheli

, Neural network for graphs: A contextual constructive approach, IEEE Transactions on Neural Networks 20(3) (2009), 498–511.

28.

Chiang

W.-L.

Liu

Bengio

and Hsieh

C.-J.

, Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 257–266.

29.

Zheng

Cheng

Aggarwal

C.C.

Song

Zong

Chen

and Wang

, Learning deep network representations with adversarially regularized autoencoders, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 2663–2671.

30.

Chen

and Xiao

, Constrained generation of semantically valid graphs via regularizing variational autoencoders, arXiv preprint arXiv:1809.02630, 2018.

31.

Seo

Defferrard

Vandergheynst

and Bresson

, Structured sequence modeling with graph convolutional recurrent networks, in: International Conference on Neural Information Processing, Springer, 2018, pp. 362–373.

32.

Guo

Lin

Feng

Song

and Wan

, Attention based spatial-temporal graph convolutional networks for traffic flow forecasting, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 922–929.

33.

Feng

You

Zhang

and Gao

, Hypergraph neural networks, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 3558–3565.

34.

Gong

and Cheng

, Exploiting edge features for graph neural networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9211–9219.

35.

Bojchevski

and Günnemann

, Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking, arXiv preprint arXiv:1707.03815, 2017.

36.

Song

Smola

Gretton

Borgwardt

K.M.

and Bedo

, Supervised feature selection via dependence estimation, in: Proceedings of the 24th International Conference on Machine Learning, 2007, pp. 823–830.

37.

Wang

Liu

Jiao

Chen

and Jin

, A Unified Weakly Supervised Framework for Community Detection and Semantic Matching, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2018, pp. 218–230.

38.

Wang

Shi

Wang

Cui

and Yu

P.S.

, Heterogeneous graph attention network, in: The World Wide Web Conference, 2019, pp. 2022–2032.

39.

Meng

Liang

Bao

and Zhang

, Co-embedding attributed networks, in: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, 2019, pp. 393–401.

40.

Perozzi

Al-Rfou

and Skiena

, Deepwalk: Online learning of social representations, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, pp. 701–710.

41.

Tang

Wang

Zhang

Yan

and Mei

, Line: Large-scale information network embedding, in: Proceedings of the 24th International Conference on World Wide Web, 2015, pp. 1067–1077.

42.

Defferrard

Bresson

and Vandergheynst

, Convolutional neural networks on graphs with fast localized spectral filtering, arXiv preprint arXiv:1606.09375, 2016.

DAMGNN: Deep adaptive multi-channel graph neural networks

Abstract

Keywords

1. Introduction

3. Fusion capability of dagnn variant

3.1 DAGNN variant module

3.2.1 Verification experiment of node features correlation

3.2.2 Verification experiment of topological structures correlation

3.2.3 Verification experiment with real data set

4. DAMGNN: Deep adaptive multi-channel graph neural network

4.3.1 Attention mechanism

5.1 Datasets and baselines

5.1.1 Datasets

Table 1 The statistics of the datasets

5.3 Learning of the main hyperparameters of the model

Table 3 Results (%) of node classification tasks. (the black font is the best result, underline is the second best)

References

Table 1
The statistics of the datasets

Table 3
Results (%) of node classification tasks. (the black font is the best result, underline is the second best)