Abstract
Document-level relation extraction aims to uncover relations between entities by harnessing the intricate information spread throughout a document. Previous research involved constructing discrete syntactic matrices to capture syntactic relationships within documents. However, these methods are significantly influenced by dependency parsing errors, leaving much of the latent syntactic information untapped. Moreover, prior research has mainly focused on modeling two-hop reasoning between entity pairs, which has limited applicability in scenarios requiring multi-hop reasoning. To tackle these challenges, a syntax-enhanced multi-hop reasoning network (SEMHRN) is proposed. Specifically, the approach begins by using a dependency probability matrix that incorporates richer grammatical information instead of a sparse syntactic parsing matrix to build the syntactic graph. This effectively reduces syntactic parsing errors and enhances the model’s robustness. To fully leverage dependency information, dependency-type-aware attention is introduced to refine edge weights based on connecting edge types. Additionally, a part-of-speech prediction task is included to regularize word embeddings. Unrelated entity pairs can disrupt the model’s focus, reducing its efficiency. To concentrate the model’s attention on related entity pairs, these related pairs are extracted, and a multi-hop reasoning graph attention network is employed to capture the multi-hop dependencies among them. Experimental results on three public document-level relation extraction datasets validate that SEMHRN achieves a competitive F1 score compared to the current state-of-the-art methods.
Keywords
Introduction
Relation extraction is a technique that automatically analyzes and processes textual information to extract structured entity relationship information from unstructured natural language text. By automatically analyzing large-scale text, it extracts relationship information, providing more intelligent and accurate information processing and decision support for various application domains such as artificial intelligence applications such as information retrieval, intelligent question answering, and conversational agents [1, 2]. Additionally, in scientific research, relation extraction holds significant value for integrating and analyzing extensive literature, driving scientific discoveries. This contributes to advancing artificial intelligence technology and promoting a deeper understanding of natural language. By addressing this challenge, it not only propels the development of the field of natural language processing but also lays the groundwork for constructing more intelligent technologies that better meet human needs. Early efforts [3, 4] in relation extraction primarily focused on the sentence-level. However, in practice, relations between entities may require expressions spanning multiple sentences. Therefore, researchers [5–7] have delved into in-depth studies of document-level relation extraction.
Identifying relationships between entities throughout the entire document requires the model to possess strong reasoning abilities. Figure 1 illustrates an example taken from the DocRED [8] dataset, which focuses on document-level relation extraction. Determining the relationship between the entities “Washington County” and “Oregon state” is challenging. It requires multi-hop reasoning. Specifically, the relation between the entities “Washington County” and the “United States” needs to be deduced first, along with the relation between the entities “Oregon state” and the “United States”. Only then can the relation between the entities “Washington County” and “Oregon state” be inferred. In this case, the entities “Portland Golf Club” and the “United States” serve as intermediary entities, establishing a logical path connecting “Washington County” to “Oregon state”. By scrutinizing the reasoning process depicted in Fig. 1, it is evident that document-level relation extraction involves handling numerous entities, a variety of relationship types, and complex contextual information.

An example from the DocRED dataset includes 6 entities, 7 statements, and 10 relations. Different colors represent various entities. The first sentence is parsed, and the part-of-speech tagging of each word, along with the dependency type of important words, is presented. The process of reasoning about the relation between the subject “Washington County” and the object “Oregon” is visualized.
To address the aforementioned challenges, earlier models [9–11] of document-level relation extraction utilized neural architectures such as convolutional neural networks (CNN) [12] and recurrent neural networks (RNN) [13] to perceive contextual information. These approaches face limitations related to the encoder’s contextual modeling capability, making it difficult for them to effectively capture global dependencies. Recently, graph neural networks (GNN) [14, 15] have been widely applied in the field of document-level relation extraction due to their capability to capture long-distance contextual dependencies. These methods primarily utilize heuristic rules [16–18], discourse relations [19], syntactic dependency information [20], or structural attention [21] to construct a document graph. Subsequently, GNNs are employed to perform multi-hop reasoning.
Nonetheless, current GNN-based models exhibit three evident limitations: i) Although syntactic dependency information contributes to establishing semantic connections between words within a sentence, previous methods overly relied on the final discrete output of syntactic parsers. This excessive reliance led to the accumulation of parsing noise in document-level relation extraction systems, undermining the performance of the model. ii) In previous approaches, syntactic dependency information, such as dependency edges and part-of-speech information, has not been fully utilized. These additional syntactic dependency details can also enhance the identification of relationships. As depicted in Fig. 1, when judging the relation between entities “United States” and “Portland” as “country", one can promptly infer that the relation between entities “United States” and “Oregon” is also “country”. This inference is based on the appositive dependency relation between the entities “Portland” and “Oregon”. iii) The previous methods did not explicitly model multi-hop reasoning between entity pairs, resulting in inefficient multi-hop reasoning between entity pairs. In their approach, when inferring two-hop relations, each entity pair needs to consider 2 (n - 1) entity pairs that overlap with them. However, when inferring three-hop relations, each entity pair needs to consider 22 (n - 1) 2 entity pairs. When inferring n-hop relations, each entity pair needs to identify entity pairs with dependencies from 2(n-1) (n - 1) (n-1) possible entity pairs, and most of these entity pairs do not contribute to identifying the relations of the target entity pair. Therefore, their method primarily operates effectively in two-hop relations and faces challenges in modeling multi-hop relations.
In this paper, a syntax-enhanced multi-hop reasoning network (SEMHRN) is proposed to address the aforementioned limitations. To overcome the first limitation, the probability matrix of all dependency arcs from a dependency parser is employed to construct a syntax graph. This approach is motivated by the fact that the probability matrix, representing relations between words’ dependencies, has more extensive syntactic information compared to the dependency parser’s final discrete output. To address the second limitation, a novel dependency-type-aware attention mechanism is designed to learn the interaction between words of syntactic dependency edges with different labels. Additionally, a part-of-speech prediction task is introduced to regularize word representation. The third limitation is tackled through a novel two-stage design. In the first stage, only related entity pairs are extracted. In the second stage, multi-hop dependency relations between these related entity pairs are captured, thereby mitigating interference from noisy entity pairs.
In summary, the contributions can be summarized as follows: A syntax-enhanced multi-hop reasoning network (SEMHRN) is proposed. The network constructs a syntactic graph using the probability matrix of all dependency arcs, providing a fine-grained modeling of information in the document and effectively alleviating interference from syntactic parsing errors. A novel dependency-type-aware attention is proposed to guide attention in learning interactive patterns of different types of dependency edges and a part-of-speech prediction task is introduced to regularize word representations. This fully leverages syntactic information, enhancing the model’s performance. A two-stage relation modeling method is designed. By extracting related entity pairs in the first stage and modeling multi-hop interactions between these pairs in the second stage, the model’s ability to capture long-range dependencies is improved. The experimental results demonstrate that SEMHRN outperforms compared methods by a noticeable margin. Additionally, an extensive series of experiments is carried out to analyze and discuss each critical component of SEMHRN. The experimental results show the remarkable ability of SEMHRN to capture multi-hop dependencies.
Sentence-level relation extraction
The purpose of relation extraction is to identify connections between entities within a given text. In the early stages, many efforts [3, 4] to identify entity relationships within a single sentence relied on statistical machine learning, where performance was significantly constrained by the quality of manually extracted features, inevitably leading to error propagation within the model. With the rise of neural networks, an increasing number of researchers have turned to using neural networks for automatic feature extraction. For example, Wang et al. [22] proposed a multi-level attention CNN that leverages entity-specific attention at the input layer and relationship-specific pooling attention at the pooling layer to identify patterns within heterogeneous contexts.
However, many relations between entities require multiple sentences to be fully expressed. Therefore, in recent years, there has been a growing emphasis on document-level relation extraction [5, 7].
Document-level relation extraction
Methods for document-level relation extraction mainly fall into sequence-based methods and graph-based methods. To emphasize the distinctions and connections among these methods, the key techniques employed by them are summarized, as shown in Table 1.
Key techniques of existing document-level relation extraction methods
Key techniques of existing document-level relation extraction methods
Sequence-based methods in the realm of relation extraction often rely on neural structures or pretrained models for encoding entire documents. Wang et al. [23] divided document-level relation extraction into two distinct stages: in the first stage, they determine whether a relationship between entity pairs exists, and in the second stage, they infer specific relationships between the entity pairs. To enhance the model’s co-referential reasoning ability, Ye et al. [24] introduced mention reference prediction and masked language modeling tasks, explicitly aiming to learn co-referential information. Eberts et al. [25] presented a joint model that integrates multi-task and multi-instance learning strategies to capture essential information from comprehensive textual data. Zhou et al. [16] suggested a localized context pooling technique for addressing multi-entity issues and an adaptive thresholding technique for handling multi-label problems. The adaptive thresholding employs a specialized class, “TH”, as the adaptive threshold value for each entity pair, while the localized context pooling incorporates multi-head attention to gather critical information for each entity pair. Li et al. [27] posits that local and global reasoning patterns play different roles in relation extraction, proposing a novel mention-based reasoning module to learn local and global document features. Zhang et al. [28] likened the relation extraction task to semantic segmentation, employing a U-shaped segmentation module to capture global information between entity pairs. Tan et al. [29] leveraged axial attention to strengthen interactions between two-hop entity pairs and used knowledge distillation to glean extensive information from a distantly supervised dataset. Ma et al. [30] employed evidence information as a supervisory signal, thereby guiding the attention module to allocate high weights to the evidence. Tang et al. [31] devised a hierarchical reasoning model that integrates entity-level, sentence-level, and document-level information. Xu et al. [32] introduced a Transformer-based SSAN, incorporating unique entity structural dependencies into the self-attention mechanism and employing two transformation modules for attention flow regulation. Huang et al. [33] believed that relation extraction and evidence prediction were highly correlated, so they proposed the E2GRE model to jointly learn both tasks. Han et al. [34] emphasized the guiding role of co-occurrence correlations of relations in aiding the classifier to identify semantically similar entity pairs. Two granularities of co-occurrence prediction tasks were introduced to capture relation-related information. Du et al. [35] increased tail relation entity pairs to tackle the long-tailed distribution issue and utilized a contrastive learning method to improve the model’s performance. Yuan et al. [36] employed a prompt tuning strategy to manage long texts and adopted a semantic segmentation method to capture entity interaction.∥While these sequence-based methods have achieved commendable performance, their inherent limitations become evident when it comes to capturing non-local syntactic information, which is indispensable for a more comprehensive and nuanced comprehension of the text.∥
Graph-based methods
Graph-based methods utilize semantic information within the document to construct document-level graphs and employ graph convolutional networks to capture non-local dependencies. Wang et al. [37] devised a global representation layer to simulate interactions among numerous hierarchical nodes and a local representation layer to aggregate multiple mentions pertaining to the same entity. Christopoulou et al. [38] introduced an edge-oriented graph neural model to capture inference information from various node and edge types. Guo [39] innovatively designed an AGGCN model, leveraging an attention-guided approach to capture valuable syntactic information while minimizing noise. Xu et al. [40] developed a reconstructor that utilizes the graph representation to reconstruct ground-truth path dependencies. Sahu et al. [20] introduced GCNN to learn both local and global dependencies, along with applying multiple-instance learning to aggregate multiple mention-level pairs. Nan et al. [21] employed structured attention to construct a document graph, which was then processed by a multi-layer graph convolutional network to capture underlying structural information. Zeng et al. [16] introduced a mention-level graph to model complex interactions among mentions and an entity-level graph to construct two-hop reasoning paths between entities. Zeng et al. [41] introduced a sentence-level encoder to collect intra-sentence dependencies and a mention-level graph to obtain inter-sentence structural information. Zeng et al. [42] designed a context-guided attention mechanism to dynamically aggregate content mentioning the same entity through weighted summation, as well as an inter-pair reasoning approach to model interactions between entity pairs. Wang et al. [19] utilized discourse information to construct a document-level graph for capturing semantic dependencies between text units. Liu et al. [43] developed a document-level graph aimed at capturing comprehensive global dependency information. Simultaneously, they established an entity-level graph to facilitate reasoning about inter-sentence entities. Wan [17] reconstructed the document by region and introduced bridge entities to construct a dependency structure, aiming to improve the efficiency of relation extraction.∥While the graph-based methods mentioned above have driven the development of document-level relation extraction, most of them have overlooked the modeling of multi-hop reasoning among entity pairs within the document. To tackle these issues, a syntax-enhanced multi-hop reasoning network is designed to capture remote dependencies between entity pairs. As shown in Table 1, this model incorporates four key techniques. Firstly, a syntactic graph is constructed using the probability matrix of dependency arcs outputted by the dependency parser, rather than the dependency tree, as the probability matrix of dependency arcs provides richer syntactic information. Secondly, a novel dependency-type-aware attention mechanism is designed to learn the interaction strength between words for different types of dependency edges. Thirdly, a part-of-speech prediction subtask is introduced to regularize word representations, motivated by the intuition that part-of-speech information aids in predicting relations between entities. Lastly, a two-stage multi-hop reasoning framework is proposed. In the first stage, entity pairs with relations are extracted, and in the second stage, multi-hop reasoning between these related entity pairs is modeled to identify specific relation types.
Method
Task formulation
Document-level relation extraction is a multi-classification task aimed at identifying relations between entities within the scope of a document.
Model overview
In this section, the proposed model SEMHRN, incorporating syntax-enhanced and multi-hop reasoning to enhance document-level relation extraction, is described. As illustrated in Fig. 2, SEMHRN mainly consists of five parts, namely (i) the context encoder; (ii) the syntax-enhanced graph convolutional network; (iii) the multi-hop reasoning graph convolutional network; (iv) the part of speech prediction; and (v) the final classifer. The context encoder and the final classifier in these five components largely follow the settings of Zhou et al. [16], while the remaining parts introduce innovative approaches.

The architecture of SEMHRN incorporates several key components to enhance document-level relation extraction.
The pre-training model performs well in various downstream tasks. For a document D, it can be encoded as:
Following the approach of Zhou et al. [16], the hidden representation of the first word within a mention is employed to represent the entire mention.
Previous methods employed syntax trees to construct document dependency graphs and fed them into GCN [14] to capture syntax information through multiple iterations. Typically, syntax parsers prioritize the dependency with the highest probability, often neglecting others. However, when the probabilities of two dependencies are similar, ignoring one can result in issues. This rough estimation contributes to syntax parsing errors, which, in turn, hinder model performance.
To address these challenges, the dependency probability matrix generated by an external parser is utilized to construct the document graph. Unlike the sparse dependency matrix, the dependency probability matrix defines the probability of dependency between words within a sentence, providing richer syntactic information and mitigating parsing errors. Furthermore, entity type information proves valuable in identifying relations between entities. Six entity type nodes, namely ORG-node, LOC-node, TIME-node, PER-node, MISC-node, and NUM-node, are introduced into the graph to establish semantic relations with entities. To acquire entity type knowledge, these six entity type nodes are set to be globally shared across all sentences in the dataset. In essence, entity type nodes begin with random initialization and refine their representations through interactions with various entity words.
Six unique types of edges are defined for constructing the document-level graph A. The detailed descriptions of these six types of edges are provided below:
The document-level graph construction process is illustrated in Fig. 3.

Construction of the document-level graph. The document-level graph comprises four types of nodes, namely token nodes, mention nodes, entity type nodes, and document nodes. The document-level graph encompasses six types of edges, including Syntactic dependency edges, Mention-token edges, Document-mention edges, Mention-Mention edges, Mention-type edges, and Self-node edges (not shown in the graph).
Following that, GCN is employed on the document-level graph to capture global structural information. For a given node i, its representation in the l + 1-th layer is computed based on the representations of its neighboring nodes in the l-th layer. The graph convolutional operation can be expressed as follows:
To incorporate more dependency knowledge, dependency-type-aware attention is introduced in the multi-layer GCN. Consequently, Equation (2) undergoes a rewriting as follows:
The normalized attention coefficient between node i and node j is expressed as:
Each layer of the GCN represents varying degrees of semantic features. To effectively utilize features from all levels, the hidden states from each layer are concatenated. Subsequently, this concatenated representation is fed through a linear layer to produce the ultimate hidden state of node i:
It’s important to note that a certain entity may be mentioned multiple times within the document. Effective relation extraction relies heavily on the ability to aggregate pertinent information from these multiple mentions. While the maximum pooling method is a viable option for retaining the most essential mention characteristics, it poses a challenge for optimization due to its non-differentiability. To address this challenge, the logsumexp function, a common operation encountered in machine learning, particularly in the implementation and derivation of cross-entropy, is utilized. Not only does it serve as a smooth approximation of maximum pooling, but it is also differentiable. The logsumexp function, with the introduction of a temperature coefficient T close to 0, is employed to obtain the entity embedding h
i
. For an entity e
i
, its entity embedding can be expressed as:
Finally, document-aware representations for tokens and entities are obtained. After GCN encoding, the node representations of tokens can learn contextual semantics and syntactic structure. The special entity type nodes will also learn general representations of entity types from the training examples in the dataset.
Previous methods often use the embedding of subject and object directly to predict the relation of entity pairs. This is problematic since different entity pairs should be concerned with different contextual information. Therefore, the average pooling is introduced to convert token-level dependencies learned by the pre-training model into entity-level dependencies, and this transfer process is expressed as:
For an entity pair (e
s
, e
o
), context representation c(s,o) is computed based on well-learned dependencies from the pre-training language model:
The embeddings of entities e s and e o are combined with their contextual embeddings:
A group bilinear is employed to reduce the number of parameters in the bilinear classifier. To be specific, z h and z t will be split into k equal-sized groups:
The bilinear function is applied to obtain the representation of the entity pair (e h , e t ).
Then, the probability of entity pairs having a relation can be calculated, formally as follows:
The cross-entropy is employed to calculate loss L EPE as follows:
Connecting related entity pairs extracted in the first stage constructs an entity-level graph. Subsequently, a graph attention network (GAT) [15] is employed to learn entity interaction information.
For an entity node e i , the calculation of its representation in the l + 1-th layer is defined as follows:
Following this, each entity pair is treated as a node, and a GAT is introduced to capture multi-hop dependencies between entity pairs. Diverging from prior methods that account for two-hop dependencies for each entity pair, we aggregate dependency information exclusively for the extracted related entity pairs. The connection is established between these related entity pairs and other entity pairs that share the same entities with them. This is guided by the intuition that reasoning cues often propagate along entity chains. Based on equations (9) - (13), the calculation of the entity pair representation p u for node u is feasible. Its representation in the (l + 1)-th layer is defined as follows:
The visualization of the multi-hop reasoning process can be seen in Fig. 4. The target entity pairs learn two-hop reasoning information from their horizontally and vertically related entity pairs. As more layers of graph networks are stacked, the target entity pairs can capture multi-hop reasoning information.

The visualization depicts the multi-hop reasoning process, showcasing that with an increase in the number of GAT layers, the SEMHRN model can effectively capture relations between multi-hop entity pairs.
In addition to syntactic dependencies, the part of speech of words in the document also contains valuable information that defines the usage and function of words. Part-of-speech tagging plays a crucial role in relation extraction for several reasons:
Disambiguation: Different usages of certain words can represent distinct meanings, and part-of-speech tagging helps disambiguate them.
Strengthening word-based features: While machine learning models can extract information from various aspects of a word, adding part-of-speech information can enhance the accuracy of these features.
Numerous studies have demonstrated that incorporating auxiliary tasks in relation extraction can enhance the performance of the primary task. To enhance the utilization of syntactic knowledge, an auxiliary task for word part-of-speech classification is introduced. Initially, a part-of-speech tag set T is created based on 36 predefined part-of-speech tags provided by the Natural Language Toolkit. Subsequently, the probability of word x i belonging to a particular part of speech is calculated by passing h i through a linear layer:
In this way, the word embeddings can be regularized by part of speech information.
Then, the probability of the entity pair (h, t) having a relation r can be calculated, formally as follows:
In a document, there can exist multiple relations between entities. Previous approaches involved setting a global threshold to determine the presence of relations, resulting in numerous decision errors due to variations in confidence levels among different entity pairs. To address this multi-label problem, Zhou et al. [16] introduced a “TH” class to establish a dynamic threshold for each entity pair. Following their approach, the loss function is partitioned into two categories to compute the loss values separately:
Finally, the integration of the relation extraction task and part-of-speech prediction task is achieved by minimizing the multi-task learning objective:
Dataset
The performance of SEMHRN model is assessed using three public document-level relation extraction datasets.
Implementation details
Context encoder on DocRED is implemented using BERT-base [46] or RoBERTa-large [47], and on CDR and GDA, it is implemented using SciBERT [48]. The dimension of the dependency edge embedding is configured to be 100. The number of GCN layers is set to 2 and The number of multi-hop reasoning GAT layers is set to 3. Training the network involves 30 epochs with early stopping, and the learning rate is set to 3e-5. Other hyper-parameters including batch size is set to 4, dropout rate [49] between layers is set to 0.1, λ POS is set to 0.3 and λ EPE is set to 0.1.
Baselines methods
Twenty models are employed as benchmarks, categorized into sequence-based methods and graph-based methods. Their specific descriptions are as follows:
Results on DocRED
Following the standard setup for document-level relation extraction tasks, the performance of these methods is evaluated using F1 and IgnF1 metrics. IgnF1 specifically calculates the F1 score after excluding relation instances mentioned in the development and test sets, providing a measure of the model’s ability to recognize relation instances that were previously unseen.
The experimental results on the DocRED dataset are presented in Table 2, indicating that SEMHRN achieves competitive performance compared to existing baseline methods. In particular, SEMHRN-RoBERTa achieves an IgnF1 of 63.01 and an F1 of 64.97 on the test set of DocRED, surpassing strong baseline models. For example, it outperforms the sequence-based ATLOP-RoBERTa by 1.57 F1 and 1.62 IgnF1, as well as the graph-based CGM2IR-RoBERTa by 1.08 F1 and 1.05 IgnF1. Furthermore, SEMHRN-BERT achieves an F1 of 62.51 and an IgnF1 of 60.53, outperforming the sequence-based Docunet-BERT by 0.6 F1 and 0.65 IgnF1, and the graph-based LSR-BERT by 3.46 F1 and 3.56 IgnF1. These results suggest that SEMHRN is effective in capturing syntactic dependency information and conducting efficient multi-hop reasoning.
Main results (%) on DocRED. F1 and IgnF1 are employed as evaluation metrics. IgnF1 is calculated by excluding relation instances mentioned in the development and test sets. Bold values indicate the best performance
Main results (%) on DocRED. F1 and IgnF1 are employed as evaluation metrics. IgnF1 is calculated by excluding relation instances mentioned in the development and test sets. Bold values indicate the best performance
Table 3 presents the results of two biomedical datasets. Following Zhou et al. [16], SciBERT is also employed as the document encoder. SEMHRN-SciBERT outperforms state-of-the-art models on CDR and GDA datasets, surpassing MRN by 7.4 F1 and 3.7 F1, and ATLOP by 3.9 F1 and 2.7 F1, respectively. The strong performance across multiple datasets demonstrates the effectiveness of SEMHRN-SciBERT in identifying relations between entities in a document.
Results (%) on the biomedical datasets CDR and GDA
Results (%) on the biomedical datasets CDR and GDA
Ablation study (%) of SEMHRN on DocRED
Results (%) of different strategies for utilizing syntax information on DocRED
Ablation study
An ablation study is conducted to analyze SEMHRN and demonstrate the effectiveness of various components within the framework. From the results presented in Table 6, it is observable that the removal of the grammar-enhanced GCN leads to a decrease in model performance by 0.99 in terms of F1 score. Likewise, the elimination of edges in the multi-hop reasoning GAT results in a reduction of 0.72 in the F1 score. Notably, when both the grammar-enhanced GCN and the multi-hop reasoning GAT are simultaneously omitted, the model experiences the most significant decrease in performance, with a decline of 1.63 in F1. These findings indicate that the performance of the framework diminishes as components are removed, underscoring the positive contributions of each component within SEMHRN to relation extraction.
Training time and memory usage on DocRED
Training time and memory usage on DocRED
Three sets of experiments are conducted to examine the effectiveness of dependency probability matrices and dependency type-aware attention in the context of Syntactically Enhanced Graph Convolutional Networks, as presented in Table 3. Initially, the removal of dependency probability matrices results in a significant decline in model performance, with a decrease of 0.73 in F1 score. Substituting dependency probability matrices with syntactic tree models leads to a reduction of 0.3 in F1 score. This indicates that syntactic trees provide some assistance in enhancing model performance, whereas dependency probability matrices encompass more extensive syntactic knowledge, mitigating syntactic parsing errors and enabling more efficient modeling of syntactic dependencies within documents. Furthermore, the omission of dependency type-aware attention results in a decrease of 0.58 in F1 score, with IgnF1 showing a pronounced drop of 0.81. The substantial reduction in IgnF1 underscores the contribution of dependency edge type information to the model’s ability to recognize previously unseen instances of relations. Dependency type-aware attention aids the model in selectively focusing on more critical dependency types, thereby enhancing overall model performance.
Analysis of multi-hop reasoning graph convolutional network
Exploring the impact of different multi-hop reasoning graph attention layer depths (l = 1, 2, 3, 4) on model performance, Fig. 5 illustrates that as the number of graph attention layers gradually increases, the model’s performance improves. When there are 3 graph attention layers, the model’s performance reaches its peak. However, further increasing the number of graph attention layers leads to a decline in model performance. This suggests that adding more graph attention layers enables more multi-hop dependency information between entity pairs to propagate to the current node. Still, an excessive number of graph attention layers can result in the excessive integration of distant noisy information.

The results on different number of multi-hop reasoning graph attention layers.
Following Zeng et al. [16], Infer-F1 is used as a metric to measure the model’s reasoning ability, focusing only on relationships involved in the reasoning process. For example, in the case of multi-hop reasoning, if paths like

The Inter-F1 results for relation instances involving multi-hop reasoning on development set of DocRED.
The training time and memory consumption of SEMHRN are compared with other models under the same conditions. Table 6 shows that SEMHRN’s training occupies only 13,633 MB of memory, significantly lower than the 11,436 MB used by E2GRE. Additionally, SEMHRN’s training speed is 5.51 times faster than LSR and 8.72 times faster than E2GRE. Experimental results demonstrate that SEMHRN is a lightweight model.
Case study
A case study is conducted to further verify the effectiveness of SEMHRN and discuss the shortcomings of the model. As shown in Fig. 7, SEMHRN can identify most of the intra-sentence relations. For example, it can deduce that the relation between the subject “Square Enix” and the object “Square Enix Europe” is “subsidiary”. Additionally, it can recognize long-distance inter-sentence relations, such as the relation between the subject “Dragon Quest” and the object “Japanese”, which is “country of origin”. This demonstrates SEMHRN’s capability to capture long-distance dependencies between entity pairs.

Case study of an example from the development set of DocRED. This example shows 19 relation instances and 11 sentences. The relation instances that the SEMHRN model did not correctly identify are highlighted in red, with the ground truth relations in parentheses.
However, SEMHRN fails to identify the relation between the subject “Final Fantasy” and the object “Square Enix”. There are two possibilities. The first possibility is that SEMHRN has limited ability to handle multi-label entity pairs. Another possibility is that it is challenging for the model to identify relation categories that occur less frequently in the dataset. Additionally, SEMHRN does not correctly identify the relation between entities “Eidos Interactive” and “Square Enix Europe”. Their relation should be “replaced by“/”replacements”, not “followed by“/”follows”, reflecting the model’s insufficient ability to distinguish relations with similar semantics. Therefore, addressing multi-label entity pairs, identifying minority relation types in the dataset, and distinguishing challenging relation types may be areas of focus for future research.
In this paper, SEMHRN is introduced as a model for performing multi-hop reasoning in the document-level relation extraction task. Instead of using the dependency tree directly to construct the adjacency matrix, a denser dependency probability matrix is employed to capture more syntactic information and alleviate dependency parsing errors to some extent. Additionally, dependency-type-aware attention is utilized to refine the weight of all edges based on the types of edges connecting nodes. The part-of-speech prediction task is introduced to regularize word embeddings. Furthermore, to enhance the interaction between entity pairs, a multi-hop reasoning graph attention network is proposed. This network initially models the two-hop reasoning between entity pairs and then expands to multi-hop reasoning. Experimental results demonstrate the effectiveness of each proposed component in improving the model’s performance.
Footnotes
Acknowledgments
This work was supported by National Natural Science Foundation of China (Grant No. 62376018).
