Inter span learning for document-level relation extraction

Abstract

The entity-relationship extraction model has a significant influence in relation extraction. The existing model cannot effectively identify the entity-relationship triples in overlapping relationships. It also has the problem of long-distance dependencies between entities. In this paper, an inter span learning for document-level relation extraction model is proposed. Firstly, the model converts input of the BERT pre-training model into word vectors. Secondly, it divides the word vectors to form span sequences by random initial span and uses convolutional neural networks to extract entity information in the span sequences. Dividing the word vector into span sequences can divide the entity pairs that may have overlapping relationships into the same span sequence, partially solving the overlapping relationship problem. Thirdly, the model uses inter span learning to obtain entity information in different span sequences. It fuses entity type features and uses Softmax regression to achieve entity recognition. Aiming at solving the problem of long-distance dependence between entities, inter span learning can fuse the information in different span sequences. Finally, it fuses text information and relationship type features, and uses Linear Layer to classify relationships. Experiments demonstrate that the model improves the F1-score of the DocRED dataset by 2.74% when compared to the baseline model.

Keywords

Joint extraction entity relation extraction span document-level extraction neural network

1 Introduction

Entity-relationship extraction is to extract entity relationships from unstructured text [1] and convert unstructured text into structured data by analyzing it. Entity-relationship extraction is useful for building knowledge graphs [2], question-answering systems [3] and information retrieval [4]. The entity-relationship triple [5] is one of the basic representation methods of entity relationship. A triple of the form < entity, relation, entity> [6] consisting of two entities and the relation between them represents the semantic relation between entities in the text. Existing entity-relation extraction methods can be divided into two categories: the pipeline relation extraction methods and the joint relation extraction methods. Among them, the pipeline extraction method [7] divides the relationship extraction task into two independent subtasks, firstly, identifying the entities in the given text, and then identifying the relationship between entities. The joint model extraction method [8] recognizes the entities and the relationship between entities at the same time. When studying the entity-relationship extraction at document-level, the relationship between entity pairs may appear in different sentences, so the information of the whole document needs to be considered in the process of entity relationship extraction. References to the same entity may appear in different sentences in a text. An example on the DocRED dataset is shown in Fig. 1. Entities and their referents appearing in different sentences may correspond to different entity relationships. In order to more accurately identify entity relationships in whole texts, inter span learning needs to be performed in entity relationship extraction. The relationship between entity pairs requires fusion inference. There are many entity relationship triples in the document, and the relationship between them is relatively weak [9], so it is necessary to carry out related logical reasoning [10] to better capture the entity relationship information in the document.

Fig. 1

Example of DocRED dataset. In this figure, those with the same color refer to the same entities. Reference can occur within the same sentence or between sentences.

Since there may be overlapping relations in the text, it will also affect the effect of relation extraction. The relations in the text are divided into three categories, Normal, Entity Pair Overlap (EPO) and Single Entity Overlap (SEO), as shown in Table 1. Entity Pair Overlap means that there is more than one relationship between two entity pairs; Single Entity Overlap means that two or more entities have a relationship with a certain entity.

Table 1

The examples of normal, single entity overlap, and entity pair overlap

Type	Example sentences and entity relationships
Normal	York is located in the England. {<York, country, England> }
Entity Pair Overlap (EPO)	News of the list’s existence unnerved officials in Khartoum, Sudan ’s capital. {<Sudan, capital, Khartoum>, <Sudan, contains, Khartoum> }
Single Entity Overlap (SEO)	The city of Aarhus, whose mayor is Jacob Bundsgaard, is served by Aarhus Airport. {<Aarhus, leaderName, Bundsgaard>,<Aarhus airport, cityServed, Aarhus> }

Although good results have been achieved so far, the current model is not so perfect for the research on the overlapping relationship in the whole text and the long-distance dependence between entities. This paper focuses on these two problems. We propose an inter span learning model for document-level relation extraction, ISLM. This method combines span to perform entity relation extraction on document-level text. First, the BERT pre-training language model (BERT-BASE-CASED) is used to encode the text. After BERT pre-training, word vectors are used to divide into span sequences, and the Convolutional Neural Networks are used to extract entity information in span sequences. Second, using inter span learning mechanism to obtain the entity representation which is located in different span sequences. In order to extract entity information, coreference resolution is used to produce mentions cluster. Entity recognition is performed using Softmax Regression after fusing entity type features. Finally, Linear Layers are used for relation classification after fusing text information and relation type information. For the overlapping relationship problem in the text, divide entity tokens that may have overlapping relationships into the same span sequence. When performing entity recognition between span sequences, entities that may have overlapping relationships in the span sequence can be effectively identified, thereby effectively solving the overlapping relationship problem. Aiming at the problem of long-distance dependence between entities, entities that may have relationships are relatively far away. We use inter-span learning to effectively obtain the relationship between entities that are far away in the span sequences, thereby effectively solving the long-distance dependency problem between entities to a certain extent.

Overall, the following are significant contributions to this paper:

An inter span learning model for document-level relation extraction based on span and multi-sentence learning are proposed. Different from existing methods, this paper performs span division after BERT pre-processing to form span sequences, entity extraction within span sequences. The issue of overlapping relations in the text is successfully resolved.

This method uses inter span learning mechanism to get entities in different span sequences, which has the advantage that can obtain entities that are far away, and these entities have certain semantic relationships between them. This can effectively solve the problem of long-distance dependencies between entities.

Through the analysis of the DocRED dataset, where compared to the current models, the methodology suggested in this research produced the best results.

2 Related works

Extracting entity relationships from unstructured text is a useful task in natural language processing, which is also a necessary step to build a knowledge graph to provide support for downstream related tasks [11]. At present, the mainstream entity-relationship extraction methods can be roughly divided into two types: pipeline entity-relationship extraction method and joint entity-relationship extraction method [12].

2.1 The pipeline extraction method

The relationship extraction task is divided into two separate subtasks using the pipeline extraction approach. Nayak T et al. [13] introduces the use of Bi-LSTM (Bidirectional Long Short-Term Memory Network) model and Attention mechanism to obtain long-distance dependencies information, which effectively solves the problem that the sentence is long and the distance between entities is far away which affects the relation extraction effect. Eberts M et al. [14] introduces the random division of input text combined with span, positive and negative sample information for entity relationship extraction, which effectively solves the problem of overlapping relationships in texts. Zeng D et al. [15] introduces the use of Deep Convolutional Neural Networks to extract lexical, which effectively solves the problem of relation classification that requires complex pre-processing. For adding the Attention mechanism to complete relationship classification, Guo X et al. [16] introduces the use of a neural network combined with RNN (Recurrent Neural Network) and CNN (Convolutional Neural Network). It can extract higher-level text information and obtain sentence feature information. Guo Z et al. [17] introduces adding entity type and relation alias information and inputting it into a Graph Convolutional Neural Network to improve the effect of entity-relation extraction.

2.2 The joint extraction method

The pipeline method does not need to manually construct features [18] and has high accuracy [19], so it is widely used, but because the error in the entity recognition task may be transmitted to the relationship classification task, it causes the problem of error propagation [20–22]. Therefore, in recent years, researchers have gradually focused on the research of joint entity-relationship extraction methods. Verga P et al. [23] introduces the use of multi-sentence learning to detect entity-relationship pairs in the biomedical domain, and solves the problem of ignoring inter-sentence relationships and redundant computations in biomedical domain texts. Yao Y et al. [24] introduces the document-based dataset DocRED, and uses a baseline model for relation extraction, which provided a benchmark for comparison for later researchers. Wang H et al. [25] introduces the use of a BERT pre-training model to encode textual information, and then uses two Linear layers to extract the relationship. The model achieves good results, but cannot use logical inference to judge the possible entity relationships in DocRED dataset. Yuan C et al. [26] introduces the use of the BERT pre-training model to encode text information, and uses the self-attention mechanism to obtain the feature information of the sentences to extract the entity-relationship information. It solves the problem that the text feature information is easy to ignore. Xu B et al. [27] introduces that the self-attention module sets up the transformation module and runs through the entire network, which can effectively carry out logical reasoning within a document. Sahu S K et al. [28] introduces the use of Graph Convolutional Neural Networks to extract document-level relations. It mainly used inter-sentence and intra-sentence dependencies. Zaporojets K et al. [29] introduces the document-level relation extraction dataset DWIE. The model paid more attention to entity-centric annotation, and applied graph convolutional neural network to document-level entity-relation extraction. Eberts M et al. [30] introduces the first document-level entity relationship extraction at the entity level, which uses multi-sentence learning to extract entity and relationship information within a document. However, the outcomes of entity relationship extraction may not be optimal if there are somewhat complicated overlapping relationships and the issue of long-distance dependencies between entities in the text.

3 Method

This section describes the joint extraction model proposed in this paper based on span and inter span learning. The main purpose is to effectively solve the problem of overlapping relationships in the text and long-distance dependencies between entities.

In this paper, we first transform the text into word vectors using the BERT pre-training model, and then divide the word vectors into span sequences through random initial span. The random initial span is automatically generated by the model, but we set the maximum span value so that the length of the formed span sequence is within the maximum value we set. Second, we extract the entities from the span sequences using convolutional neural networks. The overlapping relationship issue can be efficiently resolved by splitting the word vector into span sequences, which can divide entity pairs that might have overlapping relationships into the same span sequence. Thirdly, we extract the entity information from adjacent span sequences via inter-span learning. In order to extract entity, we use coreference resolution to form a mentions cluster. Integrates entity type features and uses Softmax regression to perform entity classification. We fuse entity type features and use Softmax regression to achieve entity recognition. Aiming at the problem of long-distance dependence between entities, inter span learning can fuse the information in different span sequences. Finally, Linear layer is used for relation classification by fusing textual information and relation type features. The overall frame diagram of the model is shown in Fig. 2.

Fig. 2

Model frame diagram. In this diagram, our model is divided into four modules.

3.1 Encoding layer

The main task of the encoding layer is to encode the text information into a matrix vector. The encoding layer uses the BERT pre-training language model to obtain the semantic features of the sentence. The semantic features of the sentence are expressed as X = [x₁, x₂, x₃, x₄, …, x_n] semantic information.

First, the input sequence is represented as word vectors through the embedding layer, and the expression of the i-th token in the processed word vector is shown in formula (1):

$e_{i} = W_{token} (t_{i}) + W_{pos} (t_{i})$ (1)

Among them, W_token (t_i) is represented as token embedding and W_pos (t_i) is represented as position embedding. The model in this paper uses BPE (Byte-Pair Encoded) encoding [31]. It can effectively reduce the number of words in the vocabulary. Then, we input it into the BERT (BERT-BASE-CASED) pre-training model for encoding. The BERT pre-training model contains 12 hidden layers, each with a size of 768. The encoded result in the sentence is shown in formula (2):

$H_{b} = BERT (E)$ (2)

Where E represents the word vector formed by token embedding and position embedding; and H_b is the output of the hidden layers. Then, the semantic representation H_b of the sentence is obtained from the text through the BERT model, which is used in subsequent tasks.

3.2 Entity recognition with span sequence layer

The main task of this layer is to identify entities in the span sequences by using Convolutional Neural Networks (CNN). Compared with the method of BIO (B-begin, I-inside, O-outside) annotation [32, 33], the problem of overlapping relationship in text can be effectively solved by using initial span to divide into span sequences. The framework of the model is shown in Fig. 3.

Fig. 3

The framework of the Model. In this model, the yellow means tokens and many tokens form span sequences; the green means embedding matrix; the orange means predefined entity type features; the dark bule means predefined relation type features; the light blue means the CLS tokens.

The acquired semantic representation information, which is abstained in the encoding layer is divided according to a random initial span to form span sequences, and the length of the span sequence will not exceed the set maximum span value. This paper sets the maximum span sequence length to 11, and describes how to determine the maximum span value in section 4.5. The random initial span is generated by the model, and the generated span sequence is composed of many tokens. The span sequence after division is shown in formula (3):

$S = (e_{i}, e_{i + 1}, \dots, e_{i + k})$ (3) where e_i, e_i+1, …, e_i+k represents the tokens in the word vector formed by token embedding and position embedding, and S represents the divided span sequences which composed by tokens. If there are no entities in the tokens in the span sequences, then use ‘NULL’ to represent it. They will not be used in subsequent modules.

We extracting entity information in span sequences using Convolutional Neural Networks (CNN) is shown in Fig. 4. The use of Convolutional Neural Networks (CNN) can effectively reduce the complexity of the model and increase the generalization and robustness of the model.

Fig. 4

Extracting span sequences feature information. We use convolutional neural network to extract entities from span sequences.

The model extracts entity features through convolutional neural networks in span sequences, as shown in formula (4):

$H_{s} = CNN (S_{i}, S_{i + 1}, S_{I + 2}, \dots, S_{i + k})$ (4) where S_i, S_i+1, S_I+2, …, S_i+k represents the span sequences. Obtain the entity information in the span sequence, and enter the entity information to the inter-span learning module. Since text information contains overlapping entity relationships, dividing the word vectors into span sequences can divide the entity pairs that may have overlapping relationships into the same span sequence, effectively solving the overlapping relationship problem in the text. The output of the Convolutional Neural Network is represented by y_b = [y_b1, y_b2, …, y_bj]. It represents the entity within the span sequence predicted by the model.

3.3 Entity recognition between span sequence layer

The main task of this layer is to identify entities between the span sequences. A reference to an entity below may be expressed as: ‘The Beatles’ is hereinafter referred to as’ The famous band’. Before extracting entities information between span sequence, reference resolution is used to form a mentions cluster. multi-sentence learning [34] were used to learn information features between sentence. The model uses inter span learning to learn the entities information, which obtained in the Entity recognition with span sequence layer to obtain entity features between span sequences, which can combine textual information to effectively solve long-distance dependencies between entities. We set the maximum length of the span sequence at 11. After extracting entity information from different span sequences, the entity information representation is shown in formula (5):

$H_{P} = CNN (s_{i} \oplus s_{i + 1})$ (5) where s_i ⊕ s_i+1 represents the adjacent span sequences, it contains the entity information H_s that has been obtained. Coreference resolution is used to form entities cluster. Before entity classification, we fuse entity type feature information which is shown in formula (6):

$X_{e} = H_{P} \circ C_{e}$ (6) where C_e is the predefined entity type feature in the text. The process of fusing entity type feature information is shown in Fig. 5. This paper applies a filter threshold to filter entities and their corresponding referent entities.

Fig. 5

Fusion of entity type feature information.

Then use Softmax regression for entity classification, as shown in formula (7):

$y_{e} = Softmax (W_{S} \times X_{e} + b_{i})$ (7) where W_S represents the weight matrix and b_i represents the bias. See the score of y_e for entity classification.

3.4 Relationship classification layer

The main task of the relation classification layer is to classify the relations in the span sequences. The entity recognition with span sequence layer and the entity recognition between span sequence layer obtain the entity information in the document and embed it into the text information to form a new relationship classification sequence. Our model uses the Linear layer to calculate the score of the relationship classification sequence. We set the relationship filtering threshold to 0.6 (Section 4.5 of this article describes how to set the value of the relationship filtering threshold.) If the score is bigger than the relationship filtering threshold, it is considered to be an effective relationship in the classification sequence, otherwise it is considered an invalid relationship in classification sequence. The relationship type features are represented by C_r, and the text information and relationship type feature information are integrated into the relationship classification sequence, as shown in formula (8):

$X_{r} = H_{r} \circ C_{r}$ (8) where H_r represents a matrix that incorporates textual information which contains the tokens in this text and entity information obtained through inter-span learning. It is input to Linear layers for classification, and uses Relu activation function, and the output of the Linear Layer is shown in formula (9):

$y = Softmax (W_{S} \times X_{r} + b_{k})$ (9) where W_S represents the weight matrix, X_r represents the matrix formed by the fusion of the final span sequence and text information, and b_k represents the bias. Looking at the result of y, the category with the highest score is the type of the relationship, and finally the entity-relationship triple is the output.

3.5 Training loss

Since both entity classification and relation classification are multiclassification problems, the cross-entropy loss function is used in the ISLM model. The loss function of the entity recognition with span sequence layer, the loss function of the entity recognition between the span sequence layer and the loss function of the relationship classification layer are shown in formulas (10), (11), and (12), respectively.

$l_{ws} = c_{1} (- \log (\frac{e^{y_{b^{i}}}}{\sum_{j 1} e^{y_{bj}}})$ (10)

$l_{bs} = c_{2} (- \log (\frac{e^{y_{e^{i}}}}{\sum_{j 2} e^{y_{e^{j}}}})$ (11)

$l_{rel} = c_{3} (- \log (\frac{e^{Y_{i}}}{\sum_{j 3} e^{Y_{j}}})$ (12)

Where c₁ represents the weight of the entity recognition with span sequence layer, and c₂ represents the weight of the entity recognition between span sequence layer, and c₃ represents the weight of the relation classification layer; The length of c₁, c₂ and c₃ are equal to the length of the real relation type vector respectively; j1, j2 and j3 represents the length of the predicted vectors; y_b represents the prediction vector of the entity recognition with span sequence layer; y_e represents the prediction vector of the entity recognition between span sequence layer; Y represents the prediction vector of the relation classification layer.

The loss function of the whole model is the sum of the loss functions of the entity recognition with span sequence layer, entity recognition between span sequence layer and the relation classification layer, as shown in formula (13).

$L = l_{ws} + l_{bs} + l_{rel}$ (13) where l_ws represents the loss of the entity recognition with span sequence layer, and l_bs represents the loss of the entity recognition between span sequence layer, and l_rel represents the loss of the relation classification layer. The loss values of the entity recognition with span sequence layer, the entity recognition between span sequence layer and relation classification layer in this model are averaged to calculate the total loss of the model.

4 Experiment

This part includes 6 subsections, which describes the dataset of the experiment in this paper, the experimental software and hardware environment, the experimental parameter settings, the comparative experiment and the comparison of the model performance under different parameters.

4.1 Dataset

The DocRED dataset is a human-annotated relation extraction dataset at document-level from Wikipedia, including more than 5,000 Wikipedia documents. In this experiment, the training set contains 3008 documents, the evaluation set contains 300 documents and the test set contains 700 documents. The dataset division is shown in Table 2.

Table 2
Data set split

Split Doc Entity Relation

Train 3008 58708 37486

Evaluation 300 5805 3678

Test 700 13594 8787

Total 4008 78107 94451

Split	Doc	Entity	Relation
Train	3008	58708	37486
Evaluation	300	5805	3678
Test	700	13594	8787
Total	4008	78107	94451

4.2 Experiment environment

The model in this paper uses the PyTorch framework to implement the ISLM proposed in this paper. Where the PyTorch version is 1.8.0. The experimental software and hardware environment is shown in Table 3.

Table 3
Software and hardware environment

Name Environment

System Windows

GPU NVIDIA GeForce RTX3090

Memory 45G

Hard disk 2T

Python version Python3.7

PyTorch version 1.8.0

Name	Environment
System	Windows
GPU	NVIDIA GeForce RTX3090
Memory	45G
Hard disk	2T
Python version	Python3.7
PyTorch version	1.8.0

4.3 Experimental parameters and evaluation methods

The experiment in this paper uses the BERT-BASE-CASED pre-training model, the maximum length of the span sequence is set at 11, and the relation filtering threshold is set at 0.6. The main hyperparameter settings of this experiment are shown in Table 4.

Table 4
Parameter settings

Parameter Value

BERT model BERT-BASE-CASED

Learning rate 5e^–5

Batch size 4

Epochs 30

Learning rate warmup 0.1

Relation filter threshold 0.6

Max Span 11

Embedding count 30

Parameter	Value
BERT model	BERT-BASE-CASED
Learning rate	5e^–5
Batch size	4
Epochs	30
Learning rate warmup	0.1
Relation filter threshold	0.6
Max Span	11
Embedding count	30

The evaluation used in this experiment are Precision (P) [35, 36], Recall (R) [37, 38] and F1-score(F1) [39, 40], and their representations are shown in formulas (14), (15) and (16):

$Precision = \frac{TP}{TP + FP}$ (14)

$Recall = \frac{TP}{TP + FN}$ (15)

$F 1 - score = \frac{2 \times Precision \times Recall}{Precision + Recall}$ (16)

4.4 Comparative experiment

The model proposed in this paper is compared with the state-of-the-art relation extraction models in recent years. The comparison model is as follows:

CNN [24], LSTM [24], Ctx-Aware [24] and Bi-LSTM [24] are the results of training the DocRED dataset using CNN, LSTM, Ctx-Aware and Bi LSTM neural networks, respectively. It provides a baseline model.

Two-Step [25] introduced the use of two-step entity relationship extraction. The first step is to predict whether there is a relationship between two entities, and the second step is to predict the specific relationship between the two entities.

HIN [41] Introduced a Hierarchy Inference Network (HIN) model to obtain information from the entity, sentence, and document levels.

LSR [42] introduced a method of enhancing the inference relationship between sentences by automatically analyzing latent document-level texts.

CorefRo [43] Introduced a model that can effectively represent coreferential relationships, achieving good extraction performance on DocRED dataset.

JEREX [30] introduced the first document-level entity relationship extraction at the entity level, which uses multi-sentence learning to extract entity and relationship information within a document.

DocRE-SD [44] introduced a document-level entity relationship extraction model with inference module, which is based on the multi-head self-attention mechanism.

Table 5 shows the results of comparing the performance of the ISLM with the baseline model on the DocRED dataset. The data show that the F1-score of the ISLM on the DocRED dataset is 63.14%, an increase of 2.74%. This improvement comes from the fact that the span division of the token sequence can effectively utilize the overlapping relationship information in the text and improve the information utilization rate. Using inter span learning to extract entity information in different span sequences can effectively solve the problem of long-distance dependencies between entities. Adding entity type features to the entity recognition and relationship type features to the relationship classification can better combine the semantics of the whole text information, which also lifts the performance of model entity relation extraction.

Table 5
Compared with the baseline model

Dataset Method Precision (P) Recall (R) F1-score (F1)

DocRED CNN [24] 44.05 40.61 42.26

LSTM [24] 51.64 48.59 50.07

CTX-Awa [24] 50.93 50.47 50.70

Bi-LST [24] 52.37 49.81 51.06

Two-Step [25] 55.25 52.65 53.92

HIN [41] 55.98 55.23 55.60

LSR [42] 60.27 57.88 59.05

CorefRo [43] 61.20 59.33 60.25

JEREX [30] 61.85 59.02 60.40

DocRE-SD [44] 63.41 62.22 62.81

ISLM 67.26 59.49 63.14

Dataset	Method	Precision (P)	Recall (R)	F1-score (F1)
DocRED	CNN [24]	44.05	40.61	42.26
	LSTM [24]	51.64	48.59	50.07
	CTX-Awa [24]	50.93	50.47	50.70
	Bi-LST [24]	52.37	49.81	51.06
	Two-Step [25]	55.25	52.65	53.92
	HIN [41]	55.98	55.23	55.60
	LSR [42]	60.27	57.88	59.05
	CorefRo [43]	61.20	59.33	60.25
	JEREX [30]	61.85	59.02	60.40
	DocRE-SD [44]	63.41	62.22	62.81
	ISLM	67.26	59.49	63.14

4.5 Span size and relationship threshold setting

When the span is too large or too small, the long-distance entity-relationship information in the overlapping relationship cannot be used well. The size of the span division will affect the entity pairs contained in the span sequence. At times, the setting of the span size will also affect the division of entities within the document.

ISLM sets the relationship filtering threshold when classifying the relationship. Only when the relationship classification sequence score is greater than the set relationship filtering threshold, the relationship between entities is considered to be valid. In order to find the most suitable maximum span value and relation filtering threshold, this part verifies the extraction performance of the model on the DocRED dataset by adjusting the parameter values. At the same time, the change of the training loss function, during the parameter changing, is also compared. When the model selects the optimal parameter value of the maximum span value and the relationship filtering threshold, only these two parameters are changed, and the other parameters of the experiment remain unchanged. The F1-score are shown in Table 6.

Table 6
Performance of the model with different parameter values

maximum span value

9 10 11 12 13

Relationship filtering threshold value 0.4 62.19 62.37 62.74 62.71 62.63

0.5 62.85 62.95 62.87 62.68 62.68

0.6 62.70 63.11 63.14 62.44 62.69

0.7 62.35 62.47 61.68 62.48 62.96

0.8 62.59 62.24 62.68 62.53 62.49

		maximum span value
Relationship filtering threshold value	0.4	62.19	62.37	62.74	62.71	62.63
	0.5	62.85	62.95	62.87	62.68	62.68
	0.6	62.70	63.11	63.14	62.44	62.69
	0.7	62.35	62.47	61.68	62.48	62.96
	0.8	62.59	62.24	62.68	62.53	62.49

According to Table 6, when the maximum span value of the model is 11 and the relationship filtering threshold value is 0.6, the F1-score of the model achieves the maximum value of 63.14. Therefore, the maximum span value of the model in this paper is 11, and the relationship filtering threshold value is 0.6.

Where each row of the table represents the change in the maximum span value, and each column represents the change in the relationship filtering threshold value. The two-dimensional table represents the F1-score of the model in different maximum span values and relational filtering threshold values.

It can be seen from Table 6 that the ISLM model has the best performance when the maximum span value is 11 and the relation filtering threshold is 0.6. Since different parameter values also have an impact on the model training loss, it is found from Table 6 that when the maximum span value is 11 and the relationship filtering threshold is 0.6, the model extraction effect is the best. So, we draw the model training loss when the maximum span value is 11 and the relation filtering threshold changes. When the relation filtering threshold is 0.6, the maximum span changes. Specifically, as shown in Figs. 6 and 7.

Fig. 6

Training loss when the threshold of relational filtering change.

Fig. 7

Training loss when the maximum span value changes.

Figure 6 shows the model training loss for different values of the relation filtering threshold when the maximum span value is 11. Through the Fig. 6, when we set the relation filtering threshold to 0.6, the initial value of the training loss of the model is the smallest and the final value is the smallest. Therefore, we set the relationship filtering threshold to 0.6.

Figure 7 shows the model training loss with different maximum span values when the relation filtering threshold is 0.6. When the maximum span value is 11, although the model training loss is not the smallest at the beginning, the training loss converges the fastest and the final loss value is the smallest. Therefore, we set the maximum span value to 0.6.

4.6 Ablation experiment

This article proposes a model that uses span and inter span learning to extract entity relationships. In order to verify the effectiveness of using random span to partition word vectors, inter span learning, and fuse relationship type features, this paper designs ablation experiments as shown in Table 7.

Table 7
Comparison of ablation experiment results

Model P R F1

ISLM’ 45.50 41.90 43.62

ISLM’ + S 61.97 60.04 60.99

ISLM’ + S + ISL 64.84 60.98 62.85

ISLM’ + S + ISL + F 67.26 59.49 63.14

Model	P	R	F1
ISLM’	45.50	41.90	43.62
ISLM’ + S	61.97	60.04	60.99
ISLM’ + S + ISL	64.84	60.98	62.85
ISLM’ + S + ISL + F	67.26	59.49	63.14

ISLM’ Indicates that the model proposed in this paper removes the use of spans to divide word vectors, and learns between different spans and fuses relationship type features.

ISLM’ + S represents that only the span is used to divide the word vector. ISLM’ + S + ISL represents not only using spans to divide word vectors, but also using inter-span learning to obtain information between different spans.

ISLM’ + S + ISL + F represents the use of spans to divide word vectors, and the use of inter-span learning to obtain information between different spans and integrate the relationship type features that exist in the text, that is, the model we proposed in this paper.

Test the Precision, Recall and F1-Score of different models for comparison.

From Table 7, it can be seen that using spans to divide the word vectors, learning between spans and integrating relationship type features can gradually increase the F1-score of the model, from 43.62% to 63.14%, an increase of 19.88%. Compared with ISLM’ + S, the F1-score of the model after adding inter-span learning increased from 60.99% to 62.85%, an increase of 1.86%. That is because using inter-span learning can effectively acquire information within span sequences of different span sequences and improve the utilization of textual information.

4.7 The impact of long-distance relationships on relationship classification

In order to study the impact of relations between relatively distant entities on relation classification, a comparative experiment is carried out in this section. The results are shown in Table 8.

Table 8
The impact of long-distance relationships on relationship classification

Model P R F1

Relation classification 67.26 59.49 63.14

-Long-distance relationship 62.65 60.93 61.78

Model	P	R	F1
Relation classification	67.26	59.49	63.14
-Long-distance relationship	62.65	60.93	61.78

Among them, -Long-distance relationship indicates the result of removing the relatively long-distance relationship. From Table 8, we can conclude that after removing the relatively long-distance relationship, the result of relationship classification dropped from 63.14% to 61.78%, a drop of 1.36%. This proves that the acquisition of relatively distant entity relationships is helpful to the relationship classification results, which shows that the problem of long-distance dependence between entities has been partially solved.

5 Conclusion

On the basis of analyzing the existing entity-relation extraction methods, this paper proposes an Inter Span learning for document-level relation extraction (ISLM). The model first divides the word vector based on the span to form a span sequence after the BERT pre-training model, and then based on the span the word vector is divided to form a span sequence. Second, the convolutional neural network is used to extract the entity information in the span sequence. Third, the entity information located in different span sequences is extracted, and the entity type features are fused, and Softmax regression is used to classify entities. Finally, the text information and relation type features are fused, and the relation is classified using Linear Layer. The experimental results show that the model has achieved certain effects.

The next step is to study the span sequence formed after division, and deconstruct the entity relationship contained in the span sequence, so as to achieve the purpose of better extraction when the text contains complex overlapping relationships and long-distance dependencies between entities.

Footnotes

Acknowledgments

This work was supported by National Natural Science Foundation of China: [Grant Number 62076006]; 2019 Anhui Provincial Natural Science Foundation Project: [Grant Number 1908085MF189]; University Synergy Innovation Program of Anhui Province: [Grant Number GXXT-2021-008].

References

Zhong

and Chen

, A Frustratingly Easy Approach for Entity and Relation Extraction[C], 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Association for Computational Linguistics (ACL) (2021), pp. 50–61.

Wang

, Zhang

, Zuo

et al., A entity relation extraction model with enhanced position attention in food domain[J], Neural Processing Letters 54(2) (2022) 1449–1464 Springer, Belgium.

Yang

, Liu

, Li

et al., Implicit relation inference with deep path extraction for commonsense question answering[J], Neural Processing Letters 54(6) (2022), 4751–4768, Springer, Belgium.

Zhang

, Chen

and Liu

, A review on entity relation extraction[C], 2017 second international conference on mechanical, control and computer engineering (ICMCCE), IEEE, (2017), pp. 178–183.

Han

and Wang

, Improving open information extraction with distant supervision learning[J], Neural Processing Letters 53(5) (2021), 3287–3306, Springer, Belgium.

Wang

, Chen

, Zhou

et al., Should We Rely on Entity Mentions for Relation Extraction? Debiasing Relation Extraction with Counterfactual Analysis[C], Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2022), pp. 3071–3081.

Giorgi

, Bader

and Wang

, A sequence-to-sequence approach for document-level relation extraction[C], Proceedings of the 21st Workshop on Biomedical Language Processing (2022), pp. 10–25.

Shang

Y.M.

, Huang

and Mao

, Onerel: Joint entity and relation extraction with one module in one step[C], Proceedings of the AAAI Conference on Artificial Intelligence 36(10) (2022), 11285–11293.

Qiao

, Zou

, Huang

et al., A joint model for entity and relation extraction based on BERT[J], Neural Computing and Applications, Springer London, The United States (2022), pp. 1–11.

10.

, Sun

, Feng

et al., Learning Logic Rules for Document-Level Relation Extraction[C], Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2021), pp. 1239–1250.

11.

, Liu

, Du

et al., Extracting medical knowledge from crowdsourced question answering website[J], IEEE Transactions on Big Data 6(2) (2016), 309–321, IEEE-INST electrical electronics engineers INC, The United States.

12.

Sennrich

, Haddow

and Birch

, Neural Machine Translation of Rare Words with Subword Units[C], 54th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (ACL) (2016), pp. 1715–1725.

13.

Nayak

and Ng

H.T.

, Effective Attention Modeling for Neural Relation Extraction[C], Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) (2019), pp. 603–612.

14.

Eberts

and Ulges

, Span-Based Joint Entity and Relation Extraction with Transformer Pre-Training[M], ECAI 2020, IOS Press, 2020, pp. 2006–2013.

15.

Zeng

, Liu

, Lai

et al., Relation classification via convolutional deep neural network[C], Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, 2014, pp. 2335–2344.

16.

Guo

, Zhang

, Yang

et al., A single attention-based combination of CNN and RNN for relation classification[J], IEEE Access 7 (2019), 12467–12475, IEEE-INST electrical electronics engineers INC, The United States.

17.

Guo

, Zhang

and Lu

, Attention Guided Graph Convolutional Networks for Relation Extraction[C], Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019), pp. 241–251.

18.

Wang

, Qin

, Zakari

R.Y.

et al., Deep neural network-based relation extraction: an overview[J], Neural Computing and Applications (2022), 1–21, Springer London LTD, The United States.

19.

Sun

, Xu

, Zhang

et al., Dual-Channel and Hierarchical Graph Convolutional Networks for document-level relation extraction[J], Expert Systems with Applications, Pergamon-Elsevier Science LTD 205 (2022), 117678, The United States.

20.

Gao

, Huang

, Tao

et al., The joint method of triple attention and novel loss function for entity relation extraction in small data-driven computational social systems[J], IEEE Transactions on Computational Social Systems 9(6) 1725–1735, IEEE-INST electrical electronics engineers INC, The United States.

21.

, Yang

et al., MarkerGenie: An NLP-enabled text-mining system for biomedical entity relation extraction[J], Bioinformatics Advances 2(1) (2022), vbac035.

22.

Tan

, He

, Bing

et al., Document-Level Relation Extraction with Adaptive Focal Loss and Knowledge Distillation[C], Findings of the Association for Computational Linguistics: ACL 2022 (2022), pp. 1672–1681.

23.

Verga

, Strubell

and McCallum

, Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction[C], Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). (2018), pp. 872–884.

24.

Yao

, Ye

, Li

et al., DocRED: A Large-Scale Document-Level Relation Extraction Dataset[C], Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019), pp. 764–777.

25.

Wang

, Focke

, Sylvester

et al., Fine-tune bert for docred with two-step process[J], arXiv preprint arXiv:1909.11898, 2019.

26.

Yuan

, Huang

, Feng

et al., Document-level relation extraction with entity-selection attention[J], Information Sciences 568 (2021), 163–174, Elsevier, The United States.

27.

, Wang

, Lyu

et al., Entity structure within and throughout: Modeling mention dependencies for document-level relation extraction[C], Proceedings of the AAAI conference on artificial intelligence, 35(16) (2021), pp. 14149–14157.

28.

Sahu

, Christopoulou

, Miwa

et al., Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network[M], Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network, (2019), pp. 4309–4316.

29.

Zaporojets

, Deleu

, Develder

et al., DWIE: An entity-centric dataset for multi-task document-level information extraction[J], Information Processing & Management 58(4) (2021), 102563, Elsevier Science LTD, The United States.

30.

Eberts

and Ulges

, An End-to-end Model for Entity-level Relation Extraction using Multi-instance Learning[C], Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main, Volume, 2021, pp. 3650–3660.

31.

Sennrich

, Haddow

and Birch

32.

Nguyen

D.Q.

and Verspoor

, End-to-end neural relation extraction using deep biaffine attention[C],, Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part I 41, Springer International Publishing, 2019, pp. 729–738.

33.

Bekoulis

, Deleu

, Demeester

et al., Joint entity recognition and relation extraction as a multi-head selection problem[J], Expert Systems with Applications 114 (2018), 34–45, Elsevier Pergamon-Science LTD, The United States.

34.

Surdeanu

, Tibshirani

, Nallapati

et al., Multi-instance multi-label learning for relation extraction[C], Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, (2012), pp. 455–465.

35.

Chen

, Cao

, Zhang

et al., CHEER: Centrality-aware High-order Event Reasoning Network for Document-level Event Causality Identification[C], (Volume 1: Long Papers). pp, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (2023), 10804–10816.

36.

Sun

, Zhang

, Huang

et al., Document-level relation extraction with two-stage dynamic graph attention networks[J], Knowledge-Based Systems 2023, 267, 110428, Elsevier, Netherlands.

37.

Tuo

and Yang

, Review of entity relation extraction[J], Journal of Intelligent & Fuzzy Systems (2023) (Preprint), 1–15, IOS Press, Netherlands.

38.

Hillebrand

, Deußer

, Dilmaghani

, et al., Kpi-bert: A joint named entity recognition and relation extraction model for financial reports[C], 2022 26th International Conference on Pattern Recognition (ICPR), IEEE, 2022, pp. 606–612.

39.

Zhao

, Yang

, Qu

et al., Exploring privileged features for relation extraction with contrastive student-teacher learning[J], IEEE Transactions on Knowledge and Data Engineering, IEEE Computer SOC. (2022), (01) 1–1, The United States.

40.

Khan

, Jan

, Farman

et al., Deep learning methods and applications[J], Deep Learning: Convergence to Big Data Analytics (2019), pp. 31–42, MIT Press, The The United States.

41.

Tang

, Cao

, Zhang

et al., Hin: Hierarchical inference network for document-level relation extraction[C], Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part I 24, Springer International Publishing, 2020, pp. 197–209.

42.

Nan

, Guo

, Sekulić

et al., Reasoning with Latent Structure Refinement for Document-Level Relation Extraction[C], Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 1546–1557.

43.

, Lin

, Du

et al., Coreferential Reasoning Learning for Language Representation[C], Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020), pp. 7170–7186.

44.

Zhang

, Su

, Min

et al., Exploring Self-Distillation Based Relational Reasoning Training for Document-Level Relation Extraction[C], Proceedings of the AAAI Conference on Artificial Intelligence 37(11) (2023), pp. 13967–13975.

Inter span learning for document-level relation extraction

Abstract

Keywords

1 Introduction

2.1 The pipeline extraction method

2.2 The joint extraction method

3 Method

4.1 Dataset

Table 2 Data set split Split Doc Entity Relation Train 3008 58708 37486 Evaluation 300 5805 3678 Test 700 13594 8787 Total 4008 78107 94451

Table 3 Software and hardware environment Name Environment System Windows GPU NVIDIA GeForce RTX3090 Memory 45G Hard disk 2T Python version Python3.7 PyTorch version 1.8.0

Table 4 Parameter settings Parameter Value BERT model BERT-BASE-CASED Learning rate 5e–5 Batch size 4 Epochs 30 Learning rate warmup 0.1 Relation filter threshold 0.6 Max Span 11 Embedding count 30

Table 7 Comparison of ablation experiment results Model P R F1 ISLM’ 45.50 41.90 43.62 ISLM’ + S 61.97 60.04 60.99 ISLM’ + S + ISL 64.84 60.98 62.85 ISLM’ + S + ISL + F 67.26 59.49 63.14

Table 8 The impact of long-distance relationships on relationship classification Model P R F1 Relation classification 67.26 59.49 63.14 -Long-distance relationship 62.65 60.93 61.78

Footnotes

Acknowledgments

References

Table 2
Data set split

Split Doc Entity Relation

Train 3008 58708 37486

Evaluation 300 5805 3678

Test 700 13594 8787

Total 4008 78107 94451

Table 3
Software and hardware environment

Name Environment

System Windows

GPU NVIDIA GeForce RTX3090

Memory 45G

Hard disk 2T

Python version Python3.7

PyTorch version 1.8.0

Table 4
Parameter settings

Parameter Value

BERT model BERT-BASE-CASED

Learning rate 5e^–5

Batch size 4

Epochs 30

Learning rate warmup 0.1

Relation filter threshold 0.6

Max Span 11

Embedding count 30

Table 7
Comparison of ablation experiment results

Model P R F1

ISLM’ 45.50 41.90 43.62

ISLM’ + S 61.97 60.04 60.99

ISLM’ + S + ISL 64.84 60.98 62.85

ISLM’ + S + ISL + F 67.26 59.49 63.14

Table 8
The impact of long-distance relationships on relationship classification

Model P R F1

Relation classification 67.26 59.49 63.14

-Long-distance relationship 62.65 60.93 61.78