Abstract
With the rapid growth of Internet penetration, identifying emergency information from network news has become increasingly significant for emergency monitoring and early warning. Although deep learning models have been commonly used in Chinese Named Entity Recognition (NER), they require a significant amount of well-labeled training data, which is difficult to obtain for emergencies. In this paper, we propose an NER model that combines bidirectional encoder representations from Transformers (BERT), bidirectional long-short-term memory (BILSTM), and conditional random field (CRF) based on adversarial training (ATBBC) to address this issue. Firstly, we constructed an emergency dataset (ED) based on the classification and coding specifications of the national emergency platform system. Secondly, we utilized the BERT pre-training model with adversarial training to extract text features. Finally, BILSTM and CRF were used to predict the probability distribution of entity labels and decode the probability distribution into corresponding entity labels.Experiments on the ED show that our model achieves an F1-score of 85.39% on the test dataset, which proves the effectiveness of our model.
Introduction
In recent years, natural disasters, accidents, public health, security incidents and other emergencies occurred frequently, causing serious damage to national security and the safety of people’s lives and property in all countries [1]. With the rapid growth of the Internet, emergency texts spread rapidly on a variety of social network platforms, which contain a lot of useful information [2]. It is of great significance to identify entity information rapidly and early warning of emergencies. In order to acquire organized and structured information, the technology of named entity recognition is particularly crucial [3].
Chinese named entity recognition (NER) task is an important branch of the field of natural language processing (NLP) [4]. It can extract specific entities common to text clustering in Chinese text, such as time, location, person, organization, etc. It is extensively utilized in such vertical domains as healthcare, finance, and law, etc [5–7]. NER in specific domains is conducive to building a large-scale knowledge base in the era of big data, and can help governments collect and use information hidden in emergency texts. Li et al. [8] apply NER to the field of education, giving some solutions to the task of named entity recognition in educational opinion. zhang et al. [9] apply NER to the biomedical domain to improve the effectiveness of named entity recognition tasks in biomedicine. Although there are some researchs on educational NER, biological NER, and social networks NER (such as Weibo, Twitter, and micro-blogs [10]), there is still very little scientific research of named entity recognition for emergency.There are two main challenges to the emergency domain NER. Firstly, The traditional task of NER is mainly to identify entities such as person, location, organization, etc. In the field of education, experts focus on entities such as educational institutions and departments; person entities are teachers, students and other educational entities. In the field of emergency, we are more concerned with entities such as the time, place, and number of casualties. For example, in the field of emergencies, the number of casualties includes the number of people injured and killed. Secondly, In the context of big data on the web, the lack of labeled data in the field of emergencies is a problem we must now address. Due to the sensitivity and specificity of emergencies, the acquisition of emergency texts is more difficult than in other fields. For the lack of labeled datasets in this paper, there are several alternative methods: the first is the deep learning method, owing to the lack of dataset to train the deep learning model, we can only use the existed deep learning model to recognize entities, which is less effective. The second is supervised learning, which collects some of the emergency texts, cleans and desensitizes it, manually labels the cleaned and unlabeled dataset, and uses the labeled dataset to train a deep learning model. The disadvantage of this approach is that the trained model is not effective with small data volume, and manual labeling is time-consuming and laborious.
In this paper, we propose a novel solution for addressing the task of named entity recognition in emergency, namely AT-BERT-BILSTM-CRF (ATBBC). Specifically, we employ the pre-trained model BERT [11] to embed the input text information and obtain semantic word vectors. The contextual information and semantic associations of words can be better captured by the BERT model, resulting in richer word vector representations, which in turn leads to an improvement. Then, the word vector sequence is modeled bidirectionally by the BILSTM [12], effectively capturing the semantic features within the sequence and aiding the model in better comprehending the structure and semantics of the input text. in the effectiveness of recognizing entities related to emergency events by the model. Finally, we use the CRF model [13] to decode the sequence labels. The CRF model can consider the dependency relationships between labels and infer the labeling results through global optimization, resulting in a more reasonable and continuous label sequence output by the model. In addition, we apply adversarial training [14] in the embedding layer of the BERT model by adding perturbations to the input word vectors to generate adversarial samples. This approach guides the model to better learn the distribution of the data, improving the model’s robustness and generalization ability. We conducted extensive evaluations on the ED as well as publicly available datasets. The experimental results indicate that our approach outperforms existing solutions in the task of named entity recognition for emergency. The contributions of this work are mainly as follows: We propose a new solution, namely ATBBC, based on deep learning to address the task of named entity recognition (NER) for emergency. This task can aid in early warning and timely response to emergency events. We constructed the Emergency Dataset (ED) based on the emergency classification and coding specifications of the National Emergency Platform System.The ED dataset consists of five entity categories, namely: time, place, denoter, casualties, property loss, in which time refers to the point in time when the event occurred; place refers to the place where the event took place and is not limited to a particular building; denoter are trigger words in the event report text that trigger the type of event, such as fire, shooting, earthquake, etc.; casualties includes the number of people injured and killed at the time of the event, as well as the number of innocent people injured and killed as a result of the event; property loss refers to the loss of various types of property caused in the course of an event. We demonstrated the effectiveness of ATBBC by extensively evaluating the proposed solution on the ED and various benchmark datasets. The experiments reveal that our proposed model is effective, it significantly improve the performance of named entity recognition. It achieves a precision of 85.75%, a recall of 85.03%, and a F1 score of 85.39% in the ED. Moreover, when recognizing overall entity categories, the F1 score of ATBBC is higher than BERT-BILSTM-CRF by 4.82%.
Related works
Early studies on text name entity recognition mainly used statistical methods. Yao et al. [15] propose a active learning strategy based on information density (ID) integrated with CRFs for Chinese named entity recognition, which achieve an F1 score of 77.2% on Sighan bakeoff 2006 MSRA NER corpus. Sitanath Biswas et al. [16] apply maximum entropy (MaxEnt) model with hidden Markov model (HMM) and some linguistic rules to recognize name entities in Oriya language, which achieve higher precision and recall. However, with the rapidly developing neural network technology and the exponential growth of data, traditional machine learning methods do not perform well when dealing with large datasets. Then, deep learning methods offered the solutions to the problems of machine learning methods.
NER based on deep learning methods
The rapid growth of deep learning has brought about significant progress in different areas. Yue Ma et al. [17] use deep learning methods to identify antimicrobial peptides from the human gut. Bilal Thonnam Thodi et al. [18] combine deep learning methods with Kinematic Wave Theory to estimate high-resolution traffic Speed. Qin Zhiguang et al. [19] apply the deep learning method to the detection and classification of the fruit. In recent years, deep learning method is widely used in the field of NER. Pin Tang et al. [20] propose a deep learning-based method named BILSTM-CRF which consists of BILSTM and CRF to recognize chinese judicial. They combine adaptive matrix estimation in the model to solve the problem of high accuracy requirements of named entity recognition model in the field of judicial decisions. Ya Qin et al. [21] propose A network security entity recognition method based on feature template and CNN-BILSTM-CRF. The method solves the problem of insufficient precision of extracting mixed security entities in Chinese and English in the field of network security. With the widespread use of pre-training models in the field of deep learning, more and more researches are using BERT as their pre-training model. Qing Yang et al. [22] proposes a BERT-based dual-channel named entity recognition model for solid rocket engines. The dual-channel network consists of convolutional neural network(CNN) and BILSTM. The experimental performance of using BERT in the model is shown to be superior to that of other models.
For the named entity recognition in the field of emergency, We pay more attention to the time and place of the emergency, the type of event, the number of casualties, the damage to property and so on. The introduction of BERT layer in the model enables deeper semantic information to be obtained about the event text, and bidirectional transformers can effectively capture contextual information. The BILSTM-CRF model can make use of input features before and after the text. BERT-BILSTM-CRF can also capture long entity names in the emergent event. Owing to the high level of sensitive information in emergency texts, the number of news texts will be more sparse compared to other fields. In order to enhance the model effectiveness in recognising entities, ATBBC introduces adversarial training based on the BERT-BILSTM-CRF model to enhance the model performance.
ATBBC model
This section will describe the various components of the ATBBC model. ATBBC combine adversarial training with BERT-BILSTM-CRF model to solve the problem of name entity recognition in emergency. An overview of the ATBBC model is shown in Fig. 1.

ATBBC model.
The model includes the following three parts. The first part is the BERT layer. Compared to traditional word2vec, glove and other models that can only express data statically, BERT has a much greater ability to process data. The word vector expressed by BERT contains the information of the word vector itself as well as the contextual information, and BERT emphasizes the semantic importance more than the word vectors expressed by other models. In order to improve the performance of model entity recognition in the emergency, we added adversarial training to the BERT layer. The second part is the BILSTM layer. BILSTM is able to learn further contextual information and the meaning of words in the context due to its bidirectional and synchronous features. The last part is the CRF layer, which decodes the output information. Compared with the traditional softmax method, CRF can improve the accuracy of model prediction.
As shown in Fig. 1, The input data can be expressed as X = {t1, t2, t3, . . . , t
j
, tj+1, tj+2, . . .}. Adding perturbations to the embedding layer and embedding X, we can get V = {v1, v2, v3, . . . , v
j
, vj+1, vj+2, , . . .}. With bidirectional transfomer encoding, we can obtain the corresponding semantic information S = {s1, s2, s3, . . . , s
j
, sj+1, sj+2,...}.We input the output of the BERT model into the BILSTM layer. BILSTM has a strong ability to capture contextual information over long distances and can effectively handle long texts of emergency. Through the propagation of forward LSTM layer
In recent years, the use of pre-trained model has gradually become popular in the study of language models in the field of NLP. ELMO [23] and GPT [24] are both popular pre-trained models in recent years. However, early pre-trained models suffered from a one-way language model, limiting the expression capability of the model. BERT learns universal semantic representations through unsupervised learning from large-scale text data. In the ATBBC model, BERT is used as an encoder responsible for transforming input text sequences into high-dimensional semantic representations. In this paper, by utilizing the pre-trained BERT model, ATBBC can capture rich contextual information, thereby improving the performance of named entity recognition.
The BERT model is represented by vectors that capture two levels of words and sentences, the Masked language model and the next sentence prediction task, respectively. The Masked language model uses a learning model similar to “cloze text”, randomly masking certain words in a sentence and using an encoder to predict the masked words. Next sentence prediction randomly replaces some sentences in the text, and then uses the previous sentence to predict the next sentence. The embedding of BERT consists of three parts: token embedding, segment embedding, and position embedding. BERT uses the [CLS] identifier for the beginning of a named entity sentence and the [SEP] identifier for the interval and end of a sentence.
In order to enhance the robustness and generalization ability of the ATBBC model, we apply adversarial training. By injecting perturbations in the embedding layer and introducing random modifications to the input text sequence, such as character replacement, deletion, or insertion, we increase the diversity of the training data. This approach helps the ATBBC model better handle noise and variations in the real world. Adversarial training for BERT is shown in Fig. 2.

BERT adversarial training.
Suppose the embedding vectors of the input text sequence V = {v1, v2, v3, . . . , v
j
, vj+1, vj+2, , . . .} are x. The perturbation of embedding is shown as follows:
The adversarial perturbation applied to input samples is somewhat equivalent to adding a gradient penalty term to the loss and performing standard gradient descent after completing the interference. In the ATBBC model, the gradient descent algorithm is used to update the parameters of each component in the model, so that the model can better fit the training data and minimize the loss of named entity recognition in emergency.
BERT input embedding vector into bidirectional Transformer model. The Transformer unit is shown in Fig. 3.

The left side of the figure is the transformer unit, and the right side shows the calculation process of multi-head attention mechanism.
The most critical part of Transformer is the self-attention mechanism. The equation for the self-attention mechanism is as follows:
The multi-head attention mechanism can prevent the over-fitting problem in the named entity recognition task and extract the semantic information from the emergency data. Multi-head is the result of projecting Q, K and V u h with different linear transformations, which enables the model to obtain information jointly from different representation sub-spaces at separate locations.The calculation process is shown in Fig. 3. The equation for the multi-head attention mechanism is as follows:
Suppose the input of layer 0 is
After obtaining the vector output of the sequence of sentences in the BERT layer, we input it into the BILSTM layer for semantic encoding.
Long Short-Term Memory (LSTM) [25] is a particular kind of recurrent neural network (RNN) [26]. It has memory units that are well suited for modeling time-series data, such as textual data that requires consideration of contextual information. Each LSTM cell consists of three parts: input gate, forget gate and output gate. The structure of the LSTM cell is shown in Fig. 4.

LSTM cell structure.
Input gate controls the status of information in the memory cell. The specific derivation of the input gate is shown in the following equation.
Forget gate control to select the information identified. The specific derivation of the forget gate is shown in the following equation.
The output gate determines the information to be output based on the current cell state. The specific derivation of the output gate is shown in the following equation.
It is essential for the task of name entity recognition to obtain the forward and backward information. However, LSTM only learn the context information of text by encoding it from forward to backward and learn text information in one direction. In order to solve the problem, Graves et al. [27] propose the BILSTM model which combines the forward LSTM and backward LSTM and merge the outputs.
BISLTM is a bidirectional incremental short-term memory model used for processing sequence information. In the ATBBC model, BISLTM is used as a context-aware layer to capture long-term dependencies and contextual information in text sequences. By processing text sequences in both directions, BISLTM can better understand the contextual environment of entities, thereby improving the accuracy of named entity recognition.In this paper, because the text sequence in the field of emergency is long sentence sequence, the forward and backward networks in BILSTM can capture both forward and backward information of long sentence sequences to obtain context information. The BILSTM will be more suitable in the task of name entity recognition in emergency domain.
Conditional random field(CRF) [13] is an unoriented probabilistic graphical model. The CRF takes the designated random variables as input to solve for the conditional variables and outputs the conditional probability distribution function conditional on the random variables. The CRF model can consider the dependencies between labels and perform inference on the annotated results through global optimization. By building upon the BILSTM model, the CRF model further optimizes the annotation results by considering the context of event entities and the constraints of label transitions. This ensures that the output label sequence from the model is more reasonable and coherent. In the ATBBC model, CRF is used as the label transformation and decoding layer to annotate and decode entities in the sequence, further improving the accuracy of named entity recognition.
Suppose the input observation sequence and output label sequence of the text are x = x1, x2, x3, . . . , x
n
and y = y1, y2, y3, . . . , y
n
, The score S of the predicted sequence is as follows:
Setting P (y|x) as a conditional random field, the output sequence obtained by normalizing all possible sequence paths is as follows:
Finally, we decode the output with the highest prediction score as the result by the Viterbi algorithm. For prediction, the set of output sequences y* with the highest overall probability is obtained by Eq.16.
In conclusion, the NER process of the BERT-BiLSTM-CRF model is shown in Algorithm 3.4.
In Algorithm 3.4, first we feed the input sequence into the BERT model to obtain the semantic features. Then, We perform a backward and forward LSTM on the output of BERT and merge the two outputs. Third, we calculate the score of each entity label and use gradient descent to obtain the optimal sequence of entity labels. Finally, we use the Viterbi algorithm for decoding the sequence.
Experiment
Dataset of emergency
The emergency dataset (ED) comes from the news data of domestic mainstream media. After unified cleaning, classification, desensitization and labeling by the classification and coding standard of the national emergency platform system for emergency events, high quality news text data were finally selected. The ED contains a total of 4 categories of primary event types, 33 categories of secondary event types, and a total of 5 categories of attribute entities, namely, time, place, denoter, property loss, and participant. We present the descriptions and examples of entity classes in the emergency data in Table 1. After obtaining the high quality emergency text data, the annotation steps of ED are as follows: Firstly, the entity names marked are determined based on the entity categories developed in Table 1.
Entity classes and the example
Entity classes and the example
Then, we form a annotation group of four people to label the data according to the annotation rules. We divide the unlabeled data into four parts according to the type of primary events, one part for each group member. When labeling disputes and ambiguities arise, we use a group meeting to determine the final label. Finally, we extracted 8531 entities, including 2245 time entities, 2163 place entities, 2245 denoter entities, 1342 casualty entities, and 536 property loss entities (as shown in Table 2).
The number of entities
In the field of NER, There are various metrics to evaluate the performance of models. The most commonly used in experiment are accuracy, precision, recall, and f1-score. Among them,The Precision is the ratio of the number of correctly identified entities to the number of identified entities. Recall is the ratio of the number of correct entities identified to the number of entities in the sample. In this paper, the extraction of event entities needs to be verified as a whole based on the entity units of word segmentation. Character-level accuracy cannot fully reflect whether the boundaries of entities are correctly recognized. Therefore, we choose precision, recall, and f1-score as the indicators of the evaluation model.
Table 3 shows the hyperparameters uesd in ATBBC model. All parameters are determined based on multiple rounds of experimental tuning.
The hyperparameters and their values used in ATBBC model
ATBBC is based on the combination of adversarial training and deep learning. Therefore, in the comparative experiments of the model, we will conduct ablation experiments from three aspects. Firstly, We verify the effect of adding adversarial training to the BERT pre-training model on model performance. Secondly, we design a deep learning model with BERT pre-training model and a deep learning model without that, which proves that pre-training model is better than traditional model. Thirdly, we compare the effect of BILSTM and LSTM on the model. Finally, we we compare the performance of 3 existing NER model and ATBBC, demonstrating that ATBBC greatly improves previously proposed NER models.
Comparison of adding adversarial training
We compare the effects of the model with adversarial training and the model without adversarial training in our dataset (ED). The experiment results are shown in Table 4 and Fig. 5.
Performance with AT and without AT
Performance with AT and without AT

Performance with AT and without AT.
We list the results of the recognition of the five categories of entities (Time, Place, Denoter, Casualty, Property loss) and the final overall recognition results.
We compared the performance of the model with and without adversarial training on five different entities and found that the model with adversarial training showed better results: the precision is 85.75%, recall is 85.03%, and F1-score is 85.39%. The model with adversarial training has 4.07%, 5.29% and 4.82% higher precision, recall and F1 scores than the model without adversarial training, respectively.
We can find the perturbation of the adversarial training construct, so that the model can have strong recognition in different attack samples, thus improving the robustness and generalization ability of the model.
Due to the extensive use of pre-training models in the NER domain, we performed experiments on our dataset with and without the BERT pre-training model. The experimental results are shown in Table 5and Fig. 6.
Performance with BERT and without BERT
Performance with BERT and without BERT

Performance with BERT and without BERT.
In this experiment, we only add or remove the BERT pre-training model part, the rest of the parameters are the same.
We compare the performance of the two models in time, place, denoter, casualty, property loss and overall entities. Compared to the model without BERT pre-training, the scores of precision, recall and F1 score with the BERT pre-training model are much higher than the model without it. The models with the BERT pre-trained model have 6.88% higher precision, 5.51% higher recall and 6.20% higher F1 score than the models without the model, respectively.
By comparison, our model is better than the model that is not using the BERT pre-training model. The most likely reason for this result is that the BERT pre-trained model is able to capture contexts of messages and reflect the complexity of words. A comparative analysis of the experimental results shows that the BERT model can take into account the correlation between words in a sentence and that BERT uses the attention mechanism of Transformers and the masked language model for training, which enables better word embedding.
LSTM overcomes the problem of short-term memory by introducing internal mechanisms of gates, which are gate structures that learn which data in a sequence are important messages to be preserved and which are to be dropped. However, LSTM can only process the sequence in one direction, BILSTM can process the sequence in both directions and combine the two outputs.
Therefore, We compare the effects of the model with BILSTM and the model with LSTM in our dataset(ED). The experimental results are shown in Table 6and Fig. 7.
Performance with BILSTM and with LSTM
Performance with BILSTM and with LSTM

Performance with BILSTM and with LSTM
In this experiment, except for the difference between the LSTM layer and BILSTM layer, the rest of the parameters are the same. The experimental results showed that in the case of 5 entities and overall entity categories, the model with BILSTM performs better. In the overall results, the model with BILSTM is 4.43%, 1.14% and 2.80% higher than the model with LSTM in terms of precision, recall and F1 values, respectively.
From the above data, it can be concluded that the effect of BILSTM is overall better than that of LSTM. BILSTM can process sequences in both directions and is able to capture contextual information better, which works better in the model than the single direction LSTM.
We compare our proposed model with the Word2vec-BILSTM-CRF (WBC), the Word2vec-LSTM-CRF (WLC), the BERT-LSTM-CRF (BLC) on the six datasets (CMeEE, CLUENER, CCK2021, cMedQANER, The People’s daily and ED). WBC and WLC are used to embedding words and obtaining semantic features by Word2vec. Word2vec achieves understanding and representation of words through contextual words. The detailed size and source of the five open source datasets are shown in Table 7. The five open source datasets are described as follows:
Entity Statistics of the public Dataset
Entity Statistics of the public Dataset
CMeEE: The dataset comes from Chinese medical NLP benchmark CBLUE. There are 9 entity types including sym, dep, dru, pro, equ, dis, etc.
CLUNER: The dataset is based on the open-source dataset THUCTC from Tsinghua University, with a selection of data for fine-grained named entity annotation. There are 10 entity types including game, organization, government, movie, name, book, etc.
CCK2021: The dataset consists of three parts: four-level administrative divisions and standard address texts, default standard address texts for address elements, and non-standard address texts. There are 19 entity types including prov, city, district, devzone, town, community, etc.
cMedQANER: The dataset consists of different biomedicine text-mining tasks with corpora. There are 11 entity types including diseases, drugs, syndromes, etc.
The People’s daily: The dataset is generated from the People’s Daily corpus. There are 3 entity types including PER, LOC, ORG.
Performance with 4 models on six datasets
Because of the differences in format between the six datasets, this paper chooses to unify the five public datasets into the ED dataset format. The six datasets are also different in terms of entity type, and we choose to use the integrated overall entity type in comparing the effects of the models. The results of the experiment are shown in Table 7 and Fig. 8.

Performance with 4 models on six datasets.
In order to ensure consistency in the comparison of results, we unified the parameters of the four models into the same parameters shown in Table 3. The precision, recall and F1-score performance of the four models on the six datasets are shown in Fig. 8. From the performance of the four models on the six datasets, the combined effect of our proposed model is better. In the cross-sectional comparison of the six datasets, the four models work better on the The People’s daily dataset. Through the analysis of the datasets, we find that the People’s Daily dataset contains only three types of entities, which is relatively fewer compared to other datasets. In addition, the the People’s Daily dataset has the largest amount of data among the six datasets. Large-scale labeled and standardized training data can make the performance of model better overall. However, the effect of the four models on the cMedQANER data set is relatively inferior. Because there are many non-entity noises in the cMedQANER dataset, the model is greatly affected by the noise during the training process. This also provides a good example for us to improve the ability of the model to resist noise. In terms of model horizontal comparison, we find that adding adversarial training to the pre-trained model can indeed improve the performance of the model, and adversarial training can also improve the ability of the model to resist noise. Next, the pre-trained model has a better ability to handle word vectors and word embeddings are better. This is the reason why our proposed model will outperform the other three models.
In this paper, we proposed a new model, ATBBC, to address the task of emergency event entity recognition. ATBBC utilizes the BERT, BILSTM, and CRF modules to achieve end-to-end entity recognition prediction for emergency events. We introduced adversarial training in the embedding layer to enhance the model’s generalization ability and robustness. Due to various challenges in collecting emergency event data, we constructed the Emergency Dataset (ED) and extensively evaluated ATBBC on both ED and various public datasets. The experimental results demonstrate that ATBBC outperforms existing solutions in terms of performance.
In the future work, we plan to conduct more in-depth research in the following three areas. Firstly, due to the limited semantic information at the sentence level, we can perform more semantically informative entity extraction and recognition at the chapter level. Secondly, we use a combination of adversarial training and deep learning, and practical results show that the performance of the model can be improved, but this method also has some drawbacks. In future work, we will consider more advanced adversarial training algorithms to promote the effectiveness of entity recognition in the field of emergency. Finally, we hope to build a large scale knowledge graph of emergencies to contribute to early warning of emergencies.
Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We would like to thank the anonymous reviewers for their insightful comments and suggestions. This research is supported by the National Natural Science Foundation of China (grant No.U2003208), the Xinjiang Autonomous Region key research and development project(grant No.2021B01002) and the Science and Technology Department of Xinjiang Uyghur Autonomous Region Fund Project (grant No.2020A03004-4).
