Abstract
With the acceleration of globalization, cross-linguistic communication has become an indispensable part of daily life, and the status of English as an international lingua franca has become increasingly prominent. Faced with the complex semantic relations contained in long English sentences, the existing machine translation systems often show understanding deviations and translation distortions, which seriously affect the accuracy and coherence of information transmission. To solve this pain point, this study focuses on the latest achievement in the field of deep learning models and explores its application potential in English long sentence semantic relationship extraction and machine translation quality optimization. Firstly, the BERT model is fine-tuned to specialize in long sentence structure analysis and semantic relationship extraction. Experiments show that the F1 score of the model reaches 89.6% on the standard evaluation dataset CoNLL 2004, which is significantly higher than the previous best record. Based on this deeply mined semantic information, we further optimize the neural network machine translation system to effectively solve the long-distance dependency problem and significantly reduce the ambiguity and omission phenomenon in the translation process by introducing a novel attention guidance mechanism. In the blind test of the WMT ‘14 English-German translation task, the BLEU score of the translated version using the optimized NMT system is 29.5, which is 3.2 points higher than that of the benchmark model, which proves the remarkable effect of this method in improving translation quality.
Introduction
In today’s globalization, cross-cultural communication and cooperation are becoming more and more frequent, and language has become a bridge connecting people all over the world. However, while humans can comfortably navigate multiple languages, it is still an extremely challenging task for machines to cross language barriers, especially when complex long sentence structures are involved. 1 As a widely used international language, English has rich grammatical forms and complex syntactic structures, which put forward higher requirements for automatic translation. In recent years, breakthroughs have been made in the field of Natural Language Processing (NLP). Deep learning models, especially Bidirectional Encoder Representations (BERT), have led a new round of technological innovation with their powerful context-sensitive word embedding capabilities and pre-training features for large-scale unlabeled corpora.2,3 The purpose of this paper is to explore how to use the advantages of the BERT model to optimize the semantic relationship extraction of long English sentences so as to improve the quality of machine translation and lay a theoretical and technical foundation for building a more efficient and accurate translation system.
English long sentences often contain multiple layers of information and intricate semantic associations, which poses a severe test to the existing machine translation technology. 4 Traditional translation models based on rules or statistics make it difficult to capture the subtle grammatical structure and semantic details within long sentences, which leads to problems such as translation results often being taken out of context or illogical.5,6 In contrast, BERT, with its bidirectional encoder architecture, can consider the context before and after each word when understanding the meaning of each word, which is particularly important for understanding the implicit relationship in long sentences, reference resolution, and polysemy discrimination. 7 Therefore, exploring the application of BERT in English long sentence processing can not only promote the progress of semantic relationship extraction technology but also greatly improve the overall performance of the machine translation system.
The proposed BERT-based semantic relationship extraction model has the characteristics of bidirectional coding and large-scale pre-training, which can deeply understand complex semantics. Its multi-head attention mechanism can accurately capture the semantic relationships in long sentences. The existing semantic relationship extraction methods have their own advantages and disadvantages. The accuracy of the traditional rule-based method is acceptable, but the cost of rule formulation and maintenance is high, and the scalability and generalization are poor. The feature-based learning model relies on artificially designed features, which has insufficient adaptability to complex semantics and is difficult to extract features when dealing with long and difficult sentences. Other deep learning-based models are inferior to BERT in handling long-distance dependencies, and their scalability and generalization are also limited. In contrast, BERT-based models have significant advantages in handling long and complex sentences, extensibility, and generalization.
In this study, we focus on two interrelated but independent directions: On the one hand, through fine semantic relationship extraction, we can reveal the hidden meaning levels in long English sentences and provide more comprehensive and accurate information support for subsequent translation activities. On the other hand, based on these semantic clues obtained from in-depth excavation, the process of machine translation is optimized to produce a more natural, smooth and close translation of the original meaning. To this end, we will take the following steps to conduct research:
The BERT model is fine-tuned to strengthen the semantic relationship extraction task of English long sentences. This includes, but is not limited to, sub-tasks such as entity recognition, relationship classification, and event triggering, each of which aims to cut into a different perspective and dissect sentence structure in an all-around way. Subsequently, a large amount of English corpus was collected, covering various styles such as news reports, academic papers, novels, and literature, which was used as the basis for model training and verification. Through contrastive experiments, quantify the performance differences of BERT relative to other models when dealing with long sentences, with special attention to its ability to deal with dependent clauses, appositive modifiers, juxtaposition components, etc. Based on the semantic relationship information obtained in the first stage, we design a new attention mechanism and integrate it into the Neural Machine Translation (NMT) framework. This mechanism allows the translator model to focus on allocating computing resources according to the semantic importance distribution of source language sentences in the process of generating target language texts and reduce the accumulation of errors caused by long-distance dependence. The expected result is that the translation system can better retain the original style while ensuring the completeness and accuracy of information transmission. Through a series of well-designed experiments, including quantitative evaluation indicators (such as the BLEU score) and qualitative human review, we will comprehensively evaluate the effectiveness of the new method and conduct a comparative analysis with other mainstream translation technologies. In addition, considering the application requirements of machine translation in actual scenarios, we will also examine the robustness, scalability and real-time processing capabilities of the model in order to build an efficient and practical solution.
Accurate semantic understanding is key to ensuring that the translation conveys the intent of the original text as it is, especially in complex and nuanced contexts. When machine translation is faced with long English sentences, if the semantic relationships cannot be accurately extracted, the translation result is likely to have problems such as information bias and logical confusion, and the target language reader cannot receive the same information as the original reader. For example, in translation scenarios such as scientific and technological literature and literary works, long sentences appear frequently and the semantics are subtle, and the accurate extraction of semantic relationships can make the translated text not only retain the professionalism of the original text, but also ensure the smooth and natural expression. Therefore, overcoming the challenge of extracting semantic relations from long English sentences is an important step to improve the quality of machine translation and achieve more accurate and efficient translation, and it plays a non-negligible role in promoting the development of machine translation technology. This study has a clear research question, that is, it is committed to solving many challenges in semantic relation extraction of long English sentences. This research question is of great significance, because long English sentences contain complex semantic relationships, and accurately extracting these relationships is the core task of improving the accuracy and fluency of machine translation.
The core goal of this study is to explore a path to deeper semantic understanding and higher-quality translation output, relying on the advanced features and powerful potential of the BERT model. We believe that through unremitting efforts and innovation, future machine translation systems will be able to be closer to human cognitive habits and make significant contributions to promoting the barrier-free circulation of information worldwide.
Related theoretical techniques
BERT model
BERT is a pre-training technology for natural language processing derived from context representation learning, including semi-supervised sequence learning, Generative Pre-training, ELMo, and ULMFiT. Different from previous models, BERT is deeply bidirectional, unsupervised, and only pre-trained with unlabeled text.8,9 Context-free models (word2vec and GloVe) assign fixed vectors to dictionary words, which is easy to lead to ambiguity. BERT’s Transformer architecture is based on the 2017 Self-Attention mechanism. Self-Attention has achieved great success in tasks such as translation, providing a new framework for many NLP tasks. It is a mechanism that focuses on each position within a sequence and is used for sequence representation. 10 Self-Attention is widely used in reading comprehension, summarization, text implication, and other tasks. Self-Attention-based end-to-end networks excel in simple question-answering and language modeling.
The transformer relies on self-attention to compute a model of input and output without RNN or convolution. Its encoder is stacked by Z identical layers, and each layer contains two sublayers: a multi-head self-attention layer, which integrates word information in the same layer to generate a context vector. Fully connected feedforward neural network integrates the context generated by self-attention with the current word information to generate a comprehensive context state. In order to speed up training, residual linking and layer normalization are used, which is defined as LayerNorm (x + SubLayer (x)). After L-layer stacking, the information is further abstracted and fused.11,12 The decoder also contains Z identical layers, each with three sublayers. The first sublayer is multi-head self-attention, which generates context vectors but only looks at the generated word information, and the ungenerated words are masked with masks. The second sublayer, multi-head self-attention, compares the hidden state of complex sentences with that of simple sentences and generates the context of complex sentences. The last sublayer is fully connected and feedforward, which integrates simple sentence context, complex sentence context and current word information to optimize the next word prediction.13,14 Similar to the encoder, residual connection and layer normalization are used for the decoder’s three layers.
As a powerful pre-trained language model based on the Transformer architecture, the BERT model has functions such as bidirectional encoding and can deeply understand text semantics. BERT has demonstrated strong applicability in the task of extracting semantic relationships from long English sentences. In order to adapt it to the task of semantic relation extraction, it is necessary to clarify the specific steps in the extraction process. In the input representation phase, the text data is converted into a format suitable for BERT model processing, ensuring that the information is conveyed to the model completely and accurately. 15 In the model training process, according to the specific model architecture, the architecture includes components such as multi-layer transformer encoders, carefully set hyperparameters, such as learning rate and hidden layer size, and use a large amount of annotated data to train according to specific training programs, so that the model can learn rich semantic relationship patterns. Long sentences often contain complex grammatical structures and semantic information, and BERT, with its powerful language understanding capabilities, can deeply encode individual components of a sentence to accurately identify semantic relationships between different entities. When working with sentences, BERT is able to accurately extract the semantic relationships between words, such as the creative relationship between words, and the thematic associations between words. In order to solve the challenges brought about by long English sentences, such as capturing long-distance dependence, the multi-head attention mechanism of the BERT model allows the model to pay attention to different position information at the same time, effectively solving the problem of long-distance dependence. In terms of managing computing resources, optimized algorithms and hardware configurations are adopted to reasonably allocate computing resources to ensure that the performance of the model is not wasted when processing long sentences.
Machine translation technology
Cross-language conversion, that is, machine translation, has gone through three stages: Rule-Based Machine Translation (RBMT) and Statistical Machine Translation (SMT) to today’s mainstream deep learning-driven Neural Machine Translation (NMT). NMT is superior in accuracy. 16
EBMT in the origin period relies on rules written by linguists for translation, but it faces insufficient rule coverage and one-to-many or many-to-one problems, which are limited by quality and writing time.17,18 Rules cover morphology, syntax and common phrases. The methods are divided into two categories: conversion rules and intermediate languages, all of which need to be supported by manual rule bases and dictionaries.
The rise of SMT in the mid-term is the core of statistical analysis of bilingual corpus, the construction of probabilistic models to evaluate translation options, and the selection of high-probability translation, such as the source channel model proposed by Peter, to describe the translation probability process. Assuming that the source language is x, the source language is converted into the target language y through this model. In other words, the source language x obtains the target language y after channel coding. Then, the corresponding SMT method decodes the target language and restores it to the source language x.
19
The translation process of this SMT method is actually a process of encoding and then decoding. The objective function Pr (x|y) is translated, and the objective function is divided into two terms through the Bayesian formula. The calculation method is shown in equation (1).
Among them, argmax is used to solve the points of a certain function domain where the function value is maximized, Pr (y) is the language model, and Pr (y|x) is the translation model. Take the logarithm of both sides and then you can get the required log-linear model, which is a commonly used processing method in engineering. 20
NMT technology abandons manual design of features, can automatically extract information, has low requirements on data quality, and has strong anti-noise performance. NMT mainly adopts RNN, CNN, and the self-attention model. 21 The basis of its architecture is the encoder-decoder. When NMT deals with variable-length sequences, it adopts a method based on the encoder-decoder (order-to-order model). The encoder converts the input sequence into a fixed-length vector through RNN, and the decoder generates the output based on this fixed-length vector. When adapting to machine translation, the source language is input, and the target language is output, which is used to realize automated translation training.
This framework overcomes the original limitation of RNN, obtains the context vector C by encoding, and then decodes and outputs the translated sentence. Encoder-decoder has become the standard configuration of NLP, which can compress variable-length input into fixed-length vector C, encoding the whole picture of information. C is fed to a decoder to produce an output, adapting unequal-length sequences. However, sequences that are too long will affect translation due to information loss, and the attention mechanism dynamically calls C to customize vectors for each target word and optimize information storage. Aiming at the gradient problem of RNN, LSTM introduces gating to regulate the information flow, which has advantages in dealing with long-distance dependencies and alleviates the problems of gradient disappearance and explosion.
The calculation of the forgetting gate of LSTM is shown in equation (2):
σ is the sigmoid activation function. Where x
t
represents the input vector of t, h
(t-1)
is the output of the cyclic cell at t-1, W
(ch)
is the weight, b
f
is the bias, and W
(cx)
represents the weight matrix connecting the input gate and the cell state. The input gate controls the information that needs to be added at the current time, and the calculation process is shown in equation (3).
W
(ix)
represents the weight matrix, the memory gate selects how much current information is memorized, W
(ih)
represents the weight matrix input to the hidden layer, and b
i
is the bias. In order to obtain the information that needs to be memorized at present, update the old information and obtain the new memory information, we mainly select to forget part of the information through the forgetting gate and then calculate the newly added information through the input gate. The specific calculation is shown in equation (4).
c
t
stands for Cell State, which is the core component of LSTM and is responsible for carrying information between time steps. f
t
is the function value; it represents the input gate value, ⊙ output represents the element-level multiplication operation and c
t
, c
t-1
is the information at time t and time t-1 of the input gate. Under normal circumstances, the final output information is calculated through the output gate, and the calculation formula is shown in equation (5). Where b
o
represents the bias vector of the output gate, and W is the weight matrix.
Machine translation evaluation index
In machine translation, semantic understanding is crucial and is directly related to the quality of the translation. Extracting semantic relationships is a key part of improving the quality of machine translation, especially when dealing with long and complex sentences. Long and complex sentences often contain multiple layers of modifications, multiple logical relationships, and subtle semantic associations, which machines need to accurately grasp in order to produce high-quality translations. The BERT-based model can be used to extract semantic relationships by first analyzing the input text, and using the powerful language understanding capabilities of the BERT model to mine the semantic connections between words, phrases, and sentences between parts of sentences, such as causality, juxtaposition, and modification. Then, the semantic relationships extracted based on the BERT model are integrated into the machine translation pipeline, and these semantic relationships can play an important role in the translation process. When encountering words or expressions that may have ambiguity, the extracted semantic relationships can accurately determine their meaning in a specific context, so as to avoid mistranslation. When generating translations, semantic relationships can enhance the authentic expression of the translation, making it more in line with the expression habits of the target language, and ensuring that the original meaning of the text is completely preserved, thereby significantly improving the quality of machine translation and providing users with more accurate and natural translation results.
BLEU is used to measure the quality of transformation from source language to target language without semantics changing and is widely used in machine translation evaluation. 22 The BLEU score measures similarity based on precision and is mainly used to evaluate the quality of machine translation. It can more accurately reflect the translation quality by weighting and matching the number of n-grams, thus becoming a key indicator in natural language processing, and has been widely used in many fields. Its score ranges from 0 to 1, and the greater the value, the better the translation effect.
In the process of BLEU calculation, for a sequence to be tested, it is assumed that the text to be translated is c
i
, and its corresponding reference text is S
i
= {s
i1
,s
i2
,s
im
}. The n-gram language model represents a set of n words, assuming that W
k
is the possible n-gram of the k-th group, h
k
(c
i
) is the number of occurrences of W
k
in the text c
i
to be translated, and h
k
(s
ij
) represents the number of occurrences of W
k
in the reference text s
ij
.
23
Therefore, the calculation process of the coincidence accuracy of the corresponding text sentence is shown in equation (6).
Among them, N can take 1, 2, 3, 4, and w n generally takes the average value of all n.
Semantic machine translation of English long sentences based on BERT model
BERT model construction
Computational efficiency is critical in machine translation systems, especially when dealing with long and complex sentences. To this end, we analyze the computational complexity of the BERT-based semantic relational extraction model in detail. BERT is based on the Transformer architecture and consists of multiple layers of encoders with a large number of parameters, which makes storage and computing computationally demanding. During training, the processing of large amounts of data, gradient calculation, and parameter update consume a lot of resources such as GPUs. In the inference stage, the input text is encoded layer by layer to obtain the result, which also requires sufficient resources. Analyzing the computational complexity of the model can help us rationally allocate resources, optimize performance, and meet the efficiency requirements of machine translation systems, especially in scenarios where long and difficult sentences are processed.
The model involved in this paper uses a BERT input layer to capture semantic information, and the encoded-word vector is dimensionally reduced and whitened so that it can better adapt to the text similarity task. The output results after the clustering operation is used for topic modeling and sentiment analysis with the help of c-TF-IDF and BERT + LSTM.
When utilizing the BERT model for semantic relational extraction, the preprocessing of the dataset involves several key stages. In the text cleaning stage, regular expression tools, such as the re module in Python, are mainly used, because they can flexibly and efficiently match and remove noise data such as HTML tags, special characters, and garbled characters, so as to ensure the purity of the text and lay a solid foundation for subsequent processing. For English texts, tools such as NLTK and spaCy are often used, NLTK is rich in functions and has a variety of word segmentation algorithms, while spaCy is known for its efficient processing speed and accurate lexical analysis, which helps the model better understand the basic building blocks of the text, so as to achieve more accurate semantic analysis. In the sentence segmentation step, with the help of the sentence segmentation tool in NLTK, the continuous text is divided into independent sentences according to the rules such as punctuation, because the BERT model mostly uses sentences as the basic processing unit, and reasonable segmentation can ensure the input normativity and facilitate the subsequent extraction of semantic relationships in sentences. Finally, semantic relationship annotation, which can be manually annotated or using tools such as Prodigy, manual annotation can ensure accuracy and reliability, and annotators can label the relationships between entities in the text according to the types of semantic relationships such as causality, juxtaposition, and affiliation, so as to provide supervision signals for the BERT model to learn semantic relationships, help the model accurately identify the semantic relationships in various texts, and build high-quality datasets for the BERT model through this series of preprocessing steps. Improve the accuracy of model training effect and semantic relationship extraction.
Figure 1 provides an overview of the overall architecture. The core of the difference between Chinese and English datasets at the input level lies in word segmentation. English word segmentation is relatively simple, mainly involving punctuation processing. However, Chinese requires complex preprocessing, that is, word segmentation, which is due to the difference between Chinese and English structures.24,25 There is no obvious separation between words in Chinese, so complicated algorithms must be used. The model tool, which treats word participles as binary decisions, uses three types of features. In this study, the Jieba word segmentation package is selected to process Chinese text. The BERT model uses the WordPiece algorithm to build a vocabulary, and its input contains a token representation. In order to perform the classification task, the [CLS] token is added before the input sequence and output at the final Transformer layer to integrate the sequence information. The input also includes automatically learned text vectors and location vectors for distinguishing location semantics, thereby fusing global and local semantics. The BERT input layer integrates tags, segments, and position embeddings and refines English vocabulary into WordPiece units. Overall architecture of the model.
Word embedding techniques, including static (e.g., Word2Vec and GloVe) and dynamic (e.g., ELMo and BERT) methods, are able to represent words as semantically rich high-dimensional vectors. Dynamic embedding adjusts to context and effectively handles polysemy. Polysemy in the commentary text needs to be comprehensively analyzed. 26 As an embedding layer, the BERT encoder uses self-attention and multi-head attention to capture context based on the Transformer. It is composed of multiple layers of structures containing multi-head attention and feedforward networks, which collaboratively capture lexical dependencies, generate rich word vectors, and effectively transform semantics into vectors. 27 After pre-training, it can be adapted to downstream tasks. Its parallel training speed significantly exceeds that of LSTM. It employs position embedding to preserve word order, and the self-attention mechanism cooperates with the fully connected layer to compute.
When discussing the interpretability of semantic relationships, we focus on how to visualize and explain the semantic relationships extracted by our method. To improve interpretability, we use the attention mechanism in the BERT model to highlight the words and phrases in the input sentence that are most relevant to the extracted semantic relationships. At the same time, we developed a visualization tool that presents these attention weights in an intuitive and easy-to-understand way.
Figure 2 presents the architecture of the Transformer based on BERT. The model uses residual connection and layer normalization techniques to speed up training and improve performance. The essence of Transformer is an encoder-decoder structure, which is composed of Attention mechanism. The encoder converts the language sequence into a mathematical representation, and the decoder generates the target sequence from this representation. Transformer architecture Based on BERT.
The methods of knowledge similarity include the path-based distance method, content-based information method and attribute-based feature number method.28,29 Cosine similarity is commonly used. Its principle is to regard the text as a space vector and measure the similarity by calculating the cosine value of its included angle. When the cosine is close to 1, the angle tends to 0, indicating a high similarity. When the value approaches 0, the angle tends to 90°, indicating dissimilarity. Assuming that the text is c and its text is denoted as h
c
, then the conditional distribution of word x over context c is shown in the following equation (9).
p (x) denotes the word frequency, and λ is the term related to c. PMI is embodied in the co-occurrence of words and contexts, and the larger the PMI, the greater the probability of co-occurrence, which means the greater the semantic similarity. Although the BERT model is very similar to the goal of cosine similarity in pre-training, and the sentence vector trained by BERT contains similar information between texts, the actual result is not ideal. The reason for this is that there are some problems in BERT representation.
30
To solve this problem, this section adopts the treatment method of whitening. The geometric meaning of the inner product of vectors x and y is the modulus length multiplied by the cosine of the included angle, so the calculation equation (11) corresponding to the cosine similarity of vectors is:
d is used to calculate the distance measure between the two inputs, x and y. Where the equal sign holds only on a standard orthogonal basis, since this formula is for calculating coordinates, it depends on the coordinate basis. Therefore, in order to solve this problem, the mean of the sentence vector generated by BERT can be transformed into 0, and the covariance matrix can be transformed into the identity matrix. Suppose the set of vectors is
Semantic extraction and quality optimization analysis of English long sentences based on Bert model
Under the framework of English long sentence processing based on the BERT model, deep semantic understanding is the core foundation of the entire translation optimization process. BERT is a pre-trained language model that captures the complex semantics of words through a context-aware approach, which is essential for accurately extracting semantic relationships in long sentences. Unlike traditional one-dimensional sequential models such as LSTM or GRU, BERT takes context into account when processing each word, and its bidirectional encoding allows the model to understand the true meaning of the word and its relationship to the surrounding elements in more nuanced ways in the face of long sentences. Our approach leverages BERT’s powerful linguistic representation to address the challenges of complex and fuzzy semantic relationships in long English sentences. The BERT model is pre-trained on a large text corpus and then fine-tuned for our specific semantic relationship extraction task, so that it can learn to recognize and distinguish various semantic relationships in long and complex sentences. In the face of fuzzy semantic relations, the method adopts a multi-layer attention mechanism to make the model pay attention to different parts of the sentence and their interactions, and eliminate the ambiguity of semantic relations by considering the context, for example, in long sentences containing multiple clauses and nested structures, this mechanism can help the model identify related clauses and their semantic connections.
When evaluating the quality of semantic relational extraction of BERT models, it is important to define appropriate evaluation metrics. These metrics provide a visual measure of model performance. We used precision, recall, and F1 scores. Accuracy refers to the proportion of the number of extracted correct semantic relationships to the number of all extracted semantic relationships, which reflects the accuracy of the model extraction results. The recall rate is the ratio of the number of extracted correct semantic relationships to the actual number of all correct semantic relationships that should be extracted, reflecting the coverage of the real semantic relations by the model. The F1 score combines precision and recall, and evaluates the model comprehensively through the harmonic average of the two, which can consider the performance of the model in terms of accuracy and completeness in a more balanced manner, and help us accurately judge the quality of the BERT model in the semantic relationship extraction task.
According to the characteristics of English long sentences, this study designs a set of hierarchical entity relationship extraction strategies, aiming at gradually constructing a complete semantic map from the micro level. The first step is basic unit identification, that is, identifying individual entities and concepts in a sentence. This step utilizes BERT’s pre-trained weights are used to predict which type of entity each word belongs to, such as a name, place, or organization. The second step is relationship classification, where the model is trained to determine the various connections that exist between entities, such as belonging relationships, “locating” relationships, or “creating” relationships. In order to accommodate the possible multi-hierarchical relations in long sentences, this study adopts a recursive structured prediction method to ensure that even the relations deeply hidden in clauses or non-subject-predicate-object structures can be accurately captured. Besides the extraction of entities and relations, semantic role labeling is also an indispensable link in understanding long sentences. Semantic role labeling helps us to clarify the roles of each argument in verb phrases, such as agent, recipient, position, and time, which is very important for correctly interpreting the specific situation in which actions occur. The strength of BERT here is that it can flexibly adjust the label of each argument according to the change of context, overcoming the limitations of traditional methods when dealing with homographs or polysemies. By combining contextual information, the BERT model can infer the semantic roles most suitable for the current situation, which lays a solid foundation for subsequent translation work.
This study establishes a global consistency control mechanism to ensure that the translation is consistent, clear, and coherent when referring to concepts repeatedly. Long English sentences are also faced with the difficulty of integrating into the source context, especially involving ellipsis, metaphor, etc. We developed a context analysis module based on BERT model to reconstruct the missing parts and make the translation completer and more natural. Simple semantic extraction is tested with simple sentences such as “Apple is a fruit,” single sentences such as “I like to eat sweet strawberries” to test the processing ability of clauses, parallel sentences such as “Xiao Ming and Xiao Hong love mathematics” to study the understanding of juxtaposition relations, and long difficult sentences such as “the old man watched the children play in the afternoon” to test the processing of complex scenes and character relationships. The semantic understanding and relation extraction of nested structures are considered with nested sentences such as “the scientist’s theory has been proved wrong.” After rigorous evaluation, we compared the performance of our proposed method with the baseline method in terms of accuracy, accuracy, recall rate and F1 score, and the results show that the proposed method is robust to sentence structure and length variation, and can maintain high performance in short and long sentences, which is a significant improvement over the baseline method.
In terms of effectiveness, BERT’s powerful language representation ability can accurately extract the semantic relationship of English long sentences, provide more contextual information for machine translation, and improve the translation quality. Its flexibility is reflected in the fact that it can adapt to different fields and languages, and is suitable for various application scenarios. In terms of scalability, based on the deep learning model BERT, it can efficiently handle large-scale datasets. However, there are weaknesses in this approach. In terms of computing resources, both training and reasoning require a lot of resources, which is a barrier for users with limited hardware capabilities. At the level of data requirements, its performance is highly dependent on the quality and quantity of training data, and insufficient data or noise can lead to performance degradation.
BERT model defect and improvement point analysis
Although the BERT model performs well in a variety of NLP tasks, it also has limitations. The context of semantic relation extraction depends on embedding, which makes it difficult to capture the subtle meaning of lexical phrases in complex sentences. The pre-training goal of mask language modeling focuses on word prediction, which is not completely consistent with the semantic relation extraction goal, which brings challenges to the recognition and extraction of semantic relations in long and difficult sentences. In machine translation, it is difficult for Bert-based models to guarantee the semantic fidelity of original text due to the complexity of language and context-dependent language features. This study introduces mitigation strategies, adds additional linguistic features, ADAPTS models to specific domains through fine-tuning techniques, and uses a human-in-the-loop approach to improve translation accuracy and quality.
We collected a variety of long English sentences from different fields such as science, technology, finance, law, and more, and applied our methodology to these sentences to assess their performance in terms of accuracy, precision, recall, and F1 scores. The results showed that the method performed differently across domains, performing better in domains where language was more formal and structured, such as science and finance, than in areas of spoken and idiomatic language, such as social media and informal conversations. We believe that this difference stems from the fact that domain-specific language use introduces unique linguistic features and patterns that are not well represented in the general corpus in which BERT models are pre-trained, which makes it difficult for our method to accurately capture and extract semantic relationships in these domains, resulting in degraded performance. To mitigate this problem, we propose fine-tuning the BERT model on domain-specific data, incorporating linguistic features of the relevant domains, and using a cyclic approach to improve the accuracy and robustness of the method.
Experimental results and analysis
In order to fully evaluate the performance of the proposed method based on the BERT model, we systematically compare it with the results of human annotators. From the accuracy dimension, the correctness of the judgment of the semantic relationship between the two is analyzed, and the proportion of the extracted correct semantic relationship in all the extracted semantic relationships is analyzed from the accuracy level, and the proportion of the extracted correct semantic relationship is compared to all the correct semantic relationships that should be extracted by the recall insight, and the performance is comprehensively weighed by the F1 score. Our approach presents advantages. Through a detailed analysis of the experimental results, we compared the translation quality of the proposed BERT-SS method with the baseline method, and supported the findings with quantitative and qualitative results. As can be seen in Figure 3, BERT-SS achieved the highest SARI score on the Newsela dataset, which is an excellent performance. On the WikiLarge dataset, however, the performance is not good. It is found that the compression ratio of WikiLarge is 0.93, which means that in the process of simplifying the conversion, there are fewer delete operations and more replacement operations, which in turn affects the deletion and fill effect of BERT-SS. For the Newsela dataset, BERT-SS outperformed all supervised and unsupervised methods, with an FRE score of 68 for its simplified results. Since a higher FRE score indicates easier to read text, and a score of 70 indicates that it is extremely easy to read, this score indicates that the simplified results of the BERT-SS output are more readable. Experimental results on dataset.
We also explore the impact of different hyperparameters on the performance of the BERT model for extracting semantic relations, including the learning rate, batch size, number of epochs, and the specific configuration of the BERT model itself, such as the choice of layers and heads in the attention mechanism. For each hyperparameter, a series of experiments were carried out with different values within the specified range, and then the performance of the BERT model in semantic relation extraction was evaluated from the terms of accuracy, precision, recall and F1 score. The results show that the selection of hyperparameters has a significant impact on the performance of BERT models, such as higher learning rate can accelerate convergence but is prone to overfitting, smaller batch sizes can improve versatility but increase training time, and the choice of specific layers and attention head mechanisms will affect the model’s ability to capture semantic nuances. Figure 4 shows the results of each simplified system. BERT-LS achieved the highest accuracy on all datasets, which improved by 17.2%, 41.9%, and 30.1% compared to PaetzoldNE, respectively. Rec-LS achieves higher accuracy but lower accuracy because it mostly uses primitive words and does not simplify them effectively. On the whole, the systematic results are similar to the candidate word generation situation, which confirms that the ranking method in this study is effective. Evaluation results of the whole simplified system.
Through a detailed analysis of the experimental results, we compared the translation quality of the proposed BERT-LS method with the baseline method, and applied quantitative and qualitative results to support the findings. As can be seen from Figure 5, BERT-LS uses all five features to achieve an average accuracy of 0.696 and an average recall of 0.615 on three datasets, achieving a good balance between accuracy and recall. In the past simplification system, word meaning similarity and word frequency are the more common features, and it is found that any reduction in features will lead to a decrease in accuracy, and these two features have the most significant impact. For the first time, new features were applied, namely, BERT prediction ordering, BERT context-based generation probability, and PPDB. Experimental data show that without the use of these new features, the accuracy and recall are reduced to 0.662/0.608, 0.686/0.593, and 0.679/0.606, respectively. These new features can effectively rank candidates, but PPDB has a relatively minimal impact, perhaps because the strategy is relatively simple, so different strategies can be tried to introduce PPDB in the future to further optimize the performance of the BERT-LS method. Influence of different features on candidate words.
Evaluation results using different BERT models.
Through a detailed analysis of the experimental results, we compared the proposed method with the baseline method in terms of translation quality, and supported the findings with quantitative and qualitative results. As can be seen from the analysis of the effects of the different mask ratios shown in Figure 6, the experiment ranged from 0% to 100% and was performed on the LexMturk dataset, with the average of the five experiments being taken. During the experiment, the same sentence was fed into BERT, where the second sentence contained the context of the plural number, while the first sentence contained more contextual information in addition to the plural number. To avoid context repetition, mask the context of the first sentence. From the data in the figure, it can be observed that the overall performance changes slightly as the proportion of masks increases. When it comes to candidate word generation, accuracy, recall, and F-scores are highest when the mask ratio is in the range of about 50%–80%. In particular, the system accuracy peaks when the mask ratio is about 50%. This shows that our method shows good results in relevant performance indicators within a specific mask ratio range, which provides data support for subsequent optimization. Effect of different mask ratios on the system.
Explore the impact of the number of candidate words on the system, conduct experiments on the LexMturk dataset, and set the number of candidate words to be 1 to 5 times 5 in turn. Figure 7 shows its impact. Because the simplified words in the label are fixed, with the increase in the number of candidate words, the accuracy rate will inevitably decrease, and the recall rate will inevitably increase. From the generation results, when the number is 10, the F value reaches the maximum. In the system evaluation, the accuracy rate and accuracy rate reach the best when the number is 20 and then gradually tend to be stable, which shows that the BERT-LS system has good stability and robustness. Evaluation results of different number of generated candidate words.
In this paper, the file error correction model based on BERT is compared with the popular open-source project Pycorrector from the aspects of error detection and error correction, respectively. As shown in Figure 8, the accuracy, recall and F1 value of the model in this paper have been greatly improved. Compared with the Pycorrector error detection model, the accuracy rate has increased by 2.91%, the recall rate has increased by 4.17%, and the F1 value has increased by 4.04. This shows that the model in this paper is superior to Pycorrector in the dataset of administrative specification documents. Performance comparison chart of model in error detection stage.
Results of automatic evaluation of English dataset.
Figure 9 shows the results of the model’s automatic evaluation of indicators on the CECG Chinese dataset. Among them, the BERT-S2SA-gate model scored the highest score on all indicators, which further shows that the fusion unit of the “gate” method can filter out useless noise well, retain useful information, and help improve the quality of replies. In addition, the model proposed in this study is significantly better than the baseline model in all aspects, which shows that the introduction of BERT has played a good role. Compared with the basic Seq2Seq model, our proposed model can avoid generating repetitive, boring and general replies. On the CECG Chinese dataset, more data volume also improves the performance of the “gate” fusion unit to a certain extent. Automatic evaluation results of Chinese dataset.
Figure 10 compares the quality of responses generated by the BERT-S2SA-Cat model, the BERT-S2SA-Gate model, and the S2SA model through some sample responses. Among them, Case1 to Case3 are the results on the CMDC English dataset; Case4 to Case6 are the results on the CECG Chinese dataset. It can be seen that the replies generated by the BERT-S2SA-Cat model and the BERT-S2SA-Gate model have higher consistency and content relevance and are closer to daily human conversation. Analysis of translation results.
The experimental results strongly show that the proposed method has a significant effect on improving the quality of machine translation. This method has been tested in practice, and the resulting translations have excellent performance in terms of accuracy, fluency and semantic integrity when dealing with various types of texts, especially complex sentence structures, which strongly proves its effectiveness in improving the quality of machine translation.
Conclusion
This study deeply analyzes the performance of the BERT pre-trained model in dealing with long English sentences under the deep learning framework, and improves the understanding ability of the complex grammatical structure of the existing machine translation system, thereby improving the quality of translation output: (1) By using the powerful contextual coding ability of BERT, a large number of long English sentences are analyzed in detail. By fine-tuning the model to adapt to a specific task, compared with traditional rule-based or statistical methods, BERT can more effectively capture the dependencies and deep meanings between words in a sentence, even if it contains multiple clauses, parentheticals, or the case of flip-chip structure is no exception. Experiments show that on the standard SemEval-2010Task8 dataset, the model in this study achieves an accuracy rate of 87.4% in extracting relationship types between entities, which is about 12 percentage points higher than the baseline model without using the BERT infrastructure, fully demonstrating the advantages of BERT in semantic parsing. (2) With the help of the obtained rich semantic relationship information, this study further optimizes the performance of the Neural Machine Translation system. By designing the attention guidance mechanism, the translator model can selectively pay attention to the most relevant parts of the source language sentence in the decoding process so as to avoid the error propagation caused by long-distance dependence. Through the NMT system enhanced in this way, in the WMT ‘14 English-German translation task, the BLEU score increased from the original 22.5 to 26.8, an increase of 19%, which means that the translation coherence and fidelity have been substantially improved. (3) This study not only confirms the effectiveness of BERT in extracting semantic relations of long English sentences but also opens up a new way to improve machine translation by using such advanced features. In future work, we will continue to deepen the research on multi-modal information fusion, explore how to integrate other sensory signals such as vision and speech to assist translation decision-making and strive to build a more intelligent and universal language conversion system to promote international cultural exchanges and understanding. Make greater contributions.
When discussing semantic relationship extraction and machine translation quality optimization methods based on BERT model, there are many potential improvement directions and future research points. On the one hand, the inclusion of additional contextual information such as discourse relationships and pragmatic cues can help to extract semantic relationships more finely, enabling more accurate translations that capture the subtle intent of the original text. On the other hand, the multilingual BERT model is used to extend the application of the method to other languages, but this needs to be adjusted to fit the unique characteristics of different languages and ensure that the semantic relationship extraction is consistent across different languages.
