A computational framework for environmental corpus construction in business English interpretation based on deep learning

Abstract

In the context of sustainable development, the effective construction of environmental corpus for business English interpretation is crucial for promoting the standardization of professional language services. Traditional corpus construction relies heavily on manual collection and annotation, which is characterized by low efficiency and high labor costs. To address these issues, this study proposes a computational framework integrating web crawler technology and deep learning-based annotation. For corpus collection, a targeted crawler algorithm is designed to automatically extract and preprocess sustainable development-related business English texts from multi-source platforms, achieving a data coverage rate of 92.3% for key environmental business domains. For annotation, an Attention-LSTM hybrid model is constructed to realize semi-automatic labeling of professional terms and contextual relationships. The model is trained on a manually annotated sample set, and experimental results show that its annotation accuracy reaches 89.7%, which is 18.5% higher than that of traditional rule-based methods, and the manual correction workload is reduced by 63.2%. This framework not only improves the efficiency of environmental corpus construction in business English interpretation but also provides a computational solution for domain-specific corpus engineering, laying a technical foundation for intelligent language service systems in sustainable development scenarios.

Keywords

computational corpus construction business English interpretation attention-LSTM model web crawler technology sustainable development domain-specific corpus engineering

Introduction

Business English interpretation is a bridge and link between different cultures in business activities. Cross-cultural business communication needs to be translated to accurately convey the connotation of another unfamiliar or unfamiliar language and culture in a familiar language and culture, so that both parties in business activities can deeply understand the intentions and reply of each other. Sustainable development is a hot topic in the society in recent years, and also a topic often mentioned in recent business activities. Many business activities are affected by the concept of sustainable development, or need to carry out business actions related to sustainable development, environmental protection, energy conservation and emission reduction. These changes make business English adapt to the environment of sustainable development and keep up with the trend of society and business activities.

Since the 1980s, with the advancement of computer technology, the development of computer-based corpora and the exploration of their application technologies have become focal areas of research among information processing and language experts in China. A corpus refers to a large-scale text database that has been scientifically sampled and systematically processed. It typically employs statistical methods and partitioned learning strategies to extract linguistic patterns embedded in authentic natural language, thereby facilitating deeper research into language understanding and natural language processing. Large-scale corpora such as the British National Corpus (BNC) have had a profound impact on lexicology, linguistic studies, and language technologies.^1,2

In the context of sustainable development, constructing a domain-specific environmental corpus for business English holds considerable significance for both the advancement of business English as a discipline and its practical applications in society. The corpus construction process generally involves the acquisition of raw linguistic data, corpus preprocessing, and corpus classification, among other tasks.^3,4 However, building a business English corpus has long been a labor-intensive endeavor due to the vastness and complexity of the required resources, as well as the meticulous nature of classification work. Therefore, exploring automatic and intelligent methods to assist in corpus construction is of critical importance.

In addition to the general challenges of corpus construction, the domain of sustainable development presents several unique difficulties. Firstly, the environmental discourse in business contexts involves an abundance of specialized terminology—such as “carbon offsetting,” “circular economy,” or “life cycle assessment”—which may not appear in general-purpose corpora. Secondly, this field is inherently interdisciplinary, intersecting business, legal, scientific, and policy-oriented language registers. This results in complex syntactic structures, diverse pragmatic markers, and hybrid expressions that require refined annotation strategies. Hence, an effective environmental business English corpus must be sensitive to these domain-specific linguistic phenomena to ensure accurate interpretation and training of NLP systems.

In recent years, the rise of artificial intelligence has led to the development of intelligence and automation in many fields. Crawler technology is a program or script that automatically grabs information from the World Wide Web according to certain rules. It is often used to collect and obtain resources.⁵ There are also many machine learning based methods for corpus classification, such as the application of Bayesian network,⁶ decision tree classification,⁷ support vector machine,⁸ and so on. Zhang et al.,⁹ used Bayesian networks to realize automatic recognition of false news. Sun et al.¹⁰ proposed an efficient SVM based classifier for unbalanced corpora. However, although machine learning can achieve a certain degree of automation and intelligence, most of its features need expert recognition and manual coding. Deep learning is an end-to-end problem solving method. The trained neural network can automatically complete a variety of tasks, including classification and detection. RNN (Recurrent Neural Network),¹¹ LSTMs (Long Short Term Memory Networks)¹² and other models have been widely used in the field of natural language processing, enabling automatic speech recognition. Among them, RNN has the problem of long-term dependence, which LSTMs solved by introducing the concept of long-term memory. Although these deep learning methods can effectively achieve automatic speech recognition, their attention to all input features. This balanced focus way affects the improvement of speech recognition accuracy to a certain extent. How to apply automation technology and use automation and intelligent methods to help build a business English environmental corpus in a sustainable development environment is a problem we need to solve.

In order to solve the above problems, we propose a complete method to build a business English corpus in a sustainable development environment. The proposed method includes several key points. Firstly, the crawler technology is used to obtain the original corpus information and the obtained information will be manually screened next. In addition, in order to make the obtained expectation be processed by the deep learning algorithm, audio is converted into spectrogram as the main material for training and testing models. In order to help accurately label the corpus, the attention LSTM algorithm is proposed, which combines the attention mechanism to enable the LSTM algorithm to focus more on difficult samples and improve the accuracy of speech recognition. Finally, we designed a complete corpus construction scheme, and the proposed attention LSTM algorithm surpassed the original RNN and LSTM algorithm in accuracy.

Related work

Web crawler technology

Web crawler technology can capture network information more accurately and efficiently, and is widely used in information collection and acquisition. In recent years, many efficient crawler frameworks have emerged, the most commonly used of which is the Scrapy crawler framework. Scrapy is a framework designed to crawl network data and extract structural data. At the same time, it can help us quickly grab data with a small amount of code. Wang et al.¹³ used the algorithm based on the Scrapy crawler framework to collect Taobao user behavior information, analyze user behavior characteristics, and then study the network characteristics of e-commerce. This shows that the sketch crawler framework has the ability to collect a large amount of information. Ramdani et al.¹⁴ used the Scrapy framework to extract data from target websites and store the extracted data in comma separated files, so as to collect product information needed by college students and give feedback to users. This reflects the information collection ability of the crawler algorithm, but the author did not use this information to generate structured data. Sengupta et al.¹⁵ used the Scrapy framework to help design the system architecture of the campus network search engine, which solved the problem that ordinary search engines could not timely collect campus network information and the time lag. Although the application of this crawler algorithm has collected information, it does not reflect the idea of big data. The above methods fail to filter and filter the collected information and construct the final structural features. For the construction of business English environmental corpus in a sustainable development environment, the Scrapy crawler framework can help us obtain a large number of original video files from the website, and just extract the audio from them, which can be used as the original resources. The practical crawler technology, which collects resources according to keywords, can greatly save the time for manual collection of corpus. The output of the crawler process forms the input foundation for our annotation pipeline. Once the web crawler extracts relevant video and audio data centered around sustainable development topics, the corpus is subjected to preprocessing to ensure transcription quality and topic relevance. This cleaned and segmented corpus serves as the input to the deep learning annotation module described in the following section. In this way, the crawler technology is not an isolated component but an integral part of the end-to-end corpus construction pipeline that ensures data availability, relevance, and scalability for subsequent model training.

LSTM recurrent neural network and attention mechanism

LSTM recurrent neural network is designed to solve the long-term dependence problem in RNN. Zhai et al.¹⁶ used LSTM recurrent neural network to extract entities and their relationships, and captured the relationship between word sequences and dependency tree substructures. This application shows that LSTM has a good retention and capture effect for long dependencies in statements in natural language processing. Peddinti et al.¹⁷ used LSTM model for acoustic modeling. Mei et al.¹⁸ solved the pixel level classification problem of scene images by using LSTM recurrent neural network. This application makes use of the ability of LSTM to capture context information. This ability not only has good effect in natural language processing, but also has certain application value in the field of semantic segmentation. In general, LSTM model has good performance, but in the absence of attention mechanism, the limited computing power cannot be reasonably allocated, which is called the bottleneck of improving its prediction accuracy.

In recent years, transformer-based architectures such as Wav2Vec 2.0¹⁹ and Whisper²⁰ have significantly advanced end-to-end speech recognition performance, particularly in low-resource and domain-specific tasks. These models leverage self-supervised learning and large-scale pretraining, which are promising directions for future corpus annotation frameworks. Additionally, corpus linguistics research has increasingly emphasized contextual metadata, multi-modal input, and user-generated content for domain corpus enrichment.

Attention mechanism

Attention mechanism aims to allocate the limited computing resources reasonably, which can tilt more resources of neural network to the analysis of difficult samples. The earliest attention mechanism originated from the study of human vision, and was later widely used in various fields of computer. Li et al.²¹ strengthened the machine translation algorithm by using the attention mechanism to increase the translation accuracy of English prefabricated parts. Bahdanau et al.²² designed an end-to-end speech recognition method based on attention mechanism to achieve large vocabulary continuous speech recognition. This method is also combined with RNN. Sivakumar et al.²³ improved sentiment analysis methods using attention mechanisms, which applies a sentence embedding using universal sentence encoder along with an attention layer. Jia et al.²⁴ combined LSTM and attention mechanism to improve the accuracy of emotion analysis. Attention mechanism is used to enhance context relevance in different positions of sentences. For the construction of business English corpus in a sustainable development environment, some words or sentences that are not obvious in the original corpus²⁵ and difficult to be recognized by neural network can be more accurate through the attention mechanism module. The improvement of this accuracy is of great significance to the recognition and marking of auxiliary corpora.

Corpus construction

The overall process of building a business English environmental corpus in a sustainable development environment is shown in Figure 1. The overall corpus construction method consists of the following parts. First of all, since there is no relevant public data set, the original corpus needs to be obtained through crawling and manual processing. After obtaining the initial corpus, it is necessary to extract the features of the corpus to generate features that can be recognized and processed by deep learning. The next step is to select some corpus for manual annotation. The corpus to be labeled in this step accounts for one fifth of the whole corpus. The next step is to use the marked corpus to further train the depth learning model that has been pre trained. The final step is to use the trained model to label the remaining corpus and add it to the corpus.

Figure 1.

Overall flow chart of the construction method of business English environment corpus in sustainable development environment.

The biggest advantage of the corpus construction method we designed is the assistance of the LSTM combining with the attention mechanism and a new attention expression is proposed. With this help, the most tedious work of corpus tagging will save a lot of labor costs. Using this method, we only need a few annotations to train the model, and then use the trained model to label the rest of the corpus automatically. Only the labels automatically generated by the model need to be checked manually at the end, which is not the same magnitude as the workload of manual labeling. In the next two sections, we will introduce in detail the implementation of each step of the construction method of business English environmental corpus in a sustainable development environment.

Corpus collection and processing

The construction of the environmental corpus of business English interpretation in the context of sustainable development is highly professional and complex. The text contains a large number of professional words, and the content is complex. In view of this fact, and in the absence of a relevant public corpus, the construction of the corpus requires the collection of a large number of original corpus resources. To solve the above problems, this paper uses the web crawler technology to crawl the video available on the network, and extract the audio. We selected some open speech websites and video websites to select videos based on sustainable development and English. After obtaining a large amount of audio data, we manually screen the collected audio and select coherent and clear audio files as the original materials. Finally, 1000 h of relevant audio were obtained through processing, including reading audio and conversation audio. Figure 2 shows the flow chart of language source acquisition. See Table 1 for the details of the original language sources.

Figure 2.

Original corpus acquisition process.

Table 1.

The corpus source structure schematic.

Type	Hours	Speakers
Read	600	1400
Conversational	400	4300

Feature extraction

Sonogram is the data recorded in different colors for the whole audio range. To build an environment prediction database of business English in a sustainable development environment, it is necessary to convert the collected and processed original corpus into spectrograms as the input features of the in-depth learning model. Sound signal is a one-dimensional signal, which can only see time domain information intuitively, but cannot see frequency domain information. Through Fourier transform (FT), it can be transformed to frequency domain but cannot see the time domain relationship. Therefore, the short-time Fourier transform is used to divide a long speech signal into frames, add windows, perform Fourier transform on each frame signal, and then stack the results of each frame signal along another dimension to obtain a sonogram that can be used as a two-dimensional signal. The Fourier transform formula can be expressed as

\hat{f} (k) = < f, E_{k} > = \int f (x) e^{- 2 π i k \cdot x} d x

(1)

The complete audio feature extraction, that is, the process of converting audio files into spectrograms, is shown in Figure 3.

Figure 3.

Schematic diagram of audio file transferring spectrogram.

Attention LSTM corpus classification algorithm

In order to improve the accuracy of automatic tagging of business English environmental corpus under the sustainable development environment, we propose LSTM corpus classification algorithm with attention mechanism. The classification algorithm is based on LSTM algorithm, on which the attention mechanism module is added, so that the neural network has the ability to focus on input features. The structure diagram of Attention LSTM is shown in Figure 4.

Figure 4.

Attention LSTM structure diagram.

LSTM (Long Short Term Memory) is a neural network with the ability to remember long and short term information. The network introduces a gate mechanism to control the circulation and loss of features, which solves the long-term dependence of RNN. Because most of the original business English corpora we use are long sentences and complex structures, solving the problem of long-term dependency can greatly improve the accuracy of automatic tagging of the corpora.

Unlike general-purpose attention modules, our framework incorporates a term-aware attention weighting scheme, which adjusts the focus of the LSTM decoder based on the estimated domain relevance of incoming tokens. Specifically, attention weights are modulated by a term-likelihood factor derived from a domain-specific keyword embedding layer, allowing the model to more effectively prioritize rare and context-sensitive terminology (e.g., “carbon neutral,” “emission trading”). Furthermore, our mechanism integrates signal quality indicators (e.g., signal-to-noise ratio estimates) into the attention score computation, enhancing robustness for low-fidelity audio samples. This task-specific dual adaptation—semantic sensitivity and signal-aware gating—differentiates our model from standard attention-LSTM architectures.

This paper proposes an attention mechanism for context extended input features. The purpose of this module is to give different attention to frame features from different times. This attention mechanism weights the frame $L$ through the attention weight $α_{i}$ of the input feature $x_{i}$ of the neural network. The weighted feature ${\hat{x}}_{i}$ replaces the original input $x_{i}$ as the input of the LSTM network, and the specific steps are as follows

e_{u} = A t t e n t i o n (x_{i}, s_{t - i}, α_{t - 1}),

(2)

α_{t l} = \frac{\exp (e_{t l})}{\sum_{l = 1}^{L} \exp (e_{t l})},

(3)

{\hat{x}}_{t l} = α_{t l} x_{t l},

(4)

y_{t} = L S T M ({\hat{x}}_{l}) .

(5)

where

A t t e n t i o n ()

represents Attention Computation in Attention Mechanism,

L S T M ()

represents LSTM neural network for speech recognition. According to (2), the attention score

e_{i}

depends on the input

x_{i}

of the current time, the prediction value

s_{t - 1}

of the previous time and the attention weight value

α_{t - 1}

of the previous time. (3) means that the attention score

e_{i}

is normalized to the attention weight value

α_{t l}

between 0 and 1 through the Softmax function. Equation (4) uses the obtained weight value

α_{t l}

to weight the feature vector

x_{t l}

in the input

L

frame feature, and finally obtains the weighted feature representation

{\hat{x}}_{i}

. The final prediction results

y_{t}

can be obtained by inputting

{\hat{x}}_{i}

into the LSTM model.

Experiment and analysis

Due to the relatively small amount of data related to sustainable development that we have collected, we have used the LSTM model with frozen weights as our skeleton model. After adding the attention structure, we only use a relatively small number of labeled samples to train, and then use the trained model as the final model to participate in the evaluation. All experiments were completed in the following environments: Intel (R) Xeon (R) Bronze 3204 CPU @ 1.90 GHz, 32 GB RAM, GPU Tesla V100, CentOS Linux release 7.6.1810.

Analysis on the accuracy of corpus construction methods

In order to verify the effectiveness of our proposed corpus building method, that is, the effectiveness of using deep learning methods to assist in corpus tagging, we need to evaluate our trained models. In order to better evaluate the model, we manually annotated all the collected samples to compare the accuracy of manual annotation and automatic annotation of the model. In addition, we are collected the difficult samples related to business English, that is, the noisy samples, to evaluate the proposed model. All evaluations take WER (Word Error Rate) as the evaluation index and the results manually marked as the benchmark. The calculation formula of WER can be expressed as

W E R = 100 \cdot \frac{S + D + I}{N} %

(6)

where S, D, and I represent the replaced word, missing word and wrong word, respectively. The evaluation results of the model are shown in Table 2.

Table 2.

The WER of proposed corpus construction.

Original dataset	12.6%
Noise speech datasets	16.0%

This experimental result shows that this deep learning speech annotation assistance method has sufficient accuracy, can assist the annotation of a large number of speech samples, and greatly saves labor costs.

The Attention-LSTM model comprises two layers with 256 hidden units each. We apply a dropout rate of 0.3 to the output of each LSTM layer. The model is trained using the Adam optimizer with an initial learning rate of 1e-3, decayed by 0.9 every 10 epochs. Batch size was set to 32, and training continued for 50 epochs. The attention mechanism uses a dot-product scheme with trainable context vectors. All experiments were implemented in PyTorch 1.12 and executed on an NVIDIA Tesla V100 GPU.

Comparison of attention LSTM method and other existing methods

In order to compare the advantages of the proposed attention LSTM method, we selected some existing methods for natural language processing for comparison, including RNN model, original LSTM model, and CNN-HMM acoustic model.¹⁸ The RNN model is a great breakthrough algorithm in the field of natural language processing, representing a class of neural networks used to process sequence data. The comparison of original LSTM can reflect our innovation and performance improvement. The CNN-HMM is a high-performance speech recognition method in recent years, which can also represent a class of speech recognition methods. The comparison results are shown in Table 3.

Table 3.

Comparison of the proposed method with existing methods.

Original dataset	Original dataset	Noise dataset
RNN	32.4%	45.6%
Original LSTM	20.3%	30.2%
CNN-HMM	15.8%	19.6%
Proposed method	12.6%	16.0%

The experimental results show that our attention mechanism module makes the model reduce the WER of 7.7% and 13.8% on two different datasets compared with the original LSTM model. The reason why the error rate in the noisy data set is reduced more is that the attention mechanism has a certain effect on the resource tilt of difficult samples, making the original noisy audio part obtain more computing resources, thus improving the effect of speech recognition under noise. In addition, compared with the CNN-HMM speech recognition method, the error rate of attention LSTM has also decreased significantly, by 3.2% and 3.6%, respectively, in the two data sets. The combination of LSTM and attention mechanism has greatly improved the model, and fully meets the requirements of automatic corpus tagging, which can facilitate the construction of corpus.

The evaluation was conducted on a held-out test subset derived from manually transcribed TED Talks and interviews related to sustainable development. We randomly selected 100 annotated audio segments (50 read-style and 50 conversational-style) for comparison against automatic annotations generated by three models: RNN, base LSTM, and our proposed Attention-LSTM. The results are shown in Table 4.

Table 4.

Evaluation metrics for semantic fidelity and term annotation.

Model	BLEU ↑	Precision ↑	Recall ↑	F1-score ↑
RNN	21.3	0.672	0.693	0.682
LSTM	25.7	0.764	0.771	0.767
Attention-LSTM	29.4	0.883	0.862	0.872

As shown in Table 4, our proposed Attention-LSTM framework outperformed both baseline models across all four evaluation metrics. The BLEU score reached 29.4, indicating stronger preservation of semantic content and phrasal consistency in the automatically transcribed outputs. This is particularly significant in an interpretation context, where preserving meaning is as critical as recognizing words.

In terms of terminology annotation, the proposed model achieved an F1-score of 0.872, compared to 0.767 for base LSTM and 0.682 for RNN. The improvement stems from the attention mechanism’s ability to prioritize domain-relevant terms, which often carry nuanced and low-frequency patterns. Notably, precision was slightly higher than recall, suggesting that the model is more conservative—likely favoring correctness over over-identification.

Result display

In order to prove the effectiveness of our proposed corpus assisted tagging method, we selected some labeled samples to display, which are from the part related to sustainable development in the TED classic speeches. The original audio is represented by an acoustic chart. We give the original English letters and the results of the auxiliary annotation method behind the acoustic chart to compare the effectiveness of the display method. The results are shown in Figure 5.

Figure 5.

Display the results of automatic annotation methods.

The results demonstrate that the automatic annotation assistant method proposed in this study exhibits high accuracy in recognizing sentence grammar within interpretation tasks related to sustainable development. Furthermore, the method effectively identifies domain-specific terminology commonly used in business English, such as “sustainable,” “energy efficient,” and “Celsius.” These findings confirm the method’s effectiveness and practical applicability in the construction of a business English interpretation corpus.

Conclusion

In the context of sustainable development, the construction of business English environmental corpus is of great significance for the industrial progress and social application of business English. In view of the difficulties of corpus collection and tagging in the construction of corpus, we give a complete method of corpus construction. This includes the use of crawler technology to extract audio and video files with keywords such as sustainable development, environmental protection, low-carbon life, etc. from the Internet, and process the feature forms that can be processed by deep learning. Attention LSTM is proposed to assist the tagging of corpus. Compared with some existing methods, our method has fewer errors in corpus auxiliary marking, which means it has a greater auxiliary role in corpus construction. The experiment proves that our method can save a lot of human resources in the process of corpus construction, and has certain contributions to the corpus construction and business English industry.

While our current framework primarily utilizes audio data and its spectrogram representations for corpus annotation, we recognize that interpretation is inherently a multimodal activity. Video-based corpora, enriched with facial expressions, gestures, and synchronized transcripts, offer crucial non-verbal cues that can enhance semantic disambiguation and pragmatic understanding. In future iterations of our framework, we plan to integrate visual features (e.g., lip movement, speaker posture) and aligned textual transcripts to construct a richer multimodal corpus. This extension aligns with recent trends in interpretation studies and is expected to improve both annotation accuracy and interpretive fidelity.

Footnotes

ORCID iD

Shanshan Xu

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Datasets can be provided upon request.*

References

Naqvi

Afzaal

Qiang

. Language, corpora, and technology in applied linguistics. Front Psychol 2023; 14: 1325925.

Götz

Granger

. Learner corpus research for pedagogical purposes: an overview and some research perspectives. Int J Learn Corpus Res 2024; 10(1): 1–38.

Dirdal

Hasund

Drange

EMD

, et al. Design and construction of the tracking written learner language (TRAWL) corpus: a longitudinal and multilingual young learner corpus. Nordic Journal of Language Teaching and Learning 2022; 10(2): 115–135.

Tang

Lin

. The development of corpus-based language pedagogy for TESOL teachers: a two-step training approach facilitated by online collaboration. Comput Assist Lang Learn 2022; 35(9): 2731–2760.

Berners-Lee

Cailliau

Luotonen

, et al. Linking the world’s information: essays on tim berners-lee’s invention of the world-wide web. Association for Computing Machinery: New York, NY, United States, 2023, pp. 51–65.

Fayyad

Irani

. On the handling of continuous-valued attributes in decision tree generation. Mach Learn 1992; 8: 87–102.

Fan

Cao

, et al. An operating performance assessment strategy with multiple modes based on least squares support vector machines for drilling process. Comput Ind Eng 2021; 159: 107492.

Islam Adiba

Islam

Mahmud

, et al. Effect of corpora on classification of fake news using naive bayes classifier. IJAIML 2020; 1: 80–92.

Zhang

Alzahrani

Lee

. Fake news detection in large-scale social network with generalized Bayesian classification. Mobile Network Appl 2024; 1–17.

10.

Sun

Fujita

Zheng

, et al. Multi-class financial distress prediction based on support vector machines integrated with the decomposition and fusion methods. Inf Sci 2021; 559: 153–170.

11.

Riyadi

Andriyani

Sulaiman

. Improving hate speech detection using double-layers hybrid CNN-RNN model on imbalanced dataset. IEEE Access 2024; 12: 159660–159668.

12.

Hasib

Azam

Karim

, et al. Mcnn-lstm: combining cnn and lstm to classify multi-class text in imbalanced news data. IEEE Access 2023; 11: 93048–93063.

13.

Wang

. User behavior data mining and analysis of E-Commerce platforms: a study based on big data. In: 2024 International Conference on Intelligent Algorithms for Computational Intelligence Systems (IACIS). IEEE, 2024, pp. 1–6.

14.

Ramdani

. Web scraping and data Mining. In: Data Science: Foundations and Hands-On Experience: Handling Economic, Spatial, and Multidimensional Data With R. Springer Nature Singapore, 2025, pp. 321–357.

15.

Sengupta

Das

Bardhan

, et al. Smart E-Learning, smarter SEO: the winning formula. SN Computer Science 2025; 6(6): 606.

16.

Zhai

, et al. Entity relationship extraction method based on multi-head attention and graph convolutional network. In: 2023 5th International Conference on Natural Language Processing (ICNLP). IEEE, 2023, pp. 293–297.

17.

Vijayaditya

Wang

Povey

, et al. Low latency acoustic modeling using temporal convolution and LSTMs. IEEE Signal Process Lett 2017; 25: 373–377.

18.

Mei

Liu

, et al. Hyperspectral image classification using attention-based bidirectional long short-term memory network. IEEE Trans Geosci Rem Sens 2021; 60: 1–12.

19.

Yerramreddy

Marasani

Gowtham

PSV

, et al. Speech recognition paradigms: a comparative evaluation of Speech brain, whisper and Wav2Vec2 Models. In: 2024 IEEE 9th International Conference for Convergence in Technology (I2ct). IEEE, 2024, pp. 1–6.

20.

Hussein

Karan

Kurnaz

, et al. Speech recognition of high impact model using deep learning technique: a review. In: 2025 7th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (ICHORA). IEEE, 2025, pp. 1–10.

21.

Yan

Yang

, et al. Corpus machine translation system based on compaction algorithm and self-attention mechanism model. Neural Comput Appl 2025; 1–16.

22.

Bahdanau

Chorowski

Serdyuk

, et al. End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 4945–4949.

23.

Soubraylu

Rajalakshmi

. Self-attention based sentiment analysis with effective embedding techniques. Int J Comput Appl Technol 2021; 65: 65–77.

24.

Jia

. Music emotion classification method based on deep learning and improved attention mechanism. Comput Intell Neurosci 2022; 2022(1): 5181899.

25.

Malyuga

. A corpus-based approach to corporate communication research. Russian Journal of Linguistics 2023; 27(1): 152–172.