Construction of English corpus oral instant translation model based on internet of things and deep learning of information security

Abstract

In order to improve the security and performance of the oral English instant translation model, this paper optimizes the instant translation model through the Internet of Things (IoT) security technology and deep learning technology. In this paper, the real-time translation model based on deep learning and IoT technology is analyzed in detail to show the application of these two technologies in the real-time translation model, and the related information security issues are discussed. Meanwhile, this paper proposes a method combining deep learning network and IoT technology to further improve the security of instant translation model. The experimental results show that under the optimized model, the parameter upload time is 60 seconds, the aggregation calculation time is 6.5 seconds, and the authentication time is 7.5 seconds. Moreover, the average recognition accuracy of the optimized model reaches 93.1%, and it is superior to the traditional machine translation method in accuracy and real-time, which has wide practical value and application prospects. Therefore, the research has certain reference significance for improving the security of the English corpus oral instant translation model.

Keywords

Information security Internet of Things technology deep learning oral instant translation model English corpus

1. Introduction

1.1 Research background and motivations

In recent years, the rapid development of Internet of Things (IoT) technology and the wide application of deep learning in the field of artificial intelligence (AI) have brought great changes and convenience to people’s daily life. Meanwhile, with the trend of globalization, the communication among people has become more frequent and diversified. The development of IoT technology makes the connection and communication among devices more intelligent and automatic. This provides a new possibility for the collection and processing of real-time data [17, 18]. Deep learning, especially in the field of natural language processing, has shown its powerful ability in understanding and generating human language [19]. Combining the IoT with deep learning can realize a more intelligent and adaptive real-time translation system. With the acceleration of globalization, the demand for cross-language communication is increasing, especially for instant oral communication in multilingual environment [23, 26]. Although the existing translation technology meets these needs to some extent, there is still room for improvement in real-time, accuracy and naturalness. The research motivation of this paper is to explore how to build a more efficient and accurate real-time oral translation model by combining the IoT and deep learning technology, and improve the natural communication ability of the translation system to make it more suitable for the needs of daily life and professional fields.

In a word, it is of great significance and broad application prospect to build an instant translation model of English corpus based on the IoT and deep learning. This paper explores how to use the latest deep learning technology, large-scale English corpus and IoT technology to build an efficient, accurate and safe oral translation model, which provides convenience for people’s daily life and work. This paper first explains the research background, research motivation and research objectives, and then summarizes and analyzes the previous studies. Thirdly, it expounds the concept and application of instant translation model under deep learning and IoT information security, and optimizes the security of translation model. Finally, this paper verifies the effectiveness and feasibility of this research through experiments, summarizes and analyzes the research, and puts forward the shortcomings of this paper and the direction of future optimization.

1.2 Research objectives

This paper’s main objective is to use deep learning and IoT improvements to enhance security measures for real-time translation models for English corpora in the context of information security. This entails the implementation of spoken data encryption and decryption in real-time, with a focus on protecting user privacy and data security. In order to improve the overall intelligence of the model, this paper explores the use of deep learning technology in the field of oral translation models. It also adds new ideas and approaches that are pertinent to how voice interactions in the IoT space will develop in the future.

2. Literature review

The application of deep learning technology in information security mainly focuses on pattern recognition and anomaly detection. And the deep learning model can effectively identify malware, phishing attacks and other network threats, which is very important for protecting IoT devices and networks. Instant language translation is an important application in the field of AI. Deep learning techniques, especially recurrent neural network (RNN) and long short-term memory (LSTM) network, have been used to develop efficient oral translation models. These models can understand and translate complex language patterns and provide support for cross-language communication. The establishment of English corpus is very important for training deep learning model. By collecting a large amount of spoken English data, the model can be trained to understand natural language better and translate it accurately. These corpora usually include samples of different accents, dialects and language styles. In an examination of the field of instant oral English translation models, Zheng et al. [38] conducted an analysis of Gaussian Mixture Model (GMM) and Dynamic Neural Network (DNN) structures and training techniques, with a particular focus on acoustic modeling. Their experimental findings demonstrated the superior performance of the DNN model, resulting in a reduction of phoneme error rates and word error rates by 5.66% and 3.48%, respectively [38]. Swar and Mohsen [30], on the other hand, focused on the modeling of oral fluency, the detection and removal of fluency characteristics, and the final product of written transcriptions of spoken text. The paper highlighted distinctions among various fluency detection types and the significant impact of fluency detection tasks on machine translation. Their experimental results underscored that oral fluency detection can substantively enhance machine translation quality [30]. Poulis [26] introduced the Convolutive Neural Network $+$ Connectist Temporary Classification (CNN+CTC) algorithm, leading to the development of an end-to-end acoustic model capable of effectively identifying speech data as Chinese phonetic sequences. On the test set, this method produced word error rates and sentence error rates of 9.20% and 24.02%, respectively. Comparative analysis demonstrated a marked improvement in recognition efficacy compared to two commonly used acoustic models. Simultaneously, the paper introduced a language model algorithm grounded in Transformer encoders, implemented through TensorFlow. Using the same dataset, their language model method fared better than the two most popular language models in all domains [26]. In their study, Natarajan et al. [24] highlighted the increasing security risks that come with the IoT’s quick development. These risks include network attacks, device manipulation, and data breaches. When building instant oral English translation models, these problems raise concerns about the security and dependability of the IoT and demand that data transfer confidentiality, integrity, and overall IoT information security be carefully considered [24]. After noting the significant amount of English corpus data produced by IoT devices, Gallo-Molina et al. [12] suggested using it to train and improve spoken translation models. In order to build an effective instant oral English translation model, their research highlighted the importance of evaluating and processing English corpus data within the IoT [12]. They also extracted useful features and structures from the corpus.

All of the previous research has shown the several difficulties that come with creating spoken English translation models in real time for the IoT [13]. These challenges encompass the adaptability of traditional translation methods to the dynamically changing IoT landscape and deficiencies within conventional oral translation systems related to language expression diversity, environmental noise, and real-time constraints [3]. These limitations have constrained the applicability and advancement of oral translation systems within the IoT. Furthermore, prior studies have demonstrated a relatively limited focus on constructing oral translation models leveraging extensive English corpus data within the IoT environment. On the other hand, this paper presents a solution based on deep learning technology and highlights how crucial it is to develop instantaneous English corpus translation models within the IoT framework, with a foundational focus on information security. It examines the accompanying information security issues as it investigates real-time translation models based on deep learning and IoT technologies. Furthermore, the paper elaborates on how these two technologies – deep learning networks and IoT technology – are integrated to strengthen the security constraints of instant translation models.

3. Research model

3.1 Real-time translation model under deep learning and IoT information security

The realm of IoT information security constitutes a critical assurance for the effective deployment of IoT technology. As IoT technology advances, an ever-expanding multitude of devices becomes interconnected within an extensive network infrastructure. These devices require accurate and efficient communication while maintaining data integrity [34]. Because there are more devices in the IoT environment, inter-device communication has become a more delicate and involved process, which raises the importance of information security issues [5]. The IoT is playing an increasingly important role in improving our way of life and work, but meanwhile, information security issues are becoming increasingly prominent. These problems mainly include data leakage, equipment being hacked and malware attacks. In terms of data leakage, IoT devices often collect personal information including geographical location and credit card number. Once these data are leaked, users may be seriously affected. The general solution is to strengthen data encryption and anonymous processing to ensure the security of data during transmission and storage. Meanwhile, the principle of minimum data collection is adopted, and only the information necessary to realize the function is collected. In terms of device intrusion, intruders may control the system through IoT devices and conduct illegal operations, such as stealing data or destroying the system. Normally, as long as the end-to-end security authentication is implemented, the access control of equipment is strengthened, and the firmware and software are updated regularly, the exploitation of known vulnerabilities can be prevented. In terms of malware attacks, malware may destroy IoT devices or steal user sensitive information [37]. This requires the deployment of advanced anti-virus software and intrusion detection system to monitor and respond to potential malware threats in real time.

Among the most prominent problems related to IoT information security include malware attacks, device invasions, and data breaches. Of these, data breaches are the most common worry, mostly because IoT devices have a tendency to collect users’ personal data, such as credit card numbers and location information. If this private information is compromised, users may suffer serious consequences [2]. Device infiltration is a serious problem as well since malicious hackers might use gadgets to carry out illegal actions like data theft and system sabotage. Malware attacks also appear as a persistent problem at the same time. Malware is used by attackers to breach devices and steal confidential data from consumers In order to fortify IoT information security, a series of comprehensive measures must be implemented [37].

Furthermore, the incorporation of deep learning into the domain of speech translation represents a pivotal sphere of research. With the growing ubiquity of IoT devices, an escalating demand for voice translation capabilities has arisen [29]. An essential tool for achieving this goal is real-time speech translation, which promotes improved accuracy and efficiency in inter-device communication. Prominent for its abilities in natural language processing, deep learning presents a strong path for the creation of speech translation models [7]. But because data processing involves sensitive information, it is still crucial to protect these systems from security risks. Machine translation based on statistics and neural machine translation are the two main applications of deep learning in speech translation [21]. However, deep learning technology integration necessitates careful consideration of security issues. IoT devices often gather a lot of private information, including credit card details and user locations, which could be accidentally revealed and result in serious privacy violations as well as financial losses [35].

In conclusion, a crucial area for the advancement of IoT devices is the incorporation of deep learning into the voice translation space. However, while utilizing deep learning technology, stakeholders must carefully manage the information security of IoT devices. The optimal achievement of efficient and exact voice translation and the general proliferation of IoT equipment can only be realized with the simultaneous fortification of device security [33]. Therefore, in order to guarantee the strong information security posture of IoT devices, future research endeavors need to place an even greater emphasis on the intersection of deep learning and information security technology.

3.2 Application of IoT technology and deep learning in the instant translation model

The data expansion method, in which every original data sample converts a row vector, is included in the data deformation approach. Subsequently, a deformation matrix is formulated for the purpose of data distortion. Initially, a stochastic reversible matrix is generated, followed by its diagonal expansion in accordance with the equation illustrated in Eq. (1):

$\displaystyle M_{x,y}=M^{\prime}_{x-Nq,y-Nq}$ (1)

$x, y$ are matrix coordinates. $M$ is the matrix. $M^{\prime}$ is reversible matrix. $q$ is deformation factor. $N$ is the calculation parameter, and the diagonal expansion operation is shown in Fig. 1.

Figure 1.

Diagonal expansion operation diagram.

Lastly, the transformed data is derived through the multiplication of the data line vector by the deformation matrix. Differential privacy represents a compromise between data privacy and data accessibility, achieved by perturbing data. This approach is distinguished by its high computational efficiency and minimal impact on data availability [28].

$\displaystyle\textit{Pr}\left[M\left(x\right)\in S\right]\leqslant\textit{exp}% \left(\epsilon\right)\textit{Pr}\left[M\left(x\right)\in S\right]+\delta$ (2)

In Eq. (2), $M$ stands as a stochastic function, $\epsilon$ and $\delta$ denote the coordinates of the function, $x$ represents the input data, exp signifies the activation function, $S$ encompasses all coordinates that satisfy the function, and Pr denotes the differential operation. Within the realm of information security research, notable attention has gravitated toward the homomorphic re-encryption scheme. The Homomorphic Re-encryption Scheme (HRES) constitutes an asymmetric encryption algorithm proficient in effecting imperceptible re-encryption through a proxy while endorsing privacy-preserving data processing and access control [10]. Furthermore, the summation scheme predicated on HRES embodies homomorphic re-encryption principles characterized by additive homomorphism attributes [6]. This scheme comprises five functions, as delineated in Eqs (3)–(7):

$\displaystyle\textit{PK}=pk_{\textit{DSP}}^{sk_{\textit{ACS}}}=pk_{\textit{ACS% }}^{sk_{\textit{DSP}}}=g^{ab}\textit{modn}^{2}$ (3) $\displaystyle T_{j}=\left({1+m_{j}\cdot n}\right)\cdot\textit{PK}^{T}\textit{% rmodn}^{2}$ (4) $\displaystyle h_{1}=H\left(({pk_{i})^{sk_{\textit{DSP}}}\parallel\textit{CID}}\right)$ (5) $\displaystyle h_{2}=H\left(({pk_{i})^{sk_{\textit{ACS}}}\parallel\textit{CID}}\right)$ (6) $\displaystyle m_{j}=L\left({\bar{T}_{J}\cdot pk_{\textit{ACS}}^{h^{\prime}_{1}% }\cdot g^{h^{\prime}_{2}}/\bar{T}^{\prime}_{j}\textit{modn}^{2}}\right)$ (7)

PK is the key generation function. $sk_{\textit{ACS}}$ and $k_{\textit{DSP}}$ are key pairs. ACS and DSP are protocols. CID is the addition algorithm. $k$ is the security parameter. $g$ , $p,$ and $n$ are the parameters of the key function. $a b$ is the input data, modn is the remainder. $m_{j}$ is the encryption function. $T_{J}$ and $T^{\prime}_{J}$ are the coordinates of the encryption function. $H$ is the execution function, and $r$ is the data provision. $h_{1}$ and $h_{2}$ are ciphertext. $k_{i}$ is the parameter in the execution function. $h^{\prime}_{1}$ and $h^{\prime}_{2}$ are the decrypted information. $L$ is the calculation function, and $\bar{T}_{j}$ is the average of the coordinates of the encryption function. In the neural algorithm of deep learning, there will be a problem of gradient descent, and this small gradient may also reveal the original training data of the data provider [31].

$\displaystyle y_{\textit{predict}}=f\left({\sum_{i=1}^{n}W_{i}\cdot x_{i}+b}\right)$ (8)

$y_{\textit{predict}}$ is the prediction label, $f$ is the activation function, $b$ is the bias value, $x_{i}$ is the data item, and $W_{i}$ is the matrix. When the models are parallel, the real label will be observed and guessed by the naked eye. The parallel schematic diagram of the model is shown in Fig. 2.

Figure 2.

Model parallel schematic diagram.

During the training process of deep learning models, users are faced with the intricate task of not only supplying effective data samples for training but also taking into consideration model architecture, initial model weights, and parameter configurations [22]. This can be especially difficult for non-expert users because some people want to use deep learning models only for their predictive power, without any need for customized involvement that involves model training and prediction [1]. Unlike the complex process of building a deep learning model, deep learning model prediction only requires users to provide the data they want to predict [8].

3.3 Information security and security enhancement for translation models within deep learning

The system for data collection comprises three fundamental components: data owners, IoT devices, and data consumers [36]. The structural framework of this system is elucidated in Fig. 3.

Figure 3.

System structure of data collection scheme.

The entity referred to as the “data owner,” or alternately the “data provider,” is an individual equipped with data collection apparatus. Ownership of the collected data belongs to the data owner, who acts as the seller in the context of the data transaction [18]. Through an incentivized mechanism, the data owner can also access deep learning model prediction services by furnishing points acquired through data exchanges, thus obviating the need for the full model download and merely retrieving the corresponding prediction results [14]. IoT devices, which are also referred to as data collectors, primarily consist of IoT devices that are used for data collection by data collectors, such as wearables, sensors, and smartphones [32]. Once the data owner has given permission, the information gathered by IoT devices is sent to the cloud server of the data consumer, which is the organization that purchased the information. Moreover, data collectors are accountable for establishing and maintaining blockchain infrastructure [11]. Data consumers, who serve as purchasers in data transactions, are similarly tasked with provisioning deep learning model prediction services to data owners. Subsequent to receiving the data uploaded by IoT devices, data consumers may elect to either store the data in the cloud server for subsequent offline analysis and utilization or engage in immediate online analysis and utilization [4]. The optimized data transaction framework delineated in this paper is illustrated in Fig. 4.

Figure 4.

Blockchainbased data trading solution.

The data transaction scheme based on blockchain in this paper can meet the following requirements, as shown in Table 1.

Table 1

Requirements met by blockchainbased data trading solutions

Ask	Concrete content
Motivating effect	The incentive method can be a point-based system, which accumulates corresponding scores based on the value of data, and certain points can be exchanged for rewards such as deep learning model prediction services.
Data privacy	When privacy breaches occur, it is possible to trace the source of data consumers who have leaked their privacy, thereby playing a role in accountability and deterrence.
Verifiable data availability	This property prevents malicious data owners from providing malicious data samples On the other hand, the data owner who is cheating is unable to give the data whenever they want it to be optimized.
Fairness	Fair for both data owners and data consumers.

In the prospective context of an intelligent “battlefield” environment, the comprehension of the “battlefield” scenario hinges upon the comprehensive analysis and processing of voluminous data. Simultaneously, the actualization of combat functionalities is contingent upon the meticulous analysis, processing, and strategic utilization of data resources. The trajectory of the collaborative combat paradigm’s evolution is substantially intertwined with the proficiency exhibited in data analysis and processing capabilities [27]. Figure 5 shows the results obtained from the federated learning system model that is suggested in this paper to protect the confidentiality of the data of the participants.

Figure 5.

A federated learning system model for protecting participant data security.

The refined system model delineated in this paper underscores the dual role undertaken by learning participants within the system. Concisely, these participants function as both data providers and data requesters. Learners essentially provide the cloud server with newly calculated gradients, which makes it easier for the cloud server to update the weight vector. Meanwhile, they request the most recent weight vector from the cloud server in order to coordinate the gradient generation that follows. In consideration of the dispersed nature of learning participants in this context, the training dataset is also distributed. The operational sequence of the model is meticulously illustrated in Fig. 6.

Figure 6.

Steps for running the model.

Combined with the model of the federated learning system and the data transaction mechanism that is intended to protect the participants’ data security, it is clear that the model involves three different roles interacting dynamically. These roles include one user who provides input data and consumes services, three workers who act as the computation’s execution agents, and one server responsible for delivering the model [16]. Users confidentially share their personal input data exclusively with the trio of workers, while the server likewise shares its pre-training model parameters confidentially with the same trio of workers. Subsequently, the workers collaboratively engage in secure tripartite computations to derive the predictive outcomes of the model. Ultimately, each worker transmits the prediction results to the user, thereby reinstating the output outcome. Figure 7 illustrates this security model’s conclusion graphically.

By utilizing secure tripartite computation, a collaborative endeavor is undertaken by three workers who collectively utilize confidentially shared input data and models to conduct calculations, ultimately generating prediction outcomes from the deep learning model. Subsequently, each worker individually conveys the results they have computed to the user. Through this process, users can readily recover the corresponding model prediction outputs. Compared with the previous model, this model emphasizes information security in its design, and integrates the IoT technology compared with the existing model, especially in the process of data collection and transmission. Moreover, this model uses more advanced deep learning algorithm, which makes the model more accurate and natural in dealing with oral translation.

4. Experimental design and performance evaluation

4.1 Experimental materials

In order to substantiate the security robustness of this model, the present study employs the Technology Entertainment Design (TED) Talks dataset. This dataset encompasses an extensive collection of both voice and text data, encompassing thousands of TED speeches. These speeches span a diverse array of topics, including but not limited to science and technology, design, society, and culture. This dataset finds utility in a spectrum of tasks, such as speech recognition, speaker recognition, and emotion analysis. Access to this dataset can be procured via the TED official website (https://www.ted.com/participant/organize-a-local-tedx-event/tedx-licensing/tedx-talks-subtitles-and-transcripts), which hosts video, audio, and textual resources for each speech. A detailed breakdown of the TED Talk dataset is presented in Table 2.

Table 2
Specific content of TED Talk data

Project	Concrete content
Speaker	Ken Robinson
Speech topic	Do schools kill creativity?
Introduction to speech	Sir Ken Robinson makes an entertaining and profoundly moving case for creating an education system that nurtures (rather than undermines) creativity.
Audio link	https://download.ted.com/talks/KenRobinson_2006.mp3?apikey=XXX
Translate text link	https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity/translations?language=zh-cn

Figure 7.

Instant translation security model.

For the accuracy of the experiment, this paper also prepares the People’s Speech dataset, which contains more than 30,000 hours of diverse spoken English data, aiming at promoting the innovation of machine learning. It is the largest diversified English speech recognition corpus for academic and commercial purposes. This dataset is suitable for training a comprehensive end-to-end model and improving applications such as automatic speech recognition, transcription, sound separation and noise reduction. The dataset is constructed from the audio data licensed by the public domain in Archive.org, with the existing transcribed text. The dataset can be downloaded from the official website (https://mlcommons.org/2021/12/introducing-the-peoples-speech-dataset-30000-hours-of-diverse-speech-data-to-drive-ml-innovation/). In this paper, the data in two datasets are combined to construct a new sample set for experiment.

4.2 Experimental environment and parameters setting

The experimental environment of this experiment is shown in Table 3.

Table 3
Experimental environment

Facilities	Model
Central Processing Unit (CPU)	2.5 G
Operating system	Windows7
Web	Apache-tomecat6
Memory	12 G

Figure 8.

Model efficiency comparison results.

In order to maximize the utility of the data, standardization of the model parameters is undertaken. The model configuration entails two convolution layers, each comprising 64 filters with a filter size of 3*3. The node size is established at 128, while both the encoder and decoder are endowed with a hidden state size of 512. Within the Transformer layer, there are eight attention heads, and the data dimension is set to 1,000. A learning rate of 0.0001 is employed throughout the training phase, which spans a duration of 40 iterations. A weight decay factor of 0.001 is also incorporated into the model. The traditional model chosen in the experiment is Sequence-to-Sequence (Seq2Seq), which is a deep learning model framework and has been widely used in the field of machine translation. The core of Seq2Seq model consists of two parts: encoder and decoder. The function of the encoder is to understand and process the input source language sequence and convert it into a fixed-length context vector. This context vector contains the main information of the input sequence and is passed to the decoder. The decoder is responsible for converting this context vector into the target language sequence. In this process, the decoder generates the output of the target language step by step, one word at a time.

4.3 Performance evaluation

In the experiment, the computational efficiency of the traditional model and the optimized model in this paper is studied and compared. The time cost of three key processes in the running process of the model, parameter uploading, aggregation calculation and authentication, is analyzed. In order to accurately evaluate the performance of each stage, the experiment made detailed statistics on the results of 50 iterative trainings and calculated the average time consumption of each process. This statistical method can observe and compare the performance of each algorithm more accurately. The comparison result between the optimized model and the traditional model Seq2Seq is shown in Fig. 8.

The comparative outcomes of the experiments, as depicted in Fig. 8, demonstrate that the optimized model presented in this paper exhibits superior performance across various metrics, encompassing parameter upload time, aggregation computation time, and authentication time. Specifically, the parameter upload time for the optimized model in this paper stands at 60 seconds, while the traditional model consumes 73 seconds for the same task. In the case of aggregation computation, the optimized model achieves a time of 6.5 seconds compared to the traditional model’s 7.3 seconds. Similarly, the authentication process takes 7.5 seconds in the optimized model, in contrast to the 8.6 seconds required by the traditional model. All of these data together highlight the observable improvements that the optimization approach described in this paper achieves on all assessed aspects. In order to further validate the efficiency of the optimized model, experiments are conducted with different iteration numbers, and the experimental results are shown in Fig. 9.

Figure 9.

Comparison of model runtime under different iterations.

In Fig. 9, as the number of iterations increases, the model’s runtime gradually increases. However, it is evident that the runtime of the optimized model increases at a significantly slower pace compared to that of the traditional model. Moreover, as the number of iterations increases, the gap in runtime becomes more pronounced. In order to verify the security of the model, the experiment assumes that there are 30% malicious node attacks, and the model classifies the nodes. The classification structure of the optimized secure model compared to the traditional Seq2Seq model, is illustrated in Fig. 10.

Figure 10.

Classification accuracy comparison results.

In Fig. 10, this paper achieves an average recognition accuracy of 93.1% with the optimized model. Compared to traditional Seq2Seq models, the optimized English spoken language corpus real-time translation secure model demonstrates significantly higher recognition accuracy under various aggregation scenarios. This noteworthy improvement signifies that the optimized model can more accurately identify speech signals and convert them into corresponding textual outputs.

In complex environments with malicious nodes and different aggregation frequencies, the optimized model maintains a consistently high recognition accuracy level, whereas traditional models’ performance tends to decline. This result further validates the superiority of the optimized model in terms of recognition accuracy and security. These experimental results confirm the significant achievements of the optimized model in handling real-time translation tasks for English spoken language corpus. The attainment of high recognition accuracy implies that the model can provide users with more precise and reliable translation services. Additionally, the stability of the optimized model in the presence of malicious nodes and varying aggregation frequencies highlights its robustness and security in handling complex environments.

In summary, this paper has made significant advancements in English spoken language corpus real-time translation tasks through model optimization, particularly in terms of recognition accuracy and security. These findings hold crucial significance for enhancing spoken language translation systems’ performance and user experience, offering valuable insights for researchers and practitioners in related fields.

4.4 Discussion

An analysis of the execution time across various task stages reveals that the training component remains the most time-intensive segment during algorithm execution. Asynchronous aggregation is used to mitigate any delays associated with model training, although it is important to note that the training step usually takes place offline. Furthermore, the delay observed in the model aggregation process and authentication phase is minimal, indicating that the overall and partial processing time costs remain within reasonable bounds. Nevertheless, it is essential to acknowledge that there exists ample room for improvement, particularly when considering scenarios involving large-scale nodes. In such cases, the time incurred for parameter uploading, aggregation computation, and authentication within the model is expected to be higher, necessitating further optimization of the algorithm. It is imperative to recognize that the experimental results are contingent on specific datasets and tasks. Consequently, in practical applications, the model’s recognition accuracy may vary depending on the specific dataset and task at hand. Nonetheless, comparative analysis of the experimental outcomes affirmatively underscores the superior security and accuracy exhibited by the optimized model presented in this paper within the domain of instant translation tasks for spoken English corpora. This achievement holds profound practical significance. The advantage of this optimized model is that it improves the accuracy and real-time performance of oral translation by integrating IoT technology and deep learning. Especially in the aspect of information security, the model ensures the translation efficiency and effectively protects the security and privacy of user data through advanced data protection mechanism. The disadvantage is that although the model has improved information security and translation accuracy, it may still have limitations in dealing with complex and changeable actual environment and accent diversity. This may affect the general applicability and stability of the model in practical application. The limitation of the optimization model in this paper is that the research and experiments in this paper are based on specific datasets and task environments, so the performance and applicability of the model may change due to the differences between datasets and actual application scenarios.

Compared with previous studies and Lumei [20] work, this model integrates two key technical fields, namely the IoT and deep learning, and deeply explores the construction of oral instant translation model in the IoT environment under the framework of information security. This comprehensive research has important academic and practical significance for solving language communication barriers in the IoT [20]. In addition, compared with the research of Joshi et al. [15], the optimization model proposed in this paper can provide more accurate and natural translation results by using large-scale data sets and enhanced model learning ability [15]. This comparison not only highlights the innovation of this paper, but also shows its advanced nature in the application of language processing technology.

5. Conclusion

5.1 Research contribution

The main achievement of this paper is to build a real-time oral English translation model that integrates IoT technology and deep learning, and pays attention to information security. The model adopts advanced deep learning algorithm and IoT technology, which effectively realizes real-time safe translation of spoken English. This paper first analyzes the real-time translation model under deep learning and IoT information security, and then discusses the application of IoT technology and deep learning in translation model in detail, and improves the security of the model through these technologies. The experimental results show that the optimized model takes 60 seconds, 6.5 seconds and 7.5 seconds respectively in parameter uploading, aggregation calculation and identity verification. The average recognition accuracy reaches 93.1%, which surpasses the traditional machine translation method in accuracy and real-time. Therefore, this paper not only brings innovative ideas and methods to the field of oral English translation, but also provides new possibilities for the application of IoT and deep learning technology in other fields. These achievements are of academic significance and important value in practice, which provides reference for research and application in related fields and lay a foundation for further research and development in the future.

5.2 Future works and research limitations

Based on the results of this paper, future research can focus on further improving the accuracy of oral translation, especially by optimizing the algorithm and expanding the corpus to reduce the misidentification rate and translation error rate. Meanwhile, it can focus on context awareness and situational adaptability, and develop translation models that can be intelligently adjusted according to different contexts and cultural backgrounds. In addition, oral translation technology will be extended to more professional fields, such as medical care and law, to improve the efficiency of cross-disciplinary communication. In this process, it is very important to strengthen data security and privacy protection to ensure the safety of user information when technology is applied. Meanwhile, the ethics and responsibility of artificial intelligence should be considered to ensure the fairness, reliability and social responsibility of technology.

There are some restrictions on this paper. First, the need for a somewhat quiet location limits the range of voice input, which reduces the efficacy of oral translation when background noise is present. Second, there is a need to improve the model’s performance. Future research projects ought to consider implementing distributed computing technologies, such MapReduce or Spark to accelerate model training and maximize the effectiveness of inter-node communication. Moreover, the study can broaden its scope to investigate the application of more effective algorithms and computational frameworks, including Federated Learning and TensorFlow to enhance the algorithm’s scalability and performance.

References

Akcay

Breckon

. Towards automatic threat detection: A survey of advances of deep learning within X-ray security imaging. Pattern Recognition. 2022; 122(32): 108245.

Redwan

Kabir

Shin

. A Neural Attention-Based Encoder-Decoder Approach for English to Bangla Translation. Computer Science. 2023; 31(1): 91.

Al-Abassi

Karimipour

Dehghantanha

Parizi

. An ensemble deep learning-based cyber-attack detection in industrial control system. IEEE Access. 2020; 8(4): 83965-83973.

Alay

Al-Baity

. Deep learning approach for multimodal biometric recognition system based on fusion of iris, face, and finger vein traits. Sensors. 2020; 20(19): 5523.

Aleksandrova

Rubtsova

Timofeeva

. Hermeneutical Equivalence of Idioms in Translation. Boletin de Literatura Oral. 2023; 10(1): 887-892.

Amanullah

Habeeb

RAA

Nasaruddin

Gani

Ahmed

Nainar

ASM

\ldots

Imran

. Deep learning and big data technologies for IoT security. Computer Communications. 2020; 151(44): 495-517.

Beddiar

Jahan

Oussalah

. Data expansion using back translation and paraphrasing for hate speech detection. Online Social Networks and Media. 2021; 24(11): 100153.

Chen

Wawrzynski

. Cyber security in smart cities: a review of deep learning-based applications and case studies. Sustainable Cities and Society. 2021; 66(7): 102655.

Chen

. Design of internet of things online oral English teaching platform based on long-term and short-term memory network. International Journal of Continuing Engineering Education and Life Long Learning. 2021; 31(1): 104-118.

10.

Fan

Bhosale

Schwenk

. Beyond english-centric multilingual machine translation. Journal of Machine Learning Research. 2021; 22(107): 1-48.

11.

Ferrag

Shu

Djallel

Choo

KKR

. Deep learning-based intrusion detection for distributed denial of service attack in agriculture 40.. Electronics. 2021; 10(11): 1257.

12.

Gallo-Molina

Cogoni

Peeters

Ambati

Nopens

. A hybrid model for multipoint real time potency observation in continuous direct compression manufacturing operations. International Journal of Pharmaceutics. 2022; 613(31): 121385.

13.

Groves

Mundt

. A ghostwriter in the machine? Attitudes of academic staff towards machine translation use in internationalised Higher Education. Journal of English for Academic Purposes. 2021; 50(5): 100957.

14.

Huang

Yang

Zhong

. A method for windows malware detection based on deep learning. Journal of Signal Processing Systems. 2021; 93(36): 265-273.

15.

Joshi

Valluri

Rawat

Kriti

. Computer Vision-Oriented Gesture Recognition System for Real-Time ISL Prediction. Machine Intelligence, Big Data Analytics, and IoT in Image Processing: Practical Applications. 2023; 117(42): 305-321.

16.

Khan

Byun

Park

. IoT-blockchain enabled optimized provenance system for food industry 40. using advanced deep learning. Sensors. 2020; 20(10): 2990.

17.

. Simulation of English education translation platform based on web remote embedded platform and 5G network. Microprocessors and Microsystems. 2021; 81(12): 103775.

18.

Liang

Xie

Zhang

. A mutual security authentication method for RFID-PUF circuit based on deep learning. ACM Transactions on Internet Technology (TOIT). 2021; 22(2): 1-20.

19.

Chai

Cao

. Live speech portraits: real-time photorealistic talking-head animation. ACM Transactions on Graphics (TOG). 2021; 40(6): 1-17.

20.

Lumei

. Database processing in Chinese classics translation system based on information search and IOT speech recognition. International Journal of System Assurance Engineering and Management. 2023; 7(4): 1-13.

21.

Luo

Tang

Chai

Qin

. Natural language to visualization by neural machine translation. IEEE Transactions on Visualization and Computer Graphics. 2021; 28(1): 217-226.

22.

Qiao

Song

. Deep-learning-enabled security issues in the internet of things. IEEE Internet of Things Journal. 2020; 8(12): 9531-9538.

23.

Maruf

Saleh

Haffari

. A survey on document-level neural machine translation: Methods and evaluation. ACM Computing Surveys (CSUR). 2021; 54(2): 1-36.

24.

Natarajan

Rajalakshmi

Elakkiya

Kotecha

Abraham

Gabralla

Subramaniyaswamy

. Development of an end-to-end deep learning framework for sign language recognition, translation, and video generation. IEEE Access. 2022; 10(3): 104358-104374.

25.

Park

Lee

. Enabling real-time sign language translation on mobile platforms with on-board depth cameras. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 2021; 5(2): 1-30.

26.

Poulis

. Evaluating the Context of Translation Courses in the Departments of English in Kurdistan Region, Iraq. International Journal of Social Sciences & Educational Studies. 2023; 10(1): 209.

27.

Sarker

. Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective. SN Computer Science. 2021; 2(3): 154.

28.

Sharma

Saxena

Singh

Sadhya

. Benchmarking deep neural network approaches for Indian Sign Language recognition. Neural Computing and Applications. 2021; 33(3): 6685-6696.

29.

Singh

. Text similarity measures in news articles by vector space model using NLP. Journal of The Institution of Engineers (India): Series B. 2021; 102(17): 329-338.

30.

Swar

Mohsen

. Students’ cognitive processes in L1 and L2 translation: Evidence from a keystroke logging program. Interactive Learning Environments. 2022; 56(41): 1-16.

31.

Thakkar

Lohiya

. A review on machine learning and deep learning perspectives of IDS for IoT: recent updates, security issues, and challenges. Archives of Computational Methods in Engineering. 2021; 28(9): 3211-3243.

32.

Ullah

Mahmoud

. Design and development of a deep learning-based model for anomaly detection in IoT networks. IEEE Access. 2021; 9(1): 103906-103926.

33.

Wijnands

Thompson

Nice

Aschwanden

Stevenson

. Real-time monitoring of driver drowsiness on mobile platforms using 3D neural networks. Neural Computing and Applications. 2020; 32(11): 9731-9743.

34.

. Multimedia interaction-based computer-aided translation technology in applied English teaching. Mobile Information Systems. 2021; 20(3): 1-10.

35.

Carpuat

. EDITOR: An edit-based transformer with repositioning for neural machine translation with soft lexical constraints. Transactions of the Association for Computational Linguistics. 2021; 9(4): 311-328.

36.

Yuan

Wang

Liu

Guo

Bao

. Byte-level malware classification based on markov images and deep learning. Computers & Security. 2020; 92(21): 101740.

37.

Zeng

. Implementation of Embedded Technology-Based English Speech Identification and Translation System. International Journal of Computer Systems Science & Engineering. 2020; 35(5): 377-383.

38.

Zheng

Song

Liu

Song

Yue

. Research on motion pattern recognition of exoskeleton robot based on multimodal machine learning model. Neural Computing and Applications. 2020; 32(9): 1869-1877.

Construction of English corpus oral instant translation model based on internet of things and deep learning of information security

Abstract

Keywords

1. Introduction

1.1 Research background and motivations

1.2 Research objectives

2. Literature review

3. Research model

3.1 Real-time translation model under deep learning and IoT information security

3.2 Application of IoT technology and deep learning in the instant translation model

4.1 Experimental materials

Table 2 Specific content of TED Talk data

Table 3 Experimental environment

5. Conclusion

5.1 Research contribution

5.2 Future works and research limitations

References

Table 2
Specific content of TED Talk data

Table 3
Experimental environment