Abstract
The teaching of linguistics is limited by the influence of various factors, which leads to poor teaching effect, and the teaching process is difficult to evaluate. In order to improve the efficiency of linguistics teaching, this paper uses improved machine learning algorithms to construct a linguistics artificial intelligence teaching model. According to the teaching needs of linguistics, the efficiency of the teaching process is improved, and the teaching evaluation is performed, and the root cause analysis algorithm based on MCTS is optimized. Moreover, according to the frequent item set algorithm in data mining, a layered pruning strategy is proposed to further reduce the search space and improve the efficiency of the model. In addition, this study combines with the comparative teaching experiment to study the efficiency of artificial intelligence models in linguistics teaching. The statistical results show that the model proposed in this paper has a certain effect.
Introduction
The emergence of artificial intelligence provides new ideas for linguistics teaching. Compared with traditional teaching, artificial intelligence teaching can not only improve the teaching efficiency through intelligent means, but also evaluate the teaching process. The major difficulty of linguistics teaching is how to sum up knowledge systematically. In addition, artificial intelligence models can also be used to correct the deficiencies of students in time to help student’s correct errors in time [1].
The “three-generation theory” of the development of distance education in academic circles is a common view in the current academic circles. The first generation of distance education is correspondence education from 1840, and its technical representative and characteristics are correspondence education of postal communication and printing technology. The second generation of distance education is broadcast and television education using modern television, radio and television education (satellite and microwave), audio and video recording, telephone, telex and computer and other modern masses, individuals and telecommunications media based on postal communications and printing technology from the 1930 s to the 1950 s. The third generation of distance education refers to the teaching activities carried out by the network (computer, telecommunications, digital satellite TV) technology introduced in the 1970 s. It is called modern distance education, represented by virtual colleges and online learning. The “Four Generations Theory” divides the development of distance education into four stages: correspondence, multimedia, distance learning, and flexible learning. Later, intelligent and flexible learning is added. The “Five Generations Theory” divides the development of distance education into five stages: meta-media, correspondence, multiple media, electronics, and networking [2].
With the popularity of computers and networks, digital technology is changing the social environment on which people live. One-time education can no longer meet people’s modern needs, and network education has become the new direction of future education development. At present, online education has become a trend of globalization. The governments of the United States and the United Kingdom have invested heavily in the development of online education. European and American countries have been at the forefront of the world in terms of information facilities and informatization. However, the development of online education in my country has been late, and the development process is slow. There is still a certain gap with European and American countries.
In foreign countries, some countries have begun to apply network evaluation to teaching and learning. At the same time, some universities have developed network teaching systems, such as WebCT, WISH, Virtual-U, BlackBorad, Courselnfo, PathWare and other systems, and developed basic teaching, management and evaluation functions, such as Virtual-U teaching platform [3].
In the context of the information age, linguistics teaching needs to be reformed through artificial intelligence. In addition, with the support of artificial intelligence technology, what kind of technology platform to choose to carry out online teaching to reduce the cost of education is stable and reliable, and how to plan the structure and function of the system and make good use of the existing online teaching platform to improve the level of teaching design and improve the quality and evaluation of teaching are currently facing problems. Moreover, how to use this network platform to promote the transformation of educational ideas, promote education and teaching reform, improve the level of education and teaching management, etc. are all issues that are considered by front-line teachers in the current era of education informatization.
Related work
Intelligent teaching system is an important field of educational technology developed in the past thirty years. It combines technologies such as artificial intelligence, educational psychology, machine learning, and multimedia to help students learn knowledge and acquire skills without the guidance of a human tutor. The research on intelligent teaching systems abroad is relatively in-depth, and related systems have been applied to the teaching fields of mathematics, physics, medicine and so on. Typical intelligent teaching systems include LISP Tutor [4], Smithtown [5], SQL-Tutor [6], and AHP-Tutor [7]. Looking at the current research on intelligent teaching systems at home and abroad, most of them still belong to IT in the traditional sense, that is, these intelligent teaching systems are often stand-alone versions. These systems need to be installed on the students’ computers when they are running, and they cannot be combined with the Internet to take advantage of Internet and network teaching. However, due to its low cost, high flexibility, and freedom from the constraints of time and space, e-learning has developed rapidly in recent years. Taking the United States as an example, the survey data of the CCP project in 2002 showed that since the US government launched the “Information Campus Program” in 1990, the number of colleges and universities that can provide students with an online course catalog has increased from 5.2% in 1998 and 77.3% in 1999 to 82.0% in 2018 [8]. It will be an inevitable trend for domestic universities and even middle schools to use the Internet for online course teaching. However, under the network teaching environment, due to the lack of human tutors, students cannot get enough guidance. Moreover, some students who study online will find it very difficult to find the information he wants on the huge online teaching website, when they just need to know the answer to a question and the definition of a professional term. In addition, due to the differences in students’ existing knowledge levels, the tutoring required for the same course will also be different. Therefore, how to provide students with personalized and adequate tutoring under the network teaching environment has become an urgent research topic [10].
At present, the dominant learning theory of ITS has changed from behaviorism to cognitivism. This period pays more attention to the study of learners and learning processes, and Model Tracing is the focus of this period. Research directions during this period included case-based reasoning, a richer BuggyLibrary [11], natural language processing (NLP) [12], and authoring system [13]. Typical systems are LISP Tutor [14], Geometry Tutor [15], and PROUST [16]. Under the guidance of constructivist theory, the main research focus is on learner control (Learner Contr01). The research in this period is more controversial about whether each learner should interact independently with ITS or a collaborative interaction between multiple learners and rrS and whether the learning process complies with the information processing model. During this period, the research directions focused on individualized learning, collaborative learning, situational learning, information processing, and the introduction of virtual reality simulation learning situations [17]. Since the 21st century, constructivist learning theories have become increasingly popular, and traditional teaching-centered teaching models have turned to learning-centered teaching models. The focus of ITS research during this period was on non- well-structured problem solving, teaching agents, teaching games, and metacognitive skills. It is worth mentioning that ITS research based on natural language dialogue has gradually become a hot sub-branch in the field of ITS research. In this field, the Atlas system of the University of Pittsburgh, the Auto Tutor system of the Memphis University, and the Circsim. Turtor system of the Illinois University of Technology are well-known [18]. On the other hand, network teaching has become more and more popular due to its low cost and high flexibility. Therefore, web-based ITS research is bound to become a research hotspot in this field. The literature [19] proposed a teaching system called ZOSMAT, which can not only assist students to learn independently, but also be used to assist students in the real teaching environment with human tutors. Its main feature is to take students as the center, pay attention to the learning state of learning, give guidance during the students’ learning process, and support online teaching, which it is a typical web-based ITS system. The literature [20] proposed a personalized online teaching system based on fuzzy feedback. It can automatically select courseware suitable for the student according to the uncertain or vague feedback of the student and recommend it to the student to learn, so as to achieve the purpose of personalized learning.
Intelligent Tutoring System (ITS) is a comprehensive subject involving artificial intelligence, computer science, cognitive science, pedagogy, psychology and behavioral science. The ultimate goal of its research is to bear the main responsibility of human education by the computer system, that is, to give the computer system intelligence and replace the human teacher to a certain extent by the computer system to achieve the best teaching. The basic framework of ITS was proposed by [21]. The literature believed that ITS must deal with three types of knowledge: domain knowledge, that is, the Expert Model, which mainly solves the problem of what to teach and contains the knowledge that the system tries to teach to students; Learner knowledge, that is, Student Model, which mainly solves the problem of whom to teach, that is to indicate what the student already knows and does not know and the cognitive characteristics of the student; The knowledge of teaching strategies, namely Tutor Model, which mainly solves the problem of how to teach (How to teach), and mainly provides targeted teaching strategies.
Design of anomaly detection algorithm based on VRNN
Although the VAE in the deep generation model has achieved great success in anomaly detection, the existing VAE-based anomaly detection method does not consider the time correlation in the time series. Therefore, its performance in time series anomaly detection is not outstanding. Therefore, this article mainly studies the second situation of time series anomaly detection, and specifically describes the anomaly detection of multidimensional monitoring indicators in intelligent operation and maintenance as: For any time t, given historical observation data xt-T+1, ⋯ , xt-1, x t , determine whether an abnormality occurs at that time t, denoted by y t = 1.
By combining the typical representative VAE in the deep generation model with the LSTM network that has a good performance in the field of time series processing, this section proposes a VRNN-based time series anomaly detection algorithm adopting a semi-supervised learning method. This design enables the variation self-encoding network to take into account the time-dependent structure of sequence data during model generation. The overall flow of the algorithm is shown in Fig. 1.

VRNN-based anomaly detection algorithm process.
VRNN combines VAE and LSTM networks, and contains a variation self-encoding network module at each time point t. The variation self-encoding network at each time point t depends on the state variable ht-1 at the previous moment in the LSTM network. This design allows VRNN to take into account the time-dependent structure of sequence data when generating models. Figure 2 shows the network structure of VRNN.

Variation recurrent neural network structure.
Since the VAE at each time point t is not a time series model, its input data x is input in the form of a vector. However, the form of the monitoring index studied in this paper at each time t is a scalar. Therefore, this article introduces the sliding window technology. The input data x t of VAE at time t is represented by a vector xt-W+1, ⋯ , x t of length W.
The following is a detailed description of VRNN in different stages.
In VRNN, the prior distribution of the hidden variable z is still set to normal distribution, but unlike VAE, it is no longer simply set to the standard normal distribution. As shown in Fig. 3(a), because the VAE module is included at each time point t, the hidden variable z is changed to z t . By using the hidden state ht-1 in the LSTM network, it is transformed by the neural network mapping. The setting of the prior distribution parameter of the hidden variable z t is shown in formula (1).
Among them,
The neural network
ht-1 represents the hidden state of the sequence output in the LSTM network at a time, linear represents a linear transformation function, and relu and softplus represent nonlinear activation functions. Formula (2) obtains
Figure 3(b) describes the process of VRNN reconstructing x
t
, which is the decoding stage of the model. The reconstruction process depends not only on hidden variables z
t
, but also on the hidden state ht-1 at the previous moment. As with standard variational autoencoders, we assume that the distribution of reconstructed x
t
follows a Gaussian distribution. First, the approximate posterior distribution of the hidden variable z
t
is sampled, and then it is mapped and transformed using the decoding network to obtain the distribution parameter of the reconstructed x
t
. The specific form is shown in formula (5).
In the formula,
Among them, ɛ is sampled from the standard normal distribution, and the approximate posterior distribution of the hidden variable z
t
is sampled using formula (6). Here, in order to make the model adopt the gradient descent method for training, the reparameterization trick in VAE is used for optimization. Therefore, the actual sampling effect at this moment is equivalent to
Figure 3(c) describes the process of the model updating the hidden state of the VRNN module. The calculation process of h
t
is shown in formula (11):

The design of the variational recurrent neural network at each stage.
Among them,
Figure 3(d) describes the model’s inference process of the posterior distribution of the hidden variable z
t
, which is also the coding stage of the model. Through the encoding network, the input data x
t
is mapped and transformed, and an approximate posterior distribution is learned to approximate the true posterior distribution of the hidden variable z
t
, and it is assumed that the approximate posterior distribution follows the Gaussian distribution. The specific form is as follows:
Among them,
Similar to the design of the decoding network, formula (13) and formula (14) actually perform two nonlinear mappings on the input data x
t
to obtain the intermediate variable
After the VRNN model is established, the semi-supervised learning method and normal class data, that is, the pre-processed training set data are used to train each part of the network and update the parameters. For the VAE at each moment in the variational recurrent neural network, we introduce the time feature in its original variational lower bound to obtain the new variational lower bound as shown in formula (17).
The objective function of VRNN is the sum of the variational lower bounds of VAE at all times, as shown in formula (18):
After training the VRNN, this paper uses the reconstruction probability based on VAE anomaly detection as a measure of whether the data x is abnormal. The difference is that after considering the time correlation of x, the reconstruction probability becomes m Eq ϕ (z t |x t ) [log p θ (x t |z≤t, x<t )]. It can be seen that the reconstruction probability at each moment is related to the information at the historical moment.
For anomaly detection based on VAE, a kernel density estimation (KDE) with reconstruction probability Eq ϕ (z t |x t ) [log p θ (x|z )] as a measurement index is proposed: The reconstruction probability Eq ϕ (z t |x t ) [log p θ (x|z )] can be regarded as a weighted kernel density estimate, the approximate posterior distribution q ϕ (z|x) is used as the weight, and log p θ (x|z) is used as the kernel density estimate. The smaller the reconstruction probability, the more likely the data is abnormal data. This paper uses this kind of KDE to further analyze the anomaly detection algorithm based on VRNN.
The anomaly detection algorithm based on VRNN maximizes the variational lower bound during training. For normal data x t in the sequence, its log-likelihood similarity log p θ (x t |z≤t, x<t) is higher, and vice versa. That is, the log-likelihood of each sample in the latent variable space of normal data x t can be used as a density estimator to indicate the degree to which x t follows the normal pattern. Then, the approximate posterior distribution q ϕ (z t |x≤t, z<t) is used as the weight, and a weighted average is performed on log p θ (x t |z≤t, x<t) of the L samples sampled in the hidden variable space to obtain the final reconstruction probability.
In the process of training time series data by VRNN, the approximate posterior distribution q ϕ (z t |x≤t, z<t) is organized into a form of smooth transition, so that the z t sample shows a color gradient in the hidden variable space. This article refers to this phenomenon as a time gradual phenomenon, as shown in Fig. 4.

Three-dimensional display of time gradual phenomenon.
The reason for the time gradual phenomenon can be derived from formula (19):
The first term in the formula requires that the L samples from the approximate posterior distribution q ϕ (z t |x≤t, z<t) of the data x t have a high probability of reconstructing x. The larger the difference between the approximate posterior distribution q ϕ (z t |x≤t, z<t) of the two data, the farther they are pushed away. Since the initial variational network is random, when the training is just started, the approximate posterior distribution q ϕ (z t |x≤t, z<t) is mixed everywhere, as shown in the leftmost of Fig. 6. At this time, the approximate posterior distribution of each data is pushed away by the approximate posterior distribution of all other data.

Monte Carlo tree search application in root cause analysis.

MCTS-based root cause analysis algorithm flow.
For a given cuboid B, we want to obtain the subset with the largest potential score. If it is assumed that there are n elements in E (B), then the search space of the set in cube B is 2 n - 1. When n becomes larger, the set search space will show exponential explosion growth. In order to solve the problem of search space explosion, this paper will use the heuristic search algorithm based on MCTS to search.
In this paper, MCTS is applied to the root cause analysis of multi-dimensional monitoring indicators to solve the problem of huge search space. In the Monte Carlo tree, each state s corresponds to the candidate root cause set S currently being explored. This article sets three variables for each state s: 1. N (s, a) is the access count, that is, the number of times edge (s, a) has been visited;2. ps (s) is the potential score of set S;3. It is assumed that s is converted to s′ after (s, a), and the action value Q (s, a) of edge (s, a) is defined to be equal to the largest potential score in s′ and its descendants, as shown in formula (20). For each s, Q (s, a) is initialized to ps (s).
The following is a detailed description of the application of MCTS in root cause analysis, mainly divided into four steps: selection, expansion, calculation, and feedback. We assume that at the beginning of the current iteration, the state of the Monte Carlo tree is shown in Fig. 5.
1. Selection: The goal of this step is to select the node to be expanded from the current status tree. Each time this step is performed, the tree traversal always starts with the root state. We assume that we have advanced to the current state in this selection step. At this time, the UCB algorithm should be used to make the selection of the node. However, when using the UCB algorithm, all nodes will be initialized once. At this time, if all nodes are initialized with the same probability, it will cause a waste of time and space, so MCTS needs to be optimized. First, the potential score ps (e) of each element e in each cuboid in each layer is calculated, and all elements e are sorted according to ps (e). After that, the UCB algorithm is used to select the child nodes according to the arranged order, thereby increasing the possibility of the node with a high potential score being visited. Compared with the case without ranking, the optimized algorithm can find the node with the highest potential score within a reasonable and limited time.
In the specific node selection, if all actions in the available action set A (s) have been selected in the previous iteration, the UCB algorithm is used to select action a from the set A (s), as shown in formula (21):
The Q (s, a) in the first part of the formula is the value of the selected action a. The higher the value of Q (s, a), the greater the chance of choosing to move a in this selection step, which is the development mechanism in MCTS. The second part of the equation is the standard UCB algorithm exploration mechanism. The less the number of times edge (s, a) is visited, that is, the smaller N (s, a) is, the larger the value of the whole formula is, and the more likely it is to be visited. By modifying C, the balance between development and exploration can be changed. The common value of C is
If there is an action a ∈ A (s) that has not been explored at all, the action corresponds to N (s, a) = 0, and formula (21) cannot be used in this case. Therefore, for the action a that has not been visited, this article sets the probability of being visited as P, as shown in formula (22):
The selection step starts at the root node of the tree and stops when a leaf state is selected, or an unvisited action is selected. For example, as shown in Fig. 7(a), as indicated by the black arrow, when the leaf state {e1, e3} is selected, the selection step stops.

Artificial intelligence evaluation management model.
2. Expansion:After selecting a certain state in the selection step, the Monte Carlo tree is expanded by adding a new node s′, and the expanded new node s′ satisfies S (s′) = S (s)∪ { e* }. Among them, the choice of e* is shown in formula (23). The element with the largest potential score among the remaining elements {e1, e2, ⋯ , e
n
} - S (s) is selected, but not e is randomly selected. At the beginning of MCTS, since the root node is ϕ, it will first select an element with the largest potential score at the root node to add child nodes.
4. Feedback: Along from s′ to the root, the action value Q and access count N on all nodes on the path are updated, as shown by the thick arrows in Fig. 7(d). According to the definition of Q, for Q, the Q of the parent node is updated only when the Q of the child node is greater than the Q of the parent node.
MCTS is applied in each cube, and for each search space, the above four steps are iteratively performed until at least one of the following three conditions occurs: If the potential score of a subset under a certain cuboid is found to be greater than the set threshold PT, that is, if ps (S) > PT, the set is considered to be the set of root causes that caused the abnormal. The higher the threshold is set, the more accurate the search results will be, but the longer the algorithm will run. All available nodes of the set are expanded in the Monte Carlo tree, that is, all possible element sets in the cube are searched. The running time of MCTS is greater than the set time threshold. MCTS is a heuristic search algorithm, and the running time of the algorithm is controllable, and it can be stopped at any point in the search execution process.
In this paper, the idea of MCTS proposed in HotSpot algorithm is followed, and the new measurement index is used as the value function in MCTS, and the root cause analysis algorithm based on MCTS is optimized. Finally, according to the frequent itemset algorithm in data mining, a hierarchical pruning strategy is proposed to further reduce the search space. The overall process of the root cause analysis algorithm is shown in Fig. 6.
Based on the support of the above root cause analysis algorithm, the artificial intelligence teaching model of this research is constructed. The constructed model is shown in Fig. 7.
Taking a class as an example, through the platform background statistics, it was found that the average number of questions in this class is 9 times, the number of submitted assignments is 12 times, the number of times the post is viewed is 19, the time the post is replied is 9 times, the average self-test score is 88 points, and the average time of the extracurricular browsing courseware is 4.7 hours per week, as shown in Table 1.
Experimental statistical results
Experimental statistical results
Based on this, it can be concluded that students have high enthusiasm for interaction, students have better test scores, and students have higher interaction effects and higher quality. Therefore, the platform provides students with good learning support services.
In order to test the effectiveness of the network learning system, this paper will compare the results of the control class and the experimental class, as shown in Table 2 and Fig. 8. The grades of the experimental class are called test grades, the grades of the control class are called comprehensive assessment results, and the grades of the control class use the comprehensive assessment grades of the corresponding correspondence+face-to-face teaching of the grade.
Comparison table of experimental grades

Comparison diagram of experimental grades.
As can be seen from Fig. 8 above, there are significant differences in linguistic grades between the experimental class and the control class. To further express this difference, the difference between the grades of the experimental class and the control class is now calculated. The experimental class is subtracted from the control class, and the results are shown in Table 3 and Fig. 9. If the difference is greater than 1, it means that the linguistic grades of the experimental class is higher than that of the control class.
Statistical table of the difference in linguistic grades between the experimental class and the control class

Statistical diagram of the difference in linguistic grades between the experimental class and the control class.
As shown in the results shown in Fig. 9, the grades of the experimental class corresponding to the ranking are better than the control group, and the lower the ranking, the more obvious the difference. It can be seen that the linguistics teaching model based on machine learning and artificial intelligence algorithms proposed in this paper has a certain effect.
In the context of the information age, linguistics teaching needs to be reformed through artificial intelligence. In addition, with the support of artificial intelligence technology, what kind of technology platform to choose to carry out online teaching to reduce the cost of education is stable and reliable, and how to plan the structure and function of the system and make good use of the existing online teaching platform to improve the level of teaching design and improve the quality and evaluation of teaching are currently facing problems. Moreover, how to use this network platform to promote the transformation of educational ideas, promote education and teaching reform, improve the level of education and teaching management, etc. are all issues that are considered by front-line teachers in the current era of education informatization. This study optimizes the root cause analysis algorithm based on MCTS by improving the machine learning algorithm, following the idea of MCTS proposed in Hot Spot algorithm, and using the new measurement index as the value function in MCTS. Moreover, this study proposes a hierarchical pruning strategy based on the frequent item set algorithm in data mining to further reduce the search space. In addition, in this study, the performance of this research model is analyzed through control experiments. The research results show that the artificial intelligence model proposed in this paper has a significant effect in the reform of linguistics teaching.
Footnotes
Acknowledgments
Fund project: The National Social Science Foundation of China: A Study on the Semantic System and Semantic Sequence of Monosyllabic Polysemous Words in The Chinese Dictionary(17BYY152); The Ministry of Education Humanities and Social Sciences Research Fund Youth Project: The analysis of sememe of Chinese Monosyllabic polysemous words and the study of entering Dictionaries(15YJC740123).
