Abstract
How to apply artificial intelligence technology to help education reform is a problem that teaching researchers need to solve urgently. Using artificial intelligence technology to improve the key competences of English subjects is the new direction of current English teaching development. This research combines machine learning technology to analyze the key competences assessment of English teaching disciplines and builds an evaluation model corresponding to the threshold. Moreover, on the basis of orderly mutual information, this study combines the maximum correlation and minimum redundancy theory to select the attribute algorithm to optimize the key competences assessment function of English subjects. In addition, in this study, the performance of the research model is analyzed through a comparative test, and the results are analyzed through actual numerical comparison and error comparison. The research results show that the recognition accuracy of this research model is closer than that of the real score, has higher accuracy, and has certain practical effects.
Introduction
In the era of artificial intelligence, one of the main purposes of learning is to cultivate key competences and capabilities, such as analytical thinking, cooperative communication, and practical capabilities that machines cannot possess, rather than simply acquiring pure memory-based knowledge. Moreover, the awareness of lifelong learning [1] needs to be established and should be implemented in the teaching practice of each student.
Teaching and research personnel engaged in the field of English must not only have good professional qualities, a solid foundation of educational theory, and profound cultural knowledge, but also have the ability to absorb scientific information and update the knowledge base. At the same time, we need to take advantage of technology and actively explore how to apply artificial intelligence technology to English teaching practice to benefit more English learners [2]. Through analysis and research, it can be found that the use of artificial intelligence to assist language learning has many technical blessings to solve key competences training. At the same time, the realization of artificial intelligence technology in the process of educational informatization can provide teaching researchers with inspiration for educational issues, which has certain practical significance.
With the implementation of the strategy of strengthening the country through talents, in recent years, the Ministry of Education has successively issued a number of documents requesting a comprehensive deepening of curriculum reform and implement the fundamental tasks of enhancing morality and fostering talents, and in March 2014 issued the “Opinions on Comprehensively Deepening Curriculum Reform and Implementing Fundamental Task of Enhancing Morality and Fostering Talents”. Moreover, the Ministry of Education has put forward the concept of “key competences” and requires “organizing and researching students of all grades to develop a key competences system to clarify the necessary qualities and key abilities that students should have to meet the needs of lifelong development and social development.” On the basis of comprehensive research on the key competences of various economic organizations and countries, China put forward the “Key Competences of Chinese Student Development” in 2016, which deeply answers the fundamental question of “what morality to build and what talent to foster “ at the meso level. Experts in various disciplines took “Chinese Student Development Key Competences” as the top design and combined the characteristics of this subject with the practical needs of China’s basic education curriculum to introduce the key competences of various disciplines, laying the foundation for the implementation of “Chinese Student Development Key Competences”. At present, the concept and connotation of the key competences of the English subject has been defined in the “General High School English Curriculum Standards (2017 Version)”, and the educational tasks undertaken by the English subject are clearly defined. Although the key competences of English subjects in compulsory education has not yet been formed, the key competences of English subjects in junior high schools and elementary schools will be formulated accordingly. Although “Key competences for Chinese Student Development” and the key competences of English subjects have been promulgated, the current research on them is still in its infancy, and is mostly at the theoretical level, and is relatively few at the practical level. At the same time, the era of informatization 2.0 has brought big data, cloud computing, the Internet, and smart classrooms to teachers and students. The new technology has brought a tremendous impact on the traditional classroom model, and also provided a new method and path for the cultivation of the key competences of English subjects in the new era [3].
At present, education and teaching in various disciplines in China is the main position for the cultivation of students’ key competences. However, the author found through the literature that the cultivation of key competences in various disciplines has not yet been fully implemented, and traditional teaching concepts and models still dominate in teaching. With the advent of the informatization 2.0 era, big data, cloud computing, the Internet, etc. have opened new doors for educational informatization, and many experts and scholars are working on the integration of informatization and subject teaching. Although some progress has been made, most research is at the theoretical and conceptual stage. In reality, although informatization technology and subject teaching have a certain integration, most of them are in a shallow state, and informatization is mostly used to assist teachers in teaching, and it has not been able to change the classroom structure and teaching and learning methods in depth. However, there is little research on the effect of the integration of informatization and student English teaching on the cultivation of students’ English key competences.
Related work
With the advancement of information technology and educational concepts, the application of data mining in the field of education has begun to attract more and more researchers’ attention. EDM is an interdisciplinary subject, which involves knowledge in multiple fields, such as psychology, pedagogy, statistics, and computer technology. At present, EDM has many applications, such as using auxiliary managers to make decisions, helping students improve learning efficiency, and assisting teachers to improve teaching programs. In the traditional teaching environment, the communication method between teachers and students is mainly face-to-face classroom teaching. At this time, the methods of collecting educational data are mainly interviews and observation records, and then the data is analyzed and processed by machine learning or statistical methods. The literature [4] observed the performance of 22 elementary students in the classroom and used the regression tree method to study the observation data. The results of the study show that the percentage of students distracted by classmates accounts for 45%, personal distraction and environmental disturbances each account for 18% and 16%. As can be seen from the research results, EDM technology helps us better understand the reason for the classic classroom phenomenon of “ lose concentration”. With the advancement of science and technology, a new teaching environment has gradually developed in the field of education, which is called a closed teaching environment. The closed teaching system is generally limited to students and educators, and there will be no interaction between users and the system will not be connected to the Internet. The literature [5] uses the decision tree algorithm to study the records and personal information left by students in the learning system and recommends the course content and learning sequence according to the characteristics of students. The literature [6] extracted the data of 5000 8th grade students, including the status of students ’scholarships and the results of the previous year, and used a variety of classification algorithms [7] to predict the results of students’ placement exams. Among them, the prediction accuracy of the C5 decision tree is the highest, reaching 95%. This research can enable schools to cancel the placement test, which greatly saves educational resources. In the 20th century, distance education based on WEB technology began to rise on a large scale. The biggest difference between distance education [8] and closed education is that it allows users to communicate and collaborate. This kind of open network education using a certain degree of artificial intelligence technology is called an open teaching environment. Typical applications in an open teaching environment are computer-supported collaborative learning (CSCL) and intelligent tutoring system (ITS). The literature [9] used data mining methods to analyze the use records of students in online courses, and the research results showed the relationship between grades and number of questions. The literature [10] clustered and mined association rules on the data on the MOODLE [11] platform to realize the personalized course recommendation function. The literature [11] designed an adaptive English learning system. This system can recommend English learning materials of different levels of difficulty according to the characteristics of students. The experimental results show that the students using the learning system have better scores than the students in the control group. In the 21st century, the era of big data [12] has spawned many novel teaching environments, some of which are based on games and some of which are based social networks. In addition, VR [14] and AR technologies have also been used in educational scenes to create a new learning experience. Game-based teaching environment (GBLS) [13] uses the game mechanism to design the learning process to create a relaxed and happy learning atmosphere for learners, which can stimulate learning motivation and passion to a certain extent. The literature [14] used a simple Bayesian classifier to model the recorded data of a puzzle game. The classification accuracy of the model on the learning style of students exceeds 85%. The literature [15] studied the data on Twitter and established a prediction model using Naïve Bayes classifier text processing technology. The study found that engineering students often face problems such as lack of sleep, lack of social activities, high stress, and excessive learning burden. At present, the research on English education in the field of educational data mining mainly focuses on language testing and some individual English skills. Wen [18] used multiple regression analysis to study the extent to which oral task performance can predict TEM-4 written test results. The literature [16] used the TEM-4 score as the standard to study whether self-assessment can reliably and effectively measure the students’ language proficiency and confirmed this hypothesis. The literature [17] found that TEM-4 score can be predicted by learning motivation and learning effort. The disadvantage of these studies is that the research data comes from the questionnaire or specific test results, and the respondents sometimes fill in the questionnaire randomly due to privacy protection, self-esteem and other issues, which affects the authenticity of the data [18]. The second deficiency is that the type of research data is too simple, and the data does not understand the characteristics of students in all aspects. The third shortcoming is that the research results can only be used to analyze specific data, the versatility is weak, and no intelligent analysis teaching platform has been formed.
Network statistical characteristics
The probability that any two neighboring nodes of a node in the network are also neighbors of each other is called the clustering coefficient.
We assume that the degree of node i in the network is k
i
, that is, it has k
i
neighbor nodes directly connected by edges. If all the k
i
neighbors of node i are connected, then there are k
i
(k
i
- 1)/2 edges between these neighbors. However, in fact, most of the k
i
neighbors of node i are not neighbors in most cases. The definition of the clustering coefficient C
i
of node i in the network with degree k
i
is [19]:
Among them, E i represents the number of edges that actually exist between the k i neighbors of node i, that is, the logarithm of the neighbors that actually exist between the k i neighbors of node i. If the neighbor node of node i is less than or equal to 1 (k i = 1 or k i = 0), then E i = 0. Then, the numerator and denominator in formula (1) are both zero, which is denoted C i = 0. Therefore, it is easy to know that 0 ≤ C i ≤ 1. When any two neighboring nodes of node i are not connected, C i = 0.
Formula (2) provides a new perspective to describe the clustering coefficient. Among them, E
i
represents the number of fully connected triples containing node i, and its maximum value is k
i
(k
i
- 1)/2. A connected triplet centered on node i represents a triangle containing node i from node i to the other two nodes (Fig. 1). Therefore, the maximum possible number of triangles containing node i is the number of connected triples centered on node i, that is, k
i
(k
i
- 1)/2 [20]. Therefore, the geometric definition of the clustering coefficient of node i equivalent to definition (1) is as follows:

Two possible forms of connected triples centered on i.
If the adjacency matrix of the network is represented by A = (a
ij
) N×N, then the number of triangles containing node i is [21]:
If a ij a jk a ki = 0, a triangle cannot be formed between the three nodes.
Therefore, the clustering coefficient of node i can be calculated as follows:
The clustering coefficient C of the network is defined as the average value of the clustering coefficients of all nodes in the network, namely
Obviously, there is 0 ≤ C ≤ 1. If the network is globally coupled, any two nodes are neighbors, and the clustering coefficient of each node is 1, then C = 1. If the clustering coefficients of all nodes in the network are zero, then C = 0 [22, 23].
For node 1, there is E1 = 3, K1 = 5, so there is:
Similarly, the following results can be obtained.
Therefore, the clustering coefficient of the entire network is:
The sociological significance of the triangle is that your two friends are also friends with each other. Its relative quantity measures the clustering characteristics of the network [24, 25]. The sociological definition of the network clustering coefficient is as follows:
The reason why the factor in formula (10) is set to 3 is that each triangle corresponds to 3 different connected triples, and they are centered on the three vertices of the triangle.
The two definition formulas (9) and (10) of the network clustering coefficient are not completely equivalent. The average in the definition formula (9) is to give the same weight to the clustering coefficient of each node, and the average in the definition formula (10) is to give the same weight to each triangle in the network. Moreover, nodes with large degrees have a greater probability of being included in more triangles. However, the difference between the two definitions does not affect the relative size of the clustering coefficients. We will be more concerned about whether a real network has a significantly higher clustering coefficient than a completely randomized network with the same number of nodes and edges.
The position of a node in the network determines its value. The closer its position to the center, the greater its value. This research on node centrality index is of great significance in different fields. For example: The most influential people are identified in online communities such as friend circles and Weibo. In the disease transmission network, high-risk groups are identified. In the communication network and the transportation network, the point with the largest flow is found. How search engines rank the most important results first.
Centrality is an important indicator of structural position, and it is used when evaluating a person ’s social status. Degree centrality, closeness centrality and betweenness centrality are three important forms of centrality.
The evaluation criterion of degree centrality is that the greater the degree, the greater the importance of the nodes. If a network has N nodes, then its maximum node degree value can reach N - 1. After normalization, the degree centrality of the node with degree k
i
is:
Betweenness centrality reflects a person’s betweenness ability. If it is difficult for others to avoid him and communicate with others, it means that he takes up a lot of critical paths. The high betweenness means that a large number of people must pass through him to connect with others.
If a network has many unconnected subgraphs, there are structural holes between the two networks. If a person connects two discrete sub-graphics, then this person is called a bridge. This person who act as bridges play an important role in the exchange of information between two separate large groups. If the betweenness centrality of a node in the network is high, the node can control the information flow of the network and control the two communities, so that it can obtain rich betweenness value. Therefore, the centrality of betweenness numbers in social networks is a very important indicator. It can be used to measure whether a node in the social network functions as a bridge and its importance as a bridge.
Betweenness centrality of node i is defined as:
Among them, g
st
is the number of shortest paths from node s to node t, and
The definition of formula (12) was given by Freemall in 1977. This definition focuses on the ability of node i to control the information flow on the shortest path through the network. If node i is not located on any shortest path between node s and node t, that is,
If there are N nodes in a network, when a certain node is connected to all other nodes, its degree reaches the maximum value N - 1. If a network is a star network with N nodes, except for the central node, the shortest path between other node pairs is unique, and the number of shortest paths through the central node is:
The normalized betweenness of node i is defined as:
There are other slightly different forms of the definition of normalized betweenness. For example, Newmall gives the following definition:
The above formula contains the path from each node to itself, and the path starting from or ending with node i. N2 g is all possible node pairs in the network (including node-to-self pairing). Although the results calculated according to formula (14) and formula (15) are different, the order of the centrality of the intervening numbers in the network will not be affected by the different results obtained earlier. Therefore, it can be ignored.
To put it another way, if we consider the importance of betweenness centrality in controlling information dissemination, that is, the network nodes with high betweenness centrality are more important to the entire network, then excluding these key nodes may have an important impact on the dissemination of information in the network. In a real network, the information between network nodes does not necessarily strictly follow the shortest path, and the frequency of propagation is also different. Therefore, the betweenness centrality can still approximately reflect the degree of influence of network nodes on the propagation of information throughout the network.
The average distance d
i
between node i and other nodes in the network is:
Among them, d
ij
represents the distance from node i to node j. Therefore, another formula for calculating the average path length of the network is:
The average distance d
i
from node i to other nodes also reflects its importance in the network to a certain extent. The close centrality of a node is the reciprocal of d
i
, referred to as the close number, which is represented by the symbol CC
i
:
The number and quality of neighbor nodes will affect the importance of the nodes. This idea is used in the definition of feature vector centrality. x
i
is recorded as the importance measure of node i, then
A = a
ij
. x = [x1, x2, ⋯ , x
N
]
T
. Then, formula (19) can be written in the following matrix form:
x is the eigenvector corresponding to the adjacency matrix A and the eigenvalue c-1, so it is called eigenvector centrality. A basic method for calculating the vector x: after the initial value x (0) is given, the following iterative algorithm is used:
Next, we analyze the convergence of the iterative algorithm. The eigenvalues of matrix A are all real numbers, denoted as λ
i
(i = 1, 2, ⋯ , N), and the corresponding eigenvectors are denoted as V
i
∈ R
N
(i = 1, 2, ⋯ , N). Therefore, the initial value x (0) can be expressed as a linear combination of this set of feature vectors:
Based on formulas (20) and (21), the following results can be obtained:
If λ1 is the largest eigenvalue of the matrix A and is a single root, then there is;
Therefore, the eigenvector centrality x should be the principal eigenvector, and formula (23) is:
In the study of complex systems of human behavior, quantitatively characterizing the distribution of time intervals of behavior is an important content, and burstiness is an important way of characterizing the distribution of time intervals. Bursts have been observed in a large number of systems, from mail patterns to earthquakes to gene expression. In general, the enhanced activity level corresponding to a short period of time follows long periods of inactivity.
The variation coefficient is defined as the ratio of the standard deviation σ τ to the mean m τ . The mean and standard deviation of the Poisson distribution are equal, so the coefficient of variation is 1. For a perfectly regular δ function, the coefficient of variation is 0, and for a heavy-tailed distribution signal with infinite variance, the coefficient of variation is infinite.
Burstiness can be defined as the coefficient of variation:
The definition is meaningful only when the mean and standard deviation exist, and when B is meaningful, the value range of burstiness is between 1 and 1. The size of burstiness is related to the burstiness of the signal: B = 1 corresponds to the most bursty signal, B = 0 corresponds to the neutral signal (such as Poisson distribution), and B = -1 corresponds to a completely regular (period) signal.
The relevant characteristics of the signal are not unique, and the distribution characteristics of the time interval sequence of events can also be characterized by memory. The joint probability distribution P (τ, τ′ ; k) parameterized by the time interval blood is defined as the probability density of the two intervals τ and τ′ separated by k times that contains the most information about the two-point correlation characteristics, A simple metric is determined by the Pearson correlation coefficient of the continuous interval time value (τ
i
, τi+1), which is defined as the memory coefficient M:
Among them, n τ is the number of intervals between signals, m1 (m2) and σ1 (σ2) are the mean and standard deviation of τ i (τi+1), and i = 1, ⋯ , n τ - 1. The M value is a biased estimator of signal autocorrelation, which is more suitable for real-world limited signals, especially when there may be long-range correlation in the system. According to the definition, the value of M is between -1 and 1. When M > 0, if a behavior is a long (short) interval, then there is a high probability that a longer (short) interval will appear, indicating that the behavior has a memory effect. However, when M < 0, there is a high probability that a short (long) time interval will appear after a long (short) time interval, which indicates that the behavior has an anti-memory effect.
This study conducted an in-depth analysis for email exchanges between individual users, calls or text messages between mobile users, bank customer service terminal call records and the other two sets of data from the university campus, that is, the students ’book borrowing records in the library and the print and submission task records of the administrative administration department. The study found that many human behaviors have a distinct characteristic of strong bursts and weak memories on the time axis. The pink elliptical area in Fig. 2 clearly reflects the location map of the above five sets of different data. Similar phenomena have been similarly studied in other different fields, and the time-dependent networks formed have similar characteristics.

The position of the time series of events in several real systems on the B – M map.
For a complex system with a large B value, the standard deviation of the distribution of its time interval sequence is likely to be divergent, and in a system with severe divergence, the B value is large, but the M value is generally small. Although there is never real divergence in a finite system, the standard deviation of the time interval distribution of real human behavior is divergent. The shortcomings of this mathematical definition are worrying. Since the distribution of individual graphs in the system may be very scattered, the average B and M values of the system are not necessarily valuable.
The modules of the student English key competences evaluation system usually include user login module, information query module, professional training plan query module, student information import and export module, comprehensive quality assessment module, and system management module. The model separates the student administrators from the tedious assessment of comprehensive quality of college students, and no longer conducts inefficient manual labor, but focuses on the cultivation plan of college students’ quality, which improves the efficiency and quality of student management. At the same time, the model also greatly reduces the non-objectivity caused by human interference, and obtains objective, true and reliable evaluation results.
The system records the students’ various performance indicators, so that the teaching management departments at all levels can grasp the students’ English course results in real time and supervise the students’ learning. Obviously, the size and level of various evaluation indicators of students directly affect the level of comprehensive quality evaluation level. For example, the student’s course scores are divided into: failing < pass < good < excellent (can be expressed by numerical value), and the comprehensive quality assessment results can also be divided into: failing < pass < good < excellent (1 < 2<3 < 4). It can be seen that the size of each indicator has a certain monotonous relationship with the comprehensive quality evaluation results: the higher the student’s score, the better the comprehensive evaluation result. If each index is regarded as a conditional attribute and the comprehensive quality evaluation result is regarded as a decision attribute, then it can be transformed into an orderly classification problem. Therefore, this paper applies the ordered decision tree algorithm to deal with the ordered classification problem to the comprehensive evaluation system of college students. Since the application of the ordered decision tree algorithm in the comprehensive evaluation system of college students has the problems of repeated mining rules and large amount of calculation, etc., this article, based on the orderly mutual information, will combine the maximum correlation and minimum redundancy theory for attribute selection algorithm to optimize the key competences evaluation function of students in English. Its structure is shown in Fig. 3.

System module structure diagram.
The data set formed by the online teaching evaluation questionnaire is used as the experimental data. A total of 290 data records are used for regression analysis and CLOF algorithm. The experiment uses Python as the development language in the windowsl0 operating environment, and the experimental platform is Eclipse+Pydev. First, the regression analysis method is used to detect abnormal data. In the experiment, the Linear Regression method provided in the machine learning skleam library is used to construct a regression prediction model. The comparison between the predicted value obtained by this model and the real evaluation value is shown in Fig. 4.

Comparison between the predicted value obtained by regression analysis and the true value.
There is a deviation between the predicted value and the true value. The deviation between the true value of the individual data records and the predicted value is very large, and these data that deviate from the overall data distribution are likely to be abnormal data. According to the overall data distribution, the threshold of the deviation range is set to 1, that is, data records with deviations greater than 1 are regarded as abnormal data points.
Based on the above analysis, the performance analysis of this research model is carried out. In this research, the performance of the algorithm is studied by comparison. This research model is compared with the traditional BP neural network scoring model. The research model is named ML model, and the results are shown in Table 1.
Comparison table of model scoring accuracy
The prediction errors between the research model, the BP neural network model and the true values are counted, and the results are shown in Table 2 and Fig. 6.
Statistical table of scoring errors

Comparison diagram of model scoring accuracy.

Statistical diagram of scoring errors.
As can be seen from the above chart, the recognition accuracy rate of this research model is closer than that of the real score, and it has a higher accuracy, and has a certain practical effect.
This research hopes to explore the impact of informatization on the key competences of students in English subjects through empirical research, and to find a model for the cultivation of students ‘English key competences under the conditions of informatization to provide a certain reference for education departments, schools and front-line teachers in the cultivation of students’ English key competences. Teaching and research personnel engaged in the field of English must not only have good professional qualities, a solid foundation of educational theory, and profound cultural common sense, but also have the ability to absorb scientific information and update the knowledge base. At the same time, they need to take advantage of technology and actively explore how to apply artificial intelligence technology to English teaching practice to benefit more English learners. Through analysis and research, it can be found that the use of artificial intelligence to assist language learning has many technical blessings to solve key competences training. At the same time, the realization of artificial intelligence technology in the process of educational informatization can provide teaching researchers with inspiration for educational issues, which has certain practical significance. Finally, in this study, the test performance is analyzed through a comparative test. The research results show that the research model is closer in recognition accuracy than the real score, has higher accuracy, and has a certain practical effect.
