Evaluation of English online teaching based on remote supervision algorithms and deep learning

Abstract

The English online teaching automatic evaluation system is unstable in actual teaching evaluation. Therefore, how the automatic evaluation system can better adapt to high school teaching also needs a more in-depth theoretical and practical discussion. According to the actual needs of English online teaching, this article combines remote supervision and deep learning algorithms, builds a system structure for the English online teaching evaluation process, and simulates and analyzes the application of supervision algorithms in the teaching process. Moreover, this article evaluates the actions and status of the student’s learning process from the aspects of teacher evaluation and student evaluation, and also scores the teacher’s teaching process. In order to study the practical effect of this system in English online teaching, this paper designs experiments to evaluate the model online English teaching effect. The research results show that the model constructed in this paper has good performance.

Keywords

Remote supervision deep learning English teaching online teaching evaluation

1 Introduction

In order to better promote the development of harmonious classrooms, the teaching concept under the perspective of ecology has attracted people’s attention and has been applied in teaching practice. This concept promotes the interaction between teachers and students, establishes a multi-dimensional teaching evaluation method, and emphasizes the overall development of students. Moreover, this concept mainly pursues the concept of sustainable development of natural ecology, which also provides a new way for formative evaluation of online English classrooms. The implementation of the new curriculum standard reform can better reflect the high school English ecological classroom concept. In addition, the new curriculum standard reform examines students’ comprehensive ability to use language, focuses on the cultivation of students’ active learning ability and interest in learning, and emphasizes the overall development of students. In teaching activities, teaching evaluation is a very important part. However, due to various reasons, a complete teaching evaluation mechanism has not been formed, which has affected the development of the entire education. Therefore, in the future development of education, the formation of formative evaluation is an inevitable measure, and reform of formative evaluation is also a necessary measure. It can be seen from this that the ecological classroom concept has both similarities and innovations compared with the new curriculum standard [1].

The combined study of teaching and ecology opens up a whole new perspective when evaluating English classrooms and requires multi-faceted thinking in theoretical research. The study of college English ecological classroom is a relatively new field. Under the existing educational background, it is necessary to promote the implementation of new curriculum standards and better reform of educational curriculum. College students are in a special period, so formative evaluation will have a huge impact on them. In the case of ecological theory as a guide, formative evaluation can solve a variety of situations such as high school classroom boredom and students’ aversion to learning. Moreover, classroom dynamic evaluation can better help students understand themselves, cultivate students’ interest in learning, improve their learning methods, and achieve their overall development. In addition, it can improve the English learning environment, improve the overall teaching quality, and enhance the professional quality of teachers [2].

When conducting research on the classroom, it changed the traditional research model, starting from the perspective of ecology, and carrying out a variety of evaluation methods according to the learning situation of the students; Moreover, this method can provide better teaching suggestions to English online teaching teachers, and can better help them complete their daily work. Secondly, at present, a lot of research is on the fields of junior high school, vocational high school, technical secondary school and university. However, there are few studies on the evaluation of college English teaching. Therefore, the key of this article is how to construct an ecological research perspective in the process of constructing the formative evaluation model of college English classrooms [3].

2 Related work

For different detection targets, scholars have proposed many models and algorithms for recognition and detection. In particular, in recent years, with the continuous development of deep learning technology, the effect of object detection is getting closer to the effect of human eye recognition, and sometimes the effect of object detection even exceeds the human eye [4]. The steps of traditional object recognition detection are extraction of candidate regions, extraction of feature expressions, and object classification/location. First, the supervised learning method selects some possible object location areas through sliding windows or image segmentation. Then, it uses traditional algorithms or CNN methods to extract regional features to form feature vectors. Finally, it uses a classifier to classify the features of each region, and then obtains the final rectangular frame by means of maximum suppression or border regression. However, although this supervised learning method has a good classification effect, it is difficult to generalize and generalize, and it cannot be used in actual detection scenarios. The reason is that supervised learning needs to know the type and location information of the objects in the picture at the same time, that is, the example-level rectangular frame annotation is required [5]. In addition, this method currently relies on manual labeling. On the one hand, labeling consumes a lot of manpower and material resources, and on the other hand, it is easily subjectively affected. Therefore, weakly supervised object recognition using only image-level annotations has gradually become the research object of people [6]. The traditional weakly supervised detection and recognition is solved by multi-instance learning. In recent years, a group of scholars have found that the trained CNN for classification contains the location information of the object. However, when doing classification tasks, these location information are hidden.

The method of fully supervised learning includes three categories of algorithms. First, traditional methods are defined relative to deep learning. Moreover, it appeared in large numbers before the outbreak of deep learning algorithms, and most of them still characterized the target objects by manually designed features. Traditional commonly used features include SIFT features, HOG features (histogram of direction gradient) and Haar features [7]. The HOG feature comes from the histogram in the gradient direction of the image, which can well describe the contour and shape features of the object, and it is widely used in object detection combined with SVM support vector machineThe algorithm proposed in [8], Later, many scholars used HOG and SVM to solve problems related to pedestrian detection and object detection, and also achieved good results. Since then, some people have made many improvements to the HOG feature. Theliterature [9] combined the HOG feature with the LBP feature to greatly increase the discriminatory information, which not only can detect the overall contour of the pedestrian, but also has a good effect on the details. The SIFT algorithm extracts features by constructing a scale space. The direction of the SIFT feature vector is the main direction of the gradient of the neighborhood of the feature point, which makes SIFT invariant to scale and direction. The literature [10] combined SIFT features and SVM support vector machine to achieve multi-target detection. The SURF algorithm proposed in [11] overcomes the shortcomings of the slow SIFT algorithm, and the overall performance of the algorithm is better. In addition, the traditional features based on manual design have the disadvantages of single feature, complex calculation and poor adaptability.

With more and more image data, big data training deep learning models have received people’s attention. Deep learning has made outstanding contributions in the field of image recognition and detection, in which deep convolutional neural network is an important calculation model. The literature [12] proposed the backward propagation technology in CNN and designed the famous LeNet model for handwriting font recognition. In the ILSVRC competition, the CNN model constructed in the literature [13] has achieved good results, and the best performer in this competition is the AlexNet model. Since then, recognition models based on deep convolutional neural networks have emerged, including the ZFNet model, VGGNet model and Goog-leNet network model [14], ResNet model [15].

For object detection, it is not only necessary to classify the image, but also to locate the image. A common method is to select some possible target areas by sliding window and then classify them. This method obtains some possible object areas in a comprehensive manner, but the entire process takes a long time and requires a huge amount of calculations, so this method will no longer be used slowly. Later, a selection method based on segmentation was generated, such as SS algorithm [16] and EB algorithm [17]. These methods have good segmentation results and can be used as a candidate region selection method. The first object detection algorithm RCNN [18] combines this region segmentation algorithm with deep learning technology to achieve good classification and positioning effects. The region extraction algorithm it uses is a selective search algorithm. The literature [18] designed the SPPNet spatial pyramid pooling network, which overcomes the shortcomings of RCNN that cannot achieve multi-scale image input. Inspired by SPPNet, the literature [19] modified RCNN to obtain Fast-RCNN. It no longer calculates the convolution for each candidate area, but calculates the convolution on the entire image, and uses ROI for feature normalization, which not only does not need to ensure the uniform size of the input image, but also greatly reduces the number of convolutions, and improves the speed of training and testing. The literature [21] proposed the FasterRCNN model on this basis and integrated the region selection into the model to realize the end-to-end structure. The literature [23] talks about the construction of directed acyclic graph for video coding algorithms for motion estimation in parallel reconfigurable computing systems. The partitioning algorithm also plays a key role in optimizing the encoding of images.

The literature [24] dealt with the exploitation of IoT and BigData Analytics using the Hadoop ecosystem in real-time environments. The implementation of IoT-based Smart City is accomplished through the above-mentioned processes. The article [25] centers around IoT and its noteworthy work in sophisticating the human hones and endeavors. This paper moreover overseen the combination of diverse data from distinctive resources that are related with the web. The article [26] talks approximately the different issues within the vehicular communication field with the proposition of agreeable centralized and distributed spectrum detecting model. Due to the execution of the agreeable cognitive model, obstructions and different hidden issues are minimized. The article [27] discusses the problem, such as the tremendous amount of big data, and introduces the SmartBuddy idea of a smart and intelligent world using individual activities and human resources [28, 29].

3 Graph-based multi-core learning algorithm

The multi-core learning algorithm will be applied to the support vector machine (SVM), and the weight vector p = (p₁, ⋯ p_m) of m embedded cores under the SVM framework will be introduced. Specifically, we will first introduce the specific construction process of the graph and the corresponding optimization method based on semi-supervised multi-core learning.

Before solving the Laplacian matrix of graphs, we first discuss some methods of constructing graphs.

We assume that g =〈 v, ɛ 〉 is a graph derived on data space X. Among them [22]: $v = {x_{i}}_{i = 1}^{N}$ (1)

is the set of all vertices, and ɛ is the set of edges.

Generally, the component graph requires the following two steps: (1) the adjacency graph is constructed (adjacency graph); (2) the weight of the edge is calculated. Among them, there are two types of methods for constructing the adjacency matrix,

Figure ɛ - NN

For ɛ ∈ R⁺, if d (x_i, x_j) ⩾ ɛ, nodes x_i and x_j are connected. Among them, d (· , ·) is a distance measurement function, and a commonly used example is Euclidean distance, that is: $d (x_{i}, x_{j}) = {∥ x_{i} - x_{j} ∥}_{2}$ (2)

Figure k - NN

For ɛ ∈ N⁺, if node x_i is one of the nearest neighbors of node x_j, nodes x_i and x_j are connected. Moreover, the determination of neighbor nodes is also based on the distance metric d (· , ·).

After the graph construction is completed, the adjacency matrix W of the graph is obtained. The next step is to determine the weight of W’s middle edge. Generally, there are two weight selection options.

The simplest solution is to set the weight of all edges to 1, that is, if nodes x_i and x_j are connected, then: $W_{ij} = 1$ (3)

The weight is determined by the distance between the nodes. If nodes x_i and x_j are connected, we set: $W_{ij} = exp \frac{d (x_{i}, x_{j})}{t}, t \in R^{+}$ (4)

Obviously, the distance measurement function d (· , ·) plays a vital role in the process of constructing the graph. Commonly used distance measurement functions include Euclidean distance, cosine distance, tangent distance, and density-sensitive distance.

After obtaining W, the Laplacian matrix of the graph is calculated, that is: $L = D - W$ (5)

Among them, $\begin{matrix} D_{ii} = \sum_{j} W_{jj} \\ D_{ij} = 0 \end{matrix}$ (6)

In this section, we will introduce how to use some basic kernel function {k_k (x, ·) } and the geometric information carried by a large amount of unlabeled data to construct a new kernel function ${{\tilde{k}}_{k} (x, \cdot)}$ . In particular, we define the following formula: $G = \frac{ρ}{n^{2}} L$ (7)

Among them, ρ is a constant. $\begin{matrix} {\tilde{k}}_{k} (x, z) = \\ k_{k} (x, z) - {[k_{k}]}_{x} (I + \frac{ρ}{n^{2}} LK) \frac{ρ}{n^{2}} L {[k_{k}]}_{z} \end{matrix}$ (8)

If it is assumed that we use SVM as the classifier, the decision function to be searched can be written as $\sum_{k = 1}^{m} f_{k} (x) + b$ . Among them, f_x belongs to a reproducing kernel Hilbert space ${\tilde{H}}_{k}$ related to the basis kernel function ${\tilde{k}}_{k} (x, \cdot)$ .

The scenario where multi-core learning is extended to semi-supervised learning can be written as follows: $\begin{matrix} min_{{f_{k}}_{k = 1}^{m}, b, ξ ⩾ 0} {\begin{matrix} \frac{1}{2} ∥ f ∥ \end{matrix}}_{\tilde{H}} + \frac{C}{l} \sum_{j = 1}^{l} ξ_{j} \\ s . t . \begin{matrix} y_{i} \end{matrix} \sum_{k = 1}^{m} f_{k} (x_{j}) + {by}_{i} ⩾ 1 - ξ_{j}, \forall j \end{matrix}$ (9)

We first decompose f into a set of functions ${f_{k}}_{k = 1}^{m}$ on ${\tilde{H}}_{k}$ , and the decomposed weight vector is p = (p₁, ⋯ p_m). At this time, the problem is rewritten as: $min_{p \in P} J (p)$ (10)

Among them, $\begin{matrix} J (p) = min_{{f_{k}}_{k = 1}^{m}, b, ξ ⩾ 0} {\begin{matrix} \frac{1}{2} \sum_{k = 1}^{m} \frac{1}{p_{k}} ∥ f_{k} ∥ \end{matrix}}_{\tilde{H}} + \frac{C}{l} \sum_{j = 1}^{l} ξ_{j} \\ s . t . \begin{matrix} y_{i} \end{matrix} \sum_{k = 1}^{m} f_{k} (x_{j}) + {by}_{i} ⩾ 1 - ξ_{j}, \forall j \end{matrix}$ (11)

Among them, p represents the definition domain of the kernel function weights, and the commonly used p is defined as follows, $P = {p \in ℝ^{m} : p^{T} e = 1, 0 ⩽ p ⩽ 1}$ (12)

The constraint on p in the above formula is also commonly referred to as L₁ - MKL. However, the scope of our model for solving p is not limited to the L₁-norm, but hopes to generalize to other types of norms, such as the L₂-norm or any norm. For example, L₂ - MKL limits p to a sphere, and the details are as follows: $P = {p \in ℝ^{m} : p^{T} p ⩽ 1, 0 ⩽ p ⩽ 1}$ (13)

The following theorem is the dual problem of the problem in formula (11).

Theorem 1. The dual problem of J (p) is as follows: $\begin{matrix} min_{α \in ℝ^{l}} α^{T} e - \frac{1}{2} {(α \circ y)}^{T} (\sum_{k = 1}^{m} p_{k} {\tilde{K}}_{k}) (α \circ y) \\ s . t . \begin{matrix} α^{T} y, & 0 ⩽ α ⩽ C \end{matrix} \end{matrix}$ (14)

Among them, ${[{\tilde{K}}_{k}]}_{ij} = {\tilde{k}}_{k} (x_{i}, y_{i})$ (15)

It should be noted that the calculation of ${\tilde{K}}_{k}$ here incorporates the information of labeled and unlabeled data, so it belongs to the category of semi-supervised learning. When the kernel function is fixed, the problem in the above formula is a typical L₂ - SVM problem, and it is a convex problem, which can be regarded as a standard multi-core learning problem to solve. The overall optimization goals are as follows: $\begin{matrix} min_{p \in P} max_{α \in ℝ^{l}} α^{T} e - \frac{1}{2} {(α \circ y)}^{T} (\sum_{k = 1}^{m} p_{k} {\tilde{K}}_{k}) (α \circ y) \\ s . t . \begin{matrix} α^{T} y, & 0 ⩽ α ⩽ C \end{matrix} \end{matrix}$ (16)

The structure of graphs plays a vital role in graph-based semi-supervised learning. Moreover, the Laplacian matrix of a graph is completely dependent on the structure of the graph. The reason is that choosing a reasonable graph is an important factor related to learning performance. In this section, we will introduce a method of using the graph-related parameters introduced earlier to select multiple graphs simultaneously, so as to calculate the final Laplacian matrix based on the multiple graphs to improve the accuracy of the data description of the graph.

From the embedded kernel function formula, we found that there are two problems that affect the performance of optimization:

The embedded kernel function is k (· , ·);

The structure of the graph is used to calculate the Laplace matrix L of the graph.

Both the selection of kernel function parameters and the selection of graph parameters are very difficult model selection problems. Cross-validation of these two sets of parameters to select will produce a very high computational complexity. In particular, when the number of training samples is small, the calculation complexity is higher. In order to solve the above problems, in this section, we plan to promote the multi-core learning framework, and at the same time determine the width of the kernel function from some candidate kernel parameter sets and appropriate graph parameter sets.

We set a series of distance metric functions as D = (d₁, ⋯ d_r), a series of possible neighbors as K = (k₁, ⋯ k₂), and the width of the heat kernel as T = (t₁, ⋯ t_q). Meanwhile, a set of constructed Laplacian matrices, that is, L_i = D_i - W_i, i = 1, ⋯ , u. Among them, u = r × s × q is the number of all graphs.

In this way, for the i-th graph, i = 1, ⋯ , u, the embedded kernel function based on the j-th basis kernel function can be calculated according to the following formula (j = 1, ⋯ , v is the number of basis kernel functions under the supervised multi-core learning framework): $\begin{matrix} {\tilde{k}}_{ij} (x, z) = \\ k_{i} (x, z) - k_{k} {(I + \frac{ρ}{n^{2}} L_{j} K)}^{- 1} \frac{ρ}{n^{2}} L_{j} k_{z} \end{matrix}$ (17)

Under the semi-supervised multi-core learning framework, the number of basic kernel functions is m, m = u × v. Because core ${\tilde{K}}_{l_{i}}, l = 1, \dots, m$ also includes nuclear labeled nuclear unlabeled data, it can be used in the case of semi-supervised learning. The final optimization problem can be written as: $min_{p \in P} max_{α \in ℝ^{l}} α^{T} e - \frac{1}{2} {(α \circ y)}^{T} (\sum_{k = 1}^{uv} p_{k} {\tilde{K}}_{k}) (α \circ y)$ (18)

Since multiple basis kernel functions in a set of different graphs are embedded, an optimal kernel matrix is obtained. Therefore, we call this problem of multi-core embedding multi-graphs semi-supervised multi-core multi-graph learning.

For large-scale data sets, the inverse operation of matrix $I + \frac{ρ}{n^{2}} L_{i} K$ in formula (17) is still very complicated (the inverse time complexity of n-order square matrix is O (n³)). In order to reduce the computational complexity, we use approximate calculations instead of inversion operations. Specifically, we first sample and approximate the kernel matrix K, $K \approx K_{ab} K_{bb}^{- 1} K_{ba}$ (19)

Among them, $K_{ab} \in ℝ^{n \times k}$ is k columns randomly extracted from the matrix K, $K_{ba} \in ℝ^{k \times n} k ⪡ n$ is k rows randomly extracted from the matrix K, and $K_{bb} \in ℝ^{k \times k}$ is a square matrix composed of the common elements of K_ab and K_ba. After sampling, the kernel matrix can be written into the above decomposition form. After bringing the above results into formula (17), we can get the following results, $\begin{matrix} {\tilde{k}}_{ij} (x, z) \\ = k_{i} (x, z) - k_{k} {(I + \frac{ρ}{n^{2}} L_{j} K)}^{- 1} \frac{ρ}{n^{2}} L_{j} k_{z} = \\ k_{i} (x, z) - k_{k} {(I + \frac{ρ}{n^{2}} L_{j} K_{ab} K_{bb} K_{ba})}^{- 1} \frac{ρ}{n^{2}} L_{j} k_{z} \end{matrix}$ (20)

The inverse operation ${(I + \frac{ρ}{n^{2}} L_{j} K_{ab} K_{bb} K_{ba})}^{- 1}$ in the above equation can be solved by the Woodbury matrix equation $\begin{matrix} {(I + \frac{ρ}{n^{2}} L_{j} K_{ab} K_{bb} K_{ba})}^{- 1} = \\ I - L_{j} K_{ab} {(K_{bb}^{- 1} + K_{ba} L_{j} K_{ab})}^{- 1} K_{ba} \end{matrix}$ (21)

In this way, the original inversion of the n-order square matrix in the formula is transformed into the inversion of the k-order square matrix. Moreover, because of k ⪡ n, the operation efficiency is improved.

4 Optimization process

In this section, we will introduce the optimization process for semi-supervised multi-core learning problems.

In recent years, many newly emerging excellent optimization algorithms have been applied to the optimization of large-scale multi-core learning, such as Semi-infinite Linear Programming (SILP), Subgradient Descent (SD) and level Level method, etc. The above methods all adopt the idea of alternating optimization when solving the multi-core learning optimization problem, that is, the optimization process of the multi-core learning problem is divided into two stages. The first stage is the internal loop update, which is mainly to iteratively optimize a standard SVM problem. The second stage is the outer loop, which mainly updates the weight vector p of the kernel function. There are many ways to optimize SVM, and it is also very mature, so this article focuses on how to solve the most weight vector p.

We separately analyze and compare the three previously introduced methods for large-scale multi-core learning: semi-infinite linear programming, sub-gradient descent and level set methods. Then, we choose one of them as a specific method to optimize the kernel weight vector p. Semi-infinite linear programming constructs a tangent plane model for the objective function, and finally updates the weight p of the kernel function by solving a corresponding linear programming problem. The semi-infinite linear programming method can be applied to large-scale multi-core learning scenarios, but its disadvantage is that the convergence speed is slow. This paper solves the training problem of multi-core learning through a simple sub-gradient descent. However, because the sub-gradient descent technique is memoryless, it cannot use the gradient of the previous iteration in the calculation process. However, the information from the previous step is very valuable for speeding up the solution.

The tangent plane model is defined as follows: $g^{i} (p) = max_{1 ⩽ j ⩽ i} f (p^{j}, α^{j}) + (p - p^{j}) \nabla_{p} f (p^{j}, α^{j})$ (22)

Among them, ∇_pf (p^j, α^j) represents the subgradient of f (· , ·) at point (pⁱ, αⁱ) with respect to p. The k-th element (1 ⩽ k ⩽ m) can be calculated according to the following formula, ${[\nabla_{p} f (p^{i}, α^{i})]}_{k} = - \frac{1}{2} {(α^{i} \circ y)}^{T} K_{k} (α^{i} \circ y)$ (23)

Next, we minimize the tangent plane model to obtain an approximate optimal solution, $\begin{matrix} min_{v, p \in P} V \\ s . t . v ⩾ g^{i} (p) \end{matrix}$ (24)

Because f (p, k) is a convex problem for p, so v actually gives the optimal lower bound of f^*.

Unfortunately, the process of solving the problem in the above equation is unstable, and the jitter during iteration is very powerful. In order to overcome the above shortcomings, we need to add a regular term to the original problem, that is, project the problem onto the level set. Before adding regular items, we need to introduce the concept of level set. The definition of level set is as follows, $L^{i} = {p : g^{i} (p) ⩽ θ {\bar{f}}^{i} + (1 - θ) {\underline{f}}^{i}}$ (25)

Among them, ∈ (0, 1) is the equilibrium parameter, ${\bar{f}}^{i}$ and ${\underline{f}}^{i}$ are the upper and lower bounds of the optimal solution f^*, respectively. The definition is as follows, ${\bar{f}}^{i} = min_{1 ⩽ j ⩽ i} f (p^{j}, α^{j})$ (26) ${\underline{f}}^{i} = v$ (27)

Next, we project the tangent plane to the level set to update the kernel function coefficient p: $p^{i + 1} = \underset{p \in P}{arg min} {{∥ p - p^{i} ∥}_{2}^{2} : p \in L^{i}}$ (28)

5 Experimental study

We use SVM and TSVM (conducted SVM) and multi-core based support vector machine (SVM-MKL) as comparison algorithms. Among them, TSVM is implemented based on CCCP. Moreover, we tested the performance of the algorithm on the four real datasets USPS, MNIST, Breast and dna datasets.

In order to facilitate visualization, we perform $Nystr \ddot{om}$ sampling on the sum matrix of the MNIST dataset. Figure 1 shows the experimental results. Figure 1a is the original kernel matrix, and Figure 1b is the matrix obtained by $Nystr \ddot{om}$ sampling.

Fig. 1

Schematic diagram of sampling (MNIST data set).

On the USPS dataset, the semi-supervised multi-core learning algorithm proposed in this paper is tested. USPS is a commonly used data set in the field of semi-supervised learning. It is a sample set of digital handwriting scanned from envelopes of the US Postal Service. The original data is stored in the form of bitmaps, containing samples of handwritten digits of different sizes and orientations. However, it is eventually normalized to a grayscale picture (16 × 16) of uniform size. The entire data set contains 7291 training samples, 2007 test samples, a total of 10 categories (numbers 0-9). The details are shown in Table 1.

Table 1

USPS data statistics

	Training	Test
0	1194	359
1	1005	264
2	731	198
3	658	166
4	652	200
5	556	160
6	664	170
7	645	147
8	542	166
9	644	177
Total	7291	2007

In the specific experiment, we designed four pairs of second-class classification comparison experiments: 1 vs 7, 3 vs 8, 4 vs 7 and 2 vs3 (this pairing is to increase the difficulty of learning, but in actual handwritten numbers these four pairs are easily confused and More difficult to distinguish) to test the performance of semi-supervised multi-core learning. For each classification pair, 700 samples are randomly selected to form all the data of a single group, that is, 1400 samples. Among them, 5% of the sample labels are used as labeled samples, and the remaining 95% are used as unlabeled samples.

In order to verify the algorithm of multi-graph and multi-core, the experiment is divided into the following three steps:

Various Laplace diagrams are constructed;

The basis kernel function K is calculated;

The basis kernel function K is embedded in the Laplace graph to obtain the $\tilde{K}$ matrix.

The first step is to construct multiple Laplace diagrams. We construct two Laplace diagrams, ɛ - NN and k - NN (as shown in Fig. 2). Specifically, we first use k - NN to calculate the adjacency matrix of the graph. When calculating the direct distance of nodes, we choose Euclidean distance as the distance metric function. Through the above settings, we can get the Laplace matrix {L_j} of a series of graphs.

Fig. 2

Two kinds of graphs obtained on the Breast dataset.

After obtaining a series of graphs, the next step is to calculate the basis kernel function K_i based on the sample data. In this paper, we choose two commonly used kernels to construct the basis kernel function. The first one is a Gaussian kernel, the candidate set of parameter kernel width is {2^-1, 2⁰, 2¹, 2², 2³}, and the other kernel function is a polynomial kernel. Meanwhile, the candidate set of polynomial degree values is {1, 2, 3}.

In the final step, after the basis kernel function is obtained, it is embedded in the series of graphs calculated in the previous step. Finally, a new kernel function $\tilde{K}$ is calculated.

Because the linear kernel has the best performance on the USPS dataset, the linear kernel is selected for both SVM and TSVM. For both comparison algorithms and our algorithm, the parameters are set to 100. Furthermore, each set of experiments was repeated 50 times. Finally, the accuracy and standard deviation of the accuracy of the label prediction are counted, and the results are shown in Fig. 3. It can be seen that in the four groups of experiments, the multi-graph multi-core algorithm proposed in this paper has higher prediction accuracy than SVM, TSVM and SVM-MKL. Among them, the contrast on 2vs 3 is the most obvious. This shows that the multi-graph multi-core learning method can not only find the best kernel function, but also use the geometric information distributed in the data to further improve the prediction performance. It should be noted that the number of our labeled training samples is relatively small (only 5%), which shows that the multi-graph multi-core learning proposed in this paper can work with very limited labeled samples.

Fig. 3

Statistics of prediction accuracy on the USPS dataset.

After that, we test the multi-graph and multi-core algorithm on the MNIST dataset. MNIST is a single digitized picture database of handwritten numeric characters, and each picture is 28 × 28 pixel in size (among them, the digital image area is normalized to the size of 20 × 20), and it contains 70,000 samples.

In the experiment, we still designed four pairs of second-class classification comparison experiments on the MNIST dataset: 1 vs 7, 3 vs 8, 4 vs 7 and 2 vs 3. In each category, 5% of the samples were used as labeled samples, and the remaining 95% were used as unlabeled samples. The construction method of kernel function and graph is consistent with the setting on the USPS data set. Moreover, each group of classification experiments was repeated 50 times. Finally, the accuracy and standard deviation of label prediction are counted, and the results are shown in Fig. 4. We found that the semi-supervised methods (LapSVM-MKL and TSVM) achieved a significant advantage in prediction accuracy over supervised learning when the labeled sample ratio was small. However, the prediction accuracy of the LapSVM-MKL method based on multi-graph and multi-core has achieved the highest accuracy in three of the four experiments (1 vs 7, 4 vs 7 and 2 vs 3).

Fig. 4

Statistics of prediction accuracy on MNIST handwritten data set.

We test the multi-graph multi-core algorithm on the Breast dataset. Breast contains a total of 683 samples, the sample dimension is 10, there are two types.

In the experiment, 100 samples are randomly selected as the training set, and the construction methods of the kernel function and graph were consistent with the settings on the USPS data set. In addition, each set of classification experiments was repeated 50 times, and the accuracy and standard deviation of label prediction are counted, and the results are shown in Fig. 5. The results show that the multi-core learning methods (LapSVM-MKL and SVM-MKL) have a significant advantage in prediction accuracy over supervised learning when the labeled sample ratio is small.

Fig. 5

Statistics of prediction accuracy on the Breast dataset.

We test the multi-graph multi-core algorithm on the DNA dataset. The dna dataset contains a total of 2,000 samples, the sample dimension is 180, and there are three types.

In the experiment, 1200 samples were randomly selected as the training set, and the construction methods of the kernel function and graph were consistent with the settings on the USPS data set. Similarly, each group of classification experiments was repeated 50 times. Finally, the accuracy and standard deviation of the label prediction are counted. The results are shown in Fig. 6. We have found that LapSVM-MKL has achieved a significant prediction advantage over other algorithms.

Fig. 6

Statistics of prediction accuracy on DNA dataset.

On the basis of the above analysis, this paper analyzes the performance of the model, and combines English online teaching to identify the learning behavior of students. In this paper, a total of 30 students’ English online learning status is studied, and teachers’ teaching effectiveness is evaluated. The results of student action recognition are shown in Table 2 and Fig. 7.

Table 2

Statistical table of English online teaching students’ behavior recognition results

NO.	Action recognition rate (%)	NO.	Action recognition rate (%)
1	91.07	15	92.44
2	97.92	16	94.84
3	91.63	17	91.94
4	94.73	18	94.31
5	94.75	19	91.88
6	89.85	20	93.15
7	95.30	21	89.75
8	89.17	22	95.32
9	96.91	23	97.11
10	95.14	24	92.58
11	92.08	25	90.82
12	96.45	26	92.42
13	92.20	27	92.19
14	90.80	28	89.58
15	92.44	29	95.15
16	94.84	30	96.13

Fig. 7

Statistical table of the behavior recognition results of English online teaching students.

From the above analysis, it can be seen that the model constructed in this paper has a good effect on the recognition of students’ action and behavior. After that, this article evaluates the teaching behavior of teaching. The evaluation is mainly through two methods: evaluation and systematic evaluation. It is believed that the evaluation uses expert evaluation methods, so the artificial evaluation method can be used as a benchmark to analyze the accuracy of the system evaluation. This paper evaluates the teaching situation of 20 teachers, and the evaluation results are shown in Table 3 and Fig. 8.

Table 3

Evaluation table of teachers’ teaching situation

1	79.6	76.1
2	71.4	68.4
3	90.6	91.9
4	67.2	70.3
5	86.7	83.3
6	70.1	69.3
7	74.4	75.2
8	88.3	85.8
9	89.4	86.8
10	83.9	85.8
11	69.4	70.5
12	89.3	85.3
13	84.2	84.8
14	85.0	84.1
15	92.2	95.2
16	72.6	71.5
17	81.2	83.9
18	73.0	71.4
19	74.5	76.4
20	89.9	94.0
1	79.6	76.1

Fig. 8

Evaluation diagram of teacher’s teaching situation.

As shown in Fig. 8, the system constructed in this paper is similar to the results of manual evaluation with an error not exceeding 5%. Since the manual scoring results in this paper are relatively reliable, the results of the system scoring are also relatively reliable. Therefore, it can be considered that the system constructed in this paper has a good effect in the teaching evaluation test, and it can be applied to practical teaching.

6 Conclusion

In order to build an English online teaching evaluation model, this paper takes graph-based semi-supervised learning as the research background and focuses on solving problems such as semi-supervised multi-label learning and semi-supervised multi-class learning. These problems are very common in the field of machine learning, and there has been a lot of related work. Different from the traditional methods of studying these two problems, this paper combines matrix completion and generative models to reinterpret these two problems from a new perspective and tests the effectiveness on multiple simulations and real data sets.

The topic of this article is semi-supervised learning based on graphs, where the idea of graphs is mainly embodied by modeling the manifold hypothesis of samples. The manifold hypothesis is an inescapable idea in the field of traditional graph-based semi-supervised learning. Moreover, it assumes that the samples are located in a manifold structure, and the labels of two samples that are close to each other in the manifold space should also be the same. The traditional graph-based semi-supervised learning is to directly restrict the parameters when learning the label prediction function. In addition, this paper evaluates the online English teaching effect of the model through design experiments. The research results show that the model constructed in this paper has good performance.

References

Linhares

R.N.

, Alcântara

Caio Mário Guimarães

, Gonçalves

Everton Ávila

, et al., Teaching evaluation by teachers from Brazil and Portugal: A comparative analysis[J], American Journal of Educational Research 5(5) (2017), 546–551.

Huang

Ning

, Analysis and design of university teaching evaluation system based on JSP Platform[J], International Journal of Education & Management Engineering 7(3) (2017), 43–50.

Murcia

J.A.M.

, Torregrosa

Y.S.

and Belando

, Questionnaire evaluating teaching competencies in the university environment. Evaluation of teaching competencies in the university[J], Naer Journal of New Approaches in Educational Research 4(1) (2015), 54–61.

Liu

and Chen

, Research on fuzzy comprehensive evaluation in practice teaching assessment of computer majors[J], International Journal of Modern Education & Computer Science 7(11) (2015), 12–19.

Zhou

, Li

and Sun

, Teaching performance evaluation by means of a hierarchical multifactorial evaluation model based on type-2 fuzzy sets[J], Applied Intelligence 46(1) (2016), 1–11.

Porozovs

, Liepniece

and Voita

, Evaluation of the teaching methods used in secondary school biology lessons[J], Nephron Clinical Practice 7(1) (2015), 60–66.

Oliveros

María Amparo

, García

Alejandra

and Valdez

Benjamín

, Evaluation of a teaching sequence regarding science, technology and society values in higher education[J], Creative Education 6(16) (2015), 1768–1775.

Cerón

Manuel Sánchez

and del Sagrario Corte Cruz

Francisca María

, The evaluation of teaching: some consequences for Latin America[J], Revista Mexicana De Investigacion Educativa 20(67) (2015), 1233–1253.

, Li

and Ge

, Application of data mining in the colleges’ in-class teaching quality evaluation system[J], Journal of Computers 10(3) (2015), 166–175.

10.

Angell

and Tewell

, Teaching and un-teaching source evaluation: questioning authority in information literacy instruction[J], Communications in Information Literacy 11(1) (2017), 95–121.

11.

Brkovic

and Chiles

, ‘Spector –the sustainability inspector’: Participatory teaching, learning and evaluation game for architects, architecture students and pupils[J], Facta Universitatis 14(1) (2016), 1–20.

12.

Reisenwitz

Timothy H.

, Student evaluation of teaching: an investigation of nonresponse bias in an online context.[J], Journal of Marketing Education 38(4) (2015), 139–144.

13.

Jiang

and Wang

, Evaluation of teaching quality of public physical education in colleges based on the fuzzy evaluation theory[J], Journal of Computational and Theoretical Nanoence 13(12) (2016), 9848–9851.

14.

Royal

, A guide for making valid interpretations of student evaluation of teaching (SET) results[J], Journal of Veterinary Medical Education 44(2) (2016), 1–7.

15.

Eckler

, Greisberger

, Höhne

Franziska

, et al., Blended learning versus traditional teaching-learning-setting: Evaluation of cognitive and affective learning outcomes for the inter-professional field of occupational medicine and prevention / Blended Learning versus traditionelles Lehr-Lernsetting: Evaluierung von kognitiven und affektiven Lernergebnissen für das interprofessionelle Arbeitsfeld Arbeitsmedizin und Prävention[J], Nephron Clinical Practice 4(2) (2017), 109–121.

16.

Garofalo

, Mota-Moya

, Munday

, et al., Total extraperitoneal hernia repair: Residency teaching program and outcome evaluation[J], World Journal of Surgery 41(1) (2017), 1–6.

17.

Gong

and Liu

, Consideration of evaluation of teaching at colleges[J], Open Journal of Social Sciences 04(7) (2016), 82–84.

18.

Zhao

Hongzhi.

, College physics teaching model design and evaluation research of students’ seriousness[J], Open Cybernetics & Systemics Journal 9(1) (2015), 2017–2020.

19.

Tran

N.D.

, Reconceptualisation of approaches to teaching evaluation in higher education[J], Issues in Educational Research 25(1) (2015), 50–61.

20.

Ancillao

, Savastano

, Galli

, et al., Three dimensional motion capture applied to violin playing: A study on feasibility and characterization of the motor strategy[J], Computer Methods & Programs in Biomedicine 149 (2017), 19–27.

21.

Komisar

, Novak

A.C.

and Haycock

, A novel method for synchronizing motion capture with other data sources for millisecond-level precision[J], Gait & Posture 51 (2016), 125–131.

22.

Sá

Fátima

, Marques

António

, Rocha

N B F

, et al., Kinematic parameters of throwing performance in patients with schizophrenia using a markerless motion capture system[J], Somatosensory Research 32(2) (2015), 77–86.

23.

Kyurkchiev

, Kyurkchiev

, Iliev

and Rahnev

, Some nonstandard differential models with applications to the population dynamics and computer viruses propagation, Dynamic Systems and Applications 28 (2019), 757–788.

24.

Ala

Volkan

and Mamedov

Khanlar R.

, On a discontinuous sturm-liouville problem with eigenvalue parameter in the boundary conditions, Dynamic Systems and Applications 29(1) (2020), 182–191.

25.

Paul

, Jiang

Y.C.

, Wang

J.F.

and Yang

J.F.

, Parallel reconfigurable computing-based mapping algorithm for motion estimation in advanced video coding, ACM Transactions on Embedded Computing Systems (TECS) 11(S2) (2012), 1–18.

26.

Rathore

M.M.

, Paul

, Hong

W.H.

, Seo

H.C.

, Awan

and Saeed

, Exploiting IoT and big data analytics: Defining smart digital city using real-time urban data, Sustainable Cities and Society 40 (2018), 600–610.

27.

Paul

, Internet of things: A primer’, R Jeyaraj Human Behavior and Emerging Technologies 1(1) (2019), 37–47.

28.

Paul

, Daniel

, Ahmad

and Rho

, Cooperative cognitive intelligence for internet of vehicles, IEEE Systems Journal 11(3) (2017), 1249–1258.

29.

Paul

Anand

, Ahmad

Awais

, Mazhar Rathore

and Jabbar

Sohail

, Smartbuddy: defining human behaviors using big data analytics in social internet of things, IEEE Wireless Communications 23(5) (2016), 68–74.