Abstract
At present, English teaching does not play the role of a smart classroom, and it is difficult to grasp the student status and characteristics in real time in actual teaching. Based on this, starting from the video image and static image and the actual situation of English classroom teaching, this study, based on the convolutional neural network and random forest algorithm, performs static image human behavior recognition under different image representation conditions, and studies the influence of background information of image and spatial distribution information of image features on recognition accuracy. Then, based on the similarity between different behavior classes, a static image human body behavior recognition method based on improved random forest is proposed. In addition, through theoretical research, an algorithm model that can identify the characteristics of English classrooms is constructed, and the static and dynamic images of English teaching are taken as an example to conduct experimental analysis. The research shows that the proposed method has certain effects and can provide theoretical reference for subsequent related research.
Introduction
With the rapid development of the economy and the continuous deepening of education reform, the reform of English teaching in secondary vocational schools has ushered in new opportunities and faces new challenges. The “Syllabus for English Teaching in Colleges and Universities” (Revised) states that the task of English teaching in colleges and universities is to “make students master certain basic knowledge and basic skills in English and develop their English application skills in daily life and professional situations.” The teaching goal is to “help students to further learn the basic knowledge of English, develop language skills such as listening, speaking, reading and writing, and preliminarily form the application ability of workplace English; stimulate and cultivate students’ interest in learning English, improve students’ self-confidence in learning, help students master learning strategies, develop good study habits, and improve self-learning ability”. At the same time, the development of students’ language application ability is put on the agenda. Traditional teaching-centered and knowledge-based teaching ideas and concepts cannot fully exert the initiative of students’ learning, which has become one of the factors that restrict the quality of teaching and the improvement of personnel training level [1].
Under the conditions of extensive application of information technology and rapid development, mobile terminal devices such as laptops, tablets, and smart phones have entered people’s lives. Following traditional media such as radio, television, newspapers and online new media, mobile new media such as Weibo, WeChat, Minor, Microfilm, and Micro-Magazine have appeared and have a wide audience [2]. The post-90 s student’s mobile terminal equipment is fully configured and is familiar with new media platforms such as Weibo and WeChat, and the informatization quality is high. The English Syllabus for Secondary Vocational Schools suggests: “Schools should equip English teaching with the necessary audio-visual equipment and materials as well as computers, Internet, etc. Teachers should use and develop teaching resources, enrich teaching content, teaching methods and means, and use existing radio and television, English newspapers, libraries, audio-visual rooms or electronic reading rooms to create learning conditions for students and broaden the channels for students to learn and use English.” Therefore, the English teaching mode combined with modern information technology in college English teaching has attracted attention. Information technology has built a dynamic teaching environment that combines computer and network. It greatly expands the teaching capacity, enriches the teaching resources, has the characteristics of diversity, interactivity and integration of information carriers, and makes the teaching methods more diverse [3]. Informatization classrooms can cultivate students’ interest in learning, develop students’ horizons, and enhance students’ awareness of lifelong learning [4]. Information technology is a change in the teaching methods of teachers and the way students learn. It is widely used in teaching. There have been many successful cases of applying mobile media to teaching at home and abroad. How to use information technology and new media in a conspicuous manner, combined with the respective advantages of online and offline teaching, expand and extend English teaching, optimize classroom teaching, and become one of the important topics of current teaching research.
Related work
Mahmood Z introduced the random forest in detail from both theoretical and experimental aspects. This is the first time that a random forest algorithm has been proposed. The algorithm is a combination of Breiman’s Bagging algorithm and TinKamHo’s RandomSubspace method. Subsequent improvements to the random forest algorithm are mostly in the form of a combination. The method is to integrate different algorithms into the random forest algorithm, so that the performance is optimized [5]. Nikitin M Y [6] et al. applied the Huffman transform to the random forest voting mechanism, and proposed the Hough forest algorithm, which is applied to various fields. The Huffman transform is for the final voting process of the random forest, instead of the original method for the greatest number of shots. Bashbaghi S [7] and others introduced the concept of a survival tree, which is based on the construction process of random forests. The algorithm uses the KM estimation estimation method when calculating the survival function. Gu J et al. [8] also theoretically analyzed the feasibility of the algorithm and found that the method is significantly better than other survival analysis methods. Bobak A K [9] and others used the Adaboost algorithm to improve the random forest algorithm. The advantage of the Adaboost algorithm in the construction process is mainly reflected in the adaptive sampling and adaptive weight update. According to the advantages of the Adaboost algorithm, Jingyi Ma introduced the algorithm into the random forest algorithm. Through the analysis of the experimental results, it was found that the random forest algorithm was greatly improved in performance. Later, some models were combined with the model, and the laboratory results performed very well.
Due to the low performance of the random forest algorithm in dealing with unbalanced data and continuity variables, the academic community has proposed the concept of data cleaning. After cleaning and standardizing the data, the performance of the random forest algorithm has been improved. Lu J [10] and others proposed the optimization of random forest algorithm by NCL technology. In order to deal with the disadvantages of random forest in dealing with unbalanced data, it is proposed to use NCL (Neighborhood Cleaning Rule) technology to apply the processed data to the random forest algorithm. The results obtained show that the classification performance has been improved. Min R et al. [11] proposed an application cost-sensitive learning algorithm for the problem of unbalanced data sets. The data set is subjected to alternative random sampling to form a plurality of bags, and then each of the bags is randomly selected to partially cut the paper to form a single decision tree, and finally the idea of the integrated classifier is used to obtain the result. The research results show that the classification performance has been improved. Through the detailed analysis of the random forest construction process and the relationship between the construction process and the correlation coefficient between the single trees, Wang H [12] proposed a new idea for the random forest feature selection algorithm. The main idea of the algorithm is to reduce the generalization error of random forests by enhancing the interaction between single decision trees. Peng C [13] and others found that in the process of constructing a decision tree, there is only one choice of the attributes of node classification when constructing a single decision tree, which makes the efficiency of random forest very low. Therefore, they proposed a random forest algorithm that splits multiple attributes in the decision process of the decision tree. They used a random selection attribute that first clusters the data set rows to generate a multivariate weighted integration. The experimental results show that the performance of the random forest algorithm is improved. After a detailed analysis of the above three aspects of optimization, we found that there are still some problems. For example, the optimization of random forest algorithms is mainly focused on the idea of combining excellent classification algorithms. Another problem is that the optimization is mainly directed at the preprocessing of the data set. The third problem is that the optimization is mainly for the improvement of random forests [14] and the optimization of random forest algorithms.
Random forest algorithm
Static image human behavior recognition is susceptible to similarities between different behavioral classes. As shown in Fig. 1, although the two images belong to different behavioral classes, the poses of the human body in the image are very similar, which leads to the possibility of dividing the two images into the same behavioral class during the recognition process. However, we can also see from Fig. 1 that the image content contained in the rectangular area of the two images is completely different. The image content contained in the rectangular area can be used as a classification basis to distinguish the two images well. The rectangular area in the image is the most distinct area between the current image and other behavioral images, which we call the “key area” [15].

Example image of “Biking” and “riding” behavior classes.
One problem is that the number of eligible local areas of the image is too large, and even a medium-sized image can produce tens of thousands or more of local areas. To solve this problem, we use random forests to search for key areas of the image and to classify them. The training of decision trees in random forests only needs to randomly consider a part of the training data, so it is suitable for the processing of big data, and compared with other classifiers, the random forest classifier has higher classification accuracy and is easier to train and test. Unlike traditional random forest classifiers, in order to reduce the generalization error of random forests, the random forest used by our algorithm uses strong classifiers in the internal nodes of the decision tree [16].
The general flow chart of the algorithm used in this study is shown in Fig. 2. The specific process is: First, intensive FIFT feature extraction is performed on the training image set. Then, the visual words are formed by clustering to connect the visual words to form a word bag model. Afterwards, the image is represented as a joint distribution vector of visual words by sparse coding and is divided into pyramid vectors by using an image space pyramid. Finally, random forests are generated through training, and the resulting random forest is used for final classification [17].

Algorithm flow chart.
Pyramid model of sparse coding space
The traditional word bag model uses a quantitative coding approach to the representation of the image. The local feature extracted from the image is represented as X = [x1, ⋯ , x
M
]
T
∈ RM×D. In the formula, D is the feature dimension, and M is the number of features extracted. The word bag formed by clustering can be expressed as V = [v1, ⋯ , v
k
]
T
∈ RK×D. In the formula, k represents the number of visual words in the word bag. The problem of quantitative coding for image representation can be understood as the optimal solution problem of the following formula [18]:
Constraints Card (u
m
) = 1 in vector quantization coding define local features of an image that can only be represented using a single visual word. Such restrictions are too restrictive and can result in large information loss. In order to reduce this constraint, sparse coding introduces a sparse regular term in equation (3), making vector coding a solution to sparse coding:
In the formula, λ|u m | is to use the L - 1 norm to approximate L - 0 norm to constrain u m sparsity. λ is used to control the degree of sparsity of the code. The L - 1-norm of the coding vector u m ensures that u m contains a certain number of non-zero values, so that the local features of the image can be represented as joint distribution vectors of multiple visual words, thereby reducing information loss and having stronger expressiveness.
The basic idea of the spatial pyramid model is to use different levels of mesh to divide the image. The higher the level, the finer the image is, and each level is equivalent to one layer in the pyramid. If the image is divided on level l = 0, 1, ⋯ , L, the number of image sub-regions that can be divided for each level is 2
l
× 2
l
. A distribution histogram of visual words is counted on a sub-region of each image, and a distribution histogram generated by all sub-regions is weighted and connected, and a spatial pyramid representation of the image is obtained. In the traditional spatial pyramid model, the visual word distribution histogram representing the image sub-region is obtained by merging the distribution vectors in the sub-regions using the mean function, as shown in the following equation:
In the process of using visual words to represent image regions, different merging functions are used to generate different statistical methods. Different from the traditional space pyramid using the mean function, the pyramid model of the sparse coding space uses the maximum function to count the distribution of visual words in the local area of the image. Through the maximum function, the distribution vectors corresponding to the local region features of the image are combined to generate an IV-dimensional vector z. Among them, the j-th element is:
The pyramid model of sparse coding space is to add sparse coding and maximum statistics to the image pyramid structure. Figure 3 shows the schematic diagram of structure of the model. The specific implementation process is as follows: First, the local feature generation feature set X of the local region of the image is extracted. Then, the feature set x is represented as a distribution vector U of the visual word using a sparse coding method, and the generated distribution vector U is merged to form a distribution vector z by using a maximum value function. After that, the maximum value function is used to merge the distribution vectors in each level of the pyramid, and the combination of the low-level pyramid distribution vectors forms the distribution vector of the advanced pyramid. Finally, the sparse coded representation of the image space pyramid is obtained by concatenating the distribution vectors in the pyramids at each level.

Pyramid model of sparse coding space.
According to the sparse coding results proposed in the article, the theory is often localized, and locality is often more important than sparsity. Therefore, Wang proposed locality constrained linear coding (LLC). Locality constrained linear coding is an improvement to ordinary sparse coding. By changing the sparse constraint in Equation (3) to a local constraint, it can be expressed as the optimal solution problem of the following formula:
In the formula,
dist (x i , B) = [dist (x i , b1) , ⋯ , dist (x i , b M )] T . The dist function is used to calculate the Euclidean distance between two vectors, and σ is used to adjust the decay rate of the weight. It can be seen that the larger the coefficient of the target vector on the visual words that differ greatly from each other, the greater the penalty weight obtained, which ensures that the finally obtained target vector is more local.
In contrast to the traditional random forest algorithm using the overall information of the image as the basis for node splitting, in order to find the key regions of the image, the method used in this paper, in the process of generating random forests, extracts a certain local area information of the image from the training sample image in the internal node as the basis for sample splitting. Different from the traditional random forest algorithm using weak classifiers inside the nodes, the random forest used in this paper uses strong classifiers in the internal nodes of the decision tree, which reduces the generalization error of random forests and improves the classification accuracy of random forests.
Generation of random forests
The steps for generating the decision tree that makes up the random forest are: A set of training sample data D is randomly selected in the training sample image, and the set of data is placed in the root node of the decision tree. It is judged whether the data in the current node conforms to the split condition. If it is, the program goes to the next step, otherwise it goes to step5. The local area position of the image is randomly selected, and the best local area is selected as the basis for division. The training samples in the current node are divided into two parts by two-dimensional splitting, and the child nodes D1 and D2 are generated. The two child nodes generated are respectively subjected to the operation of above step 2. If the splitting condition does not match, the current node is the leaf node of the decision tree.
Whether the current node meets the splitting condition, that is, whether it is the judgment of the leaf node, this article gives two conditions: One is the depth of the decision tree. If the depth of the tree where the current node is located is exactly equal to the maximum depth of the set decision tree, the node is regarded as the leaf node does not need to be split again. Another condition is to set a minimum value for the number of training examples and the number of categories in the node. If the number of training examples or the number of categories in the current node is less than the set minimum, the node is considered to be a leaf node.
In this paper, the two-dimensional splitting of the data in the internal nodes that meet the splitting condition is achieved by generating an SVM classifier at the node. First, the same type of training data in the node is randomly assigned a two-dimensional tag (the same type of training data is either assigned 0 tag or all l tags), so that the training data in the node is artificially divided into two categories. Subsequently, an image local region of random position and random length and width is selected, and the feature representation of the training image in the local region in the node is extracted. Next, we can train a linear SVM classifier based on the feature representation of the extracted local regions of the image and the two-dimensional labels randomly assigned to the image. Through this SVM classifier, we divide the training image in the node into two parts using the following formula:
Extracting different image local regions correspondingly can generate different SVM classifiers. According to the generated SVM, the training samples are split, and the information gain is calculated, and the SVM classifier and the image local region when the information gain is maximum are selected as the optimal classifier and the image local region of the current node and are saved in the current node.
The above is the generation process of the decision tree. The random forest classifier is obtained by combining the generated multiple decision trees.
The generated random forest is used to classify the test images, as shown in Fig. 4. The specific process of classification is: the test image is placed in the decision tree of the random forest, and the feature representation information of the image at the optimal local area of the current node is obtained in the internal node of the decision tree into which the test image falls. The resulting feature representation information is then classified using the SVM classifier of the current node. According to the classification result, it is judged whether the test image next falls to the left child node or the right child node of the current node. Through the path selection process described above, the test image begins at the root node of the decision tree and ends when the leaf node is encountered. Stored in the leaf nodes are the distribution of different types of training samples that fall into the current leaf node during the decision tree training process. The classification result of the test image can be obtained by comprehensively analyzing the distribution of the training samples in each leaf node into which the test image falls.

Schematic diagram of the classification of random forests.
The distribution of different classes of different samples in the leaf nodes can be expressed by the posterior probability. The posterior probability of a sample class is the ratio of the number of samples belonging to this class to the current leaf nodes. The posterior probability of the leaf node of the decision tree t and the middle sample class c can be expressed as pt,l (c). Then, the class to which the test image belongs can be obtained by:
It can be seen from the process of generating random forests that it is only necessary to randomly select a part of the training data for the decision tree training characteristics to make the random forest suitable for the processing of large data sets. This is also a very important reason why this paper chooses random forest as a classifier. However, the random forest used in this paper is still different from the random forest used in traditional image classification.
As shown in Fig. 6, in the training process of the decision tree, the traditional random forest extracts the information of the entire image as the training object for the two-dimensional splitting of the training data in the node. However, the random forest used in this paper only extracts the information of a local area in the image as the basis for the splitting of the training image falling into the node. The length and width of this partial area are arbitrary, and the position in the image is also arbitrary. Moreover, it can be seen from the figure that as the depth of the decision tree increases, the local area of the selected training image in the internal node becomes closer to the key area of the image. The reason is that in the process of decision tree training, as the depth of the decision tree increases, the training sample set undergoes layer splitting, and the possibility that the training samples belong to the same class in the node is increasing. Therefore, the local area selected in the node is also closer to the key area of the sample class with the highest proportion of samples in the current node. Therefore, when using our trained decision tree to classify test images, the test images are top to bottom. Compared with the information extracted using the entire image to select the path of the current node, the image information obtained from the best local region is more discriminative and contains less noise information. Moreover, it is beneficial to the correct choice of the path, which is beneficial to the test image finally falling into the correct decision tree leaf node, thereby improving the final classification accuracy of the test image.

Traditional random forest decision tree.

Random forest decision tree used in this paper.
It can also be seen from Fig. 6 that different from the traditional random forest in the internal nodes of the decision tree using weak classifiers to split the training samples, the random forest used in this paper uses the strong classifier SVM to split the training samples in the internal nodes of the decision tree. This method uses a strong classifier at the node. Compared with the traditional method of using randomly generated feature weights and weak classifiers at the nodes, the decision tree generated by this method is stronger. The generalization error of random forests is determined by the formula ρ (1 - s2)/s2. In the formula, ρ represents the correlation between decision trees, and S represents the strength of the decision tree. The decision tree used in this study is more intense, so the generalization error of the generated random forest will be smaller, and the classification accuracy will be higher.
The blue solid arrow indicates the two-dimensional split of the data in the node. The area enclosed by the rectangular frame in the image is the image area used for the training sample splitting in the decision internal node.
A random forest algorithm is used to generate the classification model. For a new instance, the new instance is classified according to its attribute vector and the decision tree generation rules in the random forest. The second-class problem uses the majority vote rule, that is, more than 50% of the decision tree classification results are the final category of the instance. The random forest algorithm can reduce the overall classification error rate, but the classification error rate of most of the classes is low and the classification error rate of a few classes is high. Obviously, this will cause many faulty modules to be classified as non-faulty modules. These fault-prone modules continue to be tested and validated to the next stage of the software lifecycle.
In the feature vector extraction module of IRFCM, firstly, a large number of APK files are inversely processed, and then feature extraction is performed to generate feature vectors of corresponding files, and finally combined into feature vector sets. Obviously, the feature extraction processes of these APK files are independent of each other. Figure 7 shows the classroom teaching features extracted in the English class.

English classroom extraction features.
The random forest algorithm is an integrated learning algorithm that classifies the categories of test samples by combining the classification results of several individual classifiers. The construction of a single classifier takes the feature vector set extracted by the feature vector extraction module as input, and samples the feature vector set through the random subspace idea. Moreover, the sampling processes are independent of each other. Then, the sampling results are trained, and the decision tree classifier is constructed, and the construction processes are also independent of each other. Figure 8 shows the feature extraction results obtained by independent processing of image colors.

Color independence processing results.
In this experiment, the feature vector extraction process is performed in a single-machine environment, and the parallel process of the eigenvector extraction module is performed in parallel environment. The feature extraction experiments are performed on the two sample sets respectively. The experimental results are shown in Fig. 9.

Feature extraction of the outline of the character’s edge.
After the feature vector extraction is completed, the feature vector set obtained from the sample set 2 is taken as an input, and the single-machine environment and the parallel environment are respectively used for classification detection. The final feature extraction results obtained on this basis are shown in Fig. 10.

Feature extraction results.
In this study, the background information is considered in the representation process of the image, and the spatial distribution information of the image features is added to the representation of the image using the image space pyramid. For an image in the database, its size is first changed so that the 1.5-fold rectangular frame containing the actor has a maximum of 300 frames, and the foreground and background of the image are separately represented. However, only the foreground area of the image is considered when looking for key areas of the image. For the extraction of image features, the SIFT features are extracted in five ranges of 8*8, 12′′12, 16′′16, 24*24 and 30*30, respectively, using dense sampling SIFT features. Subsequently, the codebook is generated using K-means clustering, and the size of the codebook is set to 1024. All the training images are normalized before the training of the random forest. The local regions of the image are randomly extracted from the internal nodes of the random forest. Actually, the coordinates representing the upper left and lower right corners of the local region are extracted. If the selected image local area is less than 0.2*0.2, the representation of the feature uses the word bag model. When the selected partial image is larger than 0.2*0.2 and less than 0.8*0.8, a 2-level spatial pyramid is added to the representation of the local area of the image, and when it is greater than 0.8*0.8, the 3-level spatial pyramid is added. The random forest used in this chapter contains 100 decision trees.
Static image human behavior recognition is more difficult than human behavior recognition in video. However, due to its wide application in the field of image retrieval, and the promotion of its research results in the fields of object recognition and scene boundary estimation, the field has received more and more attention. We believe that this field will definitely develop better in the near future. In this paper, the shortcomings of the current static image human behavior recognition on the recognition accuracy of image context information and the similarity between different behavior classes are studied, and the expected results are obtained.
Conclusion
For the images belonging to the same category in the English teaching image classification, which may contain different human behaviors, this study studies whether the factors affecting the accuracy of image classification also contribute to the recognition of static image human behavior. The commonly used word bag model-based classification method is used to identify the human body behavior of static images under different image representation conditions, and the background information of the image and the spatial distribution information of the image features are studied for the recognition accuracy. Through the analysis of the experimental results, the paper draws the conclusion that the image background information and the spatial distribution information of the image features are beneficial to improve the accuracy of human body behavior recognition in static images. Aiming at the existing static image human body behavior recognition methods are easily affected by the similarity between different behavior classes, a static image human body behavior recognition method based on improved random forest is proposed. The method uses the local area where the image is most distinct from other types of images, rather than the whole of the image as the classification basis of the image. Due to the advantages of random forests in processing big data, random forests are used to search for local areas of the image and the final classification. Different from traditional random forests using weak classifiers, this method uses strong classifier SVM in the internal nodes of random forests, which reduces the generalization error of random forests and improves the classification accuracy of random forests. Finally, it is proved by experiments that the classification accuracy of this method is better than the common methods.
