Abstract
Cross-cultural English teaching is limited by the influence of traditional teaching models, resulting in poor teaching results. In order to improve the efficiency of cross-cultural English teaching, with the support of AI emotion recognition and neural network algorithm, this paper builds a cross-cultural O2O English teaching system with intelligent recognition and management. Moreover, this research uses background models to detect and track targets, to realize the full recognition of students’ emotions, and to facilitate teachers to effectively control online teaching. In addition, combined with online and offline teaching, this study uses neural network algorithm to stabilize the system and perform data processing, construct an overall O2O English teaching model according to actual needs, and formulate the corresponding teaching process. In order to verify the performance of the model, this study starts from two aspects: system performance test and system practice effect and uses statistical methods to collect and process data. The test results show that the model constructed in this paper has good performance and meets expectations.
Introduction
The new teaching model should be supported by modern information technology, especially network technology. By exploring the teaching mode under the network environment, teaching and training are carried out directly on the local area network or campus network, so that the teaching and learning of English can be to a certain extent independent of location and time, and develop in the direction of personalized and autonomous learning. In particular, the new teaching model should reflect the leading role of teachers in the teaching process and the dominant position of students in the teaching process. While making full use of modern information technology, it is necessary to reasonably inherit the excellent parts of the traditional teaching model and give full play to the advantages of traditional classroom teaching. “The basic education in our country is still mainly based on the class teaching system. Teachers mainly rely on the “material” such as courseware, blackboards, books, and homework books, and schools are still the main force of education [1]. It should be noted that traditional teaching mainly focuses on knowledge transfer. If the teaching method is improper, teachers and students lack two-way information and multi-way information exchange, it is easy to cause insufficient attention to students’ learning ability, learning methods, emotional development and the formation of values. The pure online teaching model often fails to achieve the expected teaching effect, and will cause concern to society, parents, and schools. The O2O teaching model combines a MOOC-based teaching platform and a flipped classroom teaching model, which is beneficial to improve learners’ initiative and enthusiasm. Through the use of various control measures to achieve the best state of self-learning, online self-learning and traditional classroom learning can be combined with each other to make up for each other’s weaknesses, so that learners can summarize and integrate the knowledge they have learned [2].
The advent of the “Internet+Education” era has made O2O (Online To Offline) personalized teaching methods possible. Compared with traditional learning methods, the advantage of O2O teaching lies in the combination of mobile information technology, which changes the time, space, teacher-student relationship and other factors in the learning organization form, greatly expanding the communication between teachers and students. Moreover, O2O also provides new solutions for students’ personalized learning, to a certain extent, it can achieve a form of online and offline learning, which provides support for classroom learning. That is, it is a new hybrid teaching that deeply integrates traditional face-to-face classroom teaching with information technology and effectively combines the rich resources of the Internet with the classroom [3].
Related work
The literature [4] pointed out that the integration of offline business and Internet technology will make the Internet a platform for offline transactions. For example, the traditional retail industry in developed countries such as the United States first applied O2O to e-commerce platforms, which promoted the organic integration of online and offline commodity sales models. According to this model, domestic and foreign scholars have carried out relevant research on the application of O2O in teaching. Among them, nearly 400 colleges and universities in the United States provide online education for students and integrate online courses into traditional courses. For example, Carnegie Mellon University combines online education mechanisms with traditional classroom learning to enable students to self-preview through online education platforms, which has greatly promoted classroom learning. The literature [5] applied big data technology to the field of education, conducted a specific inquiry into online education, and analyzed the relevant measures for the application of data technology in education. The literature [6] studied the application of O2O teaching model in mobile social network, and used the mobile network to combine the learning inside and outside the classroom, and effectively solved the inappropriate behavior of students using mobile phones in class, The literature [6] studied the application of O2O teaching model in mobile social network, and used the mobile network to combine the learning inside and outside the classroom, and effectively solved the inappropriate behavior of students using mobile phones in class, which is the practice of O2O teaching model. Relying on the future education platform of Lingnan Teachers College, the literature [7] applied the O2O application model to the future education space station to provide reference for the application of similar education platforms. The literature [8] studied the teaching form of O2O interactive communication, which shows that this model can fully combine online and offline learning resources, help students break through the limitations of time and space, and enable students to use fragmented knowledge to achieve the purpose of autonomous knowledge learning. The literature [8] studied O2O in teacher education and teaching, and combined teacher teaching with O2O model to practice, and completed the organic integration of online to offline education.
The literature [9] proposed that O2O live classroom has the following advantages: First, it is interactive. “O2O live classrooms can simulate traditional classroom teaching to restore offline teaching scenarios and achieve two-way communication. Although students and teachers are separated by screens, students and teachers, and students and students can interact in a timely manner through voice, text, etc., and students’ questions can get real-time feedback and answers. “Second, it is fair. Through the live broadcast of high-quality teaching resources, people from all walks of life can see and provide them to areas where education resources are relatively weak. The literature [10] taken the practice of O2O live classroom teaching mode as an example to prove that O2O can break through geographical restrictions and provide viewers with rich resources and activities. The literature [11] studied more than 150 courses offered by the school for students to watch. The results show that in just half a year, the cumulative number of viewers in the live broadcast classroom exceeded 140,000, and the number of people watching the wonderful playback was nearly 70,000, which achieved good results. The literature [12] proposed an O2O English teaching model based on MOOC, which can solve the shortcomings of the traditional English teaching model. This model can improve the participation and initiative of students in the process of students’ autonomous learning and obtain the best learning effect. The literature [13] proposed that O2O hybrid teaching enables students to learn autonomously online and enable online interaction to drive offline targeted lectures. The offline part of the implementation of the course adopts the problems in real life and uses cooperative learning to organize teaching, and offline feedback and review are consolidated with mind map feedback to connect online and offline learning effects. Using this teaching model can improve students’ participation in learning, enable students to actively invest in the environment of independent inquiry and cooperative learning, and then improve their thinking ability. The literature [14] proposed that the future space station based on the O2O teaching model is mainly composed of four parts: traditional classroom, telepresence classroom, distance education and multimedia classroom. The traditional classroom belongs to Offline, which is the same place at the same time, and the telepresence classroom belongs to the typical simultaneous remote place, and the interaction is realized through the network. Distance education belongs to Online, and is the different time and place, and students can learn independently. In addition, multimedia classrooms are open to students at the different time and at the same time. The four parts interact through online and offline. There are several evaluation methods for the O2O teaching model: qualitative evaluation, quantitative evaluation, student self-evaluation, teaching assistant evaluation and teacher comment. The biggest advantage of the evaluation method based on this model is that it can make up for the shortcomings of online evaluation and make the evaluation more effective. Moreover, students can deepen their understanding of knowledge throughout the evaluation process and grasp the evaluation indicators more accurately [15]. The article [25] focuses on IoT and its major role in sophisticating the human behaviours and efforts. This paper also dealt with the collection of various data from various resources that are connected to the internet. The literature [26] talks about the various problems in the vehicular communication field with the proposal of cooperative centralized and distributed spectrum sensing model. Due to the implementation of cooperative cognitive model, interference and various hidden problems are minimized. The literature [27] addresses the problem such as massive volume of bigdata and come up with the concept of SmartBuddy to make intelligent and smart environment using human behaviours and human dynamics. The literature [28] talks about the construction of directed acyclic graph for video coding algorithms for motion estimation in parallel reconfigurable computing systems. Also, partitioning algorithm plays a major role to speed up the video processing. The article [29] dealt exploiting IoT and BigData Analytics using Hadoop ecosystem in real time environments. Implementation of IoT-based Smart City is achieved by the above-mentioned processes [30, 31].
Research on complex background modeling technology
The environment in which the teaching is located is very complicated, and it is very difficult to directly detect the emotional intelligence of students. Therefore, it is necessary to model a complex background and use the background model to detect and track targets. Therefore, it is first necessary to study the modeling technology of complex background.
The complex background modeling analysis method can be implemented by two methods, one is based on gray or color value modeling (color background model), and the other is based on similar Gaussian mixture model modeling (texture background model).
The color background model generally performs gray-scale processing on the entire image, converts its RGB color information into binary processing, simplifies the calculation amount, and then calculates the threshold value to give each pixel a value of 0 or 1.If the gray value of the current area is significantly different from the background, it can be judged that the area is the foreground, otherwise it is the background.
The color background model is not suitable for shadow-sensitive areas, which can easily cause false detections. For specific occasions, filter processing is required.
The Gaussian mixture model averages multiple Gaussian density functions and performs smooth filtering to approximate the density distribution function of any shape. It is widely used in data analysis and pattern recognition.
We set the density function of a single D-dimensional random variable g to satisfy
Among them,
Formula (1) linearly combines the spatial distribution of any dimension, and for each dimension, we can usually call it a cluster. The Gaussian mixture model is to simulate the distribution law of complex samples more intuitively and effectively and describe the data law of the entire model through linear combination.
The definition of GMM is shown in formula (2):
Formula (2) is used to calculate each point in the multi-dimensional space. Using the above formula to repeatedly calculate each point in the space, a local optimal solution can be finally obtained. The Gaussian mixture model has a very wide range of uses and is widely used in power transmission and transformation image acquisition [16].
The background modeling method based on the non-parametric model is to save the color value (such as RGB value or gray value) of each pixel in a certain period of time, and calculate the probability of the current sample value according to the method shown in formula (3).Then, according to the probability value of the pixel, its classification (foreground pixel or background pixel) is determined. It is described mathematically as follows: We set the color sampling value of N continuous-time discrete points of a pixel in the image sequence as x
i
, and x
t
is the color value of the current time t. Then, the probability density function of this pixel at time t can be expressed by formula (3), where K (·) is the probability density estimation function.
Among them, K (·) can be expressed based on the Gaussian distribution function N (0, ∑), then formula (3) can be changed to formula (4):
Among them, ∑is the covariance matrix and d is the dimension of the sample sequence. When using the RGB model, the dimension d is 3.In order to simplify the matrix calculation while retaining the modeling effect, it can be assumed that the color channels of the RGB model are independent of each other, and σ
i
≠ σ
j
.Then, ∑can be expressed as [17]:
After substituting simplified ∑into formula (4), the density estimation function can be simplified as:
The probability of the color value of each pixel of the current frame is calculated according to formula (6), and then the attribute of the pixel (belonging to the foreground or background) is classified. The specific step is to compare the size of p r (x t ) and the selected threshold th. If p r (x t ) < th, the pixel point is divided into front spots, otherwise, it is divided into background points.
The texture-based background model mainly includes the following steps: Extract video sequence images; Calculate the integral histogram; Based on the idea scene similar to the Gaussian mixture model, background modeling is carried out. The detailed description is as follows:
For each pixel in the scene, a background model is established. This model contains K models. Each model is composed of a histogram description and a weight value. This model can be expressed as formula (7):
m i is a histogram description of the i-th model, and w i is the weight value of the i-th model. Among them, the histogram description refers to taking a square neighborhood with the side length L as the center of the pixel and calculating the statistical histogram in this neighborhood.
Generally, because the number of background pixels in the detected scene is high, the greater the weight of a model, the greater the probability that it represents a background. Therefore, K models can be sorted according to the weight value, and the model after sorting is:
According to the threshold, the following results are determined:
Among them, T B is a user-adjustable threshold parameter between 0 and 1 [18].
The recognition of students’ emotions is based on the effective monitoring of their facial expressions. At present, algorithms based on moving target detection can be roughly divided into three categories: frame difference method, background subtraction method and optical flow method.
The basic principle of the frame difference method is mainly to use the gray value of adjacent frames in the image to subtract. At this time, the difference of the background part is almost zero, and the difference of the moving objects is very different. The calculation expression is:
Among them, I (x, y, t) represents the gray value of the pixel (x, y) at time t, and I (x, y, t - 1) represents the gray value of the pixel (x, y) at time t - 1 [19].
A certain threshold T is used to take a threshold judgment on ΔI
t
(x, y), as shown in formula (11) [20]:
Background subtraction is often used for moving target detection. The principle is to compare the current frame with the background model. The key of this algorithm is to determine the background model.
The principle of background subtraction is based on the difference algorithm to completely extract the moving target area. To avoid human influence (camera shake), the usual practice is to use a static scene image as the background image. Considering the change of light in different time periods, the background images under different time and light conditions can be saved, and the motion area of the object can be extracted by comparing the background images of the time periods of different frames during the analysis. This method requires all background images in each season and time point to be accurately captured and saved. However, in an open environment, there are many random changes, and this simple method often does not work.
Therefore, an algorithm for automatically updating the background is proposed: If I (x, y, t) is the image at time t and B (x, y, t) is the background image at time t, then the difference image is [21]:
Threshold processing is performed on it, and the binary image M (x, y, t) of the moving target area is calculated as follows:
The formula for updating the background image is as follows:
In the formula, α is the rate of background update.
Optical flow is the instantaneous velocity field of the movement of space objects on the observation surface. It carries a lot of information, can represent changes in the image at different times, and can be used to determine the motion of the target [22].
Under certain assumptions, scholars such as Horne gave basic methods and formulas for optical flow field calculation based on grayscale images.
We assume that at time t, the grayscale of pixel (x, y) in the image is f (x, y, t). After a period of time, this point moves to a new position (x + Δx, y + Δy), and its grayscale is f (x + Δx, y + Δy, t + Δt).Since these two positions are actually the same point at two different times, it can be assumed that they have the same gray level as the above assumption [23]:
Since the change of gray scale on x, y, t is continuous and smooth, then formula (15) can be expanded with Taylor series:
After the above formula is simplified, we can get [24]:
There are usually two algorithms for target tracking: hypothesis-based tracking algorithm and feature-based tracking algorithm.
The first algorithm analyzes the characteristics or particularities of the tracked target and background environment, determines the hypothetical conditions, and establishes a tracking model.
The second algorithm is mainly to extract the feature information of the image. Some features are distinguishable information for the recognition target and are distinguished from other interference items. Therefore, it can be achieved by detecting the features specified by the target in consecutive frames.
Mean Shift is a gradient-based nonparametric estimation method. Traditional parameters are usually typical functions, most of which are single-peak, and the actual computer is a multi-variable multi-peak visual problem. Parameter-free density estimation is a good solution to this problem.
For d-dimensional Euclidean geometric space, x is one of the points, and K
H
(x) represents the kernel function of the space. The estimated probability of density in R
d
is:
Among them:
In formula (19), H is the d × d bandwidth matrix. When we set H = h2I, formula (18) can be written as:
The kernel function is:
According to formula (21), the degree estimation expression is:
In order to extract the maximum point of density in the data set, when the gradient of formula (22) is calculated, the following result is obtained:
If g (x) = k′ (x), and the kernel function is defined by formula (20), and G (x) = cg,dg (∥ x2 ∥), then formula (23) is written as:
Formula (24) contains two items. According to formula (22), we can know:
The second definition is denoted by mh,G (x):
When g (x) = 1, formula (26) can be abbreviated as:
Formula (24) changes to:
Therefore:
The tolerance error ɛ is given, and the algorithm mainly consists of the following steps: The vector mh,G (x) is calculated; If ∥mh,G (x) - x ∥ < ɛ is satisfied, the algorithm loop ends, otherwise the algorithm continues; If x = mh,G (x) + x, the algorithm loops to step 1.
Finally, the center point of the kernel converges to the point where the estimated density gradient is zero. The moving step size is related to the size of the gradient and the probability density of the points, so the Mean Shift algorithm is an adaptive gradient algorithm.
As shown in Fig. 1 is the O2O micro-teaching teaching mode diagram. The mode diagram mainly combines the O2O teaching mode and the MO class, and is established through the network platform to establish the O2OMOOC teaching mode through the network platform.

O2O MOOC teaching model diagram.
The O2O flip classroom teaching model is applied to teaching through a combination of online and offline. The model diagram is shown in Fig. 2.

O2O flip classroom teaching mode.
In addition, the O2O MOOC teaching mode can also be constructed to meet the personalized learning needs of students through the construction of the network platform, as shown in Fig. 3.

O2O MOOC teaching personalized model.
Among the many models, we can summarize the necessary three points of the O2O teaching model: interaction, resources, and platform. First, interaction is the biggest advantage of the O2O teaching model. It is divided into two types: online interaction and offline interaction. Online interaction can be conducted in the form of student discussions, teacher-student discussions, platform message reply, and student online answering questions. Students can learn their own knowledge immediately after learning teaching resources, or they can communicate with teachers online and provide timely feedback. There can also be multiple online communication methods. The first is social software such as QQ and WeChat. After learning, students ask questions online and the teacher answers the students. However, the disadvantage is that the teacher cannot understand the completion of the student’s study. The second is the online platform. Teachers distribute resources on the platform, and students watch and learn, answer questions or leave messages in the discussion room after learning. Moreover, after seeing the reply, the teacher can also see the progress and completion of each student’s learning on the platform management, master the dynamics of each student, and grasp the rhythm of the classroom. The third is that the teacher uploads resources through the local area network. After the students watch, the teacher uploads the test questions to the students to answer, and the teacher checks the students’ mastery through the test questions. The advantage is that the statistical effect is convenient and accurate, and the students are less disturbed by the network. The disadvantage is that the distance between teachers and students is extended, and the test form is compared to the test. These methods have their own advantages and disadvantages, but they can all reflect the online interaction of the O2O teaching model and make up for the shortcomings of micro-classes. Offline interaction can play a dual subjective role through design activities, problem solving, group cooperation, teacher and student discussion to solve problems together.
Second, the combination of online and offline is the mainstay of the O2O teaching model. Students’ online self-directed learning meets their individualized learning needs, which is also in line with the cognitive and psychological characteristics of high school students. Moreover, through the learning of teaching resources, students can grasp the speed of learning as a whole, and there will be no situation that students cannot understand because teachers explain too fast or too slow. The teaching resources provided by teachers are mostly in the form of videos. In traditional classrooms, some abstract concepts are not understood by students after teachers explain them, but they can often be understood through animation demonstrations or scenes restored from real phenomena in videos. After online learning, offline activities can be carried out. Offline activities can play the role of teacher’s dominance and student’s subjectivity. Teachers and students discuss and solve problems together, which can play a key role in strengthening online knowledge learning. Offline teacher-student face-to-face lectures make students more focused on the interactive process and content and can also cultivate students’ spirit of collaboration and enterprising, thereby fostering information literacy.
Third, the platform combines online and offline to build a completed O2O teaching model. Therefore, we can sum up the interaction-resource-platform relationship: The platform is the basis of teacher-student interaction; The platform provides support for learning resources; The choice of platforms is diverse; The interaction is divided into online and offline; The interaction is the top priority of the O2O teaching model.
Based on the above comprehensive analysis, this paper proposes a cross-cultural O2O English teaching system based on AI emotion recognition and neural network algorithm. The basic structure is shown in Fig. 4.

Basic structure of cross-cultural O2O English teaching system based on AI emotion recognition and neural network algorithm.
Due to the limitations of many external conditions, such as technology and network conditions, it is difficult for the online live broadcast platform to be widely deployed in practical applications. The learners present a lot of uncertainties and insufficient interpretation of the academic situation. In this process, the teachers only give demonstrations and lectures, and the students are passively accepted, making it difficult to achieve effective communication. Although the live broadcast platform can be interactively set to achieve academic assessment and interaction, it can only affect a small number of learners, which is far from the traditional classroom teaching. This paper designs an implementation path of O2O live classroom mobile teaching based on hybrid learning theory, as shown in Fig. 5.

Mobile teaching implementation path of O2O live classroom based on hybrid learning theory.
The implementation process can be divided in chronological order:The first stage is the preparation work before the live broadcast. Since it happened before the live broadcast, it is the offline preparation work. Teachers or related personnel assign tasks to students. Tasks can be communicated in various forms, such as links, QR codes, etc., and live courses can also be made into a book. Moreover, students can provide timely feedback on the preview situation by scanning the QR code, and the teacher can roughly understand the student’s preview situation based on the relevant data. The second stage is live broadcast, and this process is carried out online. Teachers perform lectures and students listen online. Because the teachers already know the students’ learning situation before the live broadcast, they can carry out targeted communication with the students. The third stage is after the live broadcast, that is, offline activities. The offline live broadcast of traditional webcast courses means completion, but the O2O live classroom teaching is not. After the live broadcast, the students will receive the task again, and the students will give feedback on their learning situation. Feedback can be carried out online or offline, so as to ensure that teachers fully understand the teaching effectiveness of this class and provide a basis for rectification for the next live broadcast. Comparison diagram of the scores before the test as show in Fig. 6, Comparison table of the scores after the test as show in Fig. 7.

Comparison diagram of the scores before the test.

Comparison table of the scores after the test.
After constructing a cross-cultural O2O English teaching model based on AI emotion recognition and neural network algorithm above, the performance analysis of the model is carried out. A statistical analysis of the results of the experimental class and the control class before the test is performed. There is no significant difference in the scores of the two groups of subjects before the English test (P = 0.210 > 0.05).It can be seen that before the beginning of the teaching experiment, the starting point of the experimental class and the control class are equivalent, and there is no significant difference in English level, as shown in Table 1 and Fig. 5:
Comparison table of the scores before the test
Comparison table of the scores before the test
Next, this study conducts a four-month English experimental teaching and tested the teaching effect of the students tested. Moreover, this study uses independent sample t test to analyze the post-test results of the experimental class and the control class (see Table 2), t = 3.943, degree of freedom df = 101, and two-sided test P value (Sig.)=0.000 < 0.005.It shows that the scores of the English O2O teaching group and the traditional English 3P group have significant differences after the test.
Comparison table of the scores after the test
Based on the above analysis, it can be known that the system constructed in this article has certain practical effects. Next, the system is tested for performance. The performance test mainly includes the distortion rate of data transmission and the accuracy rate of classroom emotion detection, which are verified by 100 sets of data, respectively. The results obtained are shown in Tables 3, 4, Figs. 8, and 9.
Statistical table of data distortion rate
Statistical table of emotion recognition accuracy rate

Statistical table of data distortion rate.

Statistical diagram of emotion recognition accuracy rate.
Through the above statistical analysis, we can see that the system model constructed in this paper has good performance and can be applied to actual teaching.
In order to improve the quality and effect of cross-cultural English teaching and to better design English courses, this article constructs a brand-new English O2O teaching model focusing on the construction of MOOC-based O2O teaching platform for higher vocational English to improve the traditional 3P English teaching model. The environment in which the teaching is located is very complex, and it is very difficult to directly detect the emotional intelligence of students. Therefore, it is necessary to model a complex background and use the background model to detect and track the target. This study uses AI emotion recognition and neural network algorithm to achieve real-time tracking of student status, which is convenient for teachers to adjust online teaching. Moreover, this paper designs an implementation path of O2O live classroom mobile teaching based on hybrid learning theory based on actual needs and constructs the overall system result model. In addition, in this study, the performance analysis of the model is carried out to verify from the perspective of the model’s own performance and the effect of practical teaching. It can be seen from the experimental teaching and system operation test that the system constructed in this paper has a certain effect and can be applied to practical teaching.
