Abstract
The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.
Introduction
With the rapid development of computer technology and Internet technology, people rely more and more on computers in their daily lives and work, and the influence of computers has become more and more extensive. Nowadays, the computer has penetrated every link of the daily teaching work, has produced a huge auxiliary effect on the teaching work in our country, has a profound impact on the teaching method, and is also widely used in various examinations [1]. In the field of education in China, the examination itself is an important method used to measure the effect of student learning and teacher teaching quality. Moreover, the examination has highlighted the extremely important value in evaluating students ‘mastery of classroom knowledge and real-time evaluation of teachers’ classroom teaching effects [2]. In the past, the traditional manual examinations were usually completed manually by the teachers in the class. This test method requires the teacher to give students a detailed focus before the test, which causes the situation of the student ’s temporary surprise review before the test and does not reflect the true level of the student. At the same time, it will also hurt other students’ enthusiasm for learning. A complete exam includes multiple steps such as proposition papers, examination paper review, papers to be printed, answer book binding, organization of examinations, manual review, summary of score statistics, etc. The organization of examinations has caused a lot of waste of human and material resources. The efficiency is very low, so the traditional way of conducting examinations by proposition test papers cannot meet today’s needs [3]. In recent years, the Ministry of Finance and the Ministry of Education of China have continued to increase investment in educational resources and have targeted reforms in the education industry. In particular, the implementation of the separation of teaching and examination, standardization of propositions and standardized management of examination rooms has become a popular trend today, and all schools are actively implementing this policy. The continuous domestic investment in education development and the improvement of the education industry’s informatization level have made schools today have the foundation to implement informatization management [4]. Therefore, the use of computer information technology to solve the problems of separation of teaching and examination, standardization of propositions and standardized management of examination rooms has become an effective method. Moreover, the use of information technology will greatly reduce the daily workload of the teaching staff to make the work efficiency of the teaching staff doubled. The determinants of the quality of the test paper include the scientific nature of the test questions, the randomness and rationality of the knowledge points, the proportion of the natural distribution of the difficulty coefficients of the test questions, the coverage of the range of knowledge points in the books, and the differentiation of the ability level of candidates. The traditional proposition system is based on the concept of the proposition teacher, and its subjectivity is too strong. There are many restrictions and disadvantages, and it is difficult to reflect the ability level of candidates objectively and effectively. This method greatly reduces the scientific nature of the exam. Therefore, in order to improve the work efficiency of the proposition test staff, and free teachers from the heavy repetitive work to make up for the shortcomings of the traditional proposition test method, a more intelligent test paper system was born.
This article uses information technology combined with computer hardware to complete the test questions and storage, and automatically completes the selection of test questions from the question bank through artificial intelligence technology, and automatically completes the combination of test papers based on the requirements of the exam. This method avoids the process of manual intervention, and more efficiently and objectively and accurately reflects the students ‘real classroom learning ability level and teachers’ real teaching ability level in the classroom teaching process. Moreover, this method has important and far-reaching significance for the improvement of our educational institutions’ detection ability in teaching quality.
Related work
In China, online distance teaching started late but developed rapidly. Since the 1990 s, many of China’s national computer examinations have begun to use computers for testing, such as the computer software professional and technical level examination at that time. On this basis, China’s related fields have conducted in-depth analysis and development at a later time. The remote examination system constructed in the literature [5] was created under this background. The online examination system constructed in [6] takes users as the starting point. Moreover, according to the actual needs and investment of system users, it pays attention to the consistency of current technology and actual applications, considers user interface requirements, takes into account future system upgrade compatibility, and makes improvements in overall performance. The system can realize various group tests such as random group test and manual group test. With the help of NET platform and B / S system architecture, the system takes into account the system’s efficiency and practicability. This system can be regarded as the most complete function and the most cost-effective among similar products in the country [7].The literature [8] uses an open B / S mode architecture, and MS as the operating platform improves the compatibility of the system in the application process. In addition, the system also has a short message prompt function for mobile devices and a fingerprint authentication login function, which effectively improves the security of the system. The literature [9] can realize the setting of the composition rules of the test paper and the rules of drawing questions, and the system is also applicable to the different requirements of business in various fields and is widely used. The literature [10] realized that the user’s authority to use the corresponding module function can be set according to different user roles. In the early days, there was a lot of research on artificial intelligence abroad. Literature [11] used batch processing to make the teacher fill in the examination subject, question category, number of questions, etc. on the filling sheet, and then scan and enter it into the computer. The system can realize automatic group test and printing. The literature [12] developed a more powerful examination support system, which is a system dedicated to serving teachers and students in California. It has a large number of effective question banks, and it can automatically group test and evaluate online. The emergence of cognitive science and the development of teaching theory have brought a lot of inspiration and influence on researchers, and people have gradually shifted from the focus of research papers to the learning process of students. The literature [13] combined the intelligent teaching system GUIDON with the expert system MYCIN to make a major breakthrough in the intelligent examination system. The literature [14] used a production-based knowledge representation method to build student models. The teaching process is a learning process in which teachers interact with students. Due to the lack of knowledge of the two systems in the interpretation of knowledge bases and organizational teaching, teaching navigation is not strategic. The literature [15] has achieved good results in intelligent training by applying cognitive science and AcT to build a cognitive model. In today’s society, the TOFEL test in the United States is a widely used network test, and as long as the Internet is connected worldwide, candidates can participate in TOFEL training and exams [16]. In recent years, people have gradually adapted to the online TOFEL test and achieved good results [17]. Purwin Corporation is an exam certification company that provides exam services for many countries [18]. This company has become the standard for computer exams, with more than twenty languages and thousands of exam methods, including certificate exams and academic exams in various fields [19]. These examinations are mainly completed through two systems: one is the Prometric technology center system, and the other is the Prometric authorized examination system. Prometric is a global leader in computerized evaluation and certification [20], so it is often regarded as an excellent system by users.
Maximum stable extreme value region (MSER)
MSER is calculated in the image gray channel. First, a threshold is set, and then each pixel value of the grayscale image is compared with the threshold. If the pixel value is greater than the threshold, the pixel value is set to 1, otherwise, the pixel is set to 0. In this way, there will be a series of connected areas. As the brightness threshold is continuously adjusted, the area of the area also increases or decreases. When the area change between two different thresholds does not exceed a certain threshold, it is regarded as a stable extreme value area [21–23].
The definition of the maximum stable extreme value area is as follows:
1. The relationship between the coordinates of the digital image and the pixel is defined, and the color image I is regarded as a mapping, as shown in formula (1):
Among them,
Then, the mapping relationship of the gray channel G of the image I is shown as formula (2):
3. The pixel connectivity is defined as shown in formula (4). An area R in the image is regarded as a continuous subset of D:
It can be simply understood that there must be a series of specific adjacent pixels in the region R to form a path connecting these two pixels.
4. The boundary of the region R is defined as shown in formula (5):
The boundary is a set of pixels adjacent to the region R but not belonging to the region R, and the boundary is relative to the region.
5. The extreme value region Q is defined as shown in formula (6):
The extreme value area is a subset of connected pixels in the image, and each pixel value in the extreme value area must not exceed the corresponding boundary pixel value of the area.
6. The maximum stable extreme value region (MSER) is defined.
We assume that there are a lot of extreme regions Q1, ⋯ , Qi-1, Q
i
, ⋯, and Q
i
∈ Qi+1. When q (i) takes the minimum value at i*, the corresponding extreme value region Qi* is considered to be the maximum stable extreme value region. The calculation of q (i) is shown in formula (7):
It can be seen from the above steps that the definition of MSER is based on the relationship between pixels. Moreover, the maximum stable extreme value area is gradually defined from the perspective of pixel spatial position relationship and pixel value, pixel adjacency, connectivity, area, boundary, and extreme value area. Flow chart of text positioning based on MSER as show in Fig. 1.

Flow chart of text positioning based on MSER.
Text positioning in natural scenes means that people use computers, mobile phones and other terminal devices to intelligently, quickly and accurately locate text information existing in natural images. The MSER region detection operator can extract the connected region of the image stably in linear time, so it is widely used in the field of text localization. The text positioning method using MSER mainly includes three steps: character candidate region extraction, pseudo character filtering, and character synthesis text lines. The flow chart is shown in 1:
First, an image under a natural scene is input, and then MSER is used to extract text candidate regions. MSER first converts the image into a grayscale image to obtain a large number of connected regions and uses these connected regions as character candidate regions. As shown in Fig. 2 (b), in order to better display the results, the extracted text candidate regions are marked in color on the input image. It can be seen from the experimental results that the MSER detection operator can accurately extract character candidate regions containing a large number of characters.

Experimental results of the text positioning.
However, there are many non-text elements in the scene image that are very similar to the text in shape and color. Therefore, while detecting the text information, the MSER connected area operator will also detect a wide range of pseudo-character areas, as shown in Fig. 2 (c). Then, we need to design a series of feature filters to analyze and filter the connected regions of the character candidates, so as to obtain the text region more accurately.
The natural scene text localization method proposed by Neuman is to filter non-character candidate regions in two stages: 1. The trained Real AdaBoost classifier is used to filter the pseudo-character regions; 2. The trained support vector machine is used to filter out the pseudo-character regions.
1. Pseudo-character filtering based on Real AdaBoost classifier
Adaboost is an adaptive iterative algorithm proposed by Schapire and Freund, and its core idea is to continuously iteratively train the same training set. During the training process, if a sample is misclassified, Adaboost will increase the weight of this sample and reduce the weight of the correctly classified sample. In this way, the samples that are misclassified in the next round of training receive more attention due to the increased weight. At the same time, on the problem of weak classifier combination decision, Adaboost increases the weight of the weak classifier with small classification error to make it play a larger role in voting, and reduces the error of the weak classifier with less classification error to make it play a smaller role in voting. Then, it merges the weak classifiers obtained in each training as the final decision classifier.
Before RealAdaboost performs pseudo-character filtering, we first need to count the following feature description operators corresponding to each extreme value candidate area:
Area a represents the total number of pixels in each MSER area;
Bounding Box represents the circumscribed rectangle corresponding to each character candidate area;
Perimeter p represents the boundary length of each MSER area. As shown in Fig. 3, the white area within the dotted line represents the MSER area corresponding to the previous threshold, and red indicates the pixel that the current threshold has increased. Φ (R) represents the corresponding boundary length of each area, ψ (P) represents the perimeter change value caused by the pixel P whose current threshold is increased from the previous threshold, and the calculation complexity of ψ (P) is O (1). There are 2 adjacent pixels between P1, P2 and the white area, so ψ (P1) = ψ (P3) = 4 - 2 * 2 =0. Meanwhile, there are 1 adjacent pixel between P2 and the white area, so ψ (P2) = 4 - 2 * 1 =2.

Schematic diagram of Perimeter p calculation.
The number of horizontal intersections is ci, and a vector of length h (that is, the image height) is used to store the number of transitions of the corresponding row of pixels between belonging to a certain area and not belonging to a certain area. As shown in Fig. 4, the white area within the dotted line represents the MSER area corresponding to the previous threshold, and red indicates the pixel with the current threshold increased. For the area: there are no pixels in the first and fourth lines, so there are 0 transitions, and the second and third lines have transitions at the left and right boundaries, both of which are 2 transitions. Therefore, the horizontal intersection vector corresponding to the R1 region is (0, 2, 2, 0) and the horizontal intersection vector corresponding to the R2 region is (2, 2, 4, 2). After the pixel P1 is increased, two transitions are added in the first row, and the remaining rows remain unchanged, namely, (2, 2, 2, 0). The operations here are simply element addition operations. After the pixel g is increased, the transition is not increased, that is, (0, 0, 0, 0). After the pixel P3 is increased, the area R1, R2 is connected, which causes the transition to decrease by two, that is (0, 0, - 2, 0). Therefore, the horizontal intersection vector of area R3 is (4, 4, 4, 2).

Schematic diagram of the calculation of the number of horizontal crossing point ci.
All of the above feature description operators can be used as input features of the classifier. Neumann uses a Real Ada-Boost classifier with a decision tree. The classifier counts the following features corresponding to each region: tightness
2. Pseudo-character filtering based on support vector machine
Support Vector Machine (SVM) is a binary classification learning algorithm developed on the basis of statistical learning theory. Its main idea is to find the optimal classification hyperplane that can successfully separate the two types of samples and has the largest classification interval. We assume that there is now a two-dimensional plane, and there are different data on the plane, which are represented by solid circles and hollow circles, respectively. As shown in Fig. 5, if we want to divide the solid circle and the hollow circle into two types through the solid line in the middle, there will be countless lines to complete this task. In the support vector machine, we look for an optimal dividing line so that it has the largest distance to both sides. In this case, the data points with thickened edges are called support vectors, which is also the source of the name of the classification algorithm.

Schematic diagram of linear separable data set.
The learning method of support vector machine includes two models: linear support vector machine and nonlinear support vector machine.
1) Linear separable support vector machine
It assumes that a linear function can completely separate the two types of samples, as shown in Fig. 6. By maximizing the interval, we can obtain the separation hyperplane equation: w · x + b and the corresponding classification decision function: f (x) = sign (w · x + b). Among them, w is the weight coefficient vector of the feature, x is the feature vector of the sample, and b is the offset.

Schematic diagram of maximum interval.
The learning strategy of the support vector machine is to obtain the only separating hyperplane through the strategy of maximizing the interval. The interval in the maximum interval refers to the geometric interval. The geometric interval of hyperplane (w, b) with respect to sample point (x
i
, y
i
) is:
The minimum distance from all sample points to the hyperplane obtained by the support vector machine is:
At this time, the separation hyperplane problem corresponding to the maximum interval can be expressed as an optimization problem with constraints:
Since there is such
When the coefficients w and b are changed in the same proportion, it will not affect the optimization problem, and the maximum 1/ -∥ w ∥ and the minimum 1/2 ∥ w ∥ 2 are equivalent. By further simplifying formula (10), the following optimization problem of linear support vector machine learning is obtained:
Each constraint is multiplied by a Lagrange multiplier α
i
≥ 0, i = 1, 2, ⋯ , N and brought into equation (11). In this way, the problem is transformed into a dual solution problem, as shown in formula (12):
Through simplified formula (12), we can obtain:
Taking formula (13) into formula (12), we can obtain:
Then, the corresponding extreme value problem is the dual problem:
2) Nonlinear support vector machine
The data set used for training is generally non-linearly separable under actual conditions, as shown in Fig. 7:

Schematic diagram of nonlinear separable data set.
Nonlinear support vector machines are mainly used to solve nonlinear classification problems. It can realize the mapping from the low-dimensional space to the high-dimensional space through the inner product kernel function K (x, z) = φ (x) · φ, so as to realize the linear classification after a certain nonlinear classification transformation, and the calculation complexity does not increase. The schematic diagram of kernel function transition is shown in Fig. 8.

Schematic diagram of kernel function transformation.
The kernel function can simplify the inner product operation in the mapping space and the selection of the kernel function is very important for the nonlinear support vector machine. The commonly used kernel functions are as follows:
(a) Polynomial kernel function:
(b) Radial basis kernel function:
(c) sigmoid kernel function:
Through the kernel function, the classification problem of nonlinear support vector machine is transformed into the corresponding dual problem, as shown in formula (20):
The classification decision function is shown in formula (21):
In the character candidate regions screened by the classifier in the first stage, Neumann continues to use the SVM classifier of the RBF kernel to further filter out the pseudo character regions. The classifier uses all the features calculated in the first stage and adds the following features: ah/a, where ah represents the number of pixels in the hole area; ac/a, where ac represents the area of the regional convex hull; Outer boundary inflexion points represent the number of transformations between concave and convex angles between pixels on the boundary of the area k. A character usually has a limited number of inflection points (k < 10), but the non-character content (such as a sketch, pictogram, etc.) has many peaks, so there are more inflection points.
The experimental results show that after the two-step classifier screening, many pseudo-character candidate regions can be eliminated, thereby accurately obtaining the text region. Finally, the text line is constructed according to the relationship between the candidate text characters.
Before using a convolutional neural network for detection and recognition, a large amount of data is required to train the neural network. Only through continuous learning and parameter adjustment can the convolutional neural network make accurate judgment and analysis of the new input. The training process of the convolutional neural network is similar to the training process of the traditional artificial neural network. They are all based on the gradient descent principle, but there are some differences in some calculation formulas. The complete training process of the convolutional neural network is as follows:
1. All filters are initialized, that is, the weights of each neuron are initialized, and parameters and weights are set using random values;
2. A sample X is taken from the training set and input into the network, and its target output vector D is given (the training set is usually an image library);
3. The output Y of the network is calculated sequentially from front to back. The calculation method for the different layers is as follows:
1) For convolutional layers, the following formula is used:
Among them,
2) For the sampling stratum, the calculation formula is:
Among them, B l represents the trainable parameter of the lth layer.
3) For the fully connected layer, it can be directly calculated using the method of the multilayer artificial neural network. The formula is as follows:
Among them, w ij represents the weight of neuron i connected to neuron j.
4. The error terms of each layer are sequentially calculated in reverse;
1) Output layer error: If there are M nodes in the output layer, the error term for node k in the output layer is:
In the formula, d k is the target output of node k, and y k is the predicted output of node k;
2) The error of the middle fully connected layer: If the current layer is the lth layer with a total of L nodes and the 1 + 1th layer with a total of M nodes, the error term for the node j of the lth layer is:
In the formula, h j is the output of node j, and W jk is the weight from node k of layer l to node k of 1 + 1 layer;
5. The adjustment amount of each weight is calculated in order from the back to the front. For the output layer and the intermediate layer, the calculation formula of the weight change is the same. The amount of change in the weight vector of the k-th input of node j in the nth iteration is:
Among them, N is the current number of input variables.
6. The weights are adjusted, and the updated weights are:
The updated threshold is:
The step 2 to the step 6 above are repeated until the error function is less than the set threshold. The error function is:
In order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. Afterwards, the model proposed in this study is studied in terms of scoring accuracy and scoring speed.
First, the model was evaluated from the perspective of the accuracy of English grading. The total score of the composition is 25 points, and there is a total of 40 compositions, and these compositions have been manually reviewed by efficient teachers. The composition was scored by this research model according to the requirements, and the obtained results were compared. The results are shown in Table 1 and Fig. 9.
Statistics table of accuracy rate of English scoring model
Statistics table of accuracy rate of English scoring model
It is not difficult to see from Table 1 and Fig. 9 that the results of the model’s English composition scoring are close to the manual scoring results, and the maximum error does not exceed 20%. This shows that the model proposed in this study is more reliable in the accuracy of English composition recognition and can be applied to the actual English composition score.

Statistics diagram of accuracy rate of English scoring model.
On the basis of ensuring the accuracy rate of model scoring, the model scoring speed was evaluated, and the scoring speed of the model proposed in this study was compared with the neural network model. The results are shown in Table 2 and Fig. 10.
Statistic table of scoring speed of English scoring model
It can be seen from Table 2 and Fig. 10 that the model of this study is much faster than neural network in recognition speed. It can be seen from Fig. 10 that the model of this study can basically complete a essay score within 5 s, while the neural network model has a longer scoring time, the fastest is 10 s, and the slowest is nearly 20 s.

Statistic diagram of scoring speed of English scoring model.
The performance analysis shows that the model proposed in this study performs well in the accuracy and speed of English composition scoring. Therefore, the model proposed in this study can be used as an algorithm model for English assessment and homework evaluation system of online teaching system.
Natural scene text localization and recognition will be an important research direction in the field of computer vision and artificial intelligence in the future, and faster and more robust text localization algorithms will continue to emerge. Moreover, the commercial value based on this direction will be continuously excavated to better serve people’s ordinary lives. In order to meet the needs of English composition scoring, this study proposes a character candidate region extraction algorithm based on improved MSER. The MSER region detection operator is very robust and can extract low-quality character regions. After experimental verification, it is found that the improved algorithm can detect more text regions and improve the recall rate of text positioning. Moreover, this study proposes a pseudo-character area filtering algorithm based on convolutional neural networks. There are many text-like scene elements in natural scene images, and similar problems are encountered in English scoring, so this study takes measures to effectively overcome this problem. In order to verify whether the algorithm model proposed in this study meets the requirements of group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. The basic conditions of composition scoring are input into the model as a constraint model, and then the model is studied in terms of scoring accuracy and scoring speed. Through performance analysis, we can see that this research model performs well in English composition scoring accuracy and scoring speed. Therefore, this research model can be used as an algorithm model for English assessment and online teaching system.
Footnotes
Acknowledgment
This paper was supported by (1) Key Project of Humanities and Social Sciences Research in Colleges and Universities 2019 of Hebei Education Department “The Cooperative Trinity Mechanism of PCK Dynamization for English Major Students in Normal Universities‘‘ (Number: SD192023); (2) Hebei Province Quality Open Online Course 2019 –“Oral English” of Hebei Education Department.
