Abstract
English reading plays an important role in promoting oral English and comprehensive English ability. At present, the traditional online reading mode is less effective. In order to change the shortcomings of traditional education, this article builds on the artificial intelligence algorithm and combines the spoken language spectrum algorithm to build the system. Moreover, this article combines with the actual needs to put forward endpoint detection and judgment criteria based on spectral entropy information, establishes a mathematical model of knowledge forgetting, and obtains an intelligent memory algorithm to guide students in personalized learning. In order to verify the effect of the model, this article takes the students in the experimental class and the control class as the experimental objects and compares the spoken pronunciation of the students and the comprehensive English scores of the students after the experiment. The research results show that the artificial intelligence-based English multimodal online reading mode platform constructed in this article has certain effects and can effectively improve students’ English scores.
Introduction
In the era of artificial intelligence, English reading needs to be combined with artificial intelligence technology to build a teaching model that effectively promotes students’ reading ability. From the perspective of reading teaching, artificial intelligence teaching model has played an important role in English teaching. From the perspective of actual teaching effect, the application of artificial intelligence to English reading online teaching model has a good effect. However, at present, this model lacks standardization, and there is a certain chance. Therefore, it is necessary to construct the English online reading model based on the actual situation [1].
English reading ability is an important criterion for judging students’ English language ability, and the teaching of English reading is also the focus of English teachers’ education and teaching work. However, with the continuous improvement of students’ qualities and abilities and the renewal and development of education and teaching concepts, traditional teaching methods have been impacted and the drawbacks of traditional teaching methods are increasingly exposed [2]. At the same time, based on different understandings of English reading teaching, various distinctive and unique reading theories continue to emerge, and a variety of English reading teaching methods have emerged. These theories and teaching methods have had a very important influence on English reading teaching in China, such as constructivist theory, input-output theory, schema theory and interactive teaching methods, task-based teaching methods, situational teaching methods, 3P teaching methods and PWP teaching methods, etc. These teaching methods have their own advantages and disadvantages and play an important role in teaching. For example, the 3P teaching method is widely used in teaching, but it makes the entire teaching process appear to be programmed and mechanized to a certain extent [3].
Related work
At present, AI technology is still developing rapidly in the United States, Europe and Japan [4]. IBM has already manufactured ASXCI White computers with one-thousandth of the human brain’s intellectual ability and is also developing Blue Jean with an intelligence level equivalent to that of the human brain. Microsoft President Bill Gates gave a keynote speech [5] at the AI International Conference held in Washington, USA, and disclosed that Microsoft is currently working on the research of AI basic technology and application technology. Its research objects include self-determination, expression of knowledge and information, information retrieval, machine learning, data collection, natural language, speech or handwriting recognition, etc. The MIT AI Lab conducts a project code-named Cog. The Cog plan is intended to give robots human behavior. One project of this experiment is to make the robot capture the movement of the eyes and facial expressions, and the other project is to make the robot grasp the things passing by it. Moreover, another project is to make the robot learn to listen to the rhythm of music and play the music on the drum. Liszt’s Starlab has created an artificial cat brain with 75 million artificial nerve cells. From some current research, it can be predicted that the future of artificial intelligence may develop to the following aspects: fuzzy processing, parallelization, neural networks, and machine emotions [6]. The research on artificial intelligence in my country started late. At present, many universities and research institutions are carrying out research and teaching of artificial intelligence. At present, a lot of research results have been achieved in expert systems, pattern recognition, Chinese understanding, theorem proving, assisted design, assisted teaching, intelligent control, intelligent management, robotics, office automation, etc. In the study of theoretical methods, Wu’s method, generalized intelligent information system theory, information knowledge intelligent conversion theory, total information theory, pan-logic theory, etc. were proposed [7]. In terms of technology development, we have developed technologies and products with Chinese characteristics such as Chinese medicine expert system, agricultural expert system, Chinese character recognition system, Chinese-English recognition system, and Chinese-English machine translation system [8]. Some experts and scholars have formed the ability to independently study major scientific frontiers in the field of artificial intelligence and have been at the forefront of the world in some important research fields. The robot football competition system constructed in the literature [9] won the championship in large international competitions year after year. The large-scale data mining and information retrieval systems developed in the literature [10] have received extensive attention from the international academic community. The “Emotional Adaptation Model for Knowledge Expression” proposed in [11] created a new method of “information modeling”. In this method, the candidate model is provided by the computer, and emotion selection and human-computer cooperation are performed by humans, which can effectively establish a satisfactory information model through learning in complex situations. “High-dimensional geometry and neural network” in the literature [12] created a new method to describe and design artificial neural network with geometry. In terms of theoretical research on artificial intelligence, the “information-knowledge-intelligence conversion theory” proposed in the literature [13] creates an information science methodology and an information conversion mechanism that extracts knowledge from information and creates intelligence from knowledge. The literature [14] created universal logics.
The basic principle of spectral phase noise reduction method
Spectral phase subtraction noise method referred to as spectral subtraction is to subtract the noise power spectrum from the power spectrum of the noisy sound disturbance signal, so as to obtain a more pure sound disturbance spectrum. This method assumes that the sound disturbance signal is a stationary signal, and the noise and signal are additive signals and are independent of each other. At this time, the noisy signal can be expressed as [15]:
In the formula, s (n) is a pure signal and d (n) is a stable additive noise. Spectral subtraction assumes that the signal is a stationary signal, but according to the short-term stationary nature of the sound signal, the sound disturbance signal as a whole is not stable, and only each frame of the signal after the frame can be regarded as stable. Therefore, in practical applications, y (n) is the signal of each frame. If the Fourier coefficient of y (n) is Y k = |Y k | exp |jθ k |, the Fourier coefficient of s (n) is S k = |S k | exp |jα k |, and the Fourier coefficient of d (n) is N k , then there is Y k = S k + N k . The task of spectral subtraction is to use known noise power spectrum information to estimate S k from Y k . Since the human ear is not sensitive to the phase, as long as |S k | is estimated, and then the phase of the noisy sound signal is borrowed, the inverse Fourier transform can be used to obtain an enhanced sound disturbance signal [16–18].
After the signal undergoes FFT, there is Y
k
= S
k
+ N
k
, from which we can get [17]:
In the formula, * represents conjugation. Because the noise is assumed to be independent and uncorrelated, that is, s (n) and d (n) are independent, the statistical mean of the cross spectrum is 0. And N
k
is a Gaussian distribution with zero mean, so there is
As long as |N
k
|2 is subtracted from |Y
k
|2, |S
k
|2 can be restored. The reason for this is that the human ear is insensitive to the phase of sound. Because the noise is locally stationary, it can be considered that the noise before the signal is the same as the noise power spectrum in the signal, so the “silent frame” before the signal can be used to estimate the noise. For a short-term stationary process within a subframe, there is [19]:
Among them, λ (k) is the statistical average value of |N
k
|2 when there is no signal, that is, λ (k) = E [|N
k
|2], from which the estimated value of the original signal is:
Among them,
Since there may be |Y
k
|2 < λ (k) in the calculation, and the power spectrum cannot be negative, the modified formula (5) is [20]:
ɛ is a number greater than 0, which is determined by experiment.
In addition, spectral subtraction has its inherent physical meaning. We define a gain function:
A posterior signal-to-noise ratio is again defined:
Then formula (5) can be changed to:
The physical meaning of spectral subtraction can be seen from formula (7): it is equivalent to multiplying a noisy signal by a coefficient G k . It can be seen from formula (9) that when the signal-to-noise ratio is high, G k is larger, that is, the possibility of containing a sound disturbance signal is large, and the attenuation is small. Conversely, when the signal-to-noise ratio is low, G k is small, and it is considered that the possibility of containing a sound disturbance signal is small, and the attenuation increases. In practical applications, frames with high signal power are considered to be more likely to contain sound disturbance signals. Frames with low signal power are considered to be less likely to contain sound disturbance signals [21].
The program realization flow chart of spectrum subtraction is shown in Fig. 2:

Spectral phase subtraction noise algorithm block diagram.

Flow chart of spectrum subtraction program.
First, the sound disturbance signal is read in according to a certain sampling frequency. The sampling frequency is generally 10 kHz. Next, the signal is framed. Generally, the length of each frame is 256 points. In order to avoid the frame-to-frame truncation effect causing the recovery signal to be discontinuous, a certain amount of overlap is required between frames, and the number of overlapping points is generally 128. Framing is often achieved by adding a translational Hamming window. Next, signal processing is performed in units of each frame. First, each frame signal is FFT transformed, and then the amplitude and phase are separated. The power of each frame is calculated according to the amplitude, and then it is processed according to a certain spectral subtraction algorithm, and finally the processed signals of each frame are combined with the separated phases for IFFT transformation, thereby obtaining the processed output signal [22–24].
λ (k) in formula (5) in ordinary spectral subtraction is to replace the noise spectrum of the current analysis frame with the statistical average noise variance, and the default is that the noise power on each signal frame is equal. In fact, the frame power spectrum of noise varies randomly and has a wide range. The ratio of the maximum and minimum values in the frequency domain often reaches several orders of magnitude. Therefore, if the equal noise power is subtracted from each signal frame, for some signal frames, there will be some large power spectrum components remaining, showing randomly appearing peaks on the spectrum, and forming residual noise on hearing. This kind of noise has a certain sense of rhythmic fluctuation, so it is called “music noise”.
In order to effectively reduce music noise, spectral subtraction can be improved. Subtracting n × λ (k) and dynamically adjusting n according to the characteristics of each signal frame can well highlight the power spectrum of the signal. This improvement is also referred to as reduced item weighting.
At the same time, the power spectrum calculations ||2 and ||1/2 in Equation (5) are changed to || m and ||1/m calculations (where m is not necessarily an integer), which can increase flexibility. This method is called power spectrum correction processing.
At this time, formula (5) is amended as:
The introduction of two parameters m and n provides great flexibility for calculation. When m = 2, m = 1, it becomes the basic spectral subtraction. Experiments show that proper adjustment of m and n can obtain better enhancement effect than traditional spectral subtraction. Therefore, in the actual enhancement process, the improved form of spectral subtraction is used more.
Each frame is dynamically compared. In the implementation, it can be compared frame by frame. The spectrum value of each frame is compared with the existing minimum spectrum value. At the same time, The minimum value of the corresponding point is taken as the minimum spectrum value, and so on. This minimum spectrum value is the noise power spectrum.
The principle block diagram of the improved spectral subtraction algorithm proposed in the paper is as follows:
Here is the calculation formula and steps: The initial coefficient value is set first, and the minimum and maximum value of the initial power max , min is 0; The cycle is started, from the first frame to the last frame, the power accumulation at each point of the frame and the size of p and max are compared, and a large value is assigned to max. The power of each point in each frame is compared with the points in min, and the small value is kept as the noise power spectrum value λ
k
. m,n is calculated as:
Among them, q is the correction coefficient, generally about 1, here to take 1.3.
5). The estimated value of each frame signal is calculated according to formula (10).
In practice, it is very difficult to measure the power value of a frame’s noise spectrum. Even under experimental conditions, the difference between frames is very large. Therefore, the formula selects the reserved frame spectrum high power value as the base and compares it with the spectrum power of each frame. This ratio is the increase of the m0 coefficient value. When the frame spectrum power is large, the m0 value becomes large, and when the frame spectrum power is small, the m0 value becomes smaller, which meets the final signal enhancement requirements. The correction value of n0 is taken as the reciprocal of this ratio, that is, when the frame spectrum power is large, the subtracted noise spectrum power is small, and when the frame spectrum power is small, the subtracted noise spectrum power is large, which fully meets the final signal enhancement requirements. The coefficient value g is to highlight the effect of the n0 value.
The word entropy comes from statistical thermodynamics and is a measure of the degree of disorder. Information theory uses it to express the average uncertainty of the source, and the change of the average uncertainty of the source can be expressed by an entropy function. The probability space of discrete sources is set as:
Then the entropy function is:
Among them, P = (p1, p2, ⋯ , p
q
) is a q-dimensional vector and satisfies
That is, when the probability distribution is equal, the entropy reaches the maximum value, which indicates that the average uncertainty of the source is the largest when the probability distribution is equal. This conclusion is called the maximum discrete entropy theorem.
Since the amplitude of the signal has a large dynamic range for the background noise, it can be considered that there are many random events of the random signal. Therefore, the average amount of information is large, that is, the entropy value is large. However, the dynamic range of the noise segment distribution is small and the distribution is relatively concentrated, so the entropy value is small. According to this principle, endpoint detection can be performed on the signal.
Define that the noisy signal s (n) is framed and windowed to solve the FFT transform according to the 50% overlap between frames. The energy spectrum of a certain frequency component f
i
is Y
m
(f
i
), which is the FFT coefficient value at a certain point, then each frequency component The normalized spectral probability density function (pdf) of is defined as:
Among them, P
i
is the probability density corresponding to a certain frequency component i, N is the FFT transform length, and m is a frame signal analyzed. The short-time spectral entropy of each analysis frame is defined as:
After defining the short-time spectral entropy, we give the algorithm of spectral entropy endpoint detection: According to (3) and (4), the short-term spectral entropy of each frame is obtained, that is, the spectral entropy set ξ = (H1, H2, ⋯ , H
M
) is obtained. Among them, M is the total number of frames of the analysis signal. According to the spectral entropy set, the threshold T for judging the end point is obtained. The formula is:
Among them, median (ξ) is the median of ξ, and mean (ξ) is the average of ξ. Meanwhile, the values of a and b are generally determined by experiments. Experiments have shown that when a = 0.2, b = 0.4 is normal, the experimental results are better.
(3) After T is determined, the following decision criteria are used to determine whether this frame is a signal frame:
It can be seen that all frames that are larger than the threshold are determined as signal frames, but the actual situation is not so simple and absolute. Due to the existence of various interferences, the frames derived from this determination are not necessarily signals. At the same time, there is a certain noise area between the signal segments, and the end of the signal cannot be determined according to a simple criterion. Therefore, the discrimination method is improved here, and an endpoint detection criterion based on spectral entropy information is proposed. The discrimination process is similar to the double threshold endpoint detection method.
The schematic diagram of B/S three-layer structure is shown in Fig. 4. The three-layer structure is a relatively vague concept, and there is no unified standard in design. This article uses the three-layer structure that is widely used, namely the presentation layer (UIL), business logic layer (BLL), and data access layer (DAL). Under this structure, the user interface is completely implemented through the WWW browser. B/S organizations use the continuously mature and popular browser technology to realize the powerful functions that only need complex special software to achieve, which saves development costs and is a brand-new software system construction technology. This structure has become the preferred architecture for referencing software today.

Block diagram of improved spectral phase subtraction noise method.

Three-layer structure system.
The flow of the student autonomous learning system is shown in Fig. 5. After the students successfully log in to the system, they can choose the intent of logging into the system this time: test the knowledge mastery, autonomous practice, and historical practice. When practicing, the system will give exercises dynamically according to each student’s practice. After each student exercises, the system will record the results of this exercise for later judgment.

Business flow chart of student autonomous learning system.
According to the different memory retention time, memory is divided into short-term memory and long-term memory. The input information becomes the short-term memory of the person after learning through the attention process of the person. If the information is not reviewed in a timely manner, these short-term memories will be forgotten. After the information has been reviewed in time, these short-term memories will remain in the brain for a long time and become long-term memories. This process can be intuitively and clearly understood through Fig. 6.

Memory and forgetting.
The model performance is verified on the basis of the above analysis. This article comprehensively analyzes the current multiple modes of English reading and combines speech recognition system processing to perform speech recognition through sound wave processing to correct students’ reading pronunciation. Figure 7 shows the sound wave results of speech recognition.

Example diagram of acoustic wave recognition results.
On the basis of the above analysis, the model verification is carried out. The online reading mode constructed in this article can be selected according to the actual needs of students, and finally the model with the best value for itself is obtained. In order to verify the effect of the model, this article takes the students in the experimental class and the control class as the experimental object, each class has 30 people, and takes a semester as the experimental period, and compares the spoken pronunciation of the students and the comprehensive English scores of the students after the experiment. Spoken pronunciation is gradually improved through systematic correction to improve pronunciation accuracy, while English comprehensive score is the system to assist students to choose the best mode, and gradually improve students’ comprehensive English score through oral teaching. First, this article tests the students’ spoken pronunciation part, and the results are shown in Table 1 and Fig. 8.
Comparison table of spoken pronunciation test
As can be seen from Fig. 8 above, the final score of the system spoken pronunciation constructed in this article far exceeds the control group. It can be seen from this that the system platform constructed in this article takes advantage of its online reading. After that, this article conducts the evaluation of English comprehensive scores, and the results are shown in Table 2 and Fig. 9.

Comparison diagram of spoken pronunciation test.
Comparison table of English comprehensive score test

Comparison table of English comprehensive score test.
As shown in Fig. 9 above, the artificial intelligence-based English multimodal online reading mode platform constructed in this paper has certain effects and can effectively improve students’ English scores.
This article introduces the relevant content of artificial intelligence, analyzes the implementation based on the speech spectrum system, and proposes the overall implementation scheme of the artificial intelligence-based college English multimodal online reading system. Through the research on this subject, the whole research process and test results are synthesized. The paper proposes a dynamic method of estimating the noise spectrum. Moreover, the paper dynamically compares each frame, and takes out the spectrum value of each frame to compare with the existing minimum spectrum value, and takes the minimum value of the corresponding point as the minimum spectrum value, and so on. This minimum spectrum value is the noise power spectrum. The discriminant method is improved, and the endpoint detection criterion based on spectral entropy information is proposed. The discriminant process is similar to the threshold endpoint detection method. In addition, this paper uses the Ebbinghaus memory forgetting curve to build a mathematical model of knowledge forgetting and obtains an intelligent memory algorithm to guide students’ personalized learning, which enhances students’ independent learning ability and interest and makes the system more practical. In order to verify the effect of the model, this paper takes the students in the experimental class and the control class as the experimental objects and compares the spoken pronunciation of the students and the comprehensive English scores of the students after the experiment. The research results show that the artificial intelligence-based English multimodal online reading mode platform constructed in this paper has certain effects and can effectively improve students’ English scores.
Footnotes
Acknowledgments
This work was supported by the project: 2019 Undergraduate Teaching Reform Project of Higher Education in Guangxi “Research on New Teaching Mode of Extensive Reading Course Based on Intelligent Education Concept” (Project No: 2019JGB469)
