Abstract
With the continuous development and application of big data technology, its potential and value in the field of education are gradually emerging, especially in oral English teaching, big data is placed on high hopes. However, the research on how to effectively use big data to improve the efficiency of oral English teaching is still in its infancy. This study aims to fill this research gap and explore and analyze how oral English teaching strategies based on big data can improve teaching efficiency through in-depth literature review and empirical research. The results show that big data can help teachers assess students’ oral ability more accurately, and significantly improve students’ oral expression ability and learning efficiency by optimizing teaching strategies. However, oral English teaching strategies based on big data also have certain limitations, which need further research and improvement. This study provides a powerful theoretical basis and practical guidance for promoting the application of big data in oral English teaching.
Introduction
In the information age of the 21st century, big data has penetrated into all areas of life, including the field of education. Big data not only changes the way and method of education, but also provides rich resources and new perspectives for educational research. Specifically, for English education, big data provides teachers and education researchers with huge data resources, such as learner behavior data, learning effect data, etc. The effective analysis and utilization of these data can provide support for the personalization and precision of English teaching. At the same time, oral English is an important part of English learning and an important standard to measure the comprehensive language application ability of English learners. In current English teaching, the improvement of oral English teaching efficiency has always been an important issue that educators and learners pay common attention to. Traditional oral English teaching methods mainly rely on teachers’ direct teaching, and students’ learning effect is limited by many factors such as teachers’ teaching quality and teaching resources. However, this teacher-dependent teaching mode has many challenges in improving the efficiency of oral English teaching. Teachers have limited energy and time, making it difficult to meet the practice and feedback needs of all students. Therefore, exploring how to use big data to improve oral English teaching and improve teaching efficiency has become a challenge in the field of English education.
In recent years, the application of advanced technologies such as cloud computing, speech recognition technology, 5G network, AI recognition and deep learning in English teaching, especially oral English teaching, has become a hot topic of educational technology research. These techniques are of great significance to improve students’ oral English skills, teaching efficiency and teaching quality. Wei [1] studied the application of cloud computing and speech recognition technology in English teaching, and found that these technologies could provide teachers and students with more convenient and real-time learning and teaching resources, thus improving the teaching effect. Similarly, Zhang [2] also proved that the application of Android voice assistant on multimedia English teaching platform has broad prospects. 5G network and AI recognition technology provide more efficient and accurate data processing and interaction methods for oral teaching. Liu [3] proposed a NoSQL database design based on 5G network and AI recognition to support oral English teaching, which provides a flexible and efficient teaching tool for educators. Deep learning also shows its potential advantages in oral English teaching. Li and Liu [4] developed a deep learning-based oral English assisted instruction system, which can more accurately assess and guide students’ oral expression. In addition, dynamic time warping (DTW) algorithm has been widely used in oral evaluation. Fang [5], Duan and He [6] both studied the intelligent oral English assessment system based on DTW algorithm and proved its effect in automatically assessing students’ oral skills. Liao and Li [7] emphasized the importance of culturally responsive teaching in oral English classrooms, which they believe can promote EFL learners’ cross-cultural competence. Moreover, big data and deep neural network technologies have also been used in oral English teaching reform. As Liu [8] mentioned, these technologies have provided a strong driving force for oral English teaching in universities. In general, modern technology has brought many innovative methods and tools to oral English teaching. However, how to integrate these technologies into classroom teaching effectively, and how to ensure that they really improve students’ learning and meet educational goals are still questions that educators and researchers need to explore in depth.
Based on this background, this study aims to explore strategies to improve the efficiency of oral English teaching based on big data. The significance of this study is mainly reflected in the following aspects: First, it tries to fill the gap in the research field of oral English teaching strategies based on big data. Second, it aims to provide teachers and education decision-makers with targeted suggestions on teaching strategies to make teaching more effective. Third, the research results can also help learners choose more suitable learning strategies according to their own situations. The significance of the study is mainly reflected in the following aspects. First of all, theoretically, the research hopes to enrich and expand the research on teaching strategies based on big data through this study, especially the applied research in the field of oral English teaching, so as to fill the research gap in this field. Secondly, in practice, the results of this study can provide teachers with theoretical basis and targeted teaching strategy suggestions to help them teach more effectively. In addition, the research results can also provide references for education decision-makers to guide them to formulate more scientific and effective teaching policies. Finally, the research results can also guide learners to choose more suitable learning strategies according to their own situations and improve the effect of oral English learning. Therefore, the purpose and significance of this study are closely related to the background of improving the efficiency of oral English teaching and promoting the reform of English teaching.
The core content of this study is to explore and analyze how oral English teaching strategies based on big data can improve teaching efficiency. First of all, through a review of relevant literature, the study has a deep understanding of the current application and effect of big data in oral English teaching, including but not limited to automatic oral ability assessment system, the application of virtual reality technology and personalized oral test application. On this basis, this study will use empirical research methods to collect and analyze a large amount of teaching and learning data in the actual oral English teaching environment, so as to deeply explore the mechanism and best practice of big data in improving the efficiency of oral English teaching. In particular, the research will focus on how big data can help teachers assess students’ speaking ability more accurately, and how big data can be used to optimize teaching strategies and improve students’ speaking ability and learning efficiency. Finally, the advantages, limitations and possible improvement directions of oral English teaching strategies based on big data will be discussed in depth, aiming to provide a powerful theoretical basis and practical guidance for the improvement of oral English teaching.
Theoretical basis analysis
Discussion on concepts, characteristics and applications of big data in education
Big data, as an interdisciplinary field, covers information technology, statistics, computer science and other fields. In the field of education, big data mainly refers to the large-scale, diversified and fast-growing data generated by educational activities such as online learning, educational games, tests and evaluations, which can include learners’ behavior data, performance data, feedback data, etc. [9, 10].
The main characteristics of big data include five V’s: Volume, Variety, Velocity, Veracity and Value [11]. In education, these characteristics are mainly embodied in: a large number of data covering a large number of learners and learning activities; diverse data types, including text, images, audio and other forms of data; with the progress of educational activities and learning technology, the scale of educational data continues to grow rapidly. The verifiability of educational big data is low, requiring complex data processing and analysis to discover valuable information. These big data have great value, through data mining and machine learning and other technologies, can provide more scientific decision-making basis for education.
In the field of education, big data is widely used, mainly including: optimizing teaching content and methods, personalized learning, predicting learners’ learning effects, and evaluating teaching effects [12]. For example, by analyzing learners’ learning data, teachers can gain insight into learners’ learning habits and difficulties, so as to optimize teaching content and methods and further improve teaching effect. Through the analysis of personalized learning data, each learner can be provided with a customized learning path to meet their personalized learning needs. Through predictive analysis of learning data, learners who may have learning difficulties can be found early and timely help can be provided to them. At the same time, the evaluation of teaching effect can be used to adjust teaching strategies to continuously improve teaching quality.
Therefore, the application of big data provides a new tool and perspective for education reform, and also opens up new possibilities for improving the efficiency of oral English teaching.
Theoretical framework of oral English teaching
The theoretical framework of oral English teaching can be deeply discussed from two important dimensions: teaching strategy and learning process.
First of all, teaching strategy is the core element to improve the efficiency of oral English teaching. Relevant theories mainly focus on how to provide appropriate listening input, how to motivate and guide students to have meaningful communication, and how to design and implement effective teaching activities [13]. In order to achieve these goals, teachers can use a variety of teaching methods and strategies, including but not limited to situational teaching, communicative teaching and task-based teaching.
Compared to other English skills such as reading and writing, speaking is more about real-time interaction and reaction. Reading mainly focuses on text comprehension and vocabulary use; Writing emphasizes grammatical structure, logic and vocabulary richness. Speaking, on the other hand, requires processing and responding to information in a short period of time, which also means that it has a higher demand for real-time feedback and interaction.
Secondly, the learning process is also an important factor affecting the effect of oral English learning. Theories in this field mainly study individual differences of learners such as motivation, learning strategy and learning style, as well as social and cultural factors such as learning environment and learning community [14, 15]. In-depth understanding and research of these factors can help teachers design teaching activities to meet the needs of students, so as to improve the teaching effect.
The oral English teaching strategy based on big data integrates the above two theoretical dimensions into the actual teaching process. By collecting and analyzing a large amount of teaching data, teachers and researchers can more accurately grasp students’ learning needs and habits, evaluate the effect of teaching activities, and then continuously adjust and optimize teaching strategies to further improve the efficiency of oral English teaching. At the same time, big data also provides new tools and methods for the study of learning process, helping to deepen the understanding of oral English learning process and improve teaching quality [16].
Analysis of data-driven oral English teaching strategies
Data-driven oral English teaching strategy is based on large-scale learning data to guide teaching decisions, and its core concept is “data-oriented.” This strategy relies on real, objective data, rather than relying solely on experience and intuition to make instructional decisions and reforms [17, 18].
First, data-driven teaching strategies can help teachers gain insight into students’ learning behavior and progress. By collecting and analyzing students’ learning data (such as learning time, learning frequency, homework completion, test scores, etc.), teachers can accurately grasp each student’s learning status and progress rate, identify students’ difficulties and challenges in the learning process, and provide targeted teaching help.
In data-driven oral English teaching, teachers will first design a series of teaching activities and embed data collection points, such as interactive questions on online platforms, real-time oral exercises, etc. Next, teachers monitor student responses and outcomes, and the system automatically collects data on students during these activities. Then, with specific analytical tools or software, teachers can extract meaningful patterns from this data, such as which phonetic symbols or words students generally pronounce incorrectly, which topics or situations students are more likely to deal with, and so on. These information provide teachers with targeted feedback, so as to make corresponding teaching adjustments.
Secondly, data-driven teaching strategies can enhance the evaluation and feedback mechanism of teaching effectiveness. Teachers can use data to evaluate the effectiveness of teaching activities, for example, by analyzing student learning data, teachers can identify which teaching activities are more effective and which activities need to be improved. In addition, teachers can provide students with specific and timely feedback based on data, thereby enhancing students’ learning enthusiasm and effectiveness.
Finally, data-driven teaching strategies can provide strong support for personalized teaching. With the results of big data analysis, teachers can fully understand students’ learning needs, interests and styles, so as to design personalized learning paths and teaching activities suitable for all types of students. [19, 20].
For example, after implementing data-driven oral English instruction, one college found that students’ response speed and accuracy in some situational simulation exercises significantly improved. Through data analysis, teachers learned that students were more active in interactive role-playing sessions, so they decided to increase the frequency and difficulty of such activities. After one semester, students’ oral test scores improved by an average of 12%, which is enough to demonstrate the effectiveness of data-driven strategies in oral English teaching.
However, while data-driven teaching strategies have significant advantages, they also face some challenges. For example, data collection and analysis require certain technical and resource support, and data privacy and security issues also need to attract sufficient attention [21]. Therefore, when implementing the data-driven teaching strategy, the research needs to fully weigh the pros and cons and consider a variety of factors comprehensively to ensure the effect and quality of teaching.
Study design and data collection
Research sample and data collection methods
In order to ensure the representativeness of the sample and the credibility of the study, non-native English speaking university students from five different universities participating in online oral English learning courses were selected to ensure that they covered all levels of English from beginners to advanced levels. All selected students were recommended for inclusion in the study based on their willingness, activity and enthusiasm for learning. A total of 1,000 students were selected as a sample. These students come from a variety of backgrounds, including those who already have a basic knowledge of English and those who are just beginning to learn English.
Table 1 shows some basic information of the sample.
Sample basic information
Sample basic information
For further research, the following detailed data collection methods were used in this study:
Basic information of students: The age, gender and learning background of students are collected through the registration information of the online learning platform and the self-filling questionnaire. This data is automatically collected when a student first logs on to the platform and is anonymized to protect the student’s privacy. Learning behavior data: We have configured the data tracking system of the online platform to record students’ behavior data such as login, learning duration and participation activities in real time. In addition, after each teaching activity, the platform automatically sends a questionnaire to the students, asking them about their satisfaction with the activity, difficulties and suggestions. Learning outcome data: This study relies on reports automatically generated by the platform, including students’ homework completion, online exam results, oral evaluation results, etc. In addition, in order to further understand the learning effect of students, we also conducted face-to-face oral tests for some students and evaluated them by a team of experts.
Such data collection methods not only ensure the accuracy and meticulousness of the research, but also follow the relevant ethical norms to ensure the privacy of students and data security.
Data preprocessing
Data preprocessing is a critical step that converts raw data into a format suitable for analysis. In this study, data preprocessing is mainly divided into three stages: data cleaning, data conversion and data normalization.
The first step is data cleansing. In this stage, the original data collected is screened and corrected to ensure the integrity and quality of the data. Specific steps include:
Eliminate duplicate data records to ensure the uniqueness of each data. Identify and deal with outliers, such as excessively high or excessively low learning times, which may be due to system errors or other abnormal behavior. Fill in the missing values, for those data missing due to technical reasons or user error, you can use the mean, median or other methods to fill in. Delete data items that have zero learning time, or that lack certain key information.
The next step is data conversion. At this stage, the necessary transformations are made to the original data according to the requirements of the subsequent analysis. The specific steps are as follows:
Turn the classified data into numerical coding, for example, in this study, we changed the rating of English proficiency from “elementary”, “intermediate” and “advanced” to “1”, “2” and “3”. For some text data, such as feedback from students, text mining technology can be used to extract keywords or emotional scores. Time series data, such as learning frequency, are divided into time Windows or converted into frequency data.
Finally, data normalization. This step standardizes all numerical data to eliminate dimensional effects between the data. Specific operations are as follows:
Using Min-Max normalization, all data is converted to the range [0, 1].
Or use Z-score normalization, which transforms according to the mean and standard deviation of the data so that the new data has a mean of 0 and a standard deviation of 1.
The above pre-processing steps ensure the quality and consistency of the data and lay a solid foundation for subsequent analysis.
After data preprocessing, descriptive statistical analysis was performed to understand the basic situation of the data. Descriptive statistics mainly includes calculating the mean, median, and standard deviation of three statistical parameters.
The formula for calculating the average value is as follows.
Where
The standard deviation is calculated by Eq. (2).
Where
In the study, descriptive statistical analysis was carried out mainly for five data items: “learning time,” “learning frequency,” “homework completion,” “test results” and “oral assessment results.”
The descriptive statistical results of “learning time” and “learning frequency” are shown in Fig. 1.
Descriptive statistical analysis.
The descriptive statistical results of “assignment completion,” “test scores” and “oral assessment results” are shown in Fig. 2.
Descriptive statistical analysis results.
Through such analysis, we can understand the basic performance of the students in the sample in oral English learning, and provide basic data for the subsequent correlation analysis and regression analysis. At the same time, it can also help to find out possible problems. For example, the average “learning time” is 5.5 hours, which means that students spend an average of 5.5 hours on learning spoken English every day. The standard deviation of study time is 1.5 hours, which means that most students study between 4 and 7 hours. If the standard deviation of “study time” is large, it may indicate that the distribution of students’ study time is too spread out and needs to be adjusted.
Descriptive statistical analysis and subsequent regression analysis are complementary: descriptive statistics reveal the basic shape and characteristics of the data, while regression analysis further helps us to discover the internal relationships between variables. Only with a deep understanding of the data can we ensure the accuracy and reliability of the regression analysis.
Correlation analysis
The theory behind correlation analysis is based on the theory of linear relationships, where the relationship between two variables may be positive, negative, or no correlation. When two variables increase or decrease together, they are positively correlated; When one variable increases and the other decreases, they are negatively correlated. The Pearson correlation coefficient is a commonly used metric that quantifies the strength and direction of this linear relationship. In this study, the Pearson Correlation Coefficient was mainly used to measure the degree of linear correlation between variables. Its calculation formula is as follows Eq. (3).
Where
Regression analysis is derived from mathematics and statistics, and its purpose is to explore the causal relationship between variables. This kind of analysis allows us to predict the value of one variable based on the value of one or more other variables. Multiple linear regression analysis considers the influence of multiple independent variables on a dependent variable. Its basic form is shown in the following Eq. (4).
Where,
In the strategy implementation stage, we first conducted correlation analysis on variables such as “learning time,” “learning frequency,” “homework completion,” “test scores” and “oral assessment results” to reveal the correlation between these variables. Clear correlation can provide strong guidance for subsequent regression analysis. In the analysis results, the research focuses on the absolute value of Pearson correlation coefficient; if this value is close to 1, then there is a strong correlation between the two variables.
Then we conducted a regression analysis, setting “oral assessment result” as the dependent variable, “learning time,” “learning frequency,” “homework completion” and “test score” as the independent variables, to analyze how these independent variables affect the dependent variable. In this process, we pay attention to the positive and negative sign of the regression coefficient and its magnitude. A positive regression coefficient indicates that there is a positive correlation between the independent variable and the dependent variable, while a negative regression coefficient means that there is a negative correlation between the two. The size of the regression coefficient reflects how much the dependent variable changes on average when the independent variable changes by one unit.
We use statistical analysis software such as SPSS, R, or Python to perform the above steps. Finally, we will interpret and discuss the analysis results in detail, and propose strategies on how to improve the efficiency of oral English teaching based on big data.
Data analysis and results
Implementation of correlation analysis and presentation of results
The goal of this section is to verify how well our previous theory-based expectations match the real data. Through data collection, sample data as shown in Fig. 3 is obtained.
Sample data collection.
The Pearson correlation coefficient was used to measure the linear correlation between two variables. The Pearson correlation coefficient is calculated by the following Eq. (5).
Where
The study calculated the correlation between oral assessment results and other variables (study time, study frequency, homework completion, test scores), and obtained the results as shown in Fig. 4.
Correlation analysis.
As can be seen from the above legend, the data show the correlation coefficient, and the
The results of correlation analysis show that all variables have high positive correlation with the oral evaluation results, which indicates that these variables may be important factors affecting the oral evaluation results. These results provide a theoretical basis for the subsequent regression analysis of this study.
In order to further explore the effects of learning time, learning frequency, and homework completion and test scores on oral English assessment results, multiple linear regression analysis was conducted. Based on the previous correlation findings and theoretical basis, this study uses a multiple linear regression model to further explore the factors that affect the results of oral English evaluation. The form of the model is shown in Eq. (6).
Where,
The results of regression analysis are shown in Fig. 5.
Regression analysis.
As can be seen from the above legend, the data shows that the regression coefficient and intercept term (
The regression results showed that all the independent variables had significant positive correlation with the oral evaluation results. The most influential variable was test score, followed by assignment completion, study time, and study frequency.
These results indicate that improving the study time, study frequency, homework completion and test scores can effectively improve the oral English assessment results, and thus improve the efficiency of oral English teaching. These results provide the basis for the research and formulation of targeted teaching strategies.
This study needs to clearly identify the effects of various independent variables on dependent variables, and regression analysis provides a framework to quantify these relationships and gain a deeper understanding of the interactions between variables. The flexibility of multiple linear regression allows us to consider the combined effect of multiple variables on a single outcome, especially when working with complex data sets. At the same time, regression analysis can not only identify relationships between variables, but under the right conditions, it can come close to explaining causation. In addition, regression models have predictive power, allowing us to predict the desired outcome under certain conditions. Finally, due to its wide application in many fields, the methods and applications of regression analysis have been widely validated. Therefore, considering the characteristics of the research objectives and data, regression analysis becomes a very powerful research tool in theory and practice.
The results of correlation analysis and regression analysis in this study clearly point out that learning time, learning frequency, homework completion and test scores have significant effects on oral English assessment results. Among them, the test score has the strongest correlation with the oral assessment results, followed by the completion of homework, and the correlation between learning time and learning frequency is weak.
According to the regression model,
It can be found that when other conditions remain unchanged, the oral assessment results will increase by 0.20 units for each additional unit of learning time; for each unit increase of learning frequency, the oral assessment results will increase by 0.15 units; for each unit increase in homework completion, the oral assessment results will increase by 0.30 units; for every unit increase in test score, the oral assessment result will increase by 0.35 units.
By analyzing the data shown in Table 2, we can get a more intuitive understanding.
Results analysis
Results analysis
It can be seen that when the learning time is increased from 10 hours to 20 hours; the learning frequency is increased from 5 times to 7 times; the completion of homework is increased from 90% to 95%; the test score is increased from 80 points to 85 points, and the expected oral assessment result is increased from 70 points to 81.5 points, an increase of 11.5 points. When these variables continue to improve, the expected oral assessment result will reach 94.5 points.
These results confirm the setting of this study and provide strong support for improving the efficiency of oral English teaching in a big data environment.
During the course of this study, the research identified several issues and challenges that could affect the results and implementation of the research.
Representativeness of the data: Although the study collected a large number of data samples, it is still doubtful whether the samples truly represent the majority of non-native English speaking college students. Bias in sample selection may result in results that are not generalizable. Data quality and integrity: Since the collection of data relies on student self-reporting, this may lead to bias or inaccuracies in the data. For example, students may underestimate or overestimate their study time and frequency for a variety of reasons, which has an impact on the accuracy of the results. Difficulty in establishing causation: Although studies have found correlations between multiple variables and oral performance, this does not equate to causation. There are possible potential confounding factors that may affect both the independent and dependent variables. Difficulty in applying strategies: Although research suggests improving oral performance by increasing study time and frequency, how to effectively implement these recommendations in practical application remains a problem. Students’ motivation, educational resources, classroom environment and teachers’ teaching ability all affect the success of the strategy. Limitations of technology and software: In data analysis, tools such as SPSS, R or Python are used in the research, but these tools themselves may have limitations, affecting the depth and breadth of analysis.
As shown in Table 3, these issues and challenges, as well as their possible impacts, require greater attention and addressing in future studies.
Existing problems
These issues and challenges require further attention and resolution in future studies.
Based on the findings of this study, the following suggestions are put forward to improve the efficiency of oral English teaching:
Increase learning time: studies have found that there is a positive correlation between learning time and oral assessment results. Therefore, it is suggested that teachers should arrange more classroom time to train students’ oral English skills, and encourage students to practice more oral English in their spare time. In practical application, this may help students practice speaking more frequently and improve proficiency in speaking skills. Improve learning frequency: learning frequency is also related to oral assessment results. Therefore, it is recommended to design more frequent speaking practice activities, such as weekly speaking practice or daily English speaking day. The implementation of this strategy may make students more accustomed to oral expression in English and thus perform better in actual oral exams. Improve homework completion: data show that homework completion also has an impact on oral assessment results. It is suggested that teachers should arrange more oral related homework, such as recording and video dialogue, and give feedback to these homework, so as to improve students’ oral skills. Such practice may help students better revise and improve their speaking skills. Improve test scores: research shows that test scores are most correlated with oral assessment results. Therefore, it is suggested that more oral tests should be introduced in the teaching process, and the results of these tests should be included in the total score to motivate students to pay more attention to the learning of oral skills. The implementation of this strategy is expected to motivate students to pay more attention to the learning and preparation of oral skills.
The study’s recommendations and expected effects are summarized in Table 4.
Solution strategy
It should be noted that these improvement proposals should be adjusted and modified according to the specific situation, and constantly feedback and optimization during the implementation process.
The strategy of improving oral English teaching efficiency based on big data is a frontier and significant research field. Through the analysis and research of a large number of data, we can have a deeper understanding of the problems and challenges of oral English teaching, as well as how to effectively improve teaching strategies and improve teaching efficiency.
Based on big data and multiple linear regression model, this study explores the factors that affect the efficiency of oral English teaching. The results show that students’ personal background, learning environment, teachers’ teaching strategies and curriculum design have significant effects on oral English teaching efficiency.
However, there are some limitations to this study. First, the scope and depth of data collection needs to be improved. Although studies have collected a large amount of data, these data may not fully reflect all the factors that affect the efficiency of oral English teaching. Secondly, although the multiple linear regression model used in this study can reveal the relationship between various factors and the efficiency of oral English teaching to a certain extent, it cannot fully reflect the complex interaction effects among these factors.
Therefore, future studies can further expand the scope and depth of data collection, consider more factors that may affect the efficiency of oral English teaching, and adopt more complex statistical models to reveal the interaction effects among these factors. In addition, future research can further explore how to apply the research results to the actual teaching process to improve teaching strategies and improve the efficiency of oral English teaching.
In general, this study provides a new research method and strategy based on big data for the improvement of oral English teaching efficiency, and provides an important reference and inspiration for future research.
Footnotes
Funding
This work was supported by A Study on Ideological and Political Practice of College English Curriculum based on Blended Teaching Mode (Item Number: 2022YYJG064).
