Abstract
Predicting students’ course grades is an essential element in teaching. This paper used decision tree generation rules to study the prediction of students’ ideological and political course grades. Firstly, ID3 and C4.5 algorithms were briefly introduced; then, an improved C4.5 algorithm with higher computational efficiency was put forward. The formula of the C4.5 algorithm was optimized using theories such as the Taylor series. Finally, experiments were performed on the UCI dataset and students’ ideological and political course datasets. The results suggested that the average classification accuracy and computation time of the improved C4.5 algorithm was 79.37% and 74.1 ms, respectively, on the UCI dataset, which was better than the traditional C4.5 algorithm. Then, the experiment predicting students’ course grades demonstrated that the average quiz grade and the number of video views had the greatest impact on the final grades. The prediction accuracy of the improved C4.5 algorithm reached 93.46%, and the average computation time was 54.8 ms, which was 19.17% less than the C4.5 algorithm. The experimental results verify the effectiveness of the generation rule of the improved C4.5 algorithm in predicting students’ ideological and political course grades. This algorithm can be applied in the actual grade prediction.
Introduction
Course learning is one of the most basic tasks for college students [1]. With the progress and popularity of the Internet, education has become more informative and digital. In the context of the coronavirus pandemic, online lectures are no longer just a way for students to take lessons and learn extra-curricular knowledge but also gradually become an inevitable teaching choice in colleges. However, online lectures are not conducive to grasping students’ learning situation in the first time; therefore, the analysis of data related to students’ online learning is particularly important. Compared with traditional teaching methods, online lectures break the restrictions of time and space, making teaching more free and efficient. However, compared with traditional teaching, online lectures may lead to less close attention between teachers and students, and teachers understand the learning situation of the course through the final grades and do not know enough about students’ stage learning situation, which leads to higher failure rate of online lectures. Therefore, in order to make up for this deficiency, the prediction and management of students’ learning status can be realized with the help of intelligent methods. Online lectures can retain and record a large amount of data about students’ learning habits and performance [2]. The study of these data can help teachers to predict students’ learning performance so that they can adjust their learning plans appropriately and manage students better [3]. Many methods have been applied in grade prediction [4]. Zhang [5] predicted the physical education performance of college students based on neural networks combined with an artificial intelligence teaching system. They found that the method performed well in grade prediction and evaluation. Chu et al. [6] studied the prediction of students’ online course grades by analyzing students’ clicking behavior and designed a clustering-based approach. They found that the method had substantial improvement compared to the baseline model through experiments on three real datasets. Sekerolu et al. [7] investigated the prediction of students’ performance during learning and performed predictions and classifications on two datasets through five machine learning methods. They found through 18 experiments that students’ performance was predictable. Kuzilek et al. [8] analyzed students’ examination behavior in order to identify students who are at risk of dropping out of school. Four machine learning methods were used to predict students who will pass or fail in the first academic year. It was found through experiments that the method helped instructors to identify at-risk students and improve their chances of passing the first academic year. Sweeney et al. [9] developed a method combining a factorization machine and a random forest to predict students’ course grades for the next semester, and found that the method can be used to accurately predict the grades of new and returning students taking new and existing courses. Baruah et al. [10] proposed a method for predicting student grades in a MapReduce framework based on the fractional competitive multi-verse optimization-based deep neuro-fuzzy network, and found that the mean square error, root mean square error, and mean absolute error of the method were 0.3383, 0.5817, and 0.3915, respectively. Zhao et al. [11] proposed a method for student performance prediction based on fuzzy cognitive model. Through experiments, it was found that the method could effectively improve the accuracy of grade prediction and reduce prediction errors, providing strong data support for teaching reform. Abdullahi et al. [12] proposed a robust cascaded bi-level feature selection technique for student performance prediction, consisting of a relief technique in the first layer and a particle swarm optimization algorithm in the second layer. It was found that the method achieved 94.94% accuracy, i.e., it could predict student performance effectively. With the continuous development of online education, the research on achievement prediction in this field has been gradually deepened, but there are still some shortcomings in the selection of factors influencing student achievement and the accuracy of prediction, and the feasibility of many algorithms in this field has not yet been verified. Therefore, this study analyzed the factors affecting students’ performance in ideological and political courses based on the rules extracted from the historical behavioral data of students using the decision tree method. The prediction results of the unknown data were obtained through these rules to find out the students who may not pass the final exam as early as possible and notify them. This work aims to improve the teaching quality of ideological and political courses and help students to complete the course learning efficiently, thus providing a new tool for optimizing teachers’ teaching means and theoretical support for the further application of intelligent methods in the education field.
Decision tree algorithm and improvement
A decision tree is a tree-structured method, and the top level of the tree is the root node. Every internal node represents a judgment condition, and every leaf node is the decision result. The method has a very wide application in solving classification problems and has good performance in biological sciences [13], medical health [14], and other fields. The main methods of decision trees are as follows.
(1) ID3 algorithm
ID3 is a classical decision tree algorithm [15], which has the advantages of simple process and easy to understand, but it only performs well on small data sets. The core of ID3 is selecting the attribute with the greatest information gain as the root node and obtaining branches to build a decision tree. It is assumed that sample set
where
It is assumed that attribute
where
ID3 classifies samples according to information gain. Ultimately, the information gain of attribute
(2) C4.5 Algorithm
C4.5 is an improvement of ID3 [16], which has a good performance in handling incomplete data and generates rules that are easier to understand; therefore, C4.5 is chosen to predict students’ grades in ideological and political courses. In C4.5, the split information entropy of attribute
where
Ultimately, the information gain ratio of attribute
where
The C4.5 algorithm needs to calculate the information gain ratio of all attributes and takes the attribute with the largest gain ratio as the root node. However, in the calculation process, multiple logarithmic calculations are required, which leads to low efficiency. For this drawback, an improved C4.5 algorithm is proposed.
Suppose that a dataset contains
where
When
Based on this, the equation of C4.5 is improved:
The improved formula can be used to calculate the information gain ratio. The improvement of C4.5 omits the logarithmic calculation, saving computation time and thus improving computational efficiency.
Predictive analysis of students’ performance in ideological and political courses
Performance analysis of the improved C4.5 algorithm
First, the performance of the C4.5 and the improved C4.5 algorithms were compared. The experiment was performed on the UCI dataset [17], as shown in Table 1.
UCI dataset
UCI dataset
Then, experiments were conducted on different datasets, and the final results were averaged. The performance comparison results of different algorithms are shown in Fig. 1.
Performance comparison of different algorithms.
It was noticed in Fig. 2 that the classification accuracy of the improved C4.5 algorithm was higher than that of the C4.5 algorithm. Specifically, the accuracy for classifying the Ionosphere dataset was the lowest, and the improved C4.5 algorithm was 2.06% higher than the C4.5 algorithm (76.68% vs. 74.62%); the accuracy for classifying the CMC dataset was the highest, and the improved C4.5 algorithm was 3.59% higher than the C4.5 algorithm (92.36% vs. 88.77%). The computation time of the improved C4.5 algorithm was significantly less than the C4.5 algorithm. Taking the CMC dataset as an example, the computation time of the improved C4.5 algorithm was only 103 ms, which was 51.42% less than the C4.5 algorithm. Finally, the average accuracy and computation time of the two algorithms for classifying all data sets was calculated. The average accuracy of the improved C4.5 algorithm was 81%, which was 1.63% higher than that of the C4.5 algorithm. The average computation time of the improved C4.5 algorithm was 53.3 ms, which was 28.07% less than that of the C4.5 algorithm. These results indicated the advantage of the improved C4.5 algorithm compared to the traditional C4.5 algorithm.
Relying on the Massive Open Online Courses (MOOC) platform, the data of 172 students in four classes who attended the course named Ideology and Morality and Rule of Law were obtained, as shown in Table 2.
Student course learning data
Student course learning data
The obtained data were processed. After null, abnormal, and duplicate values were removed, 160 experimental data were obtained. Next, the data were discretized as follows.
The number of video views was between 40 and 100, so 40–59 was denoted by S, 60–79 by M, and 80–100 by L.
The number of assignments completed was between 0 and 6; therefore, 0–2 was denoted by S, 3–4 by M, and 5–6 by L.
The number of discussions was between 0 and 30; therefore, 0–10 was denoted by S, 11–20 by M, and 21–30 by L.
The number of absences was between 0 and 12; thus, 0–4 was denoted by S, 5–8 by M, and 9–12 by L.
The average quiz grade was divided into below 60 points and above 60 points, i.e., failing and passing.
The final grade was divided into fail and pass according to score below 60 and above 60.
Then, in order to find data related to the final grade from these attributes, the Pearson coefficient [18] and information gain ratio were calculated. The calculation formula of the Person coefficient is:
where
The results are shown in Table 3.
Attribute analysis
It was seen from Table 3 that the Pearson coefficient and information gain ratio of several items, such as gender, age, user ID, and class, were low, so they were eliminated. Finally, the top five attributes were selected to build the decision tree, and the decision tree obtained is shown in Fig. 2.
The decision tree for predicting studentsâ performance in ideological and political courses.
The decision tree generation rules that can be obtained according to Fig. 2 are as follows.
IF the average quiz grade IF the average quiz grade IF the average quiz grade IF the average quiz grade IF the average quiz grade IF the average quiz grade IF the average quiz grade IF the average quiz grade IF the average quiz grade IF the average quiz grade IF the average quiz grade IF the average quiz grade IF the average quiz grade IF the average quiz grade
The comparison of performance in predicting the grade of ideological and political courses.
Comparison of the computation time.
It was found from the classification rules that the average quiz score, the number of videos views, and the number of assignments completed had the greatest influence on students’ final grades in ideological and political courses. Therefore, in the process of online learning, teachers should urge students to watch videos and complete assignments actively, and at the same time, students should improve their consciousness of learning to achieve high performance in unit quizzes in order to pass the final exam successfully.
The decision tree developed was used to predict students’ grades in the course, and its performance was compared with the C4.5 algorithm. The results are presented in Fig. 3.
From Fig. 3, it can be found that the decision tree obtained by the improved C4.5 algorithm had better performance in prediction compared to the traditional C4.5 algorithm. The prediction accuracy of the improved C4.5 algorithm reached 93.46%, which was 3.34% higher than the C4.5 algorithm; the precision was 92.33%, which was 5.59% higher than the C4.5 algorithm; the recall rate was 90.12%, which was 5.59% higher than the C4.5 algorithm; the F1 value was 91.21%, which was 5.59% higher than the C4.5 algorithm. The results proved the effectiveness of the improved C4.5 algorithm in predicting students’ grades for this course.
Then, the computation time of the two methods was compared. The experiment was repeated five times. The computation time of each time is presented in Fig. 4.
From Fig. 4, it can be found that the computation time of the improved C4.5 algorithm was lower than that of the C4.5 algorithm. In the five experiments, the computation time of the improved C4.5 algorithm was below 60 ms, while the computation time of the C4.5 algorithm was greater than 65 ms. The average computation time of the C4.5 algorithm in the five experiments was 67.8 ms, while the average computation time of the improved C4.5 algorithm was 54.8 ms, which was 19.17% lower than the C4.5 algorithm. These results proved the efficiency of the improved C4.5 algorithm in predicting students’ grades in ideological and political courses.
This paper studied the prediction of students’ grades in ideological and political courses by the decision tree method and designed an improved C4.5 algorithm based Taylor series. It was found through experiments that compared with the traditional C4.5 algorithm, the improved C4.5 algorithm had advantages in classification accuracy and computation time. A decision tree was established, taking the course named Ideology and Morality and Rule of Law as an example. It was found that the average quiz grade, the number of video views, and the number of assignments completed greatly influenced the final grade of students. The prediction accuracy of the improved C4.5 algorithm was 93.46%, and the average computation time was 54.8 ms, which was 19.17% lower than the traditional algorithm. It indicated the reliability of the improved C4.5 algorithm. The improved C4.5 algorithm can be further applied in the actual grade prediction. However, the current study only focuses on one course in the MOOC platform, so there are limitations, such as whether the selected influencing factors are applicable to the prediction of the grades of different courses, the lack of validation on large datasets, and whether the prediction performance of the improved C4.5 algorithm can be further optimized, which are issues that need to be addressed in future work.
