Abstract
The visual analysis method of educational data statistics based on big data mining is studied to improve students’ academic performance. Introducing the Mahalanobis distance and covariance matrix into the Fuzzy C-Means (FCM) clustering algorithm improves the FCM clustering algorithm. Through the improvement of the FCM clustering algorithm, the education data is mined from the massive original education data. The mining results are analyzed statistically, and the statistical analysis chart of education data is drawn. By improving the force-guided layout algorithm, the mined educational data points are written into the elastic graph layout to realize the visual layout. The ECharts data visualization analysis component presents the visual layout results of education data points and the statistical analysis charts of education data. Experiments show that this method can effectively mine educational data and draw statistical analysis charts of educational data. Among them, learning analysis data occupy the highest proportion (15%), and privacy protection data occupy the lowest proportion (only 1%). The method can effectively lay out the educational data points and has a better visual effect. This method can effectively present the results of statistical analysis of educational data in visual form, in which learning analysis data is the most important.
Keywords
Introduction
Under the influence of new technologies, traditional education methods have undergone earth-shaking changes. Through statistical analysis of massive educational data, students’ learning patterns and learning styles can be discovered, helping teachers to change teaching plans promptly and improve the quality of education. Let teaching management develop in the direction of scientific, intelligent and precise. Visually presenting the statistical analysis results of educational data can present the results more intuitively for teachers, speed up the efficiency of teaching management, and improve students’ academic performance. For example, Zhu et al. [1] first carried out relevant statistical analysis on the educational data of a certain university; then, based on time series data, used the visibility graph method to construct the corresponding network; finally, analyzed the generated network topology from different angles-visual analysis. The research results show that people’s attention to educational data obeys a power-law distribution, and the topological characteristics of the network can be used to identify significant nodes and some abnormal data in time series data. This method provides a novel means for representing and describing educational data through complex network analysis. Momentum [2] conducted a statistical visualization analysis of educational data by designing a joint model in which the longitudinal sub-model used a linear mixed-effects model, and the survival sub-model used a cause-specific risk model with random effects of space and spatial risk. Furthermore, random effects define an underlying structure linking event times and longitudinal processes. The univariate intrinsic conditional autoregressive (ICAR) and multivariate ICAR distributions are used to model regional space’s and spatial risk’s random effects in education data, respectively. The statistical visualization analysis of educational data verifies the feasibility of this method. Dong et al. [3] developed SBGNview, a comprehensive R package for statistical visualization and analysis of educational data. By adopting the standard SBGN format, SBGNview greatly extends the coverage of statistical analysis and data visualization of pathway-based educational data to all major pathway databases beyond KEGG. In addition, SBGNview greatly extends or surpasses existing tools (especially Pathview) in design and functionality, including standard input format (SBGN), high-quality output graphics (SVG format) for easy interpretation and further updating, and flexible and open Iterative editing and interactive visualization workflow (Highlighter module). Experiments have proved that this method can effectively visualize and analyze educational data. However, none of the above methods can lay out the data points well, which affects the visualization effect and has certain limitations. Big data mining refers to the data processing process of mining valuable information in massive data, which has better statistical analysis results. To this end, the educational data statistical visualization method based on big data mining is studied to improve the accuracy of educational data statistical visualization analysis.
Lafuente et al. [4] first solved the probability distribution histogram of educational data in each original dimension, then segmented the histogram through the mean shift algorithm, and finally used Radviz technology, combined with the division results. Runfola et al. [5] used deep learning algorithms to extract features from educational data and visualization software to display and statistically analyze educational data features interactively. Mai et al. [6] proposed a new approach to visually analyze educational data to understand students’ learning behaviors and the relationship between these behaviors and learning assessment results, using a custom-made combination of random matrix theory, community detection algorithms, and statistical hypothesis testing. Gu [7] proposed a statistical visualization analysis method for educational data based on data mining. For the input and output parameters of the model, the data mining method is introduced to mine the validity index data, and the construction of the educational data statistical analysis model is completed and presented in a visualized form. Experimental results show that the accuracy of statistical analysis of educational data can reach 97.2%. However, this method’s computational complexity is high, affecting the efficiency of statistical visualization analysis of educational data. Sabor et al. [8] chose an algorithm called DBSCAN (Density-Based Spatial Clustering with Noisy Applied Clustering) to perform cluster statistical analysis on educational data. DBSCAN detects clusters in educational data without knowing the number of clusters in advance, uses DBSCAN to perform clustering statistical analysis on educational data, and uses visualization software to present clustering statistical analysis results. The proposed method has proven to be effective and has considerable potential. However, this method cannot eliminate the correlation between educational data, which affects the accuracy of statistical analysis of educational data clustering.
In the big data mining method, the improved FCM clustering algorithm can quickly mine valuable educational data in massive amounts of educational data. It can eliminate the correlation between various educational data types and improve the accuracy of educational data mining. The improved FR algorithm is applied to write the mined educational data points into the elastic graph layout for the visual layout to improve the visual effect. To solve the deficiencies of the above methods, the educational data statistical visualization analysis method based on big data mining is studied.
Methods
Educational data mining statistics
In the FCM clustering algorithm, the Mahalanobis distance and covariance matrix are introduced to improve the FCM clustering algorithm, eliminate the correlation between various educational data types, optimize the objective function threshold, and improve the accuracy of educational data mining. Using the improved FCM clustering algorithm, the objective function of mining valuable educational data in massive educational data is:
Among them, the membership matrix is
The calculation formula of
Among them, the expected value is
The formula for calculating the Mahalanobis distance
Using the improved FCM clustering algorithm, the specific steps for mining valuable educational data in massive educational data are as follows:
Step 1: Set the distance
Step 2: Determine
Among them, the average distance between the education data sample
Step 3: Calculate the cluster center
Step 4: Continue to iterate and calculate the new
Statistical analysis is carried out on the results of educational data mining, and a statistical analysis chart of educational data is drawn.
The FR algorithm calculates the energy between each educational data point by applying gravity to the educational data points and completes the visual layout of the educational data points by minimizing the energy. The calculation formula of the FR algorithm innerspring model is as follows:
Among them, the natural length between education data points
The energy calculation formula between education data points is as follows:
Among them, the energy function is
By minimizing
To improve the visual layout effect of the educational data points of the FR algorithm, the initial layout in the FR algorithm is optimized. In the process of statistical analysis of the educational data, the related data is connected with a connection line, and the educational data is analyzed according to the connection results. For grouping, the group with the most education data points is allocated in the largest area in the two-dimensional image, and the remaining groups with fewer education data points are allocated in a smaller area near the largest group. At the center of the two-dimensional image, the largest group is placed, and the rest of the groups are randomly placed near this position to generate a clustered education data relationship diagram, which can improve the visualization effect.
Use the ECharts data visualization analysis component to present educational data statistical analysis diagrams and visual layout results of educational data points in a visual form. The functional structure process of this component’s visual analysis of educational data statistics is shown in Fig. 1.
Flow chart of function structure for visual analysis of educational data statistics.
The specific steps of the function structure of educational data statistical visualization analysis are as follows:
Step 1: Determine the interaction icon between the user and the educational data statistical analysis chart and the visual layout results by responding to the mouse event.
Step 2: When a mouse click event occurs, an educational data conversion event will be triggered to convert the educational data statistical analysis graph and visual layout results into the nested data format required by user interaction. When a double-click mouse event occurs, a rollback event will be triggered to restore the educational data statistical analysis graph and visual layout results to their original state.
Step 3: Analyze the educational data statistical analysis chart and the visual layout results to see if there is a corresponding lower-level data. If there is corresponding lower-level data, call the drawing function to redraw the educational data statistical analysis chart and the visual layout results. If there is corresponding lower-level data, a prompt will pop up to remind the operator that the interaction event cannot continue to be completed and that the interaction icon needs to be re-determined.
Step 4: In a visual form, present the educational data statistical analysis diagram and the visual layout results of the educational data points.
Taking a certain university as the experimental object, the university covers an area of 5,888 acres, with a total construction area of 895,400 square meters, including 34 colleges (departments), 87 undergraduate majors, 3300 full-time teachers, doctoral and postgraduate guidance. There are 1525 teachers and 2166 teachers respectively, and 16 academicians of the two academies.
Visualization results of education data mining.
Using the method in this paper to mine valuable educational data from the massive educational data of the university, taking learning analysis data, smart educational data and open educational data as examples, the visualization results of educational data mining are shown in Fig. 2. According to Fig. 2, it can be seen that the method in this paper can effectively mine three types of educational data, namely, learning analysis data, smart educational data, and open educational data, among massive educational data, and there is no confusion among the various types of educational data, which shows the effectiveness of the proposed method. The effect of educational data mining is better.
Based on the results of educational data mining, the method of this paper is used to draw statistical analysis charts of educational data to understand the proportion of various types of educational data in the entire valuable educational data set. The statistical analysis chart of educational data is shown in Fig. 3. According to Fig. 3, in the valuable education data set, the proportion of learning analysis data is the highest, accounting for 15%, followed by online training data, accounting for 10%. Among them, the proportion of privacy protection data is the lowest, accounting for the proportion of only 1%. The experiment proves that the method in this paper can effectively statistically analyze the educational data and draw the statistical analysis chart of the educational data.
Statistical analysis of educational data.
Use the method in this paper to visualize the layout of educational data points. In the excavated educational data, randomly select 50 educational data points, use the method in this paper to visualize the layout, and analyze the visual layout effect of the method before and after the improvement of the method in this study. The edges in the layout results cross the smaller the number, the better the layout effect. The visual layout results of educational data points are shown in Fig. 4. According to Fig. 4a, before improvement, in the visual layout results of educational data points, the number of intersections of the associated lines of educational data points is large, indicating that the layout of educational data points is uneven at this time. According to Fig. 4b, after the improvement, in the visual layout results of educational data points, there is no intersection between the associated lines of educational data points, indicating that the layout of educational data points is relatively uniform at this time, that is, the layout effect of educational data points is better. The experiment proves that after improving the method in this paper, the layout effect of educational data points can be significantly improved; that is, the visualization effect of educational data points can be improved.
Visual layout results of educational data points before and after improvement.
Using the method in this paper, the education data of the university is analyzed visually and statistically, and the results of the statistically visualized analysis of educational data are shown in Fig. 5. According to Fig. 5, it can be seen that there is a certain correlation between the learning analysis data and the other 17 types of education data, and its education data points are the most, occupying the central position of layout optimization, highlighting the importance of learning analysis data in formulating teaching plans and improving education. The importance of student achievement.
Results of visual analysis of educational data statistics.
Visual analysis of educational data is of great significance for improving teaching quality and improving students’ academic performance. To this end, study the statistical visualization analysis of educational data based on big data mining. Through big data mining methods, valuable educational data can be mined in massive educational data, the efficiency of educational data statistical visualization analysis can be accelerated, and educational data statistical visualization analysis methods can be applied [9, 10, 11]. It can provide a reference for schools to formulate teaching plans.
However, the educational data applied by the method in this paper is not fine enough to serve the teaching more accurately [12, 13]. In the future, it is necessary to design an educational data collection method, collect more detailed educational data, and improve the accuracy of educational data statistical visualization analysis.
Footnotes
Acknowledgments
This study was supported by the Backbone Teacher Training Program of Zhengzhou Shuqing Medical College (2022zygg04).
Declarations of interest
None.
