Abstract
In the post-epidemic era, online learning has gained increasing attention due to the advancements in information and big data technology, leading to large-scale online course data with various student behaviors. Online data mining has become a popular and important way of extracting valuable insights from large amounts of data. However, previous online course analysis methods often focused on individual aspects of the data and neglected the correlation among the large-scale learning behavior data, which can lead to an incomplete understanding of the overall learning behavior and patterns within the online course. To solve the problems, this paper proposes an online course evaluation model based on a graph auto-encoder. In our method, the features of collected online course data are used to construct K-Nearest Neighbor(KNN) graphs to represent the association among the courses. Then the variational graph auto-encoder(VGAE) is introduced to learn the useful implicit features. Finally, we feed the learned implicit features into unsupervised and semi-supervised downstream tasks for online course evaluation, respectively. We conduct experiments on two datasets. In the clustering task, our method showed a more than tenfold increase in the Calinski-Harabasz index compared to unoptimized features, demonstrating significant structural distinction and group coherence. In the classification task, compared to traditional methods, our model exhibited an overall performance improvement of about 10%, indicating its effectiveness in handling complex network data.
Introduction
The outbreak of the COVID-19 pandemic has led to a rapid development in online education, largely fueled by advancements in big data and mobile internet technologies. Online education has not only gained global popularity but has also been established as a mainstream educational approach. Research documented in the literature [1, 2, 3] addresses the integration of big data with online education during the pandemic. This research analyzes its impact on educational practices and underscores the technological advancements, challenges, and future prospects in the field of online education. Of particular note is the development of online course evaluation models. These models facilitate a deeper understanding of student learning behaviors and needs, thereby improving the effectiveness of course designs and teaching methods. They also offer the potential to predict student learning outcomes and provide effective feedback to both educators and students. Online course evaluations based on learning behavior data [4] are instrumental in gaining insights into student learning behaviors and needs, which can inform enhancements in course design and teaching methods [5]. Thus, constructing an effective online course evaluation model is crucial for enhancing teaching quality and boosting student learning efficiency.
The analysis and evaluation of course resources by modeling online learning behaviors is an attractive topic. In previous studies, some scholars collected behavioral data of groups in a specific context using questionnaires or experimental tracking and analyzed specific behavioral characteristics inductively using classical statistics or traditional machine learning methods [6, 7]. [8] discusses and evaluates approaches to modeling student online learning and proposes a Bayesian Markov chain-based approach to clustering temporal data. [9, 10, 11, 12, 13] all focus on the use of clustering techniques to analyze student learning behaviors in online courses and have used various clustering methods to group students into different clusters based on their learning behaviors. [14] explores a new method for classifying online educational resources, which employs the Support Vector Machine algorithm to enhance classification accuracy, aiming to assist learners and educators in accessing relevant online resources more effectively. Recent research [15] has introduced a novel data privacy protection algorithm named STHE (Statistical Transformation with Homomorphic Encryption), which combines homomorphic encryption with a privacy-chain based data perturbation technique. The above papers mainly focus on the clustering analysis of student learning behaviors within an online course. But they do not take into account the variability and diversity among courses. At the same time, they ignore the implicit correlations between data, which may lead to inaccurate results.
With the development of deep learning, neural networks have made breakthrough progress in computer vision [16, 17], natural language processing [18], and other directions of their powerful adaptive learning ability [19]. In recent years, graph neural network (GNN) research [20, 21] has continued to make breakthroughs [22] and has made good progress in areas such as node clustering [23] and connection prediction [24]. Unlike traditional convolutional neural networks, graph convolution [25] is able to encode the graph structure of different input data using a neural network model. The topological structure information of the graph is captured while learning the node features. In contrast to ordinary convolutional neural networks that can only update weights, graph neural network learning includes updates to nodes, edges, and global information. In online course data, students’ learning behaviors such as course click rate [26], comment count [27] and assignment completion rate [28] can reflect the correlation relationships between learning behaviors. In summary, there is great potential to use graph neural networks to analyze and evaluate online course data. GNNs are essential for clustering student performance since they can effectively capture the relationships between student data. By analyzing these patterns, GNN models can adapt to diverse learning behaviors and provide accurate insights into students’ academic progress.
Based on the preceding analysis, we propose an online course evaluation model based on a graph auto-encoder. This approach introduces a graph structure among courses to more accurately predict students’ utilization of course resources. Simultaneously, we generate appropriate embedding vectors for data samples, reducing noise and redundancy within the online course data. Specifically, the graph auto-encoder determines suitable embedding vectors for online course data samples, learning node embedding features through these vectors for graph reconstruction. This method further uses neural networks to learn the posterior distribution of the coding process and performs parameter optimization by reconstructing the error and KL divergence. Finally, the learned embedding vectors are clustered using traditional machine learning clustering methods. Our contributions include:
We propose a graph neural network-based mining and analysis method for online learning data, which models the association relationships among online learning data and mines important embedding information representations to complete online course evaluations. Graph structures are utilized to represent the correlations between courses, and GNNs are employed to capture associations in online course learning behavior, enabling a more comprehensive evaluation and analysis of online courses. Graph Auto-encoder (GAE) has been introduced to learn implicit embedding features, aiming to address the redundancy and noise present in the original dataset. We collected online course behavior data and performed unsupervised clustering and semi-supervised classification on both real and public datasets to complete the online course evaluation, and the final results show that the evaluation results of this paper’s method are better than the baseline method.
The paper is structured as follows: Section 2 presents the related work, Section 3 introduces the methodology of this paper, Section 4 and 5 describes experiments and case analysis, and finally, Section 6 concludes the paper and looks forward to the next plan.
Related work
Online learning behavior analysis
Online education, as an extension of traditional distance learning, has emerged as a prevalent method of instruction. It necessitates novel presentation and interaction techniques [29], heralding new paradigms in online education. However, there is a lack of in-depth research into the profound impact of student interactions in these settings. The burgeoning field of educational data mining [30] has positioned the analysis of learning behavior data as a pivotal area of inquiry within data mining. A critical aspect of handling online educational data involves transforming sparse and irregular datasets into structured and actionable formats. The prowess of machine learning in data processing, with its superior ability to manage and interpret complex datasets, is extensively utilized in this context. As highlighted in [31], a significant benefit of machine learning lies in its capacity to predict student performance by analyzing individual behavioral patterns, thereby identifying and addressing the limitations of various educational techniques. [32] demonstrates this by employing Recursive Clustering to categorize students in programming courses based on their distinct performance metrics, facilitating tailored educational strategies to enhance learning efficacy. A study presented in [33] introduces an approach for analyzing student behaviors in virtual learning environments. This method involves defining a ’student community’ and employing the K-means algorithm and normalized compressed distance for its identification. Such techniques enable the extraction and analysis of student behavioral features, crucial for understanding and predicting academic performance. However, these methodologies often overlook the interconnectedness of behaviors across different courses and their aggregate effect on overall student learning outcomes.
Deep learning methods can be employed in the process of analyzing student behavior. In [34], the paper collected student exercise data based on bidirectional LSTM to track student behavioral characteristics and offer tailored recommendations for subsequent educational development. In [35], an end-to-end student performance prediction model (Tri-branch CNN) was utilized to effectively capture student behavioral traits and forecast student actions. An attention mechanism and cost-sensitive learning strategy [36] were introduced to further enhance prediction accuracy, enabling personalized education and academic alerts for students. In [37], authors introduced a neural network algorithm to extract knowledge patterns from student datasets, constructing a prediction model for anticipating students’ academic performance. These studies showcase the application of deep learning in analyzing student behavior. However, existing research often falls short of fully harnessing these analytical findings to formulate concrete recommendations for educational interventions.
In existing research, while there has been in-depth discussion on modeling behaviors in online courses, most efforts focus primarily on surface-level behavioral feature analysis, neglecting the impact of redundancy and noise in the raw data on prediction accuracy. Moreover, current methods often fail to adequately consider the interrelations of behaviors across different courses, leading to a less comprehensive understanding of student learning patterns. These limitations reveal shortcomings in current approaches to handling complex online learning data and in uncovering deeper behavioral connections, indicating a need for further enhancement.
Graph clustering
Clustering algorithms play an important role in the data processing of online student performance prediction. A paper in [38] proposed a global K-means algorithm, which is an incremental clustering algorithm that dynamically increases the clustering centers by the global search, thus making the clustering results more accurate than the traditional K-means algorithm. A paper in [39] introduced the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, a pioneering density-based algorithm that finds outliers and noise. A paper in [40] describes in detail the analysis and algorithm of Spectral Clustering and tests the feasibility of this clustering method in an experiment. In the field of data privacy protection, an innovative study using an algorithm called ST3DSA [41] (Statistical Transformation with Three Dimensional Shearing) enhances the privacy of learning data through sophisticated data perturbation techniques, offering new perspectives in analyzing student behaviors in online educational environments.
However, these methods often overlook the topological structure of data and fail to capture complex patterns of student behavior. The application of graph auto-encoders offers a new perspective for handling such structured data, not only improving the quality of embeddings within the model but also enhancing predictive accuracy by reducing data redundancy and noise. The paper of [42] proposed an auto-encoder (AE) approach to perform data dimensionality reduction thus making data analysis easier. A subsequent paper in [43] proposed a VAE approach and a new variance underline estimation method for approximate inference of variables. In the above methods, feature extraction always focuses on its own information and tends to ignore the relationship between each other, i.e., topological information. Therefore, subsequent researchers have proposed the use of graph structures to consider topological information and integrate features of the data. The paper in [44] introduced graph auto-encoders (GAE) and Variational graph auto-encoders (VGAE), an unsupervised framework for graph-structured data based on Variational auto-encoders (VAE), a model that uses latent variables to learn the undirected potential representations of graphs. In the application research of graph neural networks, the STIF [45] (Statistical Transformation with Intuitionistic Fuzzy) algorithm demonstrates its ability to protect privacy in complex datasets, providing valuable insights for understanding and analyzing the data processed by graph neural networks. A new adversarial regularization framework for graph embedding was proposed in [46], through which two variants of the adversarial model, namely Adversarial Regularized Graph Auto-Encoders (ARGA) and Adversarial Regularized Variational Graph Auto-encoders (ARVGA), were proposed to learn graph embeddings efficiently.
The graph auto-encoder not only enhances the integrity of graph structure embeddings within the model but also reduces data redundancy and noise to some extent. Inspired by this technology, we have utilized graph auto-encoders to construct an online course evaluation model aimed at generating reliable embedded feature vectors, thereby significantly improving the accuracy and performance of course evaluations. This approach allows us to analyze online course data more deeply, providing more precise evaluation results to support the making of educational decisions.
Methodology
To better complete the task of online course evaluation, we propose a graph auto-encoder-based online course evaluation model, which assists administrators in gaining a better understanding of the teaching and learning status of teachers and students within the course. Figure 1 illustrates the framework of our proposed graph auto-encoder model, which consists of four parts: online course data pre-processing, graph construction after data processing, graph auto-encoder model optimization, and downstream tasks. In the phase of the online course data preprocessing, we normalize and analyze feature correlation of the collected online data for better processing of online course data and extract features
The framework of the online course evaluation model based on graph auto-encoder.
Our online course dataset consists of 10000 pieces of online course information provided by the National Open University of China. We retain the features that are useful for evaluating online courses and remove those with excessive abnormal or missing data. Finally, 13 features were curated to encapsulate student engagement and course dynamics, including:
(1) CourseBehaviorDays represents the number of days it takes for a student to complete the entire course. This is a critical metric for evaluating a student’s time management skills and the duration of their engagement with the course. A shorter CourseBehaviorDays may indicate efficient course completion, while a longer duration may suggest that students require more time to grasp and absorb the course content;
(2) BehaviorNum measures the quantity of various interactions and actions performed by students while participating in an online course. These interactions and actions include clicks, views, submissions, answering questions, and participation in group discussions. BehaviorNum can be used to analyze student engagement, activity levels, and the appeal of the course;
(3) ViewNum signifies the frequency and number of times students access the course. This reflects the level of student interest and attention towards the course. A high ViewNum may indicate that the course content is engaging, while a low number may require further analysis to understand why students are not accessing the course frequently;
(4) ActivitiesViewedNum indicates the number of times students view specific course activities or content items. This includes viewing course modules, materials, video lectures, and more. By tracking ActivitiesViewedNum, one can gain insights into which parts of the course content are most important or appealing to students;
(5) ResourcesViewedNum reflects the frequency with which students view specific resources, such as course materials, reference documents, external links, and more. A high ResourcesViewedNum may suggest that students actively explore relevant materials, while a low number may prompt the improvement of resource accessibility or appeal;
(6) FormalExamsNum refers to the number of formal exams or quizzes taken by students. This metric can be used to assess students’ evaluation and testing performance in the course;
(7) TasksReviewedNum represents the number of times students view and review course tasks. This reflects students’ attention to assignments and tasks and their active participation in course assignments;
(8) ExamsReviewedNum indicates the frequency with which students review course quizzes. This can be used to evaluate the importance students place on quizzes and self-assessment and their efforts in exam preparation;
(9) ForumReviewedNum represents the number of times students view the course forum. This can be used to understand whether students actively participate in discussions, interact with peers, and the level of their engagement with the course community;
(10) PostNum signifies the number of posts created by students on the course forum. This reflects students’ active participation in course discussions and their willingness to share opinions, questions, and experiences;
(11) ReplyNum indicates the number of times students reply to other posts on the course forum. This can be used to assess students’ collaboration and interaction skills and their responses to other students’ viewpoints;
(12) StudentsRepliedNum represents the count of students who have replied to posts created by other students. This can be used to measure peer interactions and the extent of sharing among students, as well as the activity level of the course community;
(13) ResourcesUploadedNum denotes the number of resources uploaded by students to the course content. This includes notes, materials, projects, and more shared by students. A high ResourcesUploadedNum may indicate active student contributions to the course, enriching the course content.
This comprehensive suite of features enables a nuanced analysis of online learning behaviors. In order to visually observe the characteristics of these data, we statistically analyzed the data features, which are described in detail in the following Table 1. The statistical analysis of this dataset reveals the complexity of student behavior patterns and course interactions within a virtual learning environment. By meticulously quantifying user behavior, these statistics offer insights into educational dynamics on various levels. The observed mean and standard deviation indicate a common disparity in learning behaviors. For instance, the extremely high standard deviation in the number of behaviors suggests a significant variation in student engagement levels within online courses. This might signify a substantial difference in the appeal of course content and design among different student groups, or it may reflect the diversity in students’ learning needs and background knowledge. Percentiles further depict the tail behavior of the data distribution, particularly in the upper percentile ranges. For most features, the percentiles above 90% approach the maximum values, which may indicate the presence of a long-tail effect. Such an effect is common in educational data, typically characterized by the majority of students participating only to a limited extent, while a minority are highly active. The minimum and maximum values provide clues about potential outliers or exceptional behaviors. For most features, the minimum value is zero, indicating that some students have no recorded activity in certain areas. Meanwhile, the significant variability in the maximum values points to extreme cases of students whose behavior greatly exceeds the average level. By synthesizing these statistical metrics, we can gain a comprehensive understanding of the different learning patterns and behavioral trends present in the online learning environment.
Description of online course behavior data characteristics
Description of online course behavior data characteristics
Features correlation analysis of online courses behavioral data.
In Fig. 2, we present the results of a feature correlation analysis performed on the behavioral data from the online courses. The heat map visualizes the strength of the linear relationship between pairs of features, with the color intensity indicating the degree of correlation, ranging from 0 (no correlation) to 1 (perfect correlation). The correlation matrix reveals several noteworthy relationships. The number of behaviors (BehaviorNum) and the number of views (ViewNum) exhibit a strong correlation, suggesting that as students engage more frequently with the course material, the number of times they view content also increases. This relationship underscores a key aspect of student engagement, where active participation is closely tied to content consumption. Similarly, the correlation between review quizzes (ExamsReviewedNum) and review assignments (TasksReviewedNum) is pronounced, hinting at a pattern where students who are diligent in reviewing quizzes are also likely to be consistent in reviewing assignments. This correlation could reflect a systematic approach to learning, where students who invest time in reviewing are doing so across multiple types of assessments. Other interesting correlations can be inferred from the heat map. For instance, the number of formal exams taken (FormalExamsNum) and the number of tasks reviewed (TasksReviewedNum) also appear to be moderately correlated. This might suggest that courses with more frequent formal assessments tend to have more tasks that require review, or vice versa. The lower correlation coefficients between forum activities (such as PostNum and ReplyNum) and other features might imply that these forms of engagement are less tied to the structured learning activities like viewing course content or completing assessments. This could be an area for further investigation to understand how forum activities contribute to overall student engagement and learning outcomes. Additionally, the peer interaction metric (StudentsRepliedNum) shows varying degrees of correlation with other features, which could be indicative of the role that peer feedback and interaction play in the learning process. Notably, some features exhibit little to no correlation with others, which may indicate that they represent distinct aspects of student behavior that do not necessarily coincide with other measured activities. Overall, the heat map in Fig. 2 provides a comprehensive view of how different features of student behavior interrelate. Understanding these correlations is crucial for subsequent data analysis, as it allows for the identification of key features that might influence student success and could be considered in predictive modeling or targeted interventions to enhance student engagement and learning outcomes.
After a thorough analysis of the dataset, we proceeded to preprocess the data for our subsequent analytical procedures. One critical step in this preprocessing was the normalization of data, where we scaled all numerical features to a uniform range of 0 to 1. This transformation is crucial for comparative analysis and to prepare the data for algorithms that are sensitive to the scale of input variables. By mapping each feature to a 0-1 scale, we ensured that no single feature would dominate the results due to its original scale, allowing for a more balanced and nuanced interpretation of the data.
Building graph structures allows us to connect the different online course data together and understand the complex relationships between them. This can help us identify patterns and connections that may not be immediately obvious, and perform advanced analysis to better understand the overall learning behavior and patterns within the online course.
In online data analysis, we can clearly see the relationship between data by using graph structures and using the information in graph structures to conduct more effective data analysis. For example, in course data, in order to study the correlations between different courses, we can construct a graph based on students’ behaviors in the courses, where the nodes represent the courses and the edges represent the correlations between the courses. This way, we can discover which courses have strong correlations through the graph structure and further investigate the sources of these correlations.
Each vertex
where
Edges between vertices are defined based on a similarity metric, which is used to establish significant relationships between courses. The adjacency matrix
the threshold
The similarity between feature vectors is computed using the cosine similarity:
where the dot product
To gain a deeper understanding of the structure and connectivity of the graph, We introduce the normalized Laplacian matrix
where
and
The eigenvalues and eigenvectors of
This graph structure forms the backbone of our subsequent analysis using the graph auto-encoder model, bridging theoretical constructs with practical applications and paving the way for enhanced understanding and prediction of learning outcomes.
Graph auto-encoder can effectively capture the complex relationships and patterns within the data. Additionally, the graph auto-encoder allows for the incorporation of both structural and attribute information, which can provide a more comprehensive understanding of the online learning behavior and improve the performance of the analysis.
In our model, we further use a graph auto-encoder [44] to learn the embedded features, primarily aiming to find suitable Embedding vectors for the nodes in the graph and reconstruct the graph through the Embedding vectors. The graph auto-encoder includes two types: GAE and VGAE. Essentially, they use the encoder to fit the mean and variance of the hidden Embedding and then use the decoder to reconstruct the real samples.
Let the features of the nodes be a
which
At this point, GAE loss function is defined as follows:
To make the model robust to noise, we add a regular term to the auto-encoder. Therefore, in the loss function section, we optimize the variational lower bound to obtain the loss function of VGAE:
where
This study aims to conduct an in-depth analysis and evaluation of online courses through Graph Auto-Encoders. The experimental process is divided into two core parts: clustering analysis and classification analysis, each employing a distinct algorithm tailored to specific tasks.
[b]
The clustering analysis (see Algorithm 3.4) primarily aims to explore and identify latent patterns of student behavior in online courses. By clustering our own course data, we can group courses based on the similarity of student interaction and participation, thereby aiding educators and course designers in understanding the learning dynamics of different groups. This process involves initializing model configurations, loading feature data, and preprocessing it into tensors. Subsequently, we generate edge indices and labels, determine the number of input and output channels, and initialize the appropriate model based on command line arguments. After a series of training and testing processes, the model performs clustering on the embeddings and computes clustering quality metrics, such as the Silhouette Coefficient and Calinski-Harabasz Index, to evaluate the effectiveness of the clustering.
The classification analysis (see Algorithm 3.4) focuses on predicting the level of student engagement with courses, utilizing public MOOC datasets. This process includes using k-fold cross-validation to enhance the accuracy and generalizability of model performance evaluation. In each fold, data is split into training and test sets, and the model learns to differentiate levels of course engagement through the training set. During the testing process, the model’s performance is assessed using metrics such as accuracy, precision, recall, and F1 score. After each training round, the algorithm outputs performance metrics for that period, as well as comprehensive performance metrics for each fold.
By combining these two methods of analysis, we can not only identify prevalent behavior patterns within student groups but also predict future trends and changes in courses, providing guidance for personalized educational pathways, optimizing course design, and ultimately enhancing the quality of the entire online education ecosystem.
Online course analysis and evaluation
In the realm of education, the analysis and evaluation of online courses hold paramount importance. Research endeavors in this field provide invaluable insights into the learning behaviors and patterns exhibited by students. This, in turn, facilitates the identification of avenues for enhancing course design and delivery, thereby bolstering the efficacy of online education. Given the surging popularity of online learning platforms, a profound comprehension of student learning dynamics and their interaction with online course materials becomes imperative. Such understanding paves the way for the holistic advancement of online education quality.
Online course clustering analysis and evaluation
Online course clustering analysis and evaluation represent a pivotal approach for categorizing analogous learning behaviors within online courses. This methodology empowers educators and administrators with a deeper comprehension of student learning dynamics, subsequently enabling them to furnish tailored support to individual students. The objective of this endeavor is to partition the extensive pool of courses data into distinct clusters. In essence, this process involves segregating the dataset into subsets that adhere to a predefined metric of distance measurement. Given the voluminous and intricate nature of online course behavior data, a statistical evaluation of these datasets is essential. This evaluation is achieved through the utilization of clustering or density-based clustering techniques, which in turn facilitate subsequent data analysis and prognostication of course trends. To this end, the latent features
The ramifications of the obtained clusters are manifold. Clustering unveils prevalent behavioral patterns among students, encompassing phenomena like frequent course withdrawals and diminished engagement. The discerned clusters can be leveraged to recommend personalized course trajectories for students, predicated on their distinctive behavior patterns. Scrutinizing the behavioral tendencies of students within each cluster empowers course designers to pinpoint areas warranting refinement, thereby amplifying the student experience. Furthermore, these clusters can be harnessed as input features for predictive models, facilitating the anticipation of forthcoming course trends. The outcomes of the clustering analysis serve as a yardstick for evaluating the efficacy of educational interventions and offer a data-driven compass for informed decision-making.
In conclusion, the realm of online course analysis and evaluation stands as a linchpin for the evolution of online education. The amalgamation of advanced clustering techniques with traditional methodologies not only unravels intricate behavioral patterns but also forges a pathway towards personalized and optimized educational experiences. As online education continues its upward trajectory, the insights gleaned from these analyses are poised to shape and enrich the landscape of modern education.
Online course classification analysis and evaluation
Online course classification plays a pivotal role in enhancing the accessibility and effectiveness of digital learning platforms. By systematically categorizing online courses into distinct segments, both learners and administrators can derive substantial benefits. Online course classification delves into the profound implications of online course classification, employing the framework of semi-supervised learning to predict course popularity. The overarching objective is to seamlessly partition the online course dataset into discrete categories based on popularity labels, subsequently dividing the dataset into designated Training and Test sets. The model’s refinement is achieved through the utilization of a cross-entropy loss function during the training phase. Subsequently, the meticulously trained model is deployed to classify courses within the Test set, culminating in a robust classification system.
The applications of online course classification extend far beyond mere categorization, encompassing an array of practical implications. Through the stratification of courses into varying proficiency levels-such as beginner, intermediate, and advanced-learners are empowered to make well-informed decisions tailored to their individual skill sets. This systematic arrangement not only aids learners in course selection but also empowers administrators to effectively curate and manage the course repository. Furthermore, the classification model can be harnessed to deliver personalized course recommendations, leveraging learners’ historical course engagement and preferences. This personalized approach not only elevates student engagement but also serves as a catalyst for motivation.
A pivotal facet of this classification paradigm lies in its capacity to illuminate the popularity dynamics of diverse courses. This insight equips administrators with a data-driven vantage point, enabling informed decisions regarding course offering, marketing strategies, and potential enhancements. The classifier’s adeptness in suggesting courses aligned with learners’ proficiency and preferences further amplifies its potential to enrich the overall learning experience. Moreover, the classification framework offers a window into the nuanced dimensions of course popularity, affording course designers the opportunity to holistically tailor their design and structure to cater to learners’ evolving needs.
In summary, online course classification encapsulates a comprehensive exploration of online course classification, underscored by the application of semi-supervised learning techniques. The ramifications span a spectrum of domains, from facilitating learner-centric course selection and personalized recommendations to empowering administrators with invaluable insights for strategic decision-making and enabling course designers to craft content that resonates profoundly with the target audience. The synthesis of these academic endeavors culminates in an enriched online learning ecosystem, fortified by the strategic deployment of course classification methodologies.
Experiment and analysis of course clustering
This section performs course clustering and analysis on a real online education dataset. It is verified that the performance metrics of the method in this paper outperform several other clustering methods. The training procedure is implemented with PyTorch and a CPU (GeForce RTX 3090).
Comparison method
In this study, we employ six fundamental clustering techniques as the baseline for our experimentation: K-means, Spectral Clustering, OPTICS, DBSCAN, Mean Shift, and BIRCH. K-means and Spectral Clustering are chosen as the focus for this paper to assess clustering performance pre and post feature optimization. The following is a detailed description of the core principles and characteristics of each clustering method:
K-means: K-means is a centroid-based clustering method. Its central concept involves assigning sample points to the class whose centroid is closest. This algorithm is straightforward to implement and suitable for handling discrete and large-scale data. It particularly shines when dealing with numerical data, where distances between samples are computed to determine centroids and class assignments. K-means is a classical approach in the clustering domain.
Spectral Clustering: Spectral Clustering operates on graph theory principles. It abstracts sample points into a graph and clusters them by solving the top k smallest eigenvalues of this graph. Spectral Clustering is adept at capturing non-linear relationships in data, making it well-suited for uncovering underlying structures in complex datasets. This method is particularly valuable when dealing with datasets where the relationships are not easily discernible through linear methods.
OPTICS: OPTICS is a density-based technique, clusters data points is based on their density. In contrast to DBSCAN, OPTICS can identify clusters of varying densities, making it more adaptable to datasets with significant density variations. By leveraging density information, OPTICS provides insights into the varying structures within the data.
DBSCAN: DBSCAN is another density-based approach. It classifies data points into different clusters by setting a threshold value. DBSCAN is effective at identifying arbitrarily shaped clusters and is capable of automatically filtering out noisy points. It performs well on datasets with irregular data distributions and is robust to noise.
Mean Shift: Mean Shift is a density-based technique that clusters sample points by iteratively updating the mean value of points. It can adaptively find high-density regions in the data distribution, making it suitable for clusters of various shapes. Mean Shift is particularly effective in identifying modes within the data distribution.
BIRCH: BIRCH is a hierarchy-based clustering method that classifies data points into clusters by constructing CF trees. BIRCH is suitable for large-scale datasets and efficiently performs clustering tasks. It demonstrates strong performance, especially in scenarios with limited memory resources.
For our method, a total of four models are set in this paper according to whether or not to use variational and the number of layers of graph convolution in the encoder: GAE (1 layer GCN), GAE (2 layers GCN), VGAE (1 layer GCN), VGAE (2 layers GCN).
Experimental settings
The main training parameters were set as follows: epoch
K-means algorithm was executed 10 times with different prime seeds to reduce the impact of initialization randomness on the final clustering outcomes, and the maximum number of runs was set to 300 to ensure that the algorithm had sufficient iterations to converge to a stable solution. The number of neighbors used to construct the affinity matrix in spectral clustering was chosen to be 10, which is sufficient to capture the local features of the data structure while avoiding overfitting. OPTICS selected the Minkowski metric with DBSCAN selected Mean Shift was set with a maximum number of runs of 300, The number of neighbors
In all cases, these hyperparameter choices were based on a series of experiments in which we evaluated the performance of the algorithms and maximized accuracy while ensuring the stability of the results. Moreover, the selection of these parameters also took into account the limitations of computational resources, to ensure the optimal outcomes under our experimental conditions.
The clustering evaluation metrics used in this experiment are:
Silhouette Coefficient [47]: it combines both cohesiveness and separation and can be used to evaluate the effect of different algorithms, or different ways of running the algorithm, on the clustering results based on the same original data. The contour coefficients of the clustering results take values up to [
Calinski_Harabasz [48]: the score is defined as the ratio of inter-cluster dispersion to intra-cluster dispersion and is calculated by evaluating the inter-class variance and intra-class variance, the larger the score the better the clustering
Davies_Bouldin [49]: this metric calculates the sum of the average intra-class distance of any two classes divided by the distance between the centers of the two clusters to find the maximum value. a smaller DB value indicates that the clustering results are tight within the same cluster and far apart from the different clusters.
Results and analysis
Since the traditional clustering methods are divided into two types: those that specify the number of clusters (K-means, spectral clustering) and those that cannot specify the number of clusters (Optics, DBSCAN, Mean Shift, and BIRCH). For the sake of completeness and scientificity of the experiments, we input the data into the methods in which the number of clusters cannot be specified and the methods in which the number of clusters can be specified to perform clustering separately.
The result of the clustering method which cannot specify the number of clusters
The result of the clustering method which cannot specify the number of clusters
We input the original data into the clustering method that cannot specify the number of clusters for clustering, and the index evaluation of the clustering results is shown in Table 2. From the table, we can observe that the number of clusters for Optics, DBSCAN, mean shift, and BIRCH are set to 2, 2, 4, and 3, respectively. The Optics algorithm identifies a significantly higher number of noise points, totaling 8,713, which indicates a considerable presence of extreme values or outliers within the dataset. This characteristic reveals the presence of atypical behaviors such as extreme evaluations or rare user feedback patterns, providing a basis for identifying and addressing such data. The DBSCAN and Mean Shift algorithms show excellent performance in terms of silhouette coefficients, scoring 0.80 and 0.98, respectively. The silhouette coefficient measures the degree of tightness of data points within a cluster relative to other clusters. The high silhouette coefficients for DBSCAN and Mean Shift indicate that these methods are particularly effective in ensuring internal consistency and clear separation between clusters, which is highly effective for differentiating groups of students with similar evaluation behaviors. In terms of the Calinski_Harabasz index, the BIRCH algorithm leads with a score of 1835.18, which assesses the ratio of within-cluster cohesion to between-cluster separation. The high score of the BIRCH algorithm on this index signifies that its clusters are highly distinct in space, indicating that the method can clearly differentiate students’ feedback on course content and teaching methods. With the lowest Davies_Bouldin index score of 0.01, the Mean Shift algorithm demonstrates that its clusters are not only highly cohesive internally but also markedly separate from other clusters. This characteristic is particularly suited to identifying groups of students with distinct opinions on teaching content.
The qualitative outcomes obtained through these clustering methods provide a clear understanding of the data structure within online course evaluations, offering valuable insights for subsequent improvements in course content and teaching method optimization.
The result of the clustering method which can specify the number of clusters
In the subsequent analysis, the method proposed in this paper is applied to the clustering methods where the number of clusters can be specified, and the results are shown in Table 3. Intentionally setting different numbers of clusters (2, 4, 6, and 8), we examine the impact of the number of clusters on the clustering outcomes. A cross-sectional comparison reveals a key finding: regardless of the number of clusters, the clustering results of the features optimized by the auto-encoder are always significantly better than those without feature optimization. Specifically, for the Calinski_Harabasz metric, the best results of the two clustering methods, K-means and spectral clustering, are more than ten times better than those without feature optimization.
Moreover, in the longitudinal comparison, the best Silhouette Coefficient is 0.99 when the number of clusters is set to 2, indicating extremely high internal consistency and clear separation between clusters. The best Calinski_Harabasz score is 37268.41 when the number of clusters is 8, further proving the effectiveness of optimized features in spatial separation. And the best Davies_Bouldin score is 0.16 when the number of clusters is 2, confirming the tight aggregation within clusters and effective separation between them.
Combining these experimental results, we are able to more accurately segment students into different groups based on their learning characteristics, providing a basis for more personalized and effective teaching. This also allows for a clear identification of the strengths and weaknesses of course content and teaching methods, and making improvements accordingly. By conducting an in-depth analysis of the preferences and needs of different student groups, we can refine teaching strategies, enhance the attractiveness of course content, and undertake more effective marketing and sales efforts.
In future research, these findings can guide educators and course designers on how to adjust teaching methods based on student feedback, how to provide customized instruction for different learning styles and needs, and how to optimize online courses to maximize learning outcomes and student satisfaction.
Visualization of K-means clustering.
Visualization of SpectralClustering.
By applying the LDA method for dimensionality reduction to the four models proposed in this paper, we can visualize the learned features more intuitively, as shown in Figs 3 and 4. After LDA reduction, different models display their own clustering characteristics on the visualization charts, and each model exhibits some outliers. This not only confirms the reliability of our clustering results but also demonstrates the effectiveness of the clustering algorithm proposed in this paper. Additionally, these results indicate that the models presented in this paper are capable of identifying anomalous data, and providing valuable references for subsequent data quality management, data mining, and data analysis. By detecting abnormal data through clustering, we can gain a deeper understanding of students’ learning situations, the effectiveness of teaching, and the quality of teaching, thereby enhancing the overall quality of education.
The visualization results in Figs 3 and 4 reinforce this point, illustrating how the LDA method helps us identify the main trends and patterns in the data. These trends and patterns are crucial for revealing the learning characteristics and feedback of different student groups. Identifying these patterns enables educators to adjust teaching strategies according to the specific needs of student groups, optimize course design, and achieve more personalized and effective teaching. Moreover, these findings provide a basis for predicting student behavior, identifying potential learning obstacles in advance, and formulating targeted intervention measures. In summary, LDA dimensionality reduction and visualization offer a powerful tool to represent complex datasets in a visual form, providing empirical evidence for teaching and course improvement.
To further verify the effectiveness of the method in this paper, this section performs online course classification on the publicly available MOOC online course dataset, which contains features such as course difficulty, lecture length, course units, etc. In this paper, we input these features into the model to classify the courses and predict the course popularity. We sort the course popularity from high to low and divide it into four classes equally as labels for semi-supervised classification. The data set is divided into training and test sets in a ratio of 8:2. To ensure a comprehensive evaluation of our model, we incorporated K-fold cross-validation in our experimental design. We chose a 5-fold cross-validation approach, dividing the training dataset into five equal parts. In each fold, four parts were used for training and the remaining part was used for testing. This process was repeated five times, each time with a different part of the data serving as the test set. The experiments validate the superior performance of this method compared to the comparison methods.
Dataet
In the data processing phase of this study, the original MOOC dataset underwent specific treatments to meet our analysis objectives. The dataset contains 1,609 records of MOOC courses, encompassing various details such as course title, link, offering institution, difficulty level, learning hours, popularity, teaching language, start date, subtitle, subject area, and the platform on which the course is offered. Of particular interest was the “popularity” column, which was treated as the label, while the other columns in the dataset were used as features for analysis.
Originally, popularity was a qualitative measure indicating the degree of interest or attention a course received. To effectively utilize this information in the model, we transformed popularity into a numerical label. Specifically, courses were categorized into four classes based on their popularity level: 0, 1, 2, and 3. This classification aimed to segment courses according to their popularity, thereby laying the foundation for subsequent pattern recognition and predictive analysis.
Meanwhile, other columns such as course title, offering institution, and difficulty level, underwent appropriate preprocessing to serve as input features. For textual data, techniques for text encoding and feature extraction were employed; for categorical data, such as difficulty levels and subject areas, methods like one-hot encoding were utilized. These processed features were then used to construct models to predict the course popularity category.
Through this approach, we were able to effectively leverage the multifaceted information in the MOOC dataset. This not only facilitated an exploration of the relationship between course characteristics and their popularity but also laid the groundwork for further data mining and pattern recognition efforts.
Comparison method
We have selected four classical machine learning classification methods as comparison methods, which are described as follows:
The principle of K-Nearest Neighbors (KNN) is to find a predefined number of training samples closest to a new point and to predict the label from them. The number of samples can be a user-defined constant, or vary according to the local density of points.
The decision tree algorithm uses a tree-like structure and uses layers of inference to classify the data. Each internal node of the decision tree represents a test on an attribute, each branch represents a test output, and each leaf node represents a category.
Random forest is an integrated learning algorithm. A number of different decision trees are constructed randomly and finally, the highest probability prediction is voted.
MLP is a supervised learning algorithm that learns a nonlinear function approximator by training on a dataset. There can be one or more nonlinear layers between the input and output layers.
Experimental settings
The hyperparameters of the comparison method involved in this paper are set as follows:
KNN number of neighbors is 5. DT: The function to measure the quality of a split is “Gini”. RF: The number of trees in the forest is 10, The function to measure the quality of a split is “Gini”. The solver for the weight optimization of MLP is an optimizer in the family of quasi-Newton methods, Strength of the L2 regularization term is 1e-5, the number of neurons is 10.
In this experiment, Accuracy, Precision and Recall are chosen as the evaluation metrics for semi-supervised classification, which are briefly described as follows:
Accuracy: in multi-label classification, the probability that the predicted label matches exactly with the true label.
Precision: indicates the probability of not labeling negative samples as positive samples in all samples.
Recall: indicates the probability of finding all positive samples among all samples.
Experiment and analysis
Course evaluation results on the MOOC dataset
Course evaluation results on the MOOC dataset
The results of the experiment conducted in the paper comparing the proposed method to several machine learning classification methods are shown in Table 4. It is evident that the performance of the proposed method is the best among all the methods compared. The results also show that the performance of GAE is better than that of VGAE. Table 4 presents the results obtained from applying different classification methods on the MOOC dataset, where the GAE (1 or 2 layers GCN) model stands out with an accuracy of 0.5062, precision of 0.5089, and recall of 0.5062, performing the best among all the methods compared. This highlights the potential of graph-based autoencoder models, particularly the deeper graph convolutional network structures, in learning and classifying complex network data. In contrast, traditional classification algorithms, such as KNN, Decision Tree (DT), Random Forest (RF), and Multilayer Perceptron (MLP), show relatively weaker performance on these metrics, suggesting that they may not capture the deep features and structural relationships within the data effectively. The superiority of the GAE (1 or 2 layers GCN) model on these performance metrics stems from its ability to effectively integrate and utilize the structural characteristics of MOOC data, such as the interactions among students and the correlations between courses. The model is capable of deeply mining the latent patterns and trends in the data, thus making more precise judgments in classification tasks. The improvement in precision and recall further indicates that the GAE (1 or 2 layers GCN) model excels in identifying the positive classes (i.e., the target categories of interest), which is crucial for building efficient recommendation systems as it pertains to the relevance of the system’s recommendations and user satisfaction.
In summary, the GAE (1 or 2 layers GCN) model not only exhibits excellent performance in handling MOOC-type datasets but also shows significant advantages in understanding and representing complex data structures. This provides important references for developing high-performance course recommendation systems and lays the foundation for data analysis in future applications such as optimizing teaching content and evaluating teacher performance.
In addition to the comparison, the learned feature embeddings were further reduced using Linear Discriminant Analysis (LDA) and visualized. The visualization of the results in Fig. 5 shows a clear distinction between different categories. The results of class 0 and class 3 are better than those of class 1 and class 2. This is because class 0 and class 3 represent the two extreme categories, i.e. the most popular and least popular categories, while class 1 and class 2 are intermediate categories and thus more difficult to distinguish. It demonstrate the effectiveness of the proposed method in achieving better classification performance compared to the other methods. The visualization of the results also shows a clear distinction between different categories, indicating that the proposed method is able to learn the inherent structure of the data and represent it in a more meaningful way.
The insights gained from the classification results are essential for customizing course content to accommodate different learning styles. By analyzing the commonalities among students who favor certain courses, educators can identify which specific content elements or teaching techniques are more effective, thereby enhancing the learning experience. Such customization of content not only meets diverse learning preferences but also ensures that teaching strategies are adaptable and responsive to changes in the educational environment. Furthermore, evaluating teachers’ performance through these classification insights enables the identification of areas for improvement and excellence. Correlating classification outcomes with student feedback and course results can lead to a more nuanced understanding of teaching effectiveness. This information can inform professional development programs, assisting teachers in refining their skills and methodologies to best suit their student demographics.
Classification visualization.
This paper proposes an online course evaluation model based on a graph auto-encoder for online course behavior data. By establishing a graph structure that encapsulates the correlations among student behaviors, and utilizing a graph convolutional network to encode and extract sophisticated higher-order feature representations, our model innovatively generates hidden features from a Gaussian distribution. We then ensure fidelity in the reconstructed data by minimizing the divergence between the decoded feature distribution and the original data features. The efficacy of our approach is substantiated through rigorous clustering and semi-supervised classification experiments conducted on real and publicly available datasets. The results distinctly demonstrate our method’s superior performance over benchmark techniques. Moreover, the study delves into the clustering outlier anomalies and the underlying factors influencing classification outcomes with insightful visualizations, providing an in-depth understanding of the data’s intrinsic structure.
There are several research directions worth exploring in the future: 1) Integration of Course Attributes: Consider aspects like course type and duration to enrich the understanding of courses. 2) Incorporation of Learner Backgrounds: Include learners’ educational backgrounds and learning styles for more personalized course evaluation. 3) Expansion of Data Sources: Utilize a more diverse range of data, such as forum discussions and feedback, to enhance the comprehensiveness of evaluations.
