Abstract
Massive open online courses (MOOCs) show great potential to transform traditional education through the Internet. However, the high attrition rates in MOOCs have often been cited as a scale-efficacy tradeoff. Traditional educational approaches are usually unable to identify such large-scale number of at-risk students in danger of dropping out in time to support effective intervention design. While building dropout prediction models using learning analytics are promising in informing intervention design for these at-risk students, results of the current prediction model construction methods do not enable personalized intervention for these students. In this study, we take an initial step to optimize the dropout prediction model performance toward intervention personalization for at-risk students in MOOCs. Specifically, based on a temporal prediction mechanism, this study proposes to use the deep learning algorithm to construct the dropout prediction model and further produce the predicted individual student dropout probability. By taking advantage of the power of deep learning, this approach not only constructs more accurate dropout prediction models compared with baseline algorithms but also comes up with an approach to personalize and prioritize intervention for at-risk students in MOOCs through using individual drop out probabilities. The findings from this study and implications are then discussed.
Introduction
As an extension of online learning technologies, the rapid development of MOOCs has opened a new era of education by extending the boundaries of education to previously noncollege bound students through the Internet (Fei & Yeung, 2015; Xing, Chen, Stein, & Marcinkowski, 2016). Since MOOC's first appearance in 2008, coined by Downes and Siemens, the number of MOOCs has grown enormously around the world. The total number of MOOCs reached 4,550 in 2016 (Bouzayane & Saad, 2017a). Due to its online and open nature, a MOOC course is usually massive, with theoretically no limit to enrollment and allows anyone to participate or drop out at no penalty (Educause, 2013, p. 1). As a result, students enrolled in MOOCs, unlike campus-confined students, are much more likely to drop out from the course, which can easily reach 90% attrition rates (Li et al., 2016; Taylor, Veeramachaneni, & O'Reilly, 2014). While the high incompletion rate is often cited as a scale-efficacy tradeoff (Onah, Sinclair, & Boyatt, 2014), it does put a major obstacle to the transformative potential of MOOCs.
The substantial number of students enrolling in and dropping out of the MOOCs raise methodological difficulties for instructors as they work to identify academically at-risk students and provide in-time intervention. It is difficult, if not impossible, to monitor more than 10,000 students at the same time and then readily offer support to about 9,000 students who are about to dropout. The situation becomes worse considering traditional educational researchers and practitioners have been employing methods such as surveys, interviews, and observations for data collection. Such methods are not only time consuming but also limited in scale to identify potentially at-risk students in MOOCs (Xing, Guo, Petakovic, & Goggins, 2015). These traditional methods are unable to support instructors to provide timely interventions for at-risk students either (Rienties, Cross, & Zdrahal, 2017; Siemens & Long, 2011), since a delay always occurs between data collection, analysis, and finally identification of students who are about to dropout from the course.
The excessive attrition rate in MOOCs has encouraged researchers to think of using learning analytics methods (Siemens, 2013) for the early prediction of learners who are at risk of dropping out. Then, appropriate intervention can be delivered before the student drops out. Specifically, learning analytics enables analysis of low-level trace or log data automatically collected while students are interacting with the MOOC course. Then, through this structured low-level trace data, prediction models can be constructed using supervised machine learning algorithms (Kennedy, Coffrin, De Barba, & Corrin, 2015), which leads to training a mathematical model to automatically make an early distinction between dropout and retention students based on their prior interaction behavior during the course. After making the distinction, in-time intervention can be designed and delivered to these at-risk students. The automatic nature of a prediction model using learning analytics can address the challenge of monitoring and identifying the large-scale at-risk students of potentially dropping out, while also satisfying the requirement for being able to support early intervention design (Halawa, Greene, & Mitchell, 2014).
Current research and practices in building prediction models in MOOCs follow three interrelated directions: fixed term dropout prediction model, temporal dropout prediction model, and dropout prediction performance optimization. These methods have serious limitations for deployment in MOOCs overall. Fixed-term dropout prediction model can identify all the at-risk students at once. However, given the large number of dropout students in MOOCs, the instructors cannot provide effective feedback to that many at-risk students at once. The temporal dropout prediction model predicts students at risk of dropping out in the next week. While it tremendously reduces the number of at-risk students the instructor needs to deal with every time, it does not provide any clue for the instructor to offer personalized intervention. The result is that all the students receive the same intervention without any personalization. Much effort on optimizing prediction performance has also been made. However, no studies so far take the advantage of deep learning which has shown great success in so many disciplines (Gulshan et al., 2016). In the next three sections, empirical studies for prediction model building are reviewed in more detail based on the proposed directions.
Fixed-Term Dropout Prediction Model
Many early efforts in building prediction models employed the fixed-term dropout prediction approach. That is, they used the data available in a fixed period of time for prediction model construction. Some studies used all the data in a course to build the prediction model, which could not satisfy the early intervention design and implementation requirement. For example, Al-Shabandar, Hussain, Laws, Keight, and Lunn (2017) used all-time trace data from a course to build the dropout prediction model. However, such a model is unable to identify the at-risk students early enough to support intervention design and delivery before the student drops out.
Other studies used the trace data only from the first week or certain points of time to build the prediction models (e.g., Jiang, Williams, Schenke, Warschauer, & O'dowd, 2014). This prediction building method can successfully detect whether students are at risk of dropping out in the very early stage (as early as the end of the first week) to support early intervention design and implementation. However, this built model is unable to accommodate the gradual drop out pattern in MOOCs. In other words, this fixed-term model cannot detect who are at risk of dropping out after only the first week or which ones will remain active after 2, 3, or 4 weeks before dropping out. Consequentially, thousands of students may be flagged as being in danger of dropping out after the first week. While effective at predicting how many students will eventually dropout of a course, it is impractical for the instructors to provide quality help and support to thousands of at-risk students at the same time. In other words, this model is unable to identify those students in need of immediate intervention. Neither can it support personalized intervention for these at-risk students.
Temporal Dropout Prediction Model
Because of the gradual nature of attrition in MOOCs (Yang, Sinha, Admson, & Rose, 2013), recent studies began to explore building temporal prediction dropout models to accommodate better intervention design in MOOCs (Balakrishnan & Coetzee, 2013; Bouzayane, & Saad, 2017a, 2017b; Boyer & Veeramachaneni, 2015; Chaplot, Rhim, & Kim, 2015; Crossley, Paquette, Dascalu, McNamara, & Baker, 2016; Fei & Yeung, 2015; He, Bailey, Rubinstein, & Zhang, 2015; Kloft, Stiehler, Zheng, & Pinkwart, 2014; Li et al., 2016). To illustrate, these studies design a temporal modeling mechanism using the trace data until the current week, through which they can predict who is going to dropout next week. More specifically, instead of using the fixed-term data to identify all the students at risk at once, these temporal dropout prediction models allow specifically detection of at-risk students in the following week using data collected from previous weeks. Through only calling attention to those at-risk students in the coming week, this temporal prediction mechanism allows instructors to focus on a much smaller group of at-risk students in immediate danger instead of facing an overwhelming number of students who may drop out at some other points in the course.
However, these temporal models were not able to personalize student intervention. Personalization is one of the fundamental functions in learning analytics research and applications (Papamitsiou & Economides, 2014; Siemens & Baker, 2012). It also responds to the general call from the U.S. Department of Education (2012) for personalizing students' learning experience to improve their achievement through learning analytics. Currently, most dropout prediction models can only identify a large group of students at risk of dropping out but are unable to further provide concrete advice on how to personalize the intervention. In addition, these temporal dropout prediction models can only zoom into a group of at-risk students one week at a time. However, even for a group of students who have the risk of dropping out in a week, the number of students identified can still be large in MOOC context. Then, the MOOC system becomes overwhelmed again with the unfulfilled need for the instructors to provide intervention.
To support personalized intervention design and delivery, one way is to compute the dropout probability of individual students each week. Instructors can then provide intervention based on the dropout probability of student attrition by providing more intensive and heavier intervention for the higher risk students, and lighter interventions for the lower risk students. In addition, students can also be ranked based on dropout probability. Then instructors are asked to provide immediate intervention to students who have a higher probability of attrition first and then gradually move to the lower risk students. This ranking can help teachers personalize intervention to the students who need the most help and immediate assistance.
Dropout Prediction Model Performance Optimization
Quite a few studies have examined different features and algorithms in order to improve prediction model performance in MOOCs (Crossley et al., 2016; Liang, Li, & Zhang, 2016; Qiu et al., 2016; Robinson, Yeomans, Reich, Hulleman, & Gehlbach, 2016). For example, Dmoshinskaia (2016) incorporated sentiment features from user comments into the prediction model, thereby improving the prediction accuracy. Ye and Biswas (2014) showed that finer-grained temporal information can also improve dropout prediction performance. From algorithmic perspective, Bouzayane and Saad (2017a) proposed a dominance-based rough set approach to improve the dropout prediction performance in MOOCs. Cobos, Wilde, and Zaluska (2017) compared several algorithms including generalized boosted regression models, weighted k-Nearest neighbors, boosted logistic regression, and gradient boosting for dropout prediction model building and found that generalized boosted regression model has the best prediction performance.
With the great advancement in computing power, the deep learning algorithm has shown great success in various fields (Deng & Yu, 2014; LeCun, Bengio, & Hinton, 2015). However, no studies have been found to explore the potential of deep learning algorithms in the education area, particularly the MOOCs context for dropout prediction. In fact, in light of the data variability and massiveness in MOOCs, as well as the highly imbalanced nature of dropping out over retention (Fei & Yeung, 2015), the deep learning algorithm has the potential to achieve better prediction performance compared with traditional algorithms. Such better performance can be reflected in both the prediction accuracy as well as the performance in producing individual dropout probabilities for intervention personalization.
Summarization and Research Goals
Building dropout prediction models using learning analytics has shown tremendous potential in informing intervention design to reduce the attrition rates in MOOCs. However, current dropout prediction models are limited by either their support of early intervention design (fixed-term prediction dropout model) or not being able to prioritize individual dropout probability for personalized intervention (temporal dropout prediction model) in MOOCs. In addition, none of these previous studies have taken the advantage of the power of the deep learning algorithm in building dropout prediction models for model performance optimization.
Accordingly, two research goals guide this research: (a) to explore the power of deep learning technique in building weekly temporal dropout prediction models so we can determine if it can outperform the traditional-used machine learning approaches; (b) to investigate personalized intervention using individual dropout probabilities and further examine whether the deep learning algorithm can better personalize and prioritize intervention for students who are at risk of dropping out in MOOCs than other algorithms.
Methodology
Research Context and Data
The context for this research is a project management MOOC course hosted by Canvas. This course started in August 2014 and lasted roughly 8 weeks. In total, there were 11 modules and 3,617 students registered in this course. Also, 14 discussion forums and 12 multiple choice quizzes were structured in this MOOC course. The data set used in this study came from two sources: (a) all the click-stream or trace data provided directly by Canvas containing information on pages visited including when and how many students clicked on certain sources (e.g., syllabus, announcements, quizzes, assignments, submissions, etc.) and (b) discussion forum data and quiz scores retrieved in JSON format using the Canvas API. This data set includes quiz scores for every student and all the discussion content as well. Figure 1(a) depicts an overview of the course and Figure 1(b) shows how many students were active each week.
Course characteristics: (a) Overview and (b) active students.
Features
Features and Description.
Algorithms and Evaluation
In this study, three popularly used algorithms were implemented as baseline algorithms: K-nearest neighbors (KNN), support vector machines (SVM), and decision tree. Then, the deep learning algorithm was developed to compare the prediction performance with the baseline algorithms.
The KNN algorithm is one of the most common classification methods. It has little or no prior requirement about the distribution of the data. KNN was originally derived from the necessity to conduct discriminant analysis when reliable parametric estimation of probability densities is difficult to determine (Denoeux, 1995). KNN has a very simple working mechanism in which each input pattern to be classified is compared with a series of stored patterns, in our case, dropout or not. The classification result is the class with the most representatives in the k retrieved patterns. The stored pattern in this study is the training data set, and the k is estimated using the hyperparameter optimization algorithm (Thornton, Hutter, Hoos, & Leyton-Brown, 2013). In Figure 2, we only use two features or dimensions to illustrate the algorithmic concept, which is easy to explain. KNN can handle as many dimensions as needed.
KNN with two-dimensional feature space and two classes: When K is set to three, then the hexagon is classified into the active group. If K is set to eight, then the hexagon is classified into the dropout group.
Decision tree is a typical top-down tree growth algorithm (Quinlan, 2014). This algorithm divides a complex classification task to a set of simple classification tasks. The decision tree is usually made of a root node, a group of internal nodes, and leaf nodes as shown in Figure 3. The root nodes refer to the features used to divide the samples. Each root node can produce a branch. Each path from the root node to the leaf node in the tree can be viewed as a classification rule. The decision tree algorithm must select the node feature value. In this study, the selection method mainly used the information gain ratio of splitting. All students will be classified as potentially at risk or active using the rules in the tree structure as shown in Figure 3.
Decision tree with six-dimensional feature space and two classes: The X
i
are the features and a, b, c, d, and e are the thresholds and A, B are the class labels as active and dropout.
Based on sound statistical learning theory, the performance of SVM is robust to data noise (Widodo & Yang, 2007). SVM is originally designed for binary classification by constructing an optimal separating hyperplane that has high generalization ability. Let SVM schematic diagram.
In feature space, the optimal hyperplane can be expressed as the linear combination of training samples. These samples, as support vectors, construct the decision functions using the kernel function. Any function that meets Mercer's theorem can be used as a kernel function, which must be continuous and positive definite (Cristianini & Shawe-Taylor, 2000). Such kernel function could be linear, quadratic, polynomial, and Gaussian. Gaussian Radial Basis function has been demonstrated as one of the most efficient kernel functions (Scholkopf et al., 1997). Therefore, we chose to use the Gaussian Radial Basis function to define the feature space.
Deep learning is a format of neural network that takes metadata as input and then processes the data through a number of layers to compute the output (LeCun et al., 2015). While traditional neural network can only handle single hidden layer (Figure 5, left), deep learning processes the input data through a large number of hidden layers in its structure (Figure 5, right). Each layer is made of nodes, which is the place for computation to take place. A node combines input from the data with a set of weights to determine whether to amplify or dampen the input, which in turn assigns significance to the inputs. These input weight products are then summed and evaluated to decide to what extent the information propagates through the network to finally influence the classification. In a more holistic view, the hidden layer trains the unique set of features using the output of the previous layer. This process is known as nonlinear transformation. The more hidden layers it has, the more complex and abstract the data will become.
Deep learning diagram.
In addition, the traditional neural network also requires more information about features for conducting the feature selection and feature engineering. By contrast, the deep learning neural network has no requirement for any information about features (Schmidhuber, 2015). It performs automatic feature extraction without any human intervention, grasping the relevant features necessary to solve the problem. In other words, deep learning performs optimum model tuning and selection on its own, which saves a lot of human effort and time. In this study, a customized neural network was designed for each individual week; in addition, forward and backward propagation algorithms were used to train the weights, and batch gradient descent was adopted to minimize the cost function.
To better evaluate the prediction performance of these algorithms, the data were divided into 70% for training and 30% for testing. To avoid overfitting, 10-fold cross-validation was performed, where performances were measured from multiple rounds of cross-validation and averaged over the rounds. The area under the receiver operating characteristic curve (AUC) was obtained to measure the prediction performance. Traditional precision, recall, and F-measure are usually valid only for one specific operating point, which is selected to minimize the probability (Bradley, 1997). However, selecting only one point can generate ambiguous results when comparing systems (Hand, 2009). By contrast, AUC is invariant to the selected decision criterion. Empirical results found that AUC can decrease the standard error compared with traditional metrics (Bradley, 1997). In addition to AUC statistics, accuracy was also calculated to show the robustness of the proposed deep learning algorithm.
Intervention Personalization
After using dropout prediction models to identify the MOOC students most likely to dropout each week, further analysis can be conducted to find ways to personalize the intervention. This study proposes to produce the dropout probability of each student in each week. Then, instructors and intelligent agents can use the probability of dropping out to personalize and prioritize intervention for at-risk students. However, not all the algorithms have the capability to produce the dropout probability for each student. In our study, while SVM, decision tree, and deep learning can generate the probability of dropping out for each individual student, KNN is not able to produce such probability.
To validate our intervention personalization proposal, regression analysis was used to examine whether the calculated individual students' dropout probabilities were correlated with the actual dropout date of the students for SVM, decision tree, and a deep learning algorithm. In these tests, dropout probability served as the independent variable X, and the specific dropout date of each student in a week was the dependent variable Y. If the coefficient of the independent variable X is significant, we could make the reasonable assumption that the classification model provided a good dropout rate prediction, which can be further used for personalized intervention. The rationale behind this is that the higher dropout probability indicates that the student is more likely to drop out early in the week. Then, it makes sense to provide stronger and prioritized intervention to these students as a way of personalization. By comparing the regression tests between algorithms, it can also demonstrate which algorithm generating probability would be more in line with individual student attrition date, which, in turn, demonstrates the advantage and validity of personalizing and prioritizing individual student intervention using the deep learning algorithm.
Results
Prediction Performance
AUC Results.
Note. AUC = area under the receiver operating characteristic curve; KNN = K-nearest neighbors; SVM = support vector machines; DT = decision tree; DL = deep learning.

Prediction performance for training data: (a) AUC and (b) accuracy.
Prediction Accuracy Over Testing.
Note. KNN = K-nearest neighbors; SVM = support vector machines; DT = decision tree.

Testing prediction performance.
Specifically, for the prediction performance in the testing data set, the accuracy for KNN ranged from 0.865 to 0.947 with an average of 0.904. The accuracy for the decision tree was from 0.880 to 0.95 and the mean value was 0.915. By comparison, deep learning had a much more stable and higher prediction accuracy ranging from 0.905 to 0.961 with an average of 0.930. SVM was relatively stable from training to testing with a range from 0.869 to 0.953. However, the predictive performance was not as good as the deep learning algorithm. It is generally agreed in the research community that prediction performance in the testing data set is usually closer to the actual performance in real context (Lu, Kolarik, & Lu, 2011). After all, testing data were the data which were intangible to the machine learning algorithms. Overall, the deep learning algorithm had a much better and more stable performance in terms of accurately identifying the at-risk dropout students in MOOCs.
Dropout Probability for Personalization
Dropout Probability for Sample Students.
To be more specific about the personalized intervention mechanism, we take the SVM predicted results as an example shown in Table 4. Using the log data until the current week, SVM can identify all the students at risk of dropping out for next week along with their dropout probabilities. Then for Students 1 and 6, the MOOC instructor can pay more attention to Student 6 first since the dropout probability for this student is higher than the Student 1 (0.939 > 0.511). Also, given the dropout probability for Student 6 is higher, the instructor can also provide heavier intervention to this student. While only sending one e-mail to Student 1, the instructor can send several e-mails with different wording to the Student 6. In reality, the instructor can also place the at-risk students into several groups based on the rank of the dropout probability. Then, a specific composed e-mail with different emphasis can be sent to each group or dealing with each group with different intervention strategies for personalized help.
To test the validity of our personalized intervention proposal, linear regression tests for each algorithm were conducted between the individual predicted probability and the actual dropout date as shown in Figure 8. The results reflect that deep learning has the best trend alignment between the dropout probability and the actual dropout date (p = .000, adjusted R2 = 0.0687). The decision tree had the least alignment between these two variables (p = .320, adjusted R2 = −4.09e−05), and SVM fell in between deep learning and decision tree (p = .0341, adjusted R2 = 0.0162). The higher adjusted R2 value and the small p value of the deep learning model indicate larger correlations between probabilistic dropout predictions and dropout time. These significant tests show that it makes sense at least from statistical perspective to prioritize and personalize heavier intervention to students with higher predicted dropout probability since they are more likely to dropout early. In addition, the more aligned and significant trend in the deep learning regression test indicated that deep learning also has an advantage in personalizing and prioritizing interventions for at-risk students than the other algorithms.
Regression tests for personalization.
Discussion
MOOCs have become more and more popular because they show great potential to transform the traditional education system (Bouzayane & Saad, 2017a; Kloft et al., 2014). However, due to their unique characteristics of being fully open and online, students dropout literally at every point of the course (Yang et al., 2013). As thousands of students keep dropping out throughout the course, it raises methodological difficulties for researchers and instructors to employ traditional educational approaches (e.g., questionnaires, interviews, and observations) to inform intervention design and delivery for the at-risk students of dropping out. Traditional methods are very time consuming and unable to identify such a large-scale number of at-risk students in time.
Researchers are beginning to explore the use of learning analytics to build dropout prediction models for early identification of at-risk students and then deliver interventions before the student actually drops out. However, the current dropout prediction model building has significant restraints in dealing with the massive nature of MOOCs and at the same time personalizes intervention design. For example, many studies used fixed-term dropout prediction model, which is to build a dropout prediction model using data from a certain time period (Al-Shabandar et al., 2017; Jiang et al., 2014). This prediction model identifies all the at-risk students in a MOOC course at once, which makes it impossible for instructors to design interventions for so many at-risk students at the same time. While the new trend of using weekly temporal dropout prediction models is becoming more dominate in the research community (Balakrishnan & Coetzee, 2013; Bouzayane, & Saad, 2017b; Crossley et al., 2016), simply identifying a group of students who are in danger of dropping out each week still has significant limitations in terms of personalizing intervention (Author 1). That is, the number of students identified each week can still be large and instructors have no information using these prediction models to personalize their intervention design and delivery.
Based on the limitations in the current dropout prediction model construction in MOOCs, this study proposed an approach, while still relying on weekly temporal dropout prediction models, to further produce all the student predicted dropout probabilities as a way to personalize the intervention design and delivery. Given the generated student dropout probability, instructors of MOOCs can easily decide to provide more intervention and support for students who have higher probabilities of dropping out. Meanwhile, given the ranking of the probabilities of dropping out, instructors can also prioritize their intervention design so that they have an even smaller number of at-risk students to deal with each day. Otherwise, instructors may also have to deal with all the at-risk students each week at once, which would be challenging given the possible high number of at-risk students identified (Kloft et al., 2014). This personalization proposal is validated using regression techniques to correlate the student dropout probability with the actual dropout day in a particular week. The significant findings show that personalization can at least be statistically validated to personalize and prioritize intervention using the dropout probabilities.
In practice, the deep learning algorithm can be developed into a plugin for the MOOC platform. This plugin reads live log data from the database that collects all the students' digital traces while they are interacting with the MOOC courseware. Then, at the end of every week, the plugin can run once (run dynamically depending on the needs) and identify the at-risk students for the next week and generate their dropout probabilities at the same time. These at-risk students along with the dropout probabilities can be visualized by the plugin and presented to the MOOC course instructor. Based on the visualization, the instructor can decide to provide intervention to certain students first and some students later in that week. Also, based on the dropout probabilities values, the instructor can provide heavier intervention for higher at-risk students and lighter for the lower risk ones.
In addition, the personalization and prioritization mechanism for intervention can also easily be implemented in an automated agent like in intelligent systems (Wenger, 2014). After all, instructors providing personalized help still have limitations given the time and effort required (Rienties et al., 2017). With an intelligent agent, it can directly deal with the dropout probabilities by considering its face value and rank. By comparing the dropout probability each student has with the thresholds set for different interventions, a predefined intervention strategy can be delivered for that at-risk student automatically. Such predefined intervention strategies can be learned from expert teachers or from historical data. It can be an automated e-mail, a reduced assignment, or even a video message delivered to the student. As said, this current study is just the first step toward the ultimate diverse and automated personalized intervention and scaffolding for students, and in turn to improve their engagement in MOOCs and their final success.
From an algorithmic perspective, this study explored the power of deep learning in education context. As deep learning has gained popularity and showed great effectiveness in various disciplines (LeCun et al., 2015), no studies were found to examine how deep learning can work in an educational context, particularly in an MOOC context. The amount of data generated by students in MOOCs is enormous with great variability (Fei & Yeung, 2015). The experiment in this study showed that deep learning performs consistently better than the other baseline algorithms including KNN, SVM, and decision tree. This superior performance from deep learning included not only the prediction accuracy in identifying at-risk students but also its capability in generating more accurate individual student dropout probabilities for intervention personalization and prioritization. In addition, by comparing the prediction performance of the different algorithms between training and test data, the results show that the prediction performance of deep learning is much more stable than the other algorithms as well. Also, we found that KNN and decision tree can become easily overfitted compared with SVM and deep learning algorithms by comparing their performance in dealing with training data and testing data. This finding has implications for the current research and practice in building prediction models in MOOCs as only reporting the dropout prediction performance in training data may not necessarily reflect the actual predictive power for MOOCs.
An important limitation for this study is that our findings are based on a single MOOC course data. Even though as a case study, the results supported the advantage of deep learning for prediction and personalized intervention design, the general application of the findings should be used with caution. Another limitation is the proposed prediction and personalized intervention mechanism, although based on the literature support, is still hypothetical. It means that they have not been proved to be effective in live context. These are all sound directions for future research.
Conclusion
This study will optimize the dropout prediction models to focus on personalizing and prioritizing intervention for academically at-risk students in MOOCs. Relying on a weekly temporal prediction mechanism, this study proposes to use a deep learning algorithm to build dropout models and further produce individual student dropout probabilities for intervention personalization. By utilizing deep learning, this approach will not only build more accurate dropout prediction models compared with baseline models but also introduce a valid approach to inform intervention design thereby personalizing and prioritizing support for at-risk students using MOOC dropout probabilities.
Future research can go in three directions: (a) future studies can explore methods to further improve the deep learning prediction performance in MOOCs such as increasing the hidden layers in the network; (b) currently, the personalization and prioritization only show statistical validity. Researchers can further examine its validity by implementing the proposed prediction model and intervention personalization method in actual MOOC courses to assess if this approach can reduce students' dropout; (c) future research can conduct similar experiments on more general online courses to examine whether deep learning can be useful in other educational contexts.
Footnotes
Acknowledgments
We thank the Instructure Canvas and particularly Jared Stein for providing the data and support for this study.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
