Abstract
The objective of this paper is the design of a predictive model of students’ desertion in Educational Institutions based on the Analytic Hierarchy Process (AHP). The proposed model is based on a weighted sum of individual probabilities of desertion associated with various factors (explanatory variables) by experts in the combined use of the AHP and the Ratings technique for the evaluation of the explanatory variables of the model. This proposal was applied in an Institution of Higher Education in Chile. To evaluate the predictive performance of the method, the results were compared with those obtained using Logistic Regression (RL) and with the actual retention of the students in one year. It was found that the proposed method had a 64.6% level of predictability, whereas the model with logistic regression had a 69.9%. It is concluded that it is possible to predict student desertion with a simple model based on the Analytical Hierarchy Process.
Phenomenon of Student Desertion
From the institutional point of view, all students who leave their higher education without getting the degree can be classified as deserters. This is how several authors associate desertion with the phenomena of academic ‘mortality' and forced retirement (Díaz, 2008). In the experience of the authors of this work and agreeing with Tinto (1989), desertion in high educational institutions creates financial problems for the institution, because it receives less tuition fees and produces instability in the source of its income. It also produces financial problems for the individual, due to the money and time spent in their studies and it may affect his/her future earnings.
Tinto (1989) states “that the study of desertion in higher education is extremely complex, since it involves not only a variety of perspectives, but also a range of different types of abandonment.” Additionally, he affirms that no definition can capture in its entirety the complexity of this phenomenon. Levy (2007) studied this phenomenon in the case of e-learning courses and stressed the difficulty of producing a definition of dropout in this context. Bowles and Brindle (2017) also emphasize that factors affecting desertion are complex.
As stated by Tinto (1989), there are many definitions and ways of operationalizing them of the concept of desertion. In this paper, a deserter in a Higher Education Institution (HEI) is defined as a student who has not graduated from a program and who has not registered their current enrolment for at least one academic period. Desertion is a key issue in Higher Education because it does not only affect the personal life of students, but the society as a whole. Desertion is a key issue in Higher Education because it does not only affect the personal life of students, but the society as a whole.
In particular, the desertion level is one of the factors taken into account to measure the quality of education in an institution (Rodríguez-Gómez et al., 2012). According to a document published by the UNESCO (2009), the “survival rate by grade (SR)” is defined as one of the multiple indicators related to the education quality. It is defined as the percentage of a cohort of pupils (or students) enrolled in the first grade of a given level or cycle of education in a given school year who are expected to reach successive grades. It also specifies that the purpose of the SR is to measure the retention capacity and internal efficiency of an education system.
It is very important to detect, in advance, a student who has the possibility of not continuing his/her studies, in order to take measures to avoid his/her desertion. Noble et al. (2007), Copeland and Levesque-Bristol (2011) and Morrow and Ackermann (2012), among others, emphasize the importance of detecting it the first year of study. When a student does not finish his/her studies, it leads to many problems, the student feels that he/she has lost part of their life with the associated costs. It is also a cost for his/her family. In addition, the student considers that not having completed their studies is a dramatic disappointment that can have negative emotional affect. In fact, it represents a failure for the overall educational system. According to González (2005), the desertion has social consequences in terms of the expectations of the students and their families; emotional, by the dissonance between the aspirations of young people and their achievements and also important economic consequences for both people and the system as a whole. Additionally, those who do not finish their studies are in a situation of unfavorable employment with respect to those who finish. Britton et al. (2015) confirmed the major labor market advantage for English graduates over non graduates.
There are a number of papers in the existing literature that attempt to discover the factors affecting desertion. Choi and Park (2018) study the factor affecting dropout of adults in on-line programs, Gregori et al. (2018) investigate the impact of the learner support strategy in the course completion of different type of MOOCS, Graffigna et al. (2014) study the impact of tutorial practices in the early stage of students in a university, Roberts (2018) analyses the contribution of the professional staff in the retention of students and Gitto et al. (2016) investigate the influence of supply-side factors (such as number of lecturers per student and geographical conditions).
The objective of this paper is the design and evaluation of a predictive model of students´ desertion in Educational Institutions based on the Analytic Hierarchy Process (AHP). In this case, the characteristics of the students are considered the relevant factors. The reason for using AHP is that it is easy to apply and it does not require a large amount of data, as AHP is only based upon the judgment of experts in the field.
Predictive Modelling Related to Students’ Desertion
There are a number of statistical models that can be used in order to predict students’ desertion. For instance, Smith and Naylor (2001), Baars et al. (2017), Bossema et al. (2017) and Da Costa et al. (2018) use a statistical model to estimate dropout of students in universities. One of those models that has been used is the logistic regression model. Logistic regression allows categorically and continuously scaled variables to predict any categorically scaled criteria. Applications include predicting or explaining pass/fail in education, survival/non-survival in medicine, or presence/absence of clinical disorder in psychology (Osborne, 2016) and predicting dropout of students based on the students grades for different activities (Burgos et al., 2018). According to Kleinbaum and Klein (2010), the fact that the logistic regression function ranges between 0 and 1 is the primary reason the logistic model is so popular. Another reason derives from the S shape of its graph, which allows seeing how a combination of factors modifies the value of the probability.
Though logistic regression was slow to catch on initially, the past two decades have seen tremendous growth in its use within the social sciences. Nevertheless, many social scientists remain unfamiliar with its workings. One reason is the complexity of the procedure. Textbooks such as those by Hosmer et al. (2013) and Kleinbaum and Klein (2010) are valuable resources, but are written at an intermediate level of difficulty. There is also a general lack of agreement on terminology (Osborne, 2016). On the other hand, in Osorio et al. (2012), the estimation of survival models in discrete time led to the conclusion that desertion is more influenced by academic-type variables, whereas both personal and academic features have an influence on graduation.
Regarding another field, Yu et al. (2010) brought in a new perspective by exploring the issue with the use of three data mining techniques, namely, classification trees, multivariate adaptive regression splines (MARS), and neural networks. These data mining procedures identified transferred hours, residency, and ethnicity as crucial factors to retention. According to Baepler and Murdoch (2010), the emerging fields of academic analytics and educational data mining are rapidly producing new possibilities for gathering, analyzing, and presenting student data. Faculty might soon be able to use these new data sources as guides for course redesign and as evidence for implementing new assessments and lines of communication between instructors and students. It was shown how the concepts of academic analytics, data mining in higher education, and course management system audits can be linked to suggest how these techniques and the data they produce might be useful to those who practice the scholarship of teaching and learning.
In addition, Zhang et al. (2010) discussed how to use data mining to improve student retention. For MCMS (Mining Course Management Systems) project, the information incorporated into the data warehouse is the historical data of previous students and the features associated with the current and future potential students. They used this information to build the model of students that had potential to dropout. These students could then be divided into different groups according to their risk value. In Herzog (2006), having examined the prediction accuracy (on student retention) of several data-mining methods, all relatively new to institutional research, and compared it with that of the well-established approach logistic regression, the study found that the level of complexity of the data used and the outcome predicted may largely guide the selection of a particular analytical tool.
As a conclusion, it is possible to say that that students’ desertion has been estimated using predictive models, based on traditional statistical methods such as logistic regression, decision trees and data mining. Even though some of the models have led to good results, their main disadvantage is that they require a large amount of data. In addition, the construction of the models, and the interpretation of the results requires an expertise that is not always available. Tools such as AHP solves this situation.
AHP in Predictive Modelling
AHP is a multi-criteria decision method that models a decision problem as a hierarchy. Normally, one level corresponds to decision criteria and the lower level corresponds to decision alternatives, the model has to be adapted to the characteristics of the problem to be faced. The objective of AHP is to estimate the priority of every element within each level. In this way, the priority of every decision alternative is estimated.
AHP has been used in many areas. For example, Oddershede et al. (2014) use AHP for assessing the quality of service in Heath Systems. Quezada and López-Ospina (2014) use AHP to identify the causal relationships within a strategy map. Tamura et al. (2000) describe a modelling with AHP on the legitimacy of the investment ranges; and Xin (2012) describes an investment risk assessment, also using AHP.
The main objective of AHP is to support a decision making process. However, it also has been used in forecasting and prediction. As examples, Yüksel (2007) uses AHP to forecast the demand of a hotel; Shih et al. (2012) use AHP to forecast the demand for printers; Ognjanovic et al. (2016) use AHP to predict selection of courses to be taken by a student and Samuel et al. (2017) utilizes AHP for predicting heart failure risk. Notwithstanding the number of applications of AHP in many fields, in the review made by Anis and Islam (2015), they identify more than 30 diverse areas in which AHP has been used in the area of Higher Education, but they did not mention any work in which AHP was used in students‘desertion phenomena.
The objective of this work is to design and validate a method for predicting students´ dropout using the analytic hierarchy process. This work explores the alternative of implementing a method that integrates the opinions of experts or relevant actors on the creation of a predictive model. For doing this, the Analytic Hierarchy Process (AHP), developed by L. Saaty (1990) has been selected.
Methodology Research
Overview
The proposed method utilizes the Analytic Hierarchy Process. The associated model was created based on the literature about the factors affecting student’s desertion. In order to validate the method it was applied to a real case of an HEI in Chile. The method was applied to those students who entered the Institution in the first semester of 2012 in order to estimate who would abandon his/her studies. The results were compared with what happened in reality in the following semesters (until the second semester of 2015). The students´ desertion was also estimated using a Logistic Regression model in order to compare the results with those obtained using AHP.
The following indicators were calculated in order to measure the performance of the methods: Accuracy: the percentage of total matches between predicted and observed deserters. Sensitivity: indicates the percentage of deserters correctly predicted. Specificity: indicates the percentage of non-deserters correctly predicted.
The Analytic Hierarchy Process
The Analytic Hierarchy Process (AHP) is a multi-criteria decision making approach in which the decision problem is modelled as a hierarchy (L. Saaty, 1990). The levels of the hierarchy include decision criteria and alternatives. AHP assigns priorities to each factor of the hierarchical levels. Particularly, priorities are assigned to alternatives, which is the final purpose of the approach.
AHP assigns priorities to each factor of a hierarchical level by performing a pairwise comparison of the element in relation to each factor (in turns) of the preceding level. If X and Y are two factors of one level and Z is a factor of the preceding level then the basic question is How much more important is factor X than factor Y in relation to factor Z?. AHP uses Saaty’s fundamental scale that ranges from 1 (equally important) to 9 (extremely more important) (L. Saaty, 1990). The resulting values are input into what is called the “comparison matrix.”
Let
The priorities of the factors are estimated from the equation:
It should be pointed out that the entries of the comparison matrix are estimations of the actual values of the importance of one factor over the others. It means that there may be some inconsistence in the responses. AHP provides a mechanism to estimate the inconsistencies. Equation (2) shows the way for calculating the consistency, where the value of ICR is a random consistency ratio, which depends on the size of the matrix A. A table with the values of ICR can be found in T. Saaty and Vargas (2000).
AHP uses two types of measurements: relative and absolute (L. Saaty, 1990). In the case of relative measurement, the factors in each level are pairwise compared with respect to the factors of the above level. However, in absolute measurement, a different calculation is performed for the case of the alternatives. For each factor in the level above the alternatives, intensities or grades are defined. They are pairwise compared in order to obtain the priority of those grades or intensities.
In this work, absolute measurement is used. In this case, categories are assigned to each factor that may explain desertion. An illustrative example is given. Let’s assume that one factor is the average mark obtained by a student in the secondary school and that we define categories as the quintiles of mark obtained. Then, the comparison is performed asking the question: How much more probable is it that a student in quintile X leaves his/her studies than a student in quintile y? In this way, probabilities of desertion are assigned to quintiles.
The Logistic Regression
Logistic regression is a statistical method the origin of which dates back to the sixties (Cornfield et al., 1961). It is used for analyzing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes). Logistic regression generates the coefficients (and its standard errors and significance levels) of a formula to predict a logic transformation of the probability of presence of the characteristic of interest.
Let:
Let defined p as
Where:
Logistic regression is used to obtain the odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together (Sperandei, 2014).
The goal of logistic regression is to find the best fitting model to describe the relationship between the dichotomous characteristic of interest (dependent variable = response or outcome variable) and a set of independent (predictor or explanatory) variables. Rather than choosing parameters that minimize the sum of squared errors (like in ordinary regression), estimation in logistic regression chooses parameters that maximize the likelihood of observing the sample values.
The Proposed Method
The AHP Model
The AHP model considers 3 levels. The first level is the decision to be made, the second level includes the factors determining desertion and the third level includes sub-factors. Figure 1 depicts the structure of the model. There are n factors and each factor has a number of sub-factors affecting dropout.

Structure of the AHP Model.
With this model, weights (priorities) are assigned to every factor and sub-factor. In order to estimate the probability of dropout of an individual, the absolute measurement is applied, which allows the ranking of alternatives in terms of ratings, intensities or grades of the criteria (L. Saaty, 1990). To do this, categories are defined for each sub-factor and priorities are assigned to them within the corresponding factor, using pairwise comparisons. As explained above, priorities are assigned to each category of a sub-factor.
Let:
Let’s consider a student who belongs to the category k within the sub-factor i in factor j.
Then a score associated to the probability of dropout of the student is calculated as:
The number of sub-factors in a factor is considered in order to avoid that the priority of a factor be affected by that number. As illustrated in the study case, the priority of both the categories and the sub-factors are normalized. The score P is calculated for all the students. A student will be considered as a deserter if his/her score is highest or equal than the median of the score of all the students.
The Steps of the Method
The method includes the following steps:
Step 1: Identify desertion factors and sub-factors.
Step 2: Build the AHP model.
Step 3: Estimate the priorities (weights) of the factors and sub-factors, using pairwise comparisons.
Step 4: Define categories for each sub-factor.
Step 5: Estimate the priorities of the categories of each sub-factor. They correspond to the priorities of drop-off.
Step 6: For every student calculate the score of desertion, using formula (5).
Step 7: Estimate if every student will be a deserter or not. A student is predicted as a deserter if his/her score of desertion is higher than the median of the study population.
A Study Case
The Institution
The proposed method was applied in a Higher Education Institution in Chile, which has more than 90.000 students distributed in various campuses around the Country. The method was applied in the main campus of the Institution. A key issue for the correct application of the method was the support of the main administration.
Selection of Desertion Factors
The selection of factors was carried out in a workshop with the participation of a number of people of one of the campuses: the general manager, the vice-director of the campus, the directors of the programs and a couple of teachers. The procedure includes the following steps:
Academic factors
Average mark in secondary school
Average mark in Institution
Personal factors
Gender
Age
Environmental factors
Subject
Type of school
Scheme
The experts identified the factors and sub-factors shown in Table 1. It also shows the categories that are considered in each sub-factor. Each student should be characterized by the category he/she belongs to.
Factors and Sub-Factors Affecting Retention.
The AHP Model
The AHP model includes three levels, the decision to be made, factors and sub-factors, as depicted in Figure 2.

The AHP Model.
As an illustration, the comparison matrix for the sub-factors associated t with the environmental factor is shown in Table 2. This table states that Subject is 1/5 more important than School Type, which means that School Type is 5 times more important than Subject in predicting desertion.
An Example of a Comparison Matrix.
RC = 0%.
Global Priority of Sub-Factors
The overall results of global priorities of sub-factors are shown in Table 3. The “local priority” is estimated from the respective comparison matrix, whereas the “global priority” of a sub-factor results from the multiplication of the priority of the factor by its local priority.
Priorities of Factors and Sub-Factors.
Adjustment of Global Priorities
However, these values of global priorities must be adjusted because they are distorted by the different number of sub-factor contained in each factor. The adjustment consists of multiplying each global priority by the respective number of sub-factors, obtaining new representative values, named Adjusted Global Priority. These values are normalized, as shown in the last column of Table 4. Finally, the overall results of global priorities of sub-factors are shown in Table 4.
Priorities of Factors and Sub-Factors.
Global Priority of Categories
In a similar procedure to the GPS’s, the next step is the estimation of the priorities of the categories in each sub-factor. They are also obtained by using pairwise comparisons. The overall results are shown in Table 5.
Priorities of Sub-Factors and Categories.
Score for a Student
Let
The score S for a given student is defined as follows:
To illustrate the calculation of the score for a given student, let’s consider a student with the following characteristics:
Average mark in secondary school: Q1
Average mark in Institution: Q3
Gender: Male
Age: Q4
Subject: Automation
Type of School: Private school
Scheme: Day
The score for this student is calculated in Table 6.
Score for a Given Student.
Score =
According to definition given in section 3.2, a student is a predicted as a deserter if his score of desertion is higher than the median of the study population. As the median in this case is 0.02470, this illustrative student case would be predicted as a deserter by the AHP model.
The Logistic Regression Model
In order to validate the proposed AHP model, it is compared to the performance of a model built with logistic regression. The model uses the same independent variables as the hierarchical model to avoid results that are affected by the selection of variables. The model was run using RapidMiner Studio™ (8.1.001). The coefficients of the logistic regression model are shown in Table 7.
Coefficients of Logistic Regression Model.
In this case,
The explanatory variables are: x3: Gender x4: Age x5: Subject x6: Type of school x7: Scheme
A similar calculation made using AHP is applied using logistic regression. The priorities are replaced by the coefficients of Table 6. In this model, a student is predicted as a deserter if his/her score of desertion p is higher than 0.5. In the same way, the model was applied to every individual student to estimate whether he/she would desert from the Institution.
Performance of the Models
Both, AHP and logistic models were applied to the data of 2422 students that had entered the Institution the first semester of 2012. Those students were followed during the following semesters (until the second semester of 2015) in order to identify who of them deserted from the Institution. It was necessary to consider that a student deserted when he/she left their studies before completing all the courses of the program. It also had to consider all cases when a student suspended his/her studies.
Definitions and Considerations
Observed deserter: a student (not yet graduated from a career) is defined as deserter if he/she has not registered a valid enrolment in at least one academic period.
Predicted deserter: a student is considered as a predicted deserter if, according to a mathematical model definition he/she is labelled in that condition.
In order to measure the performance of the models, the following indicators were calculated: Accuracy: is the percentage of total matches between predicted and observed deserters. Sensitivity: indicates the percentage of deserters correctly predicted. Specificity: indicates the percentage of non-deserters correctly predicted.
In both models, “observed” and “predicted” labels of desertion are compared individually for each student. In this way, a correct match between them means that a particular student has been correctly identified in that condition. In other words, it is not just a prediction of how many people could be a deserter, but a one by one interpretation. According to Figure 3, it can be said that the student in the fourth row is being predicted as having deserted but it is not what was observed. In another case, the student in the first row is correctly predicted as a deserter.

Output of Logistic Regression Model.
Performance of AHP Model
The performance of the AHP model is summarized in Table 8. It shows the actual number of deserters and non-deserters and the prediction produced by the model.
Performance of AHP Model.
Accuracy = 64.57% Type I error: 39.66%.
* Sensibility.
** Specificity.
The AHP model identifies correctly 66.24% of the observed deserters. It also identifies correctly 62.55% of the observed not deserters. Finally, the AHP model has an accuracy level of 64.57%, which means that it predicts correctly 64.57% of total cases. In is important to highlight that these calculations were obtained by looking at the behavior of each individual student.
Performance of Logistic Regression Model
The logistic regression model was configured in ©RapidMiner Studio (8.1.001) and its performance was measured by split validation, partitioning the original database of 2422 students into two samples. The first one was used to train the model, and the other one was used to test it. In the following tables, the percentage of training split validation refers to the % of students used to train the model. The performance of logistic regression model is summarized in Table 9.
Performance of Logistic Regression Model.
* Sensibility: It indicates the percentage of deserters correctly predicted.
**Specificity: It indicates the percentage of non-deserters correctly predicted.
***Type I error: It indicates the percentage of students predicted as non-deserters who actually deserted.
General Performance Results
The results of the performance of both models are summarized in Table 10.
Comparison of Performance of the Models.
Figure 4 shows the behavior of all considered models with the performance measures of accuracy, sensibility and specificity.

Performance of the Models.
Discussion
The AHP model has an accuracy level of 64.57%, which means that it predicts correctly 64.57% of total cases, whereas the logistic regression model obtained between 69.45% and 76.03% of accuracy. The AHP model has a significant lower sensibility performance than the logistic models, with at least 9 percentage points of difference. However, there are no such differences in relation to specificity performance, all lying between 61% and 70%.
It is important to establish that the creation and validation of the logistic regression model required a robust data set of 2422 observations belonging to the cohort of students of 2012; while in the case of the AHP model this data can be omitted, because the parameters are obtained just by the quantification of the perceptions of the experts, about factors and sub-factors. This a fundamental difference between the models, because the AHP also allows a predictive model to be created in situations where there are not historical data.
In general terms, the behavior of all models, AHP and logistic regression, was similar. In the Table 10, it can be seen that all models have their best performance in the sensibility, and the worse in the specificity. This also can be verified in the shape of the lines in Figure 4.
Conclusions
This paper proposed the use of the Analytic Hierarchy Process (AHP) to predict students´ desertion in a Higher Education Institution. The model is used to estimate priorities for those factors and sub-factors affecting desertion. Using the rating technique, it is possible to estimate whether a student will desert from the program or not.
The method was applied in a Chilean Institution and the method was applied to a sample of students that entered the institution in the first semester of 2012. A number of experts were consulted in order to identify those variables affecting desertion. The students were followed until the second semester of 2012 in order to identify those one who actually deserted. A comparison was made between what AHP predicted for each individual and what actually happened to them.
The performance of the proposed method was also evaluated by comparing these results with those obtained using a logistic regression (LR) model. LG is a statistical technique that requires the use of a high quantity of data. It is complex to use and understand by the users. In order to compare the level of prediction of both methods, three indicators were used: (a) accuracy, which is the percentage of total matches between predicted and observed deserters, (b) sensitivity, which indicates the percentage of deserters correctly predicted and (c) specificity, which indicates the percentage of non-deserters correctly predicted. As a result, it was possible to observe that both models had in common that the sensibility was greater than the accuracy, and that this was greater than the specificity; suggesting similarities in the underlying logic.
The participants declared that the most interesting part in the whole the process of applying AHP was the task of estimating the priorities (weights) of the factors and sub-factors; using pairwise comparisons. They also said that this method allowed them to apply and improve their perceptions about how some factors may affect the behavior of a particular variable of interest. In addition, they pointed out that that this method could be applied in contexts where there are not consolidated or reliable data.
Finally, it can be concluded that it is possible to predict student desertion with a hierarchical mathematical model based on the Analytical Hierarchical Process. As a conclusion, it is possible to state that there is enough evidence to continue with this investigation, in order to improve it and to validate it as a real alternative to traditional models.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Acknowledgments
The authors would like to thank the Professional Institute DUOC UC for allowing the application of the proposed method in the Institution and in partcular to Mr. Cesar Mendoza for his valuable contribution in the development of this work.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Department of Industrial Engineering of the University of Santiago of Chile.
