Abstract
Allocating university resources, especially defining the number of necessary student groups and laboratory classes is a hard task without knowing the exact number of students who will enroll in the given courses. This number usually depends on the exam results of the prerequisite courses. However, the planning of the next term has to be done some months before the end of the actual term. This paper presents the creation of a fuzzy model that can predict the student results in case of the Visual Programming course with an acceptable accuracy based on nine input factors describing the relevant history of the student. The model has a low complexity rule base containing only 28 rules and predicts the exam result using fuzzy rule interpolation based inference. The position of the rule consequent sets as well as the rule weights were tuned by particle swarm optimization. The root mean squared error expressed in percentage of the output range was less than 13% in case of all the training, validation and test datasets, which gives a satisfactory level of information for the planning of the number of student groups and laboratory classes in the next term in case of the next course that follows the examined Visual Programming course.
Introduction
Creating university timetables as well as managing human and departmental resources often present serious challenges. These tasks have to be done some months before having exact information about the achievements of the students at the end of the current semester as well as without knowing anything about their enrollment intentions for the next semester.
Computational intelligence methods like fuzzy logic, neural networks, nature inspired optimization algorithms can be powerful tools for building models that support carrying out the above-mentioned tasks.
In this paper, we report the creation of a fuzzy model that can predict the exam results of the students with an acceptable accuracy in the case of a certain course called Visual Programming. Being a prerequisite course for some other courses its success rate greatly determines the enrollment numbers in other courses next semester, which in their turn determine the number of necessary student groups and laboratory classes.
The system estimates the exam results of each student individually based on nine factors related to the student’s history, like how many times she/he has tried the course before and what her/his exam results were in the case of some prerequisite courses. The resulting model gave good estimation for the students’ results. Owing to the sparse character of the available information a fuzzy rule interpolation based inference technique was adapted.
The main contribution of this paper is threefold: (1) it extends the fuzzy rule interpolation method LESFRI [8] by the possibility of rule weighting, (2) it extends the 2n+2 rules (rule base generation) method [6] to the case of the sample data that do not cover all the corner points of the antecedent space, and (3) it demonstrates that the presented fuzzy modeling based approach is suitable for higher educational application for the prediction of the exam results of the students in their third or higher years of studies with an acceptable error.
The rest of this paper is organized as follows. Section 2 reviews some key ideas related to fuzzy inference, sparse rules bases, and the fuzzy rule interpolation-based inference method applied in the case of the current research. Section 3 presents the methods used for the generation of the fuzzy model including the initial rule base generation and the parameter optimization. The results of the fuzzy model generation aiming at the prediction of student results are described in Section 4, and the conclusions are drawn in Section 5.
Fuzzy rule interpolation-based inference
Sparse rule bases and applicable inference techniques
Although the classical fuzzy inference techniques like Zadeh’s [22], Mamdani’s [12], or Takagi-Sugeno’s [18] are widely applied in practice (e.g. [3, 19]), they cannot properly handle cases when the available rule base does not ensure a full coverage of the input space, i.e. there is at least one point of the input space which is not overlapped (covered) by any rule antecedents [20]. Rule bases having this characteristic are called sparse ones. Figure 1 illustrates a sparse rule base in the case of a two-dimensional input space where all the membership functions are trapezoidal ones and thus the rule antecedents can be represented as truncated pyramids.
Sparse rule bases can arise either owing to the lack of information or due to an intentional approach. The first situation can occur both in the case of human created and in the case of automatically generated rule bases. The second situation could appear when the full coverage of the input space could be reached only with a huge number of rules. This would slow down the system considerably, and therefore, the creator of the rule base could consider a sparse solution as well.
The discovery of the existence and sometimes necessity of the sparse cases led to the development of interpolation based fuzzy inference methods. The key idea here is that the output of the fuzzy inference (conclusion) is determined as a result of interpolative calculations that take into consideration the similarity between the current input (observation) and the antecedent parts of the known rules. Although the related research began in the early 1990s fuzzy rule interpolation (FRI) is still an intensively investigated field.

Antecedent space of a sparse rule base of a fuzzy system with two input dimensions.
FRI methods form two main groups, i.e. the one-step and the two-step methods. The one-step methods determine the conclusion directly from the observation, and thus the creation of an auxiliary rule is not necessary. Typical members of this group are e.g. the linear interpolation [10], the vague environment based reasoning FIVE [11], or the IMUL method [21], and FRISUVW [7].
In contrast, two-step FRI methods apply the concept of Generalized Methodology of the fuzzy rule interpolation (GM) [1]. In the first step they interpolate a new rule in the position of the observation and next, they calculate the conclusion using a special single rule reasoning technique. Typical members of this group are the technique family suggested in [1], LESFRI [8], IGRV [4] as well as the transformation based technique [2].
The FRI method chosen for the current research is an extended version of the Least Squares based Fuzzy Rule Interpolation (LESFRI) [8] and it is called Least Squares based Fuzzy Rule Interpolation with Rule Weights (LESFRIW).
LESFRI uses the classical Mamdani type rule format
where x i , i ∈ [1, N I ] is the ith antecedent linguistic variable, N I is the number of input dimensions, A i , i ∈ [1, N I ] is the antecedent fuzzy set of the rule in the ith antecedent dimension, y is the consequent linguistic variable, and B is the consequent fuzzy set of the rule.
LESFRI was originally developed aiming at the creation of a fast technique with low computational complexity. The proposed extension, which will appear later in (2) aims at the introduction of rule weights that provide an additional tuning possibility to the fuzzy model. Further on a description of the method is given for the case of multiple input single output (MISO) systems.
The key idea of the method is that first a new rule is interpolated in the position of the current observation based on the Euclidean distance between the multi-dimensional input (observation) point and the reference points of the antecedent sets of the known rules. The reference point of the new rule is defined as a weighted average of the reference points of the existent rules using an extended version of the Shepard interpolation [17] and ensuring the fulfilment of the requirement that if the current input is identical in each antecedent dimension with the reference points of the antecedent sets of one of the existent rules, the interpolated rule has to be identical with that specific rule. Thus
where ρ (B
i
) is the reference point of the consequent fuzzy set of the interpolated rule, N
R
is the number of the rules,
The weights associated to the rules (w R ) is a new feature added to the original LESFRI method that increases the tuneability of the fuzzy model by providing an additional tool for the designer to express the difference between the desired effects of the individual rules if it is necessary. The effect of the individual rules is also determined by their distance from the observation.
The antecedent and consequent sets of the new rules are calculated using the least squares based fuzzy set interpolation technique (FEAT-LS) [8]. Here the task is to determine a new fuzzy set in each antecedent and consequent dimension so that the reference points of the new sets have to be identical with the observation values (in the case of input dimensions) and with ρ (B
i
) in the case of the output dimension. The key idea is that the shape of the new set should be similar to the shapes of the existent sets. All sets of the partition are taken into consideration using different weights depending on their distance from the interpolation point. If the interpolation point is identical with the reference point of an existent set, the interpolated set will also be identical with that set. The calculations are done α-cut wise separately for the left and right flanks of the new set. Thus, the X
i
interpolated set is defined as
where α
j
∈ [1, N
L
] is the jth α-level, N
L
is the number of α-levels. Furthermore,
where d (X
i
, X
i
) denotes the Euclidean distance between the interpolated set and the ith fuzzy set of the current partition, N
x
is the number of fuzzy sets in the current partition. To avoid the formation of an abnormal set shape the condition
has to be enforced as well. The points belonging to the right flank are calculated in an identical way.
Having the antecedent and consequent sets of the new rule the conclusion (the resulting set of the fuzzy inference) is determined by the SURE-LS single rule reasoning method [8].
In the simplest case, when the fuzzy observation (or the fuzzified value of the crisp input) is identical with the antecedent part of the interpolated rule in each dimension, the conclusion will also be identical with the consequent set of the rule.
Otherwise, the dissimilarity between the observation and rule antecedent sets is measured in each antecedent dimension and the consequent set is modified to reflect the average dissimilarity encountered on the antecedent side. Thus, considering the conclusion set B as
where
where j is the number of the current α-cut, N
D
is the number of antecedent dimensions
has to be enforced as well. The points belonging to the right flank are calculated in an identical way. The output of the system is determined by an arbitrary defuzzification method.
The fuzzy model generation methodology applied in course of this research consists of two steps. The first step identifies an initial rule base that describes some characteristic points of the hyper surface that represents the relation between the input and output data.
Next, an optimization is executed where some parameters of the model are adjusted to obtain a better resemblance between the actual system output and the expected output described by the sample data.
Creation of the raw fuzzy model
Sparse rule bases containing some relevant rules combined with rule interpolation based fuzzy inference can provide a low complexity fuzzy model and reduced time demand of the fuzzy model generation especially in multi-dimensional cases. Following this idea in course of this research we adopted a modified version of the 2 n + 2 rules method [6] for the generation of the initial rule base.
The key idea of the 2 n + 2 rules method is that in order to facilitate the rule interpolation one creates rules for each corner point of the multi-dimensional antecedent space and two extra rules are also defined that describe the minimum and maximum outputs, respectively. In some cases, the minimum or/and the maximum output might occur for input values corresponding to one of the corner points and therefore, the final number of rules is less than 2 n + 2.
For illustration purposes we consider a simple case of a MISO model with two inputs (x1 and x2) and one output (y), where the functional relation between the input and output data is described by the surface in Fig. 2. Considering the 2 n + 2 rules method with triangle shaped membership functions the rule base will consist of 6 rules of whose antecedent parts are shown in Fig. 3. Four pyramids are symbolizing the antecedent parts of the rules describing the input-output relationship at the corner points of the antecedent space ((0.0,0.0), (0.0,0.5), (0.5,0.0), (0.5,0.5)) and the two remaining pyramids are centered around the input values ((0.3,0.0), (0.3,0.5)) corresponding to the minimum and maximum output, respectively.

Sample surface.

Rule antecedents.
Although it represents a simple and straightforward approach the applicability of the 2 n + 2 rules method is limited to the cases when all the necessary data samples are available, which cannot be always ensured. Therefore, in this paper we propose a modification of the original method by substituting the missing corner point describer data sample by the sample that is the closest one to the corner point. The methodology of the creation of the fuzzy model is described by the following algorithm.
Further on the creation of the new rule is described for the case of trapezoidal membership functions. Its main advantage is that it can be easily used for the triangle and singleton set shape types as well. All sets of a partition will be created with an identical set shape of which core and support widths are defined in function of the range of the allowed values in the actual dimension
where c c and c s are constants applied in all dimensions. The new rule is created conform to the following algorithm.
The quality of the fuzzy model is characterized by a so-called fitness function, which measures the dissimilarity between the system output and sample output in the case of the same input data. In course of this research we considered the root mean squared error expressed in the percentage of the output range, which is defined as
where N is the number of data samples, y
i
is the output of the ith data point,
The goal of fuzzy model optimization is to find the best possible values for some selected model parameters to achieve the best fitness value
where P is the parameter vector, N
p
is the number of parameters,
Nature inspired optimization algorithms have proved to be useful tools for fuzzy model tuning (e.g. [5, 15]). Particle Swarm Optimization (PSO) is a heuristic technique originally described by Kennedy and Eberhart [9]. It belongs to the family of population based iterative methods where the members of the population are called particles.
In course of the optimization the particles move in each iteration cycle one step in the search space looking for the optimal position. The key idea of the method is that the step depends on three factors: (1) its value in the previous iteration cycle, (2) the best position encountered by the particle so far, and (3) the overall best position having encountered so far taking into consideration the history of all particles. Thus, in each cycle a particle is characterized by two vectors, one describing its actual position in the search space called position vector and one storing the step it takes called velocity vector. The general description of the algorithm is presented below.
Velocity vectors have the same dimensionality as the position vectors. Their initial instances are filled with random numbers from the [–0.5, 0.5] interval. Starting from the second iteration cycle they are calculated by
where i is the iteration cycle number, j is the particle number, c1, c2 and c3 are constant parameters of the PSO method,
The initial position vectors are filled with random initial positions from the search space. However, a guided initialization using a-priori knowledge is also acceptable. For example, one particle is initialized with a given parameter vector and the rest of them are initialized randomly. Starting from the second iteration cycle they are calculated by
Taking into consideration that the parameter values usually should be confined to the search space the position vector is corrected by
where Pmin and Pmax are vectors containing the lower and upper bounds of the parameter values.
PSO can be used with several stopping criteria. For example, the algorithm can stop if the number of iteration cycles reached a limit, or the number of consecutive iteration cycles without a significant (e.g. 5%) improvement of the fitness function reached a limit, or the value of the fitness of the overall best particle becomes better than a target value.
The goal of this research was to show the practical applicability of a fuzzy model for the support of the university timetable creation process. The purpose of the fuzzy model is to give an acceptable prediction of a student’s exam results in the case of a specific course called Visual Programming. The prediction is made based on the previous achievements of the student using the following nine pieces of information as input data. Number of enrollments of the student in the Programming 1 course Last Programming 1 exam result Number of enrollments of the student in the Algorithms and Data Structures course Last Algorithms and Data Structures exam result Number of enrollments of the student in the Programming 2 course Last Programming 2 exam result Number of enrollments of the student in the Programming Paradigms and Techniques course Last Programming Paradigms and Techniques exam result Number of enrollments of the student in the course Visual Programming course
All the presented methods (LESFRIW, raw fuzzy model creation, PSO, tuning) were implemented in Matlab by the author.
Conform to the Hungarian higher educational system and the current practice all the above presented input variables can take values between 1 and 5. In the case of the exam results 5 represents the best possible value while 1 corresponds to the worst possible one. Mark 1 means that the student failed the exam. Using the same value interval all the input fuzzy partitions were chosen of the same form. Figure 4 shows the first antecedent fuzzy partition. All membership functions are triangle shaped and five fuzzy sets are used in each partition.
The output of the fuzzy model is the predicted exam result. Here the range of the values is also 1..5 and similarly to the antecedent case the consequent partition also contains five fuzzy sets with triangle shaped membership functions.

First antecedent partition.
The available data set contained 78 samples, and it was divided randomly into three subsets. The biggest one with 54 samples (about 69%) was used for training purposes, the second one with 16 samples (about 21%) was created for validation purposes, while the last one with 8 samples was reserved for testing the final model.
The fuzzy model was generated using the methodology presented in Section 3. First, an initial model was created, which contained only 28 rules. It was because the sample data only partially covered the hypercube defined by the theoretically allowed input value ranges.
The parameter optimization was done by PSO focusing on two parameter types, i.e. the position of the consequent sets and the weights of the rules. The two parameter types were tuned alternately using the following schema.
PSO was used with the following parameters number of particles N = 100, c1 = 0.2, c2 = 0.2, c3 = 0.6, and the second stopping criterion type was applied with the number of allowed consecutive iteration cycles without at least 5% improvement of the fitness function n = 10. Figure 5 shows the variation of RMSEP in course of the training process in the case of the training and validating data sets.
The optimization was done based on the fitness measured on the training data set. The curve describing the variation of RMSEP in the case of the validation data set was used only to decide which model-parameter set should be kept as the final one. Figure 6 shows the final consequent partition obtained at the end of the tuning process.

RMSEP variation in course of the tuning for training and validating data sets.

Tuned consequent partition.
The fitness values of the initial and final fuzzy models in the case of the different data sets are presented in Table 1. Figures 7–9 show the coverage of the sample output points with the calculated output points at the end in the case of the training and validating data sets, respectively.

Sample and calculated output values in the case of the training data set.
RMSEP values

Sample and calculated output values in the case of the validating data set.

Sample and calculated output values in the case of the test data set.
The results show that the developed fuzzy model can predict the exam marks of the students in the case of the Visual Programming course with an error less than 13%.
This paper reports the creation of a fuzzy model which is able to predict the exam results of students based on their previous university achievements. This type of prediction can never tell exactly the exam results in advance because the previous academic life of the students does not fully determine in advance the exam results. However, a good enough estimation can provide great help for the university timetable and resource allocation planning.
In the case of this project the root mean squared error expressed in percentage of the output range was less than 13% at the end of the tuning process in the case of all datasets that gives a satisfactory level of information for planning the number of student groups and laboratory classes in the next semester in the case of the ASP.Net Programming course that follows the examined Visual Programming course. The developed fuzzy model contains only 28 rules mainly because not all the value combinations of the 9 input variables could be experimented in practice. Therefore, a fuzzy rule interpolation-based inference technique had to be adapted.
The results of this project clearly showed that the presented approach is suitable for higher educational application for the prediction of exam results of students in their third or higher years of studies.
Footnotes
Acknowledgments
This research was supported by EFOP-3.6.1-16-2016-00006 “The development and enhancement of the research potential at John von Neumann University” project. The Project was supported by the Hungarian Government and co-financed by the European Social Fund. The research was also supported by ShiwaForce Ltd., and the Foundation for the Development of Automation in Machinery Industry.
