Abstract
Knowledge defines the action of an entity in a particular work setting. Developing a model for knowledge creation helps to gain a competitive advantage in the work scenario of any organization. There is also a requirement in the non-profit sector like education to optimize the functioning by deducing knowledge from the massive amount of data that gets accumulated. The paper illustrates the implementation of Genetic Algorithm based knowledge creation technique which could be applied in higher education scenario quantifying knowledge on various parameters and eventually helping in making guided decisions in the environment.
Introduction
Comparison of different solutions in an iterative manner, leads to optimization on certain output criteria. The process of comparison goes on till a favorable solution is reached. Hence in order to acquire a favorable solution in numerous problems of the real world, maximization or minimization of a certain output criteria should be carried out [1]. Optimization in the functioning of non-profit education sector could be achieved by adopting measures that lead towards knowledge creation. Researchers have taken up this subject to carry out this work of importance [2].
Knowledge can be viewed as problem definitions, validated models, concepts, arguments, beliefs, descriptions, problem statements, etc. [3]. Knowledge which is a direct precursor of our decisions, eventually help in making guided decisions in the environment [4]. The subject of knowledge discovery in database (KDD) has been taken up by various researchers in sectors pertaining to medical, education, insurance, etc. [5, 6].
In the popular book “The knowledge creating company”, the authors put forth the view that the success of Japanese companies relies on the skill set and expertise at the knowledge creation level [7]. Keeping the same in the light, knowledge progression in an organization can be exhibited through Fig. 1.
Knowledge progression.
One of the major achievements of the field of Artificial Intelligence (AI) is to provide a range of ways to represent knowledge [8]. Knowledge in a system can be represented by IF-THEN rules illustrating causal relationships, a frame-based system, a semantic network, or through an artificial neural network depicting connection weights.
The knowledge that can be represented in form of IF-THEN rules, bring out the relationship between various attributes under study and adopt a segmental approach, each describing primarily independent and relatively minor piece of knowledge [9]. The study undertakes the creation of knowledge set through generation of classification rules. Genetic Algorithm (GA) is used to develop efficient classifier in the study and hence achieve a knowledge set comprising of rules that can be further used to exploit relationships among the independent attributes.
For achieving optimization, GA is a robust and suitable technique. It carries out directional search in an effective manner. Moreover through the procedures of selection, crossover, and mutation GA finally converges over successive generations to achieve global (or near global) optimum. The technique follows the following scheme until best solutions are acquired.
Random initialization of population (p)
Determining fitness of population (p)
repeat
Selection of parents from population (p) Performing crossover on parents, leading to creation of population (p Performing mutation of population (p Determining fitness of population (p
until best solution is reached
The paper is organized as follows. Section 2 discusses the related work. Section 3 elaborates the methodology adopted in the study explaining the application of GA procedures to the considered problem. The results acquired on implementation of the experiment in MATLAB R2015a are explained in Section 4. The interpretation and analysis of the results are also stated in the same section. Finally, the conclusion is drawn along with the statement of future directions in Section 5.
As is known, knowledge building can be achieved through extraction of information from the huge amount of data accumulated by organizations. Knowledge centric activities could be associated with number of important processes that exist in educational organization like planning of strategies, admission, curriculum development, teaching-learning, examination and evaluation, research initiatives, bonding with alumni, etc. [10].
Data mining (DM) technique is the process of discovering relationships from huge volumes of data. Further DM technique of Classification rule mining can be considered as a means to establish relationships between entities that are not visible openly. Experimental results show that the technique of classification rule mining well supplemented by genetic algorithm discovers rules having higher classification performance to unknown data [11].
Several GA designs, for discovering classification rules, have been proposed by the researchers. The researchers [12] in their work proposed IF-THEN rules that make use of generalized uniform population method. A uniform operator inspired from the uniform population method is used to obtain high quality chromosomes. In the work [13], GA is used to discover IF-THEN rules where each chromosome corresponds to classification rule. Here the genotype (number of genes) in the chromosome is kept fixed varying the phenotype (number of rule conditions). In their work [14] constructed IF-THEN rules using GA approach where the fitness function considers four important factors, error rate, entropy, rule consistency and hole ratio. Even Hybridized GA with Tabu search was proposed to do classification rule mining [15]. The authors [16] proposed a Hybrid Elite Genetic Algorithm and Tabu Search method to optimize the premise and consequent parameters of fuzzy rules for the design of fuzzy system of Takagi- Sugeno zero order.
Attributes of the study
Attributes of the study
Entropy of the independent attributes.
Subsequently the approach of GA was further extended by various researchers by using multi-objective genetic algorithm approach (MOGA) for mining classification rules from large databases. In the work [17], researcher used Lexicographic Pareto Based Multi-Objective Genetic Algorithm to extract useful rules from Iris Data set. In their work [18] proposed a multi-objective genetic algorithm with a hybrid crossover operator for optimizing objectives like predictive accuracy, comprehensibility and interestingness of the rules simultaneously. Later the researchers compared this rule discovery procedure with simple genetic algorithm approach with a weighted sum of all the above objectives.
The procedure starts with the initialization of population comprising of random members. Student’s characteristics related to performance of the students in final term examination are codified in terms of bit sequences. This initial population is used to randomly select two individuals to create offspring exhibiting great potential. These newly generated offspring will have high scoring capabilities due to the inheritance of characteristics from their high scoring parents.
Replacement of non-high scoring offspring could be carried out by high scoring individuals, generated through the work keeping the population size intact. Working through this process leads to production of offspring with high scoring intelligence.
Attributes under study
The study was carried out using the following attributes shown in Table 1.
The entropy value for each dependent attribute is then calculated, which is a measure of uncertainty associated with a random variable. In case,
where is the probability that an arbitrary sample belongs to class
It should be noted that higher entropy signifies higher uncertainty and lower entropy represent lower uncertainty, which is desirable [19]. The independent attributes were assessed on the basis of their entropy value. The results are shown in Fig. 2.
Attributes can be encoded using binary values. On application of this procedure, continuous values of the attributes may be transformed into binary values, and vice versa. As understood, binary digits are used to work with binary encoded GA, let
Encoding of attributes
Encoding of attributes
The chromosome in work is made up of 5 attributes, each encoded with
Where
GA optimization parameters used for the study are presented in Table 3.
Optimization parameters
Optimization parameters
Result after 25 generations run
The population comprises of a group of chromosomes. The initial population is randomly generated and evaluated over a fitness function. With the help of the fitness function, quantification of the optimality of a chromosome could be achieved. Quantification of the chromosomes may lead to ranking of a particular solution against all the other solutions.
A five attribute, weighted fitness function shown in Eq. (4.1), is stated for optimizing the end term performance of the students in the course of Mathematics II. It can be seen that every attribute (
Antecedent part of the rule
Antecedent part of the rule
The experiment has been performed using MATLAB R2015a on a system with Intel(R) core (TM) i3-4005U CPU @ 1.70 GHz, 1700 Mhz, 2 Core(s), 4 Logical Processor(s) and 4.00 GB installed physical memory (RAM).
Class assignment
Class assignment
Confusion matrix
Performance metrics for evaluating the rule
The experiment was carried on a dataset having 5000 records and five attributes, all of them categorical.
where
On running the algorithm for 25 generations, the result set is shown below in Table 4. It has been observed that on running 10 iterations in 25 generation, the maximum occurrence of bit sequence in the dataset is 4.
Result interpretation
The bit pattern 100111010111111 obtained can now be converted into the antecedent part of the rule as shown in Table 5.
To obtain the consequent part of the rule, the class label can be assigned by the following below mentioned computation in Eq. (4).
Where
The class label assigned to the chromosome is the one for which the value of Eq. (4) is highest. On applying the above mentioned criteria, the probable class labels were compared for their count in the database and the following counts for the respective class labels were obtained, which are presented in Table 6.
Hence the selected class label is 111, which translates to end term marks lying between 91–100. Therefore the rule that is coined is given below.
IF
Continuous Evaluation Marks (CEM) are between 16–19 (Max marks 30)
THEN
The End Term Marks (ETM) are between 91-100 (Max Marks 100)
Consider that the rule generated, is of the form: IF A THEN C where
Confusion Matrix can be used to summarize the performance of a rule, in terms of True Positive (tp), True Negative (tn), False Positive (fp) and False Negative (fn) values.
Where tp
fp
fn
tn
The confusion matrix is presented in Table 7.
Various metrics that can be derived from the confusion matrix are listed in Table 8.
Conclusion and future scope
The study uses GA for establishing relationships between attributes that are not visible openly. The single objective implementation of GA which identifies students’ characteristics associated with maximization of final performance in a course of higher education proves to be a useful deduction. Such an inference assumes importance in curriculum planning, an important aspect of any academic institution. This kind of work will make a sizeable contribution towards professional growth and advancement of the students.
The study can be further extended to implement rule generation through multi objective genetic algorithm (MOGA).
Footnotes
Acknowledgments
We acknowledge the support provided by the management of Corporate Groups of Institutes, Bhopal (India) for providing us the necessary backing to carry out the research activity presented in the paper.
