Nature inspired clustering and indexing of learning objects based on learners cognitive skills

Abstract

In the present scenario of educational technology inter-networking have provided a platform to access and share learning materials spread across various educational institutions. E-learning platforms provide an interface for accessing and sharing of heterogeneous educational resources (Learning Materials) to the various types of learners and content providers. These materials are created as the smallest digital imprints called as learning objects for the better usability. However, due to the variation in learner’s competence, providing a right content to the learner has become a cumbersome task. Consequently, a lot of personalization towards the creation and storage of the learning objects has become obligatory. Strategies involving learning style detections have provided a source of solution for the personalization. Yet, these methods are carried out with a limited number of learning styles and learning objects. And, most of these methods fail to upgrade the competency level of the learners as it provides less concentration to the learner’s capability. In order to personalize the system in consideration of the learner’s capability demands, there is a demand to understand the individual learner’s strength and weakness in the learning process. One of the features that decide the learner’s capability is the cognitive skill of the learners. It is desirable to maintain e-learning materials with respect to cognitive skill of the learners so the learning process becomes enjoyable. This paper focuses on the grouping of the e-materials with respect to the dominant cognitive content of learning objects. Hence, an organized storage mechanism is envisioned to aid the faster search and recovery of learning objects depending on the individual’s capabilities. The process is aimed to classify the materials that could be stored in a dedicated repository maintained in a distributed environment for a good usability.

Keywords

Learning objects clustering indexing learning object storage cognitive skill nature inspired

1. Introduction

In the present scenario of the e-learning technology, a number of Learning Objects (LO) are created to help the different types of learners effectively. Different organization of learning objects is being widely adopted so that learners can access, share and use these resources easily and effectively. The maintenance of these learning objects encourages storing, indexing, sequencing, and retrieval of the objects. Organisation methodologies arrange different means of learning object grouping to help the learners in a personalized way. It is observed that these organizations attempt to provide a learning content at the learner’s desk in their desirable mode with the assurance of their convenience in learning. Usually, the contents are available in the form of learning objects and are stored in a centralized repository. This leads to the issues of storage, access and thereby affecting the reliability and maintainability of the system. Eventually, the challenge lies in storing and maintaining the assimilated learning objects in a standardized form to aid an easy access and retrieval. In order to deprive these issues, an appropriate storage mechanism is required. In view of this requirement, a dynamic reliable, maintainable storage mechanism is evolved. The idea emphasizes for distributed repositories that are strategized through a convenient arbitration mechanism, which boosts the organization of learning objects. Additionally, the organization is expected to help the individual’s convenience to reach the competency level. Hence, a Cognitive Skill (CS) based personalization is envisioned. Subsequently, cognitive skill-based classification strategy for storage of the learning object is aimed to be deployed over the grid structure for better personalization. This paper attempts to propose a novel standardizing mechanism to segregate the learning objects towards satisfying the cognitive skill of an individual, thereby improving the LO usability. Furthermore, the approach introduces a method that is powerful to guide a distributed storage mechanism that aids an easy and faster access of learning objects.

2. Related work

Learning object storage has been a highly puzzling process, as it instigates the learning as a meaningful and fruitful process for improving any one’s competency level. With the advent of technological facilities, E-learning has become prevalent across various domains of the education system. Irrespective of the type and level of education, Information and Communication Technology (ICT) based e-learning system has become essential to support learners for the personalization of the learning process. Mostly, this personalization is carried out to cater to the need for learner’s learning style. Hence, to suit the learning style of the learners, learning objects are needed to be deployed. As the number of learning objects proliferates, it becomes highly indispensable to increase the memory requirements of the system accordingly. Consequently, an appropriate storage and cataloging mechanism need to be adapted to assist faster, relevant retrieval of learning objects. One of the strategies adopted to resolve these issues is the distribution of course contents across variety of platforms. Several attempts have been carried out in the last decade for the effective storage of learning objects thereby to develop a prudent system for content management. Considerable research work has been carried out by various authors focusing the above aspect with different approaches. The contents are organized as a set of learning objects that span from text to multimedia contents, which included tutorials, scenarios, simulations, lesson modules, case studies, and assessments [30]. These learning material composed of different objects are usually stored in a central repository [32]. However, these repositories are to be properly indexed and maintained using an appropriate strategy. A Knowledge-tree model maintained as a component had reusable learning objects and services for learners as well as tutors. The model operated through the distributed environment for better storage and maintainability. Yet, the model suffered from an efficient protocol for the communication and retrieval [5].

Figure 1.

Proposed architecture.

Similarly, [36] tried with geographically spread e-learning services based on the subjects as a strategy for storage. The strategy was worked out to reduce complexity and time. Meanwhile, [23] adapted a storage strategy based on the profile and experience of the learners. However, the method had limited inferring mechanisms. Also, it is noted that these storage repositories contain contents from different sources and needs standardization for better maintainability. An open source e-learning taxonomy is attempted to resolve the issue of standardization [10]. But, the model restricted the number of category of Learning Objects [28] arrived a metrics to understand the size and growth rate of the learning object to decide on the storage requirements for an e-learning system. Meanwhile, [31] proposed an efficient system using P2P architecture and Semantic Overlay Network. This method suffered from clustering similar LO’s from the perspective of different degrees of aggregation. In the meantime, [17] adopted a learning grid to identify appropriate LOs based on learner’s context and preferences. This has led to the trials on content creation and management [31, 29, 15]. Ulimately, the e-learning system discovered the use of social networks as a platform for sharing knowledge base of databases with different scheme designs [3, 11, 2], proposed to use learning object repository that was stored with a classification of Learning Objects using Felder-Silverman Learning Style Model (FSLSM) model. Additionally, [26, 27, 20] have expressed the significance of distributed database, query optimization and data replication in the e-learning technology. The idea of distributed database explored the possibility of information access and rapid data collection for high-performance data processing in e-learning content management. Subsequently, a micro context based positioning process was evolved to support distribution of LO [24]. On the other hand, the concept of data replication boomed to be an effective technique for distributed storage of e-learning content [35, 25]. In a similar way, [34] have developed sharable web fragment to distribute course contents across different e-learning system and a federated, unified and an interoperable scheme had been attempted by [22, 6] has proposed flexible and distributed framework that included tasks to capture, process and store data. An Apache Hadoop Framework has been used to collect live data, organize data traffic, stream data, and for storing data. Hence, a scope of distributed storage is expanded to fulfil the requirement of faster access.

From the study, it is inferred that the storage of e-learning materials is as important as the content authoring for better usage of LO’s. The approaches followed in the literature mostly rely on the FSLSM and subject-based model, where a centralized repository was emphasized. The repository is observed to be deployed over distributed environment for an effective maintenance of LO’s. Yet, the arbitration allowed to distribute LO’s concerned only on the storage and the consideration of learner’s learning style and their capability was not either addressed or handled in minimum. Hence, to help in the easy access of LO’s with respect to their learning capability, a system is demanded. To address this issue, a distributed model policy including the cognitive capability of the learner’s is proposed. The model claims to distribute the heterogeneous LO’s into an appropriate cluster so that the learners can easily access their choice of LO’s that improves the competency level without any difficulties. The following section depicts the distributed model proposed for the issue.

3. Design of nature inspired distributed model for LO storage

Considering the gap identified a distributed model for LO storage is envisioned and designed. The design model for the LO storage system is provided in Fig. 1. The model enclosed three major processes such as CS-LO mapping, LO-Automation and pre-processing and LO Clustering for storage as shown in Fig. 2.

Figure 2.

Pre-processing.

3.1 CS-LO mapping

Due to the advent of ICT in learning, a lot of research commenced to bloom towards building a learning network for bringing up global knowledge sharing. In order to provide, a good and flexible encompassing different LO’s are suggested. According to [21], each of these LO’s are entitled to be a knowledge objects enclosing a set of predefined self-describable elements such as text, video, audio, graphics or any other illustrations issuing contents. However, for effective strategizing, these objects need to be properly designated and indexed and hence the storage and access to these objects become easy.

To commence with the process, initially, an appropriate organization of LO’s has to be carried out. As the focus of the work is storage strategizing, an additional feature for LO’s segregation is to be defined. The method attempts to favor the learning style of the learner’s, which is ultimately defined by the cognitive skill of the individuals. Hence, a demand for subscribing LO’s with respect to cognitive skill is foreseen. In support of this spirit, [4] proposed a model that aids in adding learning objects that are in accordance with the cognitive skill of a learner. The cognitive skill is broadly classified as memory, concentration, perception and logical thinking. According to [4], a mapping between the CS and LO is evolved. The mapping projected a combination description of LO with CS. Using this idea, an opinion survey is conducted among the educationalists. The survey outcome is synthesized and verified with the outcome obtained [4]. It was observed, that the mapping was the same. Hence a confident move towards LO-CS mapping is performed. The individual mapping thus obtained as,

$\displaystyle M=\{\textit{Text, Audio, Video}\}$ $\displaystyle C=\{\textit{Chart, Diagram, CaseStudy}\}$ $\displaystyle P=\{\textit{Video, Diagram, Simulation}\}$ $\displaystyle L=\{\textit{Chart, CaseStudy, Simulation}\}$

Having obtained the mapping, the selected documents are subjected to annotation. Presently with the consensus of the experts, a manual annotation process was suggested.

3.2 Annotation and pre-processing of learning materials

The process that is time-consuming in the whole model is in preparing the learning objects suitable for clustering. Initially, the raw course material as obtained from different authors is annotated manually with the help of experts. The present form of the system obtains the input from the experts and using the mapping evolved, the annotation is performed. Therefore, each Learning Material (LM) is assumed to be set of LO’s (indeed, it represents only the available LO in the material). However, the annotation material cannot be taken forward as such as the different types of objects have varied influence on the cognitive skill. Hence, an appropriate pre-processing scheme is designed. According to the scheme, the weights are assigned randomly based on the importance as well as the influence of the LO towards the CS. To make the computation simple, the weights are distributed among the LO’s such that it sums up to 1.

3.2.1 Preprocessing

Using the weight decided, the procedure as mentioned in the Fig. 2 is carried out. Finally, each LM is designated as LO vector. By finding the weight of each LO-CS pair the dominant CS of the LM is obtained. Hence, the LM-LO vector with designated CS is stored in the knowledge store for further processing. As each of the LO has different weights, a normalization process is applied on the feature vector to convert the feature values between 0 and 1 to reduce the effect of biases. This normalization is carried out through Min-Max Normalization using the expression,

$\displaystyle V=\frac{A-\min(A)}{\max(A)-\min(A)}.$

Hence, the feature vector value is normalized between 0 and 1 and is made suitable for uniform processing. Thus, the knowledge store contains the annotated LM, as well the LM’s significance towards the cognitive skill. Finally an appropriate clustering is employed to categorize the LM’s based on the CS and subsequently, a schema for distributed storage and cataloging may be arrived. To determine the dominance, the appropriate values of LO’s are aggregated for each CS respectively. The CS that culminates to a maximum value is considered to be a dominant CS for the document. Finally, the document is associated with the dominant cognitive skill. Once the annotation is completed, the documents are subjected to clustering process.

3.3 Clustering for distributed storage

Clustering is a process of collecting similar objects. The process is usually performed with respect to certain characteristics of the object that make the object unique. Traditionally, the clustering is carried out using either supervised mode or in an unsupervised mode. The supervised clustering is performed with respect to known classes. Hence it may provide an illusion of classification, with high probability density to a single class. An unsupervised clustering is a process that finds groups that are inherent to data with specific object functions and keep the group tight. Semi-supervised clustering, in turn, enhances the process with additional input along with objects characteristics so that the clustering becomes tighter with minimum distance.

To carry out the process of LM clustering, an appropriate clustering mechanism is to be decided. As the domain has different dimension of perspective such as CS, LO and learning style for clustering obviously the unsupervised mode is ruled out. A supervised model may be a suitable one but obtaining the additional information for the clustering may be challenging. Therefore a semi-supervised mode of clustering is suggested.

4. Nature inspired clustering of learning objects

As stated in the previous section, a clustering process is designed to segregate the learning objects, which may be subsequently dispersed in to a distributed environment for a better storage organization. According to the literature, it is found that some of the clustering algorithms are traditionally adopted for its popularity, flexibility, applicability and handling high dimensionality [1]. K-Means algorithm is one of the most important and widely used algorithms in clustering issue and is known due to the simplicity and rapid implementation for large test data set [16]. Though it is simple, in these kinds of clustering it is not possible to have full search, as challenges remain with the definition of clustering in a large search space. Hence, to have good quality clusters having optimal or near-optimal solutions in a domain of learning object discovery with respect to the learning style of individuals, nature-inspired clustering is preferred. According to [12, 14], out of several nature-inspired clustering, Particle Swarm Optimization (PSO) based clustering is found to be more appropriate for recommendation systems of larger search space. Indeed, the learning materials will have larger search space as the number of learning objects increases. Therefore, the clustering of learning objects is attempted by a Hybrid PSO technique which would be performing the clustering process based on the characteristics of cognitive skill.

4.1 Particle swarm optimization for LO clustering

Generally, PSO is a meta-heuristic algorithm which relies on a population, referred to as a swarm of particles [13], each representing a solution within the problem space. Each particle owns some information like its location, its velocity, the memory of its own best position and the best position found by the whole population up to the current search round. Fitness function calculates the fitness value for each particle, which is used to adjust the particle for a better location. Each particle has a position and velocity, denoted by the following vectors:

$\displaystyle X_{i}(t)=(x_{i1}(t),x_{i2}(t)\dots x_{in}(t))$ (1) $\displaystyle V_{i}(t)=(v_{i1}(t),v_{i2}(t)\dots v_{in}(t))$ (2)

The solution is evaluated for each particle iteratively. The former best position of the $i^{\rm th}$ particle at $t^{\rm th}$ iteration is denoted by vector, $P_{ibest}(t)=\{p_{ik}(t)\};k=1\ldots n$ . The position of all the particles giving the best fitness value at $t^{\rm th}$ iteration is referred as the global best position denoted by $G_{best}(t)=\{g_{k}(t)\};k=1\ldots n$ . And for each iteration, the velocity and the position of each particle are updated:

$\displaystyle X_{i}(t+1)=X_{i}(t)+V_{i}(t+1)$ (3) $\displaystyle V_{i}(t+1)=\omega\times V_{i}(t)+c_{1}r_{1}(P_{ibest}(t)\!-\!X_{% i}(t))+c_{2}r_{2}(G_{best}(t)-X_{i}(t))$ (4)

where $\omega$ is an inertia weight that denotes a preference factor for the particle to continue moving in the same direction. The concept of inertia weight was brought to light by [33]. Several strategies for selecting inertial weight $\omega$ have been described in [8, 13, 18, 19]. Also, t is the iteration counter which is bound to a predefined maximum value $T_{Max}$ , often used as one of the termination criteria for the PSO algorithm. $c_{1}$ and $c_{2}$ are positive constants. Finally, $r_{1}$ and $r_{2}$ are random numbers in the interval $[0,1]$ . To prevent the particle from getting far away out of searching space, another predefined constant $V_{Max}$ is used to bind the particle’s velocity. Each element of the velocity vector should be within the range $[-V_{Max},V_{Max}]$ . Furthermore, ways for confirmation of convergence of the PSO has been detailed in [9, 7].

According to the attempted scenario, each vector represents the consolidation of learning objects available in each of the learning materials. Ideally, seven learning objects are considered as mentioned in section. Though the vector contains seven parameters, for any learning materials there would be a few influencing parameters that contribute for the dominancy of cognitive skills in the material source. Hence, uniformly three parameters for each cognitive skill are identified. As per the investigations carried out by [4], it is inferred that each cognitive skill is contributed by the LOs significantly as shown in the Table 1.

Table 1

Influential learning objects

Cognitive skills	Learning objects
Memory	Text	Audio	Video
Concentration	Diagram	Chart	Case studies
Perception	Video	Diagram	Simulation
Logical thinking	Simulation	Case studies	Chart

Considering these significant influencing parameters, the objective function is formulated as,

$\displaystyle Max(f(\textit{CS}_{i}))=\textit{ILO}_{1}+\textit{ILO}_{2}+% \textit{ILO}_{3}$

where ILO is the Influencing Learning Objects and $i=$ 1, 2 and 3. Obviously, this leads to another function that minimizes the impact of rest of the parameters. Hence, the function is denoted as

$\displaystyle Min(f(\textit{CS}_{i})=L_{\textit{ILO}_{1}}+L_{\textit{ILO}_{2}}% +L_{\textit{ILO}_{3}}+L_{\textit{ILO}_{4}}$

where $L_{\textit{ILO}}$ indicates the Less Influencing Objects, ‘ $i$ ’ ranging between 1 and 4. And $\textit{CS}_{i}$ represents the cognitive skill of type $i$ . With this formulated objective functions, the process commences to update the swarm with the solutions and thereby, in each iteration, the best solutions are updated as gbest. The process is repeated until the global optimum solution is reached. The adopted steps are provided as follows:

Particle swarm optimizationInitialize the particle and their velocities Stopping criterion is reached

$i\leftarrow 1$ $S$ Update the personal best position

( $f(x_{i})<f(p_{i})$ ) $p_{i}=x_{i}$ ; Update the global best position $G=p_{i}$

$i\leftarrow 2$ $S$ ( $f(p_{i})<f(G)$ ) $G=p_{i}$ ;

$i\leftarrow 2$ $S$ Update the particle velocity $v_{i}$ using Eq. (4) Update the particle position $x_{i}$ using Eq. (3) Eqs (2) and (1)

The optimum value obtained for each of the cognitive skill is henceforth considered as the centroid for the cognitive skills around which the clustering needs to be performed. Having found out the centroid that is optimum for each cognitive skill, traditional clustering process is carried out through K-Means(KM) and Fuzzy C-Means(FCM) clustering techniques.

4.2 K-means clustering and fuzzy C-means clustering

Consider a given data set $X=\{x_{1},x_{2},\dots,x_{n}\}$ , and $V=\{v_{1},v_{2},\dots,v_{c}\}$ be the set of centers of clusters in $X$ dataset in $p$ dimensional space $\mathbb{R}^{p}$ . Where $n$ represents number of objects, $p$ represents number of features, and $c$ represents the number of partitions or clusters. Usually, the clusters are considered by their member objects and its centers. Typically centroids are used as the centers of clusters. The centroid of each cluster is the point to which the sum of distances from all objects in that cluster is minimized. By using a partitioning clustering procedure, $X$ is partitioned into c clusters with a motive of obtaining low distance within-cluster and high distance between-cluster heterogeneity. Depending on research domains, dataset $X$ is formed with data points that are the representations of objects which can be individuals, observations, cases, requirements, pixels etc. In this case, it is the weighted, normalized learning objects that may be clustered to evolve the similar ones. According to K-means procedure, the squared error function,

$\displaystyle J_{\textit{KM}}(X;V)=\sum_{i=1}^{c}\sum_{j=1}^{n}D_{ij}^{2}$ (5)

is minimized. where $D_{ij}^{2}$ is the distance measure which is obtained from Euclidean norm: $\left\|x_{ij}-v_{i}\right\|^{2}$ , $1\leqslant i\leqslant c,1\leqslant j\leqslant n_{i}$ . Where $n_{i}$ represents the number of data points in $i^{\rm th}$ cluster. Considering these facts, the K-means process is enlisted as given.

Centroids of c clusters are chosen from $X$ randomly.

Distances between data points and cluster centroids are calculated.

Each data point is assigned to the cluster whose centroid is closest to it.

Cluster centroids are updated by using the expression: $v_{i}=\sum_{i=1}^{n_{i}}x_{ij}/n_{i};1\leqslant i\leqslant c$ .

Distances from the updated cluster centroids are recalculated.

If no data point is assigned to a new cluster the run of algorithm is stopped, otherwise the steps from 3 to 5 are repeated for probable movements of data points between the clusters.

Similarly, Fuzzy C-Means focuses on the minimization of the objective function,

$\displaystyle J_{\textit{FCM}}(X;U,V)=\sum_{i=1}^{c}\sum_{j=1}^{n}u_{ij}^{m}D_% {ijA}^{2}$ (6)

This function differs from classical K-Means with the use of weighted squared errors instead of using squared errors only. ‘ $U$ ’ is a fuzzy partition matrix that is computed from dataset $X$ : $U=[u_{ij}]\in M_{\textit{FCM}}$ . The fuzzy clustering of $X$ is denoted with $U$ membership matrix of size $c\times n$ . The element $u_{ij}$ is the membership value of $i^{\rm th}$ object to $j^{\rm th}$ cluster. ‘ $V$ ’ is a prototype vector of centroids, $V=\{v_{1},v_{2},\ldots,v_{c}\},v_{i}\in\mathbb{R}^{p}\$ . $D_{ijA}^{2}$ is the distances between $i^{\rm th}$ features vector and the centroid of $j^{\rm th}$ cluster. They are computed as a squared inner-product distance norm as shown

$\displaystyle D_{ijA}^{2}=\left\|x_{j}-v_{i}\right\|_{A}^{2}=(x_{j}-v_{i})^{T}% A(x_{j}-v_{i})$

FCM is an iterative process and stops when the number of iterations is reached to maximum, or when the difference between two consecutive values of objective function is less than a predefined convergence value ( $\epsilon$ ). The steps involved in FCM are:

Initialize $U^{(0)}$ membership matrix randomly.

Calculate prototype vectors:

$\displaystyle v_{i}=\frac{\sum_{j=1}^{n}u_{ij}^{m}x_{j}}{\sum_{j=1}^{n}u_{ij}^% {m}};1\leqslant i\leqslant c$

Calculate membership values with:

$\displaystyle u_{ij}=\frac{1}{\sum_{k=1}^{c}(D_{ijA}/D_{kjA})^{2/(m-1)}};1% \leqslant i\leqslant c,1\leqslant j\leqslant n$

Compare $U^{(t+1)}$ with $U^{(t)}$ , where $t$ is the iteration number.

If $\left\|U^{(t+1)}-U^{(t)}\right\|<\epsilon$ then stop else return to the step 2.

According to the process carried out the feature vectors are subjected for clustering by applying these standard clustering techniques. The optimum value as obtained from the PSO for each of the cognitive skill is provided as centroids for further processing. The hard clustering process through K-means assigns each CS designated LM to exactly one cluster while FCM tries to accommodate LM to clusters with varying degrees of membership. Hence, when dataset is accumulated with noises or outliers, the soft clustering FCM performs differently as it provides overlapping clusters and K-means is appearing to be good for exclusive clustering.

With the identified credentials of hard and soft clustering, the process is carried out to cluster the LMs in accordance with the dominant cognitive skill exhibited through the LOs embedded in the LM. The evaluation of the clustering process performed is done through precision and recall measures.

4.3 Evaluation of the suggested clustering techniques

In order to evaluate the effectiveness of the clustering methods towards categorizing the Learning Objects based on cognitive criteria, experiments were carried out. The experiments were restricted to Learning Objects of one specific course namely operating system. Two different topics such as process management and memory management were initially considered. Around 238 materials for process management and 156 materials for memory management has been collected for experimentation. Presently, learning materials were limited to presentation slides due to its adequate availability. Each of the documents is subjected to an expert guided, manual annotation process. Subsequently, the annotated learning objects are assigned with a prescribed weightages. Earlier, the influence of these LOs towards the cognitive skill was established through ANOVA process. Accordingly, the learning objects influencing the cognitive skill memory are assigned with the lesser weightages while the LOs associated with the logical thinking is given greater weightages as shown in Table 2. Once the annotation process is completed for a document, the aggregated values for each LOs is analyzed and the dominant CS is determined accordingly.

Table 2
Weights for influencing LO’s

Influencing LO’s	Weights
Text	0.08
Audio	0.10
Video	0.12
Diagram	0.15
Chart	0.16
Case study	0.18
Simulation	0.22

In addition to the dominant cognitive skill of each document, a feature vector is also constructed to echo the impact of each of these learning objects in a document. Thus, ‘ $N$ ’ documents would culminate to ‘ $n$ ’ feature vectors which may be subjected to the traditional clustering process for grouping. To commence with the process of grouping, two of the traditional methodologies such as K-means and Fuzzy C-means have been attempted. As these methods have demonstrated significant accuracy in different domains, initially it is considered for the grouping. However, it is observed that the prediction of ‘ $k$ ’ values in K-means is not standardized. Additionally, the process bashes to optimize a global goal but attains a local optimum.

In the case of fuzzy C-means, it relies more on the wide clusters than the narrow clusters and hence, the real grouping is envisioned properly. Thus, to ensure that the grouping obtained on global goal, it is decided to integrate these methods with nature inspired techniques, as these methods ensure the best solution. To demonstrate the same, the particle swarm optimization is performed to find the optimum centroid around which the clusters could be formed. Having applied K-means, Fuzzy C-means with and without PSO, it is noted that the accuracy rate is significantly improved in the method that used PSO for centroid determination. As the task objective is to group the documents based on the dominant cognitive skill exposed through document, the measures such as precision, recall, and accuracy are considered for evaluation. Considering the process management course contents of operating systems, the performance is evaluated. The outcome for the performance of the K-means clustering with and without PSO is shown in the Table 3.

Table 3

Process management – K-means clustering

	Precision		Recall		Accuracy
	KM	KM-PSO	KM	KM-PSO	KM	KM-PSO
Memory	0.73	0.90	0.44	0.50	0.83	0.87
Concentration	0.67	0.84	0.57	0.76	0.81	0.90
Preception	0.56	0.71	0.65	0.87	0.74	0.86
Logicial thinking	0.32	0.57	0.44	0.75	0.75	0.83
Average performance	0.57	0.76	0.53	0.72	0.77	0.87

From the outcome obtained, it is noted that the average performance of the K-Means method was found to have a precision of 0.57 and recall of 0.53 with an accuracy of 0.77. While K-means integrated with PSO turned out to have 0.76 of precision, 0.72 of recall with an improved accuracy of 0.87. From this result it is evident that K-Means with PSO performs better, similarly FCM clustering is also attempted.

Table 4

Process management – fuzzy C-means clustering

	Precision		Recall		Accuracy
	FCM	FCM-PSO	FCM	FCM-PSO	FCM	FCM-PSO
Memory	0.47	0.67	0.50	0.66	0.76	0.83
Concentration	0.43	0.76	0.29	0.66	0.71	0.85
Perception	0.32	0.64	0.30	0.69	0.60	0.79
Logicial thinking	0.17	0.48	0.25	0.77	0.60	0.78
Average performance	0.35	0.64	0.34	0.69	0.67	0.81

It is found from the Table 4, the average performance of FCM is found to be 0.35, 0.34 and 0.67 in terms of precision, recall and accuracy. On the other hand FCM with PSO showed some improvements in the measures with values of 0.64, 0.69 and 0.81 respectively.

It is found that the average performance of FCM with PSO is observed to be higher than the FCM. However, the outcome does not show a convincing result as the FCM calculates the likelihood of a document in to the class. Hence, it is seen that the result is very much oriented towards the cognitive skill – concentration for precision, accuracy and the clustering with respect to other CS does not seem to be uniform. Yet, the performance of FCM with PSO has shown significant improvement. The performance is depicted through the graph shown in Table 5.

Table 5

Process management precision values

Figure 3.

Comparison of recall values.

Table 6

Memory management – performance measures

	Precision		Recall		Accuracy
	KM	KMPSO	KM	KMPSO	KM	KMPSO
Memory	0.55	0.67	0.85	0.92	0.84	0.90
Concentration	0.60	0.64	0.38	0.44	0.79	0.81
Perception	0.81	0.82	0.59	0.82	0.82	0.88
Logical thinking	0.33	0.69	0.44	0.69	0.66	0.85
Average performance	0.57	0.70	0.56	0.72	0.78	0.86
	FCM	FCMPSO	FCM	FCMPSO	FCM	FCMPSO
Memory	0.37	0.60	0.54	0.92	0.73	0.87
Concentration	0.29	0.59	0.38	0.63	0.63	0.81
Perception	0.27	0.71	0.18	0.45	0.57	0.76
Logical thinking	0.42	0.50	0.31	0.50	0.73	0.76
Average performance	0.33	0.60	0.35	0.63	0.66	0.80

Figure 4.

Comparison of accuracy values.

Figure 5.

Comparison of average accuracy values.

Table 7

Memory management performance values

Figure 6.

Comparison of F1-score.

Figure 7.

Comparison of F1-score.

Considering the Recall rate as depicted in Fig. 3, it appears, the methodology performs well with PSO (on an average of 0.72 for K-Means and 0.695 for FCM) than without PSO (average of 0.525 for K-Means and 0.335 for FCM). Hence it is concluded that the inclusion of PSO for centroid identification does make significant improvement in the performance. To substantiate the claim, the accuracy of all the four methods has been shown in Fig. 4. By comparing the average performance, it is observed that the method integrated with PSO performs in a better way as shown in the Fig. 5.

Similarly, for memory management documents of operating system, the process is repeated. The outcome obtained is as shown in the Table 6 and the performance graphs are shown in the Table 7. It is observed that using K-Means approach has a precision of 0.57 and recall of 0.56 with an accuracy of 0.78, while K-means integrated with PSO turned out to have 0.70 of precision, 0.72 of recall and with an accuracy of 0.86. It is also found that for FCM the precision, recall and accuracy values were 0.33, 0.35 and 0.6 respectively, while FCM with PSO had the measures with values 0.60, 0.63 and 0.80.

According to the outcome obtained, the average performance with respect to the learning material titled memory management is observed to be better for the method integrated with PSO as shown in the graph. From the results it is clear that the method with PSO performs better irrespective of KM and FCM. Within the two methods KM and FCM, KM with PSO has shown better performance for the considered dataset. The reason for the better performance is that each subset share a significant proportion of common traits which becomes a favorable fact for the LO clustering towards dominant CS. On the other hand, FCM has an advantage of fast convergence, yet it expects low membership degree for outlier. In this case, as most of the LOs have significant contributions at different levels towards CS, it is likelihood that the presence of an LO would contribute for more than one CS. Hence, there is a chance for the increased misclassification rate, due to which the accuracy of the clustering is affected. Additionally, according to the F1 score of these methods, it is seen that the K-means with PSO demonstrates substantially.

From the graph, it is observed that the F1 score which is obtained as a balance between precision and recall is greater in the KM-PSO than the other options. Only in the case of concentration of memory management CM, there is a minor deviation where FCM-PSO showed a better performance. The deviation could be due to the lack of learning objects in the sample course materials. However, as the difference between KM-PSO and FCM-PSO is trivial, the deviation can be ignored. Therefore, from the experimentation, it is concluded that for the clustering of learning objects having considerable amount of heterogeneity, K-means with PSO would be very much preferable.

5. Conclusion

The study was aimed to design a methodology that would help in the process of segregating bulk course materials in the order of cognitive skills. The process attributes for two different aspects such as effective storage and learner centric course materials. Towards this objective, experimentations were carried out by considering around 400 documents containing LOs for process and memory management of Operating system course. Popular methods such as KM and FCM were attempted. As the performance was not resounding, a hybrid methodology including PSO as means of finding centroid for clustering is attempted. The endeavored approach demonstrated an owing performance when compared to the methods without PSO. Also it is noted that the KM clustering was found to be effective than FCM. However the approach may have deviated functionalities when all the Learning Objects are not be present in the Learning Material and sufficient document for all types of Learning Objects are not trained adequately. With these findings, a storage strategy for distributed E-learning is framed. The clustering of the course materials with respect to the dominant cognitive skill intended to streamline the distributed storage strategy which is expected to yield improved throughput. Additionally, the approach ensures the availability of course materials in accordance with the learning capability of the individuals. On the whole, the suggested approach demonstrates a decent performance towards distributed e-learning system.

References

Abbas

O.A.

, Comparisons between data clustering algorithms, International Arab Journal of Information Technology (IAJIT) 5(3) (2008).

Anitha

M.D.

and Deisy

, A novel approach for selection of learning objects for personalized delivery of e-learning content, Computer Science & Information Technology, 2013, pp. 413–420.

Balakrishnan

and Gan

C.L.

, Students’ learning styles and their effects on the use of social media technology for learning, Telematics and Informatics 33(3) (2016), 808–821.

Balasubramanian

and Anouncia

S.M.

, Learning style detection based on cognitive skills to support adaptive learning environment – a reinforcement approach, Ain Shams Engineering Journal (2016).

Brusilovsky

, A component-based distributed architecture for adaptive webbased education, in: Artificial Intelligence in Education, 2003, pp. 386–388.

Chaffai

Hassouni

and Anoun

, Real-time analysis of students activities on an e-learning platform based on apache spark, (IJACSA) International Journal of Advanced Computer Science And Applications 8(7) (2017), 101–109.

Chan

C.-L.

and Chen

C.-L.

, A cautious pso with conditional random, Expert Systems with Applications 42(8) (2015), 4120–4125.

Chatterjee

and Siarry

, Nonlinear inertia weight variation for dynamic adaptation in particle swarm optimization, Computers & Operations Research 33(3) (2006), 859–871.

Clerc

, The swarm and the queen: towards a deterministic and adaptive particle swarm optimization, in: Evolutionary Computation 1999, CEC 99 Proceedings of The 1999 Congress on IEEE 3 1999, 1951–1957.

10.

Convertini

V.N.

Albanese

Marengo

Marenao

and Scalera

, The osel taxonomy for the classification of learning objects, Interdisciplinary Journal of Knowledge and Learning Objects, 2(2006) (2006), 125–138.

11.

Cuéllar

M.P.

Delgado

and Pegalajar

, Improving learning management through semantic web and social networks in e-learning environments, Expert Systems with Applications 38(4) (2011), 4181–4189.

12.

Dascalu

M.-I.

Bodea

C.-N.

Lytras

De Pablos

P.O.

and Burlacu

, Improving e-learning communities through optimal composition of multidisciplinary learning groups, Computers in Human Behavior 30 (2014), 362–371.

13.

Eberhart

and Kennedy

, A new optimizer using particle swarm theory, in: Micro Machine and Human Science 1995, MHS’95, Proceedings of the Sixth International Symposium on, IEEE, 1995, pp. 39–43.

14.

Huang

T.-C.

Huang

Y.-M.

and Cheng

S.-C.

, Automatic and interactive e-learning auxiliary material generation utilizing particle swarm optimization, Expert Systems with Applications 35(4) (2008), 2113–2122.

15.

Jabr

M.A.

and Al-omari

H.K.

, E-learning management system using service oriented architecture 1, Journal of Computer Science 6(3) (2010).

16.

Jajuga

Sokolowski

and Bock

H.-H.

, Classification, Clustering, and Data Analysis: Recent Advances and Applications. Springer Science & Business Media, 2012.

17.

Lau

and Tsui

, Knowledge management perspective on e-learning effectiveness, Knowledge-Based Systems 22(4) (2009), 324–325.

18.

Lei

Qiu

and He

, A new adaptive well-chosen inertia weight strategy to automatically harmonize global and local search ability in particle swarm optimization, in: Systems and Control in Aerospace and Astronautics 2006, ISSCAA 2006 1st International Symposium on, IEEE, 2006, 4.

19.

and Bai

, Generalized radial basis function neural network based on an improved dynamic particle swarm optimization and adaboost algorithm, Neurocomputing 152 (2015), 305–315.

20.

Magdalena

I.N.

, The use of distributed databases in e-learning systems, Procedia-Social and Behavioral Sciences 15 (2011), 2673–2677.

21.

Merrill

M.D.

and Jones

M.K.

, Second generation instructional design (id2), Educational Technology 30(2) (1990), 7–14.

22.

Mustacoglu

A.F.

and Fox

G.C.

, A novel digital information service for federating distributed digital entities, Information Systems 55 (2016), 20–36.

23.

Mustaro

P.N.

and Silveira

I.F.

, Learning objects: Adaptive retrieval through learning styles, Interdisciplinary Journal of Knowledge and Learning Objects 2 (2006), 35–46.

24.

Navarro

S.M.B.

Graf

Fabregat

and Méndez

N.D.D.

, Searching for and positioning of contextualized learning objects, The International Review of Research in Open and Distributed Learning 13(5) (2012), 76–101.

25.

Navarro-Estepa

Á.

Xhafa

and Caballé

, A p2p replication-aware approach for content distribution in e-learning systems, in: Complex, Intelligent and Software Intensive Systems (CISIS) 2012 Sixth International Conference on, IEEE, 2012, pp. 917–922.

26.

Nicoleta-Magdalena

I.C.

, Distributed queries in the e-learning environment, Procedia-Social and Behavioral Sciences 28 (2011), 241–245.

27.

Nicoleta Magdalena

I.C.

, The replication technology in e-learning systems, Procedia-Social and Behavioral Sciences 28 (2011), 231–235.

28.

Ochoa

and Duval

, Quantitative analysis of learning object repositories, IEEE Transactions on Learning Technologies 2(3) (2009), 226–238.

29.

Otón

Ortiz

Hilera

J.R.

Barchino

Gutiérrez

J.M.

Martínez

J.J.

Gutiérrez

J.A.

de Marcos

and Jiménez

, Service oriented architecture for the implementation of distributed repositories of learning objects, International Journal of Innovative Computing, Information & Control (2010).

30.

Parnell

and Procter

, Flexibility and place making for autonomy in learning, Educational and Child Psychology 28(1) (2011), 77.

31.

Prakash

L.S.

Saini

D.K.

and Kutti

N.S.

, Integrating edulearn learning content management system (lcms) with cooperating learning object repositories (lors) in a peer to peer (p2p) architectural framework, ACM SIGSOFT Software Engineering Notes 34(3) (2009), 1–7.

32.

Ritzhaupt

, Learning object systems and strategy: A description and discussion, Interdisciplinary Journal of e-learning and Learning Objects 6(1) (2010), 217–238.

33.

Shi

and Eberhart

R.C.

, Empirical study of particle swarm optimization, in: Evolutionary Computation 1999 CEC 99 Proceedings of The 1999 Congress on IEEE 3 1999, pp. 1945–1950.

34.

Shimomura

Ikeda

and Montanaro

, Distribution and sharing of e-learning courseware by weighted keywords, International Journal of Information and Education Technology 5(9) (2015), 698.

35.

Shorfuzzaman

Alelaiwi

Masud

Hassan

M.M.

and Hossain

M.S.

, Usability of a cloud-based collaborative learning framework to improve learners’ experience, Computers in Human Behavior 51 (2015), 967–976.

36.

Stojanov

Ganchev

Popchev

O’Droma

and Venkov

, Delc-distributed elearning center, in: 1st Balkan Conference on Informatics (BCI’2003), Thessaloniki, Greece, 2003.