Abstract
How to recommend learning resources to users accurately to meet the individual needs of users becomes the key issue with the increasing number of online education users. A personalized recommendation system was proposed in this paper based on user preference behavior data analysis to analyze the online education recommendation model. It determines the criteria set of the recommendation system with the product attribute mining method, and then uses the personalized recommendation algorithm for user preference modeling to explore the user’s preference for each criterion, thereby producing more accurate recommendations. The simulation results of the algorithm proposed in this paper show that the multi-criteria recommendation algorithm using user distance similarity works best. Using this personalized recommendation algorithm based on user preference can effectively improve the recommendation quality.
Introduction
Nowadays, digital and mobile online learning has become a new way for people to receive education with the rapid development of Internet technology, the large-scale popularization of smart terminal devices such as smart phones and tablet PCs, and mobile network resources such as 4G being no longer scarce. The online education service represented by the online learning platform is reshaping traditional learning habits, enabling people to conduct fragmented and networked learning anytime, anywhere. Compared with the traditional education mode, online education has the advantages of fragmentation of learning time, unlimited learning places, strong content targeting, high efficiency of online interaction, and repeatable learning. It has played an important role in the transformation of learning style from the traditional “passive” classroom teaching mode to “interactive” online (Ren X.Y. et al. 2017) [1]. Through the online education platform, users can break through the time and space restrictions of the traditional teaching mode, select content of interest at any time and place for targeted, fragmented learning, and the learning method becomes more flexible and controllable (Tang F. et al. 2016) [2]. At the same time, online education can provide online communication and learning services for learners by the convenience of network transmission and the strong interactive nature of web2.0, thus achieving multi-dimensional interaction between learners and teachers, learners and learners, which helps learners better understand the content of the course and solve problems encountered in the learning process. More and more users are beginning to use the online education platform for learning (Lei M. et al. 2016) [3]. For the online learning education platform, how to accurately recommend products that meet their needs to users is very important, which affects the satisfaction of user experience directly. Therefore, data analysis of user behavior is very necessary.
At present, there are few researches on personalized recommendation for online education, but more and more online education platforms have applied recommendation techniques to discover user characteristics and purchase patterns, and to conduct course recommendation in a targeted manner. There are not many examples of introducing personalized recommendations in online education at home and abroad. In the experiment, there is no suitable data set, and there are many research difficulties. Therefore, the product attribute mining method is used to determine the criteria set of the recommendation system, and then the personalized recommendation algorithm is used to build a recommendation model based on the online education user behavior preference to help users filter information and make decisions. It uses software to identify the most relevant items from a large number of resources to explore the user’s preferences for each criterion, resulting in more accurate recommendations.
The research in this paper is mainly divided into three parts. The first part is an overview of the user behavior research of the Internet online education platform. The recommendation model and algorithm based on user behavior data analysis are proposed to study the realization of the online education recommendation system function. The second part builds a recommendation model based on online education user behavior preference, and designs the online education user recommendation system. The third part carries on the algorithm simulation and evaluation. The system verified that the personalized recommendation algorithm based on user preference can improve the recommendation quality effectively.
State of the art
Internet online education platforms have been increasingly favored by users in recent years, and many scholars have conducted a lot of related research on it. Adeniyi D. A. takes the TAM model as an influential model for the use of college students’ online learning platform. The study found that the TAM model can effectively explain the willingness of college students to use the online learning platform (Adeniyi D. A. et al. 2016) [4]. It is found that the positive influence of performance expectation, social influence, hard work expectation and contributing factors on the intention of college students’ online course learning is significant in the adoption and utilization of the theoretical model, when Ding Z. studies the influencing factors of college students’ online course learning behavior, through empirical analysis (Ding Z. et al. 2017) [5]. As scholars continue to deepen their research in online education, more and more learning begins to analyze their user behavior and recommendation models. Xin M. uses the relevant tools of data mining to mine the user behavior. The results are validated and validated by practice, which lays a good foundation for further construction of more data sources and user data analysis system with larger data volume (Xin M. et al. 2017) [6]. Kim E. proposed a recommendation algorithm for user behavior, and implemented the classical KNN collaborative filtering model and the implicit semantic factorization model based on stochastic gradient descent (Kim E. et al. 2016) [7]. Lei X. proposed a mining method based on user behavior data. The research found that the algorithm combines the advantages of matrix decomposition technology and multi-task learning technology, and is suitable for processing massive user behavior data (Lei X. et al. 2018) [8]. Liu H. analyzes the log data of users browsing the webpage, compares and analyzes through simulation experiments, and passes relevant tests. The research shows that the method is feasible (Liu H. 2017) [9]. Choudhary D. takes the commercial online education platform as the research object, and constructs the influencing factor model of the online education platform user course payment willingness, and can verify the reliability of the model through experiments (Choudhary D. et al. 2016) [10]. In summary, the recommendation model and algorithm based on user behavior data analysis can be used as an effective means to solve information overload in the era of big data, which can help users to find resources that meet their interest preferences from massive data, which is conducive to the development of online education.
Methodology
Construction of recommendation model based on online education user behavior preference
The association rule method is used in the product attribute mining method to mine the project criteria set, which is a commonly used method in product attribute mining. Association rules use confidence level and support level to describe the relationship between elements. The premise is to segment the relevant sentences of the project and extract the nouns and noun phrases in product attribute mining. The nouns and noun phrases of confidence and support on calculating, the definition threshold, and the words exceeding the threshold are added to the candidate frequent attribute set; Then it uses proximity rule pruning, independent support pruning, and regular word filtering pruning to remove meaningless words and redundant words and finally filters out the set of criteria. The specific process of the algorithm is as follows: The rules for constructing association rules transaction files are mostly nouns and noun phrases, so it is necessary to segment and label the comment data and remove non-nouns. The Chinese word segmentation uses the ICTCLAS tool of the Institute of Technology of the Chinese Academy of Sciences for Chinese word segmentation. The part-of-speech tagging also uses ICTCLAS to complete the part-of-speech tagging. The non-noun part is deleted to build the association rule transaction file. The text file after the first three steps are processed is stored in the association rule transaction database and becomes the association rule transaction file. Each sentence in the association rule transaction file is a row, called a transaction, and the noun is an item. The frequent item sets are extracted to define the minimum support and the frequent item sets dimensions (here, the minimum support is 1%, and the frequent itemsets are 2). The algorithm of extracting frequent item sets in the Apriori algorithm is applied. The algorithm flow chart is shown in Fig. 1.

Flow chart of algorithm for extracting frequent item sets.

MAE values of the four algorithms.

RMSE values of the four algorithms.

Accuracy comparison of four algorithms.
Pruning treatment. The criterion set A obtained from the above two steps needs further processing to remove some redundant vocabulary, including three methods of proximity rule pruning, regular word filtering pruning, and independent support pruning. Proximity rule pruning. In the set of guidelines, there may be such a multi-dimensional item set, such as “product, mobile phone”, etc. These words are not attributes of the product itself, because their children are frequent items or frequent itemsets, so they are included in the set of criteria. Usually, words that are farther apart in a sentence are discussing different topics. For example, “products, mobile phones” appear in the sentence a lot, the support and confidence are very high, but obviously “products, mobile phones” is not a set of criteria that can be evaluated. Therefore, it is necessary to use adjacent rule pruning to remove such non-neighboring frequent itemsets without semantic relations. The steps are as follows: (a) For the frequent item set m, m contains n nouns (1 ≤ 2 ≤ n), and the sentence after the pre-processing is scanned; (b) If the vocabulary in m appears in sentence a, record the position w1,w2, . . . , w n of the vocabulary in sentence a, perform step (c), otherwise perform step (a); (c) If the distance between w i and wi+1 is less than or equal to two words, it is considered that in sentence a, the frequent item set m is a neighboring word, performing (d), otherwise executing (a); (d) The proximity support of frequent itemsets m is increased by one; (e) If m is contiguous in at least t sentences, add m to criterion set C1. Where t=(number of sentences×minimum support) is rounded up. (2) Independent support pruning. In the processed set C1, the non-neighboring frequent itemsets without semantic relations are deleted, but there are also some one-dimensional sets. For example, the “effect” of a movie, this attribute does not describe the attributes of the item very accurately, because the “effect” may be the “animation effect” of the movie, or it may be the “martial effect”. Therefore, if the “effect” is not combined with other vocabulary, it cannot express explicit attributes, which will cause ambiguity, causing comprehension problems in user evaluation, resulting in inaccurate recommendation. It is therefore necessary to use independent support pruning to remove such words that are usually found along with other vocabulary. The steps are as follows: For the one-dimensional frequent itemsets, the number of occurrences of the one-dimensional frequent item sets a is calculated; the number of occurrences of the two-dimensional frequent item sets containing the one-dimensional frequent item sets is calculated; The calculation (a-b) is the independent support of the one-dimensional frequent item sets. If the independent support of the one-dimensional frequent item sets is not less than the minimum support degree, the set is added to the criterion set A. Regular word filtering pruning.
There are many commodity pronouns, model words, or personal pronouns related to users in the evaluation due to the colloquiality and randomness of user evaluation. So you need to use regular word filter pruning to remove this regular term with a high frequency of occurrence. Words that are regular words but not project attributes mainly include: words indicating the model number of the product, brand, such as “Apple”, etc.; Common spoken nouns, such as “brands”, etc.; common people are called nouns, such as “classmates” and “mothers”. It is necessary to consider the user’s emphasis on each criterion in the recommendation system, that is, consider the user’s preference for each criterion, since different users tend to pay more attention to the criteria. For example, some users prefer a movie with a good storyline in a movie website, and some users prefer a movie with good visual effects, which is less important to the storyline. In this paper, the feature space of the user model is represented as the set of criteria for the project, the keywords are the individual criteria, and the feature space is represented as C ={ C1, C2, . . . , C
k
}. User u’s user preference model is
Where u (I) is the total utility of item I and u
i
(r
i
) is the marginal utility function of criterion C
i
; For the score r
i
of criterion C
i
, there are
Where j = 1, 2, . . . , a
i
i = 1, 2, . . . , k, r
i
Belongs to a certain interval
Each user can be represented as a weight vector A by user preference modeling
User preference matrix
Although the collaborative filtering algorithm has been applied in many systems successfully, it still has a serious problem - scalability problem. The scalability problem means that the amount of data in the system will increase rapidly as the system develops, and the amount of computation of the system in identifying neighbor users will increase exponentially. One way to improve the scalability problem is to group users with similar preferences and then identify neighbor users within the group instead of searching the entire user group. This method is called cluster-based collaborative filtering algorithm. Scholars have proposed a variety of cluster-based collaborative filtering algorithms. K-means clustering algorithm is used in this paper to cluster users. The user preference clustering algorithm flow in this paper is as follows: (1) Enter the user preference matrix and randomly select n users as the initial user cluster center c
j
= (cj1, cj2, . . , c
jn
),
This formula guarantees that the similarity will approach 0 when the distance between users increases. When the two users score the same for all common items (ie, the distance is 0), the similarity is 1.
Algorithm simulation and evaluation
A user preference model is established by identifying user preferences based on user multi-criteria scores. Each user is represented as a user’s preference weight
Partial data of user preference matrix
Partial data of user preference matrix
The user model established by the user preference modeling step clusters users with similar preferences. Experiments were respectively k = 0 (no clustering), k = 10, k = 20, k = 30, k = 40, and k = 50. The experimental results are shown below. The user similarity algorithm in the traditional user-based collaborative filtering algorithm and the three improved algorithms for multi-criteria scoring mentioned above are tested, including the average similarity algorithm, the minimum similarity algorithm and the user distance similarity algorithm. Average similarity method (sim-avg): The user similarity matrix calculated by the average similarity algorithm is shown in Table 3 (k = 0).
User similarity matrix (sim-avg)
The user similarity matrix calculated by the minimum similarity algorithm is shown in Table 4 (k = 0).
User similarity matrix (sim-min)
The user distance similarity method is based on the user multi-dimensional distance calculation formula: First, calculate the distance between two users for the same item n, which has:
Here r
i
is the rating of user i for an item, including an overall rating and k multi-criteria scores, i.e.
Four algorithms were tested, including the traditional user-based collaborative filtering algorithm (user-CF) and three improved collaborative filtering algorithms, including multi-criteria recommendation algorithm (MC-CF-min) using minimum similarity, multi-criteria recommendation algorithm using average similarity (MC-CF-avg), and multi-criteria recommendation algorithm using user distance similarity (MC-CF- Dis). The performance of these four algorithms is evaluated. The evaluation indicators include mean absolute error (MAE), root mean square error (RMSE), accuracy, recall and F-value.
In the experiment process, the algorithm of user preference clustering can effectively reduce the running time of the algorithm, indicating that the user preference clustering can effectively alleviate the scalability problem of the recommendation algorithm. It can be seen from the experimental results that as the number of clusters increases within a certain range, the MAE value and RMSE value of the four algorithms gradually decrease. Compared with the traditional user-based collaborative filtering algorithm (user-CF), there is a small decrease in MAE and RMSE values of the minimum similarity multi-criteria recommendation algorithm (MC-CF-min) and the average similarity multi-criteria recommendation algorithm (MC-CF-avg). The MAE and RMSE values of the multi-criteria recommendation algorithm (MC-CF-dis) using user distance similarity show a large decrease, indicating that the error of the user distance similarity algorithm is the smallest in the similarity algorithm. Therefore, the use of such a personalized recommendation algorithm based on user preferences in the recommendation system can improve the recommendation quality effectively.

Recall rate comparison of four algorithms.

F-value comparison of four algorithms.
It can be seen from the experimental results that as the number of clusters increases within a certain range, the accuracy of the four algorithms gradually increases. Compared with the traditional user-based collaborative filtering algorithm (user-CF), the accuracy rate of the minimum similarity multi-criteria recommendation algorithm (MC-CF-min) and the average similarity multi-criteria recommendation algorithm (MC-CF-avg) is basically the same as user-CF or has a slight increase. The accuracy of the multi-criteria recommendation algorithm (MC-CF-dis) using user distance similarity is greatly improved. It is shown that in the similarity algorithm, the user distance similarity algorithm has the highest accuracy. Therefore, using the personalized recommendation algorithm based on user preference in the recommendation system can effectively improve the recommendation quality. The recall rate of the four algorithms are calculated, and the results are shown in Fig. 5. It can be seen from the experimental results that as the number of clusters increases within a certain range, the recall rate of the four algorithms gradually increases. Compared with the traditional user-based collaborative filtering algorithm (user-CF), the recall rate of the minimum similarity multi-criteria recommendation algorithm (MC-CF-min) and the average similarity multi-criteria recommendation algorithm (MC-CF-avg) is basically the same as user-CF or has a slight increase. The recall rate of the multi-criteria recommendation algorithm (MC-CF-dis) using user distance similarity is greatly improved. It is shown that in the similarity algorithm, the user distance similarity algorithm has the highest recall rate. Therefore, using the personalized recommendation algorithm based on user preference in the recommendation system can effectively improve the recommendation quality.
It can be seen from the experimental results that as the number of clusters increases within a certain range, the F-values of the four algorithms gradually increase. Compared with the traditional user-based collaborative filtering algorithm (user-CF), The F-value of the minimum similarity multi-criteria recommendation algorithm (MC-CF-min) and the average similarity multi-criteria recommendation algorithm (MC-CF-avg) is basically the same as or slightly higher than the user-CF. The F-value of the multi-criteria recommendation algorithm (MC-CF-dis) using user distance similarity shows a large increase. It is shown that in the similarity algorithm, the user distance similarity algorithm has the highest F-value, so using the personalized recommendation algorithm based on user preference in the recommendation system can effectively improve the recommendation quality.
Scholars began to study techniques for different users to display different information according to their preferences in order to solve this problem. The traditional collaborative filtering recommendation algorithm only considers the one-dimensional score of “user-project”, but scholars have found that users often consider the multi-faceted factors of the project when selecting projects. Only considering the whole cannot accurately identify the user’s preferences, so many scholars have tried to incorporate the multi-faceted factors of such user consideration into the recommendation algorithm. This multi-criteria recommendation method based on user preference has emerged. In order to more accurately identify user preferences and improve recommendation quality, a personalized recommendation system is studied in this paper based on user preferences. The main contents of the research include: the k-means algorithm is used for user clustering based on the user’s criteria preference, and users with similar preferences are grouped into one category. This can alleviate the scalability problem of the recommendation algorithm, identify the user’s preference, optimize the similarity algorithm in the traditional collaborative filtering recommendation algorithm, calculate the user similarity, and generate recommendations. Finally, the simulation experiment of the proposed personalized recommendation algorithm based on user preference is carried out. The experimental results show that the personalized recommendation algorithm based on user preference can improve the recommendation quality effectively.
Footnotes
Acknowledgments
This work was supported by Chongqing Big Data Engineering Laboratory for Children, Chongqing Electronics Engineering Technology Research Center for Interactive Learning, Project of Science and Technology Research Program of Chongqing Education Commission of China (No. KJZDK201801601).
