Abstract
Social tagging has revolutionized the social and personal experience of users across numerous web platforms by enabling the organizing, managing, sharing and searching of web data. The extensive amount of information generated by tagging systems can be utilized for recommendation purposes. However, the unregulated creation of social tags by users can produce a great deal of noise and the tags can be unreliable; thus, exploiting them for recommendation is a nontrivial task. In this study, a new recommender system is proposed based on the similarities between user and item profiles. The approach applied is to generate user and item profiles by discovering tag patterns that are frequently generated by users. These tag patterns are categorized into irrelevant patterns and relevant patterns which represent diverse user preferences in terms of likes and dislikes. Furthermore, presented here is a method for translating these tag-based profiles into semantic profiles by determining the underlying meaning(s) of the tags, and mapping them to semantic entities belonging to external knowledge bases. To alleviate the cold start and overspecialization problems, semantic profiles are enriched in two phases: (a) using a semantic spreading mechanism and then (b) inheriting the preferences of similar users. Experiment indicates that this approach not only provides a better representation of user interests, but also achieves a better recommendation result when compared with existing methods. The performance of the proposed recommendation method is investigated in the face of the cold start problem, the results of which confirm that it can indeed remedy the problem for early adopters, hence improving overall recommendation quality.
1. Introduction
Web 2.0 technologies have dramatically transformed the way people collaborate and share information. Owing to a surge in user participation as a result of fierce competition by social networking services (folksonomies, blogs, wikis), it has become increasingly difficult for users to find the most attractive content when overloaded with information [1]. To overcome this problem, recommender systems have emerged as a necessary service to recommend to users items they would probably like the most [2]. These systems rely on data about user interests, preferences and needs for providing proper recommendations. This information, referred to as a user model or user profile, is crucial to the success of recommender systems. In a typical recommender system, user profiles are built based on ratings given by users on a set of items. The rating information maps the user–item pairs on a set of numerical values.
Furthermore, the huge amount of information collected has become a rich resource for investigating, understanding and exploiting knowledge about user characteristics, preferences and needs in order to build accurate user profiles. Collaborative tagging systems are an example of software applications that take advantage of this kind of data. In these systems, users create or upload content (items), annotate it with freely chosen identifiers (tags) and share it with other users. The whole set of tags constitutes an unstructured knowledge classification scheme that is commonly known as a folksonomy [3, 4]. Personal tags can be used as a means of expressing user characteristics, such as their favourite book subjects on LibraryThing, 1 movie preferences on IMDb 2 or music tastes on Last.fm. 3 Therefore, tagging information can be used to create reasonably accurate user profiles in addition to aggregating ratings given by users on a set of items. In the last few years, the exploitation of user tagging information has received much attention in research communities, with the intention of complementing existing recommendation systems [5–7].
The focus of this study is on a new method for generating user and item profiles based on both tagging and rating information. User and item profiles are constructed so that a user profile represents a user’s preferences and an item profile specifies an item’s characteristics. These profiles are initially built and extended based on frequently used tag patterns discovered through pattern mining techniques. In order to represent the users’ diverse preferences in terms of both likes and dislikes, patterns in user profiles are classified under two separate sets: a set of relevant patterns and a set of irrelevant patterns.
Nevertheless, since social tags are generated by users in an uncontrolled manner, they can be noisy and unreliable; thus, utilizing them in the construction of user profiles is a nontrivial task [8, 9]. In response to this challenge, an approach is presented here for finding the underlying meanings of the tags and mapping them to semantic entities belonging to external knowledge bases, namely WordNet and Wikipedia. Additionally, this mapping process exploits ontologies created within the W3C Linking Open Data initiative. In this manner, the tag-based profiles are upgraded to semantic profiles by replacing tags with their corresponding ontological concepts.
The cold start problem and the overspecialization issue are mitigated by enriching semantic profiles with a semantic spreading mechanism and preference sharing between similar users. The cold start problem occurs when a system lacks information about new users to make high-quality recommendations [10]. In this case, users are provided with low-quality recommendations that may lead them to underestimate and eventually exit the system. This necessitates enrichment of user profiles to make high-quality recommendations for new users.
The overspecialization issue deals with a situation where the user profile is limited by the available information about the user; here, it is difficult to recommend fresh items other than the ones experienced by the user [2]. Extending the user profile through its relationship with other similar profiles helps to recommend novel items.
The key contributions of this study can be summarized as follows:
A new approach is described for building user and item semantic profiles based on frequently occurring semantic tag patterns (referred to as topics of interest), where user topics are classified under relevant or irrelevant topics to determine user likes and dislikes.
A social tag processing architecture is presented for filtering user-generated tags and mapping them onto ontology concepts, which helps deal with the noisy and inconsistent nature of social tags. In addition, a new disambiguation method is proposed to address tags that have more than one matching concept.
A new similarity metric, referred to as the topic-based similarity metric, for measuring the similarity between two sets of topics, is discussed.
A method for a two-phase enrichment process is detailed, in order to overcome sparsity in profiles, alleviating both the cold start and overspecialization issues. The first phase involves enriching user and item profiles by a preference-spreading mechanism, while the second phase exploits the topic-based similarity metric to further enrich user profiles through their relations to similar neighbours.
A recommendation system is presented that recommends a ranked list of relevant items according to the topic-based similarity between the target user profile and item profiles. We demonstrate the effectiveness of our approach in terms of improving recommendation quality and dealing with cold start users.
The rest of this paper is organized as follows: in Section 2, recent studies applying social tagging and ontological knowledge-bases to recommender systems are reviewed. In Section 3, a detailed description of the proposed approach is provided in seven subsections. In Section 4, we present the performance of our approach through experimental testing. Finally, conclusions and future work are discussed in Section 5.
2. Related works
A thorough survey on the types of recommender systems is presented in Bobadilla et al. [11]. The two main types of recommender systems are content-based and collaborative filtering. The content-based recommender systems base their recommendations on the types of items preferred by the user in the past. There exists a direct relation between the quality of content-based recommendations and the quality of content description for various items. Decision trees, neural nets and vector-based representations are applied to represent the user profile [12]. The major drawback of content-based systems is in their inability to offer out-of-the-box recommendations because they offer only items similar to ones offered in the past.
The collaborative filtering techniques simulate the word-of-mouth phenomenon whereby the users are interested in the items preferred by similar users. The quality and density of rating information are directly involved in the success of these systems. Unlike content-based systems, collaborative filtering systems do not require complex item content descriptions; collaborative filtering systems are the most popular recommender systems employed in popular e-commerce websites such as Amazon and MovieLens. However, the major drawback of collaborative filtering systems is data sparsity. Since the number of items available in e-commerce sites is high, a typical user can experience and interact with only a small fraction of items. A sparse user profile of this type is able to block the estimation of similar users and lead to lower recommendation quality.
The hybrids of various methodologies are adopted in the most successful recommender systems in order to incorporate the best features. Burke presents the details of various hybridization methods that have been adopted and have the potential to be applied [13]. Moreover, additional data sources, wherever available, may be adopted to enhance the quality of recommendations. An important component of many of the services, under the Web 2.0 label, is social tagging [14]. An overview of the field of social tagging systems that can be used to extend the capabilities of recommender systems is provided in Zhang et al. [15]. In the following, a number of recent studies that take user tagging activity into account within a recommender system are briefly presented.
Gedikli and Jannach investigate the question of how user-provided tagging data can be used to improve the quality of recommender systems [16]. The authors introduce the concept of user- and item-specific tag preferences where the users attach feeling to tags as a powerful means to express in detail which features of an item they particularly like or dislike. They present a new recommendation scheme that can exploit tag preference data to improve recommendation accuracy.
Jin and Chen introduce a new trust-based recommender system in social tagging networks, and design a user–item rating matrix construction method for user browsed or searched information [17]. The authors use tags to compute the similarity between users or items. They propose a Top-K recommender system construction method in the network with trust values computed from users’ interest similarity.
Durao and Dolog present a tag-based recommender system that suggests similar Web pages based on the similarity of their tags from a Web 2.0 tagging application [18]. The authors extend the basic similarity calculus with external factors such as tag popularity, tag representativeness and the affinity between user and tag.
Anand and Mampilli propose an approach to infer the degree of genre presence in a movie by examining the various tags conferred on it by various users [19]. The authors exploit tags to guide the genre degree determination. Moreover, they utilize fuzzy logic techniques to derive gradual representation as well as construct user profiles. The fuzzy user and object representations proposed in their study can be utilized in the design of content-based and collaborative recommender systems. Experimental evaluations establish the effectiveness of their proposed approach as compared with other baselines.
Linked Open Data is applied as a means of enriching content information. Moreover, it is employed to incorporate knowledge regarding the domain and has been proven to improve the effectiveness of recommendations [20]. Mirizzi et al. proposed a Facebook application that semantically recommends movies to the user through leveraging the knowledge within Linked Data and the information elicited from his profile [20]. The authors exploit the power of social knowledge bases (e.g. DBpedia) to detect semantic similarities among movies. These similarities are computed by a Semantic version of the classical Vector Space Model, applied in semantic datasets.
Similar to our proposed approach, in Cantador et al. [21] a methodology is proposed for filtering tags based on their meaning; in fact the filtering method proposed in our study is motivated by the methodology proposed in Cantador et al. [21], where possible misspellings and compound nouns are discovered through the Google ‘did you mean’ mechanism. The authors correlate meaningful tags to the corresponding Wikipedia entries by exploiting WORDNET, Wikipedia and Google.
Cantador et al. present a mechanism to automatically filter and classify raw tags in a set of purpose-oriented categories [22]. The authors identify the underlying meanings (concepts) of the tags through the exploitation of ontologies and assign these concepts to content- and context-based categories. Moreover, they determine subjective and organizational tags based on natural language processing heuristics.
Davoodi and Fatemi use Open Directory Project (ODP) data as external knowledge about web pages in addition to the tagging activities of users in a social book marking site [23]. The authors have designed a content-based recommender system that can recommend the most relevant web pages for each user based on the user’s profile and gathered information about web pages from ODP as implicit data.
3. The proposed approach
A visual representation of the overall process of the recommendation approach in this study is presented in Figure 1. Each one of the numbers in the diagram corresponds to one of the seven steps outlined below.

The overall process.
In step 1, owing to the noisy and inconsistent nature of social tags, they are passed through a filtering process. In the second step, the filtered tags from the previous step are mapped onto ontology concepts. The output of this step is stored in a repository, referred to as the semantic tag repository, which contains filtered tags along with the corresponding mapped concepts.
In step 3, tag-based profiles are built based on the correlation between users, items and tags as well as the rating of users on items. Tag-based user profiles represent user preferences and tag-based item profiles represent item characteristics. In step 4, the tag-based profiles are upgraded to semantic profiles by exploiting the information in the semantic tag repository produced in step 2.
In the semantic enrichment process (step 5), in order to overcome the existence of sparsity in user and item profiles, semantic profiles from the previous step are enriched through a preference spreading mechanism. In step 6, semantic user profiles are further enriched through the neighbours. The basic idea of collaborative enrichment begins from the assumption that the user is likely to prefer similar topics to those discovered for his neighbours. This enrichment process is particularly effective for some users who do not have sufficient topics in their profiles.
Finally, in step 7 items with the most similar profiles to the target user profile are recommended to the target user. In the following subsections, each step will be explained in detail.
3.1. Filtering
Since social tags are generated by users in an uncontrolled manner, they can be noisy and unreliable. For instance, users add tags in singular and plural forms indistinctively (blog, blogs, blogging), misspell, add prepositions, pronouns, adjectives or adverbs to the main tag (to read, magnificent view), use morphological derivations (e.g. synonyms and acronyms), and add compound nouns with different white spaces (e.g. New York, New_York, New-York and NewYork).
Hence, the comparison of social tags will result in misperception and eventually lead to the loss of correlation information. For example, if a number of relevant pages are annotated with the tags (San Francisco, San_Francisco and SanFrancisco), the correlation between these tags and, consequently, the correlation between the related pages, will be lost. For this reason, tags have to be filtered and transferred to a set of equivalent terms.
Here, we present a social tag processing and filtering architecture, which is inspired by previous works [24, 25]. This filtering architecture utilizes the Google 4 ‘did you mean?’ service and an external knowledge base. The use of YAGO2 [26] as an external knowledge base is proposed in this study. YAGO2 is a huge semantic knowledge base – supported by the W3C Linking Open Data (LOD) initiative 5 – derived from Wikipedia, WordNet and GeoNames. As of October 2012, YAGO contains over 10 million entities (such as persons, organizations and cities) and more than 120 million facts about these entities.
The whole filtering process where a set of raw tags is transformed into a set of filtered tags and a set of rejected tags is illustrated in Figure 2. In the Lexical Filter (step 1) a number of filtering operations are applied. Operations begin with the tags that are too small or too large: they are removed. Next the special characters (such as accents, dieresis and the caret symbol) are converted to a base form. Then, the tags containing numbers that are below a certain global tag frequency threshold are rejected. Finally, the common stop-words (such as pronouns, articles, prepositions and conjunctions) are removed. After lexical filtering, tags are passed on to step 2, the spellchecker and compound noun filter. In step 2, if an incoming tag has an exact match in YAGO2, it will be passed on to the next step in order to save further unnecessary processing. Otherwise, the tag will be considered as a possible misspelling or a compound noun. To solve these problems, an efficient algorithm that makes use of the Google ‘did you mean?’ functionality to split compound nouns with no spaces and to correct misspellings is applied here. For further details regarding this algorithm refer to Szomszor et al. [24].

The filtering process.
The final step is the unification process (step 3) where synonyms and morphologically similar tags are reduced to a single representative tag. Here, the ‘hasPreferredMeaning’ fact, provided by YAGO2, is applied to identify pairs of synonyms. In addition, a custom singularization algorithm and the stemming functions provided by the Snowball library 6 are exploited to merge morphologically similar tags to a single tag.
3.2. Mapping social tags onto ontology concepts
After the filtering process, the filtered tags are guaranteed to have at least one exact matching concept in YAGO2; therefore, mapping them to the ontology concepts is a trivial task. However, there are also problems of ambiguity when tags have more than one matching concept. Terms have different meanings, and only one should be chosen for the corresponding social tag. For example, the following concepts and YAGO2 classes are associated with the social tag java:
Java_(town)
Java_Virginia
Java_South_Dakota
Java_New_York
Java_(cigarette)
Java_(chicken)
Java_(band)
Java_(composition)
Java_(programming language)
Java_(software platform)
Java_(board game)
Java_(dance)
…
Here, each tag with the all the possible concepts mapped to it is merely stored in a table, referred to as the semantic tag repository, for future use. In Subsection 3.4., we present a disambiguation technique, where the most appropriate concept for each tag is selected as the mapping concept.
3.3. Tag-based profiles
In this section, we describe our approach to building tag-based user and item profiles that is derived from a set of users, tags, items and their transactions. Before further details are discussed, notations and primary definitions are introduced.
Let
Let
3.3.1. User tag-based profile
While the incentives for using tags in user modelling are diverse, the motivation behind this usage is often to present the content of an item at a higher level of abstraction. Consequently, in this study user profiles are built based on user generated tags.
When a set of relevant items
Owing to the variety of items chosen by users, a distinct dataset is selected for every one of the users during the mining process; therefore, the frequent tag patterns for each user are extracted from the set of items that is annotated by him. The research literature in data mining defines frequent patterns as patterns that occur at least as frequently as prescribed by a predetermined threshold, commonly referred to as the minimum support [27]. In this study, a frequent tag pattern for each user is defined as a set of tags that the user applies frequently, beyond a certain threshold to annotate either a set of his relevant items
The user tag-based profile, denoted as
The relevant patterns
where
In the same manner, irrelevant patterns
where
In order to determine the level of user interest or lack of it in a relevant or irrelevant pattern, each pattern should be weighted in its own set. In this regard, the weight of each pattern
3.3.2. Item tag-based profile
Similar to the user tag-based profiles, the item tag-based profiles are built based on tags generated by the users to annotate items. As mentioned before, if several users frequently annotate an item with the same group of tags, these tags introduce a particular association that represents the content of the item at a higher level of abstraction. For example, if a web page is frequently annotated with ‘WPF’, ‘.net’, ‘programming’ and ‘guide’ tags, it can be concluded that the mentioned web page is about training WPF programming in a .net framework.
Analogous to the user profile, the item profile is based on frequent tag patterns, where frequent-pattern mining techniques are employed to extract tag patterns that occur in high frequency. Here, in order to generate an item tag-based profile (
where
In order to determine the importance of each pattern in the item profile, the patterns in the item profile are weighted in a similar manner to that of the user profile. The weight of each pattern
3.4. Semantic profiles
In this step, the tag-based profiles from the previous step are upgraded to semantic profiles by exploiting the information stored in the semantic tag repository, produced in Section 3.2. In order to upgrade a tag-based profile to a semantic profile, all tags in each tag pattern are replaced with the corresponding concept in ontology, with respect to the information in the semantic tag repository. A set of concepts that correspond to the tags of a certain tag pattern is referred to as topic of interest or simply topic (Figure 3).

Tag-based and semantic profiles.
As shown in Figure 3, the semantic user profile for a certain user u, referred to as
A set of relevant topics for a certain user
where
The amount of user interest or lack of interest in a relevant or irrelevant topic
With regards to the item semantic profile, there is a one-to-one correspondence between the topics in an item
The item semantic profile for an item
The weight of a certain topic
As mentioned before, each tag in a tag pattern is mapped to a concept in ontology, with respect to the information available in the semantic tag repository. In this process, if a tag is mapped to more than one concept, we select the concept that is most similar to the concepts mapped to the other tags in the same tag pattern. In other words, for each tag pattern, we select concepts for mapping in such a way that the sum of similarities among the selected concepts has maximum value. For this purpose, measuring the similarity between concepts in ontology is of importance. A number of ontology-based similarity metrics have been developed for this purpose [28–33]. In this article, we use a new similarity metric, presented in Sánchez et al. [34] that relies on the exploitation of taxonomical features available in our knowledge base, and that has a reasonably low computational cost. This new metric provides high accuracy without any of the limitations observed in other metrics. According to this metric, the similarity between concepts
where
where
3.5. Semantic enrichment
As mentioned at the beginning of Section 3, we recommend items to the target user based on the similarity between his semantic profile and item semantic profiles; therefore, measuring the similarity between two semantic profiles is of special importance. In this study, the similarity between two semantic profiles is based on the similarity between their topics and, in turn, the similarity between the topics is calculated based on their common concepts. On the other hand, owing to the small number of tags in tag patterns, topics generally comprise a small number of concepts; consequently, there are numerous pairs of topics with no common concept, which makes the calculation of their similarity impossible.
To overcome this problem, we propose a semantic spreading mechanism, which expands the initial set of concepts stored in the topics of semantic profiles through explicit semantic relations with other concepts in the ontology (Figure 4). In order to apply this mechanism, the intensity of the concepts in a topic is defined by their weights in that topic;

Semantic enrichment.
3.6. Collaborative enrichment
The user is likely to prefer similar topics discovered within a set of users, commonly referred to as neighbours, similar to him [37]. For example, if a set of relevant topics of a user contains a topic including the concepts ‘web2.0’ and ‘semanticweb’, he is likely to be interested in topics including ‘web2.0’, ‘semanticweb’ and ‘ontology’, that appear in the set of relevant topics of users similar to him. This enrichment process is particularly effective for some users who do not have sufficient topics in their profiles.
To enrich a user profile collaboratively, first, the k most similar users to the target user are selected as nearest neighbours, and subsequently the target user profile is enriched through analysing neighbour profiles. In this study, since semantic user profiles consist of two sets of relevant and irrelevant topics, the two separate sets of neighbours, denoted as relevant neighbours and irrelevant neighbours are identified. Subsequently, the user’s relevant and irrelevant topics are enriched by analysing the relevant and irrelevant neighbour profiles.
3.6.1. The topic-base similarity metric
In order to identify the relevant/irrelevant neighbours, measuring the similarity between two weighted sets of topics is a necessity. In this regard, a new similarity metric for measuring the similarity between two topics sets, referred to as the topic-base similarity metric, is presented in this study. This new metric is based on the amount of overlap between topics, and is an extension of the Jaccard similarity metric, a well-known practical metric for measuring the similarity between two sets [38]. The proposed topic-based similarity metric measures the similarity between two sets where each element in the set is a set of its own. The initial equation for measuring the similarity between two semantic profiles is as follows:
where
As mentioned before, the weight of a topic determines the level of user interest or its lack in that topic; therefore, the weight of the topics in each set should be entered in the similarity calculation. For this purpose, the initial similarity equation is extended to take topic weights into account as follows:
The Jaccard similarity metric is a metric for measuring the similarity between two sets, and is equal to the cardinality of the intersection of the sets divided by the cardinality of the union of the sets. According to Sections 3.4 and 3.5, a topic is a weighted set of ontology concepts; therefore, we extend the Jaccard similarity to take into account the weights of the concepts in each topic. The similarity between two topics is measured by the following equation:
3.6.2. Enriching user profiles collaboratively
Before enriching a target user profile, determining the neighbours of the target user is of significance. To find the relevant/irrelevant neighbours for the target user u, the similarity between the target user’s relevant/irrelevant topics and other users’ relevant/irrelevant topics is calculated through the newly introduced topic-based similarity metric in this study. Here, the k most similar users to the target user are selected as the relevant/irrelevant neighbours, referred to as
After the set of the nearest neighbours for a certain user u is identified, his semantic profile
To enrich the relevant topics of a certain user u, first the most similar neighbour user
For an effective enrichment, only the specific topics that have a higher weight than that of a general topic are of concern. For example, if the weights for
To put it more precisely, a topic
there is a topic
the weight of
The weight of new topic
where
3.7. Recommendation
In the final step, the top-N items that would interest the user the most are recommended to him. For this purpose, the level of user interest in all items is calculated and the top-N items with the highest user interest are recommended. In this study, the user interest in an item is measured based on the following two factors:
the similarity between his relevant topics and the topics in the item profile;
the dissimilarity between his irrelevant topics and the topics in the item profile.
The similarity between user relevant/irrelevant topics and item topics is measured through the topic-based similarity metric developed in this study. The level of interest of a certain user
where
4. Experimental evaluation
4.1. The setup
The proposed recommendation method was empirically evaluated with real data, and its performance was compared against that of benchmark methods. The system prototype was implemented using .Net framework 4 and Microsoft SQL server 8.0. The experiments were performed on a PC with an Intel Core i7 1.73 GHz processor and 6GB RAM running Windows 7.
To demonstrate the effectiveness of the proposed approach, the MovieLens dataset available at the GroupLens website (http://www.movielens.org/) was used as the experimental data. The dataset consists of 10,000,054 ratings and 95,580 tags applied to 10,681 movies by 71,567 users. The ratings are in the range 1–5.
The experiments were designed to answer the following two key questions:
Can the recommendation quality of the proposed method compete with benchmark methods?
How would the proposed method deal with the cold start problem?
To evaluate the recommendation quality, we divided the experimental data into two separate sets: a training set containing 80% of the dataset and the remaining 20%, which was used as the test set. To ensure that the results were not sensitive to the specific training or test set, the experiments were repeated 10 times with different random sets, and the average of the 10 runs is reported in the subsequent sections. In this experiment, the minimum pattern support in the mining process was set to 0.2 (20%).
We use two metrics to evaluate the recommendation quality, namely precision and ranking accuracy [39, 40]. Precision estimates the proportion of useful recommendations among all items recommended to the user and ranking accuracy is an estimate of the correctness of the ranking of the recommended items.
4.2. Comparisons with other methods
The performance of our semantic tag-based approach, referred to as STB, was evaluated in comparison with the following benchmark methods: (a) a collaborative filtering method based on user-to-user similarity, UserCF [41]; (b) a collaborative filtering method based on an item-to-item similarity, ItemCF [39]; and (c) a collaborative filtering method based on collaborative tagging, TagCF [37].
The neighbourhood size for UserCF and ItemCF was set to 50 and 70, respectively, as a result of data collected from the parameter tuning experiment. TagCF is a method that provides an enhanced recommendation quality derived from user-generated tags. In the TagCF, the similarities between users are determined through user-generated tags. Furthermore, the latent tags for each user, collectively called the Candidate Tag Set, are identified and applied to the item recommendation process. In this experiment, the neighbourhood size and model size (the size of candidate tag set) of TagCF were set to 50 and 60, respectively.
Figure 5 illustrates the results of precision as the number of recommended items N increases from 1 to 10, demonstrating how the STB outperforms the benchmark methods. The curves in this graph indicate that, as the number of recommended items N increases, precision values tend to decrease. According to the results obtained by STB and benchmark methods, regarding precision, STB is more accurate in comparison with the benchmark methods on all variations of N. TagCF ranks second and outperforms UserCF and ItemCF. This is because TagCF employs user-generated tagging, while UserCF and ItemCF operate independently of user-generated tags. In all cases, STB outperforms TagCF, UserCF and ItemCF on average by 9, 12 and 18%, respectively.

Precision of the number of recommended items as N increases.
The results of the ranking accuracy obtained by STB and the benchmark methods are illustrated in Figure 6, where the outperformance of STB is observed in all instances. The obtained results confirm that STB recommends more appropriate items with a higher rank in the top-N recommended items, and consequently, is able to make more convincing recommendations. Similar to precision, the ranking accuracy of TagCF is ranked in second place. This is due to the fact that TagCF employs user-generated tags while UserCF and ItemCF are independent of user-generated tags. On average, with regard to ranking accuracy, STB outperforms TagCF, UserCF and ItemCF by 7, 13 and 15%, respectively.

Ranking accuracy in the number of recommended items as N increases.
4.3. The cold start problem
We further investigated the recommendation performance of this proposed method in face of the cold start problem. For this purpose, the recommendation performance for cold start users, that is, users who assign fewer than 20 tags, is analysed. For these users the precision and ranking accuracy obtained by the proposed and benchmark methods is determined.
Figures 7 and 8 illustrate the results obtained from cold start users with regard to precision and ranking accuracy, respectively. As observed, the results of the benchmark methods confirm that the values of the two metrics for cold start users are significantly low in comparison with the results of all users. However, the proposed method provides almost the same results as those obtained from all users. This is due to the fact that the benchmark methods can scarcely identify preferences of cold start users owing to lack of information. On the contrary, this proposed method can appropriately identify cold start user preferences by enriching the initial user profile.

Precision of top 10 for cold start users and all users.

Ranking accuracy of top 10 for cold start users and all users.
For cold start users, the precision and ranking values of this proposed method are higher than that of those found in the benchmark methods. For instance, for cold start users, STB achieves 45, 41 and 52% improvement for precision compared with TagCF, UserCF and ItemCF, respectively. Moreover, it is obvious that STB surpasses the benchmark methods with respect to ranking accuracy. These results confirm that this method can indeed be a remedy for the problem of cold start users, and hence improve the quality of recommendations.
5. Conclusions and extensions
Owing to the rapid symbiotic growth of social tagging information and semantic web knowledge repositories, the method of integrating semantic knowledge into tagging information has become a flexible and powerful tool for generating efficient semantic information about user profiles and item descriptions. This information can be defined and organized according to ontologies that identify the relationships between different concepts. What this means is that user profiles and item descriptions can be related under a common semantic concept space.
In this article, we present a method that filters social tags, and exploits them to construct semantic profiles for both users and items based on frequently occurring tag patterns. In this case an external knowledge-base, YAGO2, was used. In addition, semantic profiles are further enriched through the constrained activation spreading strategy and by correlation with similar users. Eventually, the top-N items that would most interest the user are recommended based on the similarity between the semantic profiles of the user and item.
Experimental results indicate that our proposed method is more accurate in comparison to benchmarks in terms of precision and ranking accuracy. The recommendation performance of this proposed method in the face of the cold start problem also lends support to its remedial benefits.
Although the approach presented in this study has shown promising results, it is limited by its reliance on data collected from only one source: the GroupLens website. There are a number of issues when processing tag data across heterogeneous tagging platform that are not addressed in this study [42]. This method should also be evaluated in a multidomain scenario using other datasets such as CiteULike, Flickr and Del.icio.us. Moreover, the authors intend to extend this work by leveraging knowledge of the social influences that exist between users to further improve the quality of recommendation.
