Abstract
The folksonomies resulting from user-generated tag systems feature rapid adaptability, and reflect the information needs of their supporting user communities. However, they suffer from well-known problems, such as polysemy, heteronymy and lack of recall, which have been addressed in controlled vocabularies and ontologies, which in turn follow slower but more controlled evolution processes. These differences have led to the bridging approach described in this paper, which is based on mapping tags to ontology elements. Mappings can be automatically generated or explicitly provided by user-created assertions between tags and ontology elements. The main objective is to combine existing tag navigation, such as that featured in Delicious, with related tag recommendations obtained from ontology relations, in order to provide a hybrid navigation context that benefits folksonomy browsing. The implementation of such integration, combining Delicious and the OpenCyc knowledge base, is described, along with an evaluation of its potential in improving navigation through the user-generated tag system. The results reveal that the new semantic shortcuts and the decrease in dead ends can substantially influence the collaborative bookmarking experience.
Keywords
1. Introduction
A popular application of Web 2.0 is collaborative or social tagging, which is characterized by allowing any user to freely attach identifying labels or tags to content in document repositories or digital libraries. It is useful when there is no traditional indexing authority, or whenever there is simply too much content for that control to be effective. Social tagging results in a structure of tags that is often called folksonomy. The term was coined by Thomas Vander Wal [1], as ‘the result of personal free tagging of information and objects (anything with a URL) for one’s own retrieval’. Folksonomy is created from the act of tagging by the person consuming the information. The systems that are usually mentioned as examples for that approach include Delicious 1 and Flickr, 2 but many other Web 2.0 applications also provide non-centrally controlled tag systems.
There are other approaches for the categorization of web content based on controlled vocabularies such as directories, thesauri and taxonomies. 3 In general, a controlled vocabulary is a list of specified terms or codes that reduces ambiguity, misspellings and other undesirable properties in the process of indexing [2]. Comparisons to folksonomies have started to appear, e.g. as in [3, 4]. The essential difference between controlled vocabularies and folksonomies is that the latter are created without any kind of central or organized control. For example, efforts in developing thesauri are actually the result of the organized collaborative effort of many people, but folksonomies, by definition, lack any kind of planning or organization, which are in place for controlled systems in general [5]. Therefore it seems clear that the main advantage of folksonomies is that they can adapt more flexibly and quickly than controlled systems: for example, new tags can be added immediately by any of their users at any time, whereas controlled systems discriminate between homonyms, control the use of synonyms and lexical anomalies, and so on.
Ontologies constitute another kind of artefact also related to controlled vocabularies, and are an essential element of semantic web approaches [6]. The difference between a folksonomy approach and an ontological one to describing web contents lies not only in the process, but also in the language used for the description. Folksonomies are unstructured lists of open-ended tags, whereas an ontology’s language is based on some kind of formal logic [7] (in some cases combined with rules or other formalisms) that follows rigorous methodological criteria [8], and provides room for representing axioms, constraints and descriptive knowledge, far beyond the simple flat registering of tags and tagging events that is the basis of folksonomy. In consequence, the two systems represent artefacts that have very different inherent values and forms, and their eventual combination should still maintain a clear separation of the two systems.
Combining methods of traditional knowledge representation with folksonomies, in order to enrich folksonomies with semantic content, is known as tag gardening [9]. The research described in this paper uses ontologies to recommend additional tags for navigation. In a related direction, the incorporation of ontologies is also effective in information retrieval, where they are used for query expansion. Vechtomova and Wang [10] define query expansion as the process of reformulating a seed query to improve retrieval performance in information retrieval operations. For example, Segura et al. [11] report on a method for the evaluation of ontology-based query expansion in learning object repositories, reporting tests using the Gene 4 Ontology and the MERLOT 5 repository. It is remarkable that the number of mature domain ontologies is still very limited for many areas of interest, arguably with the exception of the biomedical domain [12]. In contrast, folksonomies are widely in use, and have grown to cover large areas of many domains [13]. The limitations of folksonomies as classification systems have been reported in [14] and [15], even though they are not far from web directories [16].
This paper addresses the bridging of folksonomies and ontologies to provide a hybrid recommendation mechanism that improves tag navigation and browsing. The related tags recommendations currently offered by folksonomy-based services depend on tag co-occurrence and popularity. According to Peters [9], there are two general sorts of relations: paradigmatic relations, which include taxonomic, equivalence and association relations, hardwired into knowledge organization systems such as ontologies; and syntagmatic relations, where the terms’ connections are brought about by their co-occurrence in a resource.
Recommendations based on syntagmatic relations have shown great effectiveness in Delicious, where related tags must share some associated URL links to be considered as related. However, they cause tags with a small number of links to remain isolated or unrelated, and they are affected by synonymy (i.e. multiple tags for the same concept), heteronymy (i.e. the same tag with multiple related meanings) and lack of recall, which lower the efficiency of content indexing and searching [9, 14, 17, 18]. Other reasons for meta-noise 6 are the lack of stemming 7 and the heterogeneity of users and contexts. In addition, Sinclair and Cardew-Hall [19] have evaluated the usefulness of popularity recommendations in the form of tag clouds, and reached the conclusion that many tagged resources are inaccessible from tag clouds, and they are unsuitable for specific searches.
We propose a hybrid navigation mechanism that adds ontology-based tag recommendations to folksonomy-based systems in an attempt to solve or reduce the side effects of the problems mentioned above. The approach followed is that of mapping tags in the folksonomy with formal terms in the ontology, and then traversing ontology relations to retrieve and recommend related tags. Thus two important research issues are raised. First, since tags are not provided with a formal definition, mappings with ontology terms need to be given an epistemologically adequate interpretation, and the algorithms for traversing ontologies and recommending related tags should account for the different possibilities in linking. 8 Second, the acquisition of the mappings needs to be done in a way that retains the core features of the mental model of tag systems users, that is, respecting the ease of use and open approach of Web 2.0 sites. The two issues are here approached through a generic framework and a concrete implementation of a navigation interface for Delicious site (called TagExplorer) using the large OpenCyc 9 commonsense ontology. The TagExplorer integration attempts to reduce the impact of introduced Delicious problems by providing semantic navigation features retrieved from OpenCyc.
The rest of this paper is structured as follows. Section 2 briefly surveys related work in which formal ontologies and folksonomies are linked or related in some way. Section 3 describes a general framework for linking tag systems with formal ontologies. In Section 4, interface design is discussed, and the concrete mashup prototype application (TagExplorer) is described. Section 5 reports on the evaluation of the potential usefulness of the described approach, by measuring the similarity and differences between the related tags offered by the Delicious folksonomy and those based on the OpenCyc ontology; also, some of the TagExplorer advantages are pointed out. Finally, conclusions and outlook are provided in Section 6.
2. Related work
In contrast with extreme positions considering that folksonomies allow the abandonment of formal ontology as the ground of the next generation of Web applications, or positions that dismiss folksonomies as a serious alternative to fulfil the Semantic Web objectives, we here start from the intermediate position of exploring the potential for bridging the two kinds of system and exploiting the unique features of each of them. The space of possible solutions for such a connection is still far from exhausted.
Previous approaches to relate folksonomies and ontologies include the concept of folksology [20], or the ontology of folksonomy. The TagOntology ontology proposed by Gruber [21] includes concepts such as
Laniado et al. [23, 24] use the WordNet lexical database [25] to build a tree of tags in a folksonomy. This approach finds tags related to a given tag t by collecting the latest resources that have been annotated with t and building a tree that includes their paths up to the unique root of the hypernym noun hierarchy of WordNet. As in our approach, Laniado et al. apply their method to the Delicious website in order to improve the tag-browsing experience. Their approach is based on relatedness based on tag co-occurrences, making it possible to process tags that are not included in the lexicon. However, the quality depends on the amount and coherence of the resource-tagging behaviour of the users. In contrast, the method explained in this paper depends on the quality of the mappings to OpenCyc (which, for some cases, can be generated automatically from the ontology, as will be described later), and it is able to reuse the effort spent in the formal process of ontology engineering for the benefit of navigating in tag systems. For example, the TagExplorer application can show
Specia and Motta [26] describe an approach by which meaningful groups of tags corresponding to concepts in ontologies could be derived by means of co-occurrence analysis and clustering techniques, while relationships within tags in each cluster could be discovered by querying ontologies. This is a way of extracting the semantics of tag systems representing a promising direction. Here we focus on reusing the same informal mechanisms of user tagging, so that such an approach can be complemented by or contrasted with the one presented in this paper. In a similar direction, but focusing on different techniques and ontology elements, Schmitz [27] presented a model for inducing ontology from the Flickr tag vocabulary, using a probabilistic, subsumption-based model.
A different approach would be that of turning folksologies into ontologies, as proposed by van Damme et al. [13]. Their approach entails combining: (1) the statistical analysis of folksonomies, associated usage data, and their implicit social networks; (2) online lexical resources; (3) ontologies and Semantic Web resources; (4) ontology-mapping and -matching approaches; and (5) functionality that helps human actors in achieving and maintaining consensus over ontology element suggestions resulting from the preceding steps. However, such an approach is close to an evidence-based, comprehensive process of ontology engineering that exploits folksonomies. In contrast, our starting assumption here is to retain the value of folksonomies and ontologies as separate systems, and combine ontologies with folksonomies in an agile way.
De Meo et al. have pointed out [28] some drawbacks of approaches based on the exploitation of external data sources to support users in their browsing activities through folksonomies. Specifically, they observed that ontology-based approaches generally depend on the suitability of the exploited ontologies to the context on which they are operating, thus benefiting or affecting these systems’ accuracy. Further, they argue that attempts to use Semantic Web search engines to find proper ontologies (e.g. [29]) do not offer sufficient query facilities, and in many cases a human user is required to manually download, parse and modify the selected ontology. The application described in this paper addresses these issues by making the user able to feed OpenCyc with new tags and multiple levels of similarity relations that bring the ontology into the proximity of the domain of interest. This process can be carried out by general users, and allows for the ontology to reach the desired level of granularity. Thus we do not consider the ontology as a static resource in the folksonomy–ontology bridge, but as a dynamic resource that receives users’ feedback during Delicious browsing.
With regard to the interoperability of tagging systems, Kim et al. [18] described the use of a semantic framework called SCOT to express tagging activities in a machine-understandable and reusable way. In combination with Newman’s ontology [30], the model stores metadata for each tag, including information elements as spelling variants, synonyms, the user that made the annotation, usage popularity, etc. Kim et al. also introduced a decentralized tag-sharing approach, allowing users to reuse shared tag resources.
In summary, the formal representation of tag systems has been addressed in several existing works, and there are approaches that attempt to bridge them with ontologies, making use of co-occurrence or lexical analysis. However, there is still a need to investigate how folksonomies can be combined with ontologies through direct mapping, and without losing the key characteristics that make them valuable separately, which is the starting point of the research presented in this paper.
3. Bridging folksonomy and ontology
In what follows, we briefly describe the approach for the folksonomy–ontology mapping, and the generic algorithm for generating recommended tags by traversing the structure of ontologies.
3.1. Formal model
In this section we consider a tagging system that basically stores a number of instances of concept Tag, which is broadly defined as ‘A natural-language concept that is used to annotate another resource’. Resources are digital elements identified by URIs. We also know who has annotated the resources, and when, that is, there is a function
so that
In our model, such a minimalist schema is combined with an
The idea of the mapping is that of letting users to explicitly state mappings between elements in
3.2. Ontology-mediated tag recommendations
In our approach, tags are mapped to ontology elements by users, and it is expected that such mappings will act as a method of informal semantic interpretation. The

A generic algorithm to derive relatedness between folksonomy tags, based on ontology traversal.
First, the concepts directly related to this tag are retrieved from the ontology. Such a task is accomplished by automatic mappings, including for example the complete term for a given abbreviation or acronym, and also by using the mappings asserted by users, if available. Since tags do not hold a formal definition, mappings with ontology terms are important to set the bridge between the folksonomy and the ontology. The following slangs can be taken as examples:
It should be noted that the accuracy of user mappings is variable, depending on the ontology scope within a domain, among other factors. This has led us to provide the concrete implementation in Section 4 with a feature that allows users to assert different levels of strength on their mappings. Gathering a set S of ontology terms from where to start the traversal process is the objective of this stage. Synonyms and equivalent terms are also added to set S (Figure 1, line 3).
The second stage then uses
On the other hand, we must be aware of how far we can move in the ontology hierarchy before the retrieved results start to get non-meaningful for the user need (level parameter or

Screenshot of the TagExplorer interface when browsing the Javascript tag.
4. TagExplorer application
TagExplorer is a semantic interface based on the OpenCyc ontology, and devised to improve the user experience when navigating across the pages of the delicious.com website. The key objectives of the application are twofold. On the one hand, it attempts to capture the contextual meaning of tags without dependence on their co-occurrence or popularity. On the other hand, it aims at offering a set of related tags based on that meaning. OpenCyc and Delicious have been selected to test the introduced bridging approach for the reasons given in Section 4.1. Sections 4.2 and 4.3 describe the TagExplorer client and server sides respectively.
4.1. Rationale for selecting Delicious and OpenCyc
Delicious has been selected to apply and test the bridging approach described in Section 3, for several reasons. First, the benefits of the Delicious folksonomy for this approach are contrasted with the folksonomies behind other major examples of social tagging: Flickr as a photo annotation website, and YouTube 10 as a video-hosting website. We arrived at most of these conclusions by analysing the social network profiling research described in [31–33]. Second, some statistics are provided to justify the selection of Delicious in terms of popularity.
The tagging activity in Delicious is highly collaborative and dynamic. It is different from Flickr, where a resource (photograph) is generally tagged only by the individual who uploads it. The major activity of other members of the Flickr community is to ‘comment on’ or ‘vote for’ resources. YouTube operates in a similar manner when tagging videos. Therefore the Flickr and YouTube folksonomies are not as rich as the Delicious one with regard to tag co-occurrences. Having several tags assigned to each resource, and the collaborative reuse of tags, is what gives the Delicious folksonomy a semantic potential that deserves to be compared with ontology capabilities.
Another important difference between the three social tagging websites is based on tag orientation. By analysing the top 20 tags in each of them, it is apparent that tags in Delicious are more content oriented, in that they are generally related to the topics and the intellectual content of the resources bookmarked. The tags used in Flickr are more annotation oriented: that is, they are in many cases related to some physical features of the photographs themselves, such as colours, lighting and location. Tags in YouTube tend to focus on the medium or genre of resources (e.g.
Therefore the semantic relations obtained from tag co-occurrence in Flickr and YouTube are not as significant as those obtained in Delicious, where tags are mostly based on the content. Take, for example, a photo of a random person named John when visiting Milan. Typically this photo will be annotated in a website such as Flickr with the tags
Content-oriented tags can also be found in other social tagging websites, including CiteULike 11 and Connotea, 12 where the tagged resources are bibliographical records, and LibraryThing, where the tagged resources are books. However, Delicious was chosen among them for its high levels of popularity (5,300,000 users by November 2008, 13 and currently including more than 180 million tagged URLs). In addition, several works cited throughout this paper include Delicious studies that allow us to evaluate and compare the folksonomy–ontology bridging approach.
With regard to the ontology selection, three common-sense ontologies have been considered: the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE); 14 the Suggested Upper Merged Ontology (SUMO); 15 and the OpenCyc knowledge base, which is the open-source version of the Cyc technology. All of them are comprehensive ontologies, but two main factors make OpenCyc the most appropriate for this research. First, it is one of the largest and most complete general knowledge bases currently available. The latest release of OpenCyc includes 47,000 concepts and 306,000 facts, whereas SUMO includes about 20,000 terms and 70,000 axioms. The more terms or concepts there are in the ontology, the more effective the automatic mapping to the folksonomy will be, and the greater the flexibility that will be offered to the user-based mappings (see Section 4.3.2).
Second, in contrast with the other two ontologies, OpenCyc provides a Java-based API 16 that allows TagExplorer to connect to the OpenCyc server to query concepts and relations. The TagExplorer architecture is illustrated in Figure 3, showing the OpenCyc server as part of the server side of the application. OpenCyc is the open-source version of the Cyc knowledge base [34] (KB from here on). It stores knowledge in a machine-readable form. It contains a set of data and rules formulated in the ontology language CycL, 17 which has a syntax similar to that of Lisp. An internal ontology defines the structure of the stored data, including types of terms and their relationships. The CycL elements used in this paper are introduced below:
Term: anything that can be an argument to a predicate or function. Variables, constants, non-atomic terms (NATs), numbers and strings are terms.
Constant: terms introduced into CycL by explicit creation. Each constant stands for something that can be understood as a concept in the world. They are referred to with the prefix
Variable: terms that appear in CycL rules to stand for not-known-in-advance constants that satisfy the formula of the rule.
Non-atomic term (NAT): specification of a term as a function of some other term(s). It is neither a variable nor a constant.
Formula: combination of terms into meaningful expressions. Every formula has the structure of a Lisp list.
Rule: any CycL formula that begins with #
Microtheory: represents a context in Cyc (e.g. the

The TagExplorer architecture.
4.2. The TagExplorer interface
Figure 2 shows the main TagExplorer interface (the upper frame). Its contents depend on the tag that is being visited in Delicious. The left box in the frame allows the user to identify him or herself, for the purpose of tracing his or her mappings, as explained in Subsection 4.3.3. The second box includes the list of semantically related tags that were found using the ontology-traversing technique discussed in Subsection 4.3.3. Hence it provides browsing aids that help the Delicious user to find related tags, even if they do not share any bookmarked resource inside Delicious. In the Figure 2 example, TagExplorer recommends the
4.3. The TagExplorer Server
The following subsections explain the stages and processes in which the Figure 3 components participate. They can be summarized as follows:
Enriching the KB with Delicious tag constants. This is carried out either automatically by the TagsCrawler, or manually by Delicious users.
Asserting similarities between Delicious tags and the rest of the constants in the KB. This is supported by the Mappings Manager.
Retrieving Delicious tags related to the one that is being visited. This is carried out by the Semantic Relations Retriever.
4.3.1. Enriching the KB with Delicious tag constants
Delicious tags are represented as constants. At any given moment, the KB is aware only of the existence of a subset of all Delicious tags. The more the users visit Delicious while using the TagExplorer, the larger the subset will become. To avoid starting this process from scratch, the most popular Delicious tags are initially loaded in the KB by the TagsCrawler. Each time a Delicious tag is visited, the application determines whether there is already a defined constant with the same name. If there is not, then a new constant is created as an individual of the
4.3.2. Linking Delicious tag constants to the rest of the KB
Delicious tags such as
The
Queries of the OpenCyc’s
Depending on the desired strength and intensity of the relation, the user is able to select one of the following predicates to link Delicious tags to constants in the cloud:
By means of the OpenCyc
Note that we could use functions instead of predicates to represent these mappings, but it would inhibit the advantages of the logical properties explained above. 19
4.3.3. Tracing user activity
The user activity log makes it possible to tracing responsibility for asserting each Delicious–OpenCyc mapping. This is useful in detecting domains with the highest levels of interest, and in performing several kinds of empirical analysis. It can be the base for future development of a collaborative filtering service [36] to recommend tags.
Three ternary predicates were created to store log registries, bearing correspondence to the three levels of similarity described in the previous subsection: #
4.3.4. Retrieving semantically related Delicious tags
TagExplorer is aimed at minimizing the Delicious obstacles mentioned in Section 1 by providing the user with a related set of tags based on the OpenCyc KB. The present section describes the traversal patterns applied to obtain those near tags from the KB taxonomic hierarchy, which is defined by means of the
The traversing algorithm shown in Figure 4 takes a given tag as starting point. Normally it will be the currently visited tag. The algorithm then finds the constants that better represent such tag in the KB (the set ∑) and visits the nearest nodes to that constant or those constants. If $ is a

A concise version of the traversing algorithm.
Then the Delicious tags having a map to some constant in the results
That being said, there are trivial OpenCyc relations that can blur significant results. For example,
5. Evaluation
The approach followed for the evaluation of TagExplorer was that of assessing its potential to derive useful browsing suggestions. An experimental evaluation with Delicious users would require the gathering of a representative sample, which is particularly difficult as Delicious provides no direct means to identify the users behind the nicknames. Also, such an evaluation would require sustained use by the sample of users over a relatively prolonged period of time. The representativeness of the sample would remain an issue if the users were connected by some interest, but not having such a connection might result in a sparse collection of mappings that did not provide enough material to evaluate the potential of the application. Because of these methodological difficulties, the evaluation approached here is based on automatic mappings that, as described above, are carried out by matching tags and ontology terms. Although the results do not evaluate actual use, they are sufficient to assess the potential of TagExplorer for attaining its objectives.
Three case studies were analysed and compared. Each one was based on a dataset of Delicious tags and the relations between them provided by the Delicious co-occurrence-based recommendations. The greater the number of bookmarks shared between two tags, the stronger is their relation in Delicious. Several tests were designed to find similarities and differences between the tag graph defined by Delicious recommendations and that created with the semantic relations retrieved from the OpenCyc KB and provided by the TagExplorer interface. The hypothesis to prove was that, even though there are many coincidences and structural similarities between both recommendation systems, there are also important differences that make them complement one another. The three datasets had the following sizes and configurations:
Dataset 1 included 1000 Delicious tags covering a narrow domain (the art tag was taken as the starting point, i.e. the graph root).
Dataset 2 included 12,500 Delicious tags covering the most popular domains (the 30 most popular tags shown in the Delicious home page were selected as roots).
Dataset 3 included 20,000 Delicious tags covering Dataset 2, and going beyond to unpopular domains.
5.1. Results from Delicious–OpenCyc mapping
The method for mapping Delicious tags to OpenCyc constants was explained in Sections 4.3.1 and 4.3.2. The analysis described below is based exclusively on automatic mappings, and thus starts from a clean copy of the OpenCyc KB containing only the knowledge provided by Cycorp. It can be considered as an attempt to estimate a lower bound for the TagExplorer performance by studying the worst-case scenario, which can be then improved during daily use with the collaborative participation of users.
The percentage of folksonomy tags successfully mapped to semantic resources has been measured by different authors as follows.
Studying 480,000 Delicious tags, it was found that about 8% are contained in the WordNet lexicon [23].
WordNet coverage of Delicious tags, according to [37], is:
No. of top-frequency tags
100
500
1000
5000
10,000
Fraction in WordNet
82%
80%
79%
69%
61%
In the Angeletou et al. experiment [38], 23% of the investigated Flickr tags were discovered in ontologies retrieved from Swoogle.
Results from the present research are illustrated in Figure 5. The negative slope in all datasets shows that popular tags are more likely to be mapped. It should be also noted that, when tags are sorted by Delicious popularity, larger datasets show higher percentages of mapped tags for each value in the horizontal axis. Such a result is understandable, given the fact that Dataset 1 is a subset of Dataset 2, which in turn is a subset of Dataset 3: thus a quite uncommon (and therefore unmappable) tag can fall into the 500 most popular tags of Dataset 1 while being ranked at a much lower position in Dataset 3. For example, the

Percentage of Delicious tags automatically mapped to OpenCyc by TagExplorer. The larger the dataset the greater the slope, and therefore the better the application performance.
With regard to relations between tags, Laniado [23] reports that 68.1% of the studied dataset (including repeated relations) involves terms contained in WordNet. In our case, 42% of 72,000 different Delicious relations involve at least one tag that is contained in OpenCyc, and 32% have both related tags contained in the ontology. Although the obtained mappings are quantitatively similar to the approaches based on WordNet, the OpenCyc ontology provides a semantically richer environment, which allows for TagExplorer to offer the browsing advantages introduced in the following subsection. In addition, the application is designed to foster user-asserted mappings that help OpenCyc in dealing with novel terminology, folksonomy jargon and multilingual tags.
5.2. Contrasting Delicious folksonomy graph with OpenCyc-derived ones
The following figures show the layout of the smallest case study. Delicious datasets have been represented as in Figure 6 in order to be compared with the TagExplorer results obtained from the

Starting from the

OpenCyc relations found between Delicious tags. Grey dots represent tags and black edges represent the
The table in Figure 8 compares the recommendations provided by Delicious and TagExplorer when browsing tags with different characteristics. The three columns at the left show popular tags (according to Delicious users), while the ones at the right are very unpopular. There are general meaning tags such as
Decreasing dead ends: the Delicious folksonomy provides no related tags or outgoing links for 59.6% of the tags in our largest dataset: that is, 11,937 dead-end tags. This is the case for
Cluster co-occurrence: the OpenCyc
Semantic shortcuts: from the 4 × 108 combinations of two tags in the Delicious graph (Dataset 3), no more than 40% are related through a finite path (i.e. the other 60% of pairs of tags are directly and indirectly unrelated). However, if we only consider the pairs that are recognized by TagExplorer as connected (i.e. the Figure 7 edges), then the described percentage rises to 94%, which means that most of the TagExplorer recommended connections already exist in the Delicious folksonomy, albeit as long paths. Figure 9 shows the lengths behaviour in the Delicious graph according to the three case studies. It allows comparison of the distances between all possible pairs of Delicious tags (lower lines) with the distances between the Delicious pairs that were found as semantically connected through the OpenCyc ontology (upper lines). Thus in a graph with density as low as the Delicious one, the TagExplorer recommendations are able to link directly far tags in the Delicious folksonomy, the semantics of which are closely related. Therefore the statistics show that TagExplorer edges (Figure 7) can act as semantic shortcuts that ease folksonomy browsing. According to Figure 9, such semantic shortcuts mostly substitute for Delicious paths of three to eight edges. Figure 10 depicts some examples of these. Note that Delicious paths can be very recondite, so that closely related tags such as

Examples of related tags provided by Delicious and TagExplorer. The analysed tags are in black headings, and white and grey columns correspond to Delicious and TagExplorer results respectively.

Distance between tags in Delicious folksonomy.

Short arrows represent Delicious relations, and large dotted arrows represent semantic shortcuts provided by TagExplorer.
6. Conclusions and future work
As has been shown throughout this paper, folksonomies and ontologies represent two divergent approaches to knowledge organization. Ontologies are formal, logic-based representations, whereas folksonomies include loosely defined tags that are much more prone to ambiguity. In order to get the benefits from both knowledge artefacts, this paper has described a concrete approach in which folksonomies are linked to ontology terms.
The straightforward cases of the linking process are automatically accomplished, while the rest is done by users in the uncontrolled style of Web 2.0 applications. The links are stored inside the ontology (and in the formal language of the ontology) as in existing approaches to ‘ontologies of folksonomy’ [21], and they are kept separate from the domain ontologies. Then a family of potential algorithms and mapping criteria is defined, to allow the formal ontology to be used as an aid for navigation in the tag system. General benefits include linking synonyms, revealing homonyms and heteronyms, and stimulating serendipity. Given the characteristics of the folksonomies generated inside social bookmarking websites such as Delicious, these benefits provide a means of arriving at more elaborate browsing advantages, such as those described in Section 5 and mentioned below. Such approaches can be subject to empirical inquiry, and they represent an extension to ontology–navigation interfaces [39] that exploit the value in community-generated organization systems.
The folksonomy–ontology bridging approach explained in Section 3 involves no specific conditions that bind it to certain folksonomy–ontology pairs; so the traversing algorithm described there can be used to semantically connect the tags from any folksonomy. However, the usefulness of the bridge will depend strongly on the significance of the ontology relations for the folksonomy’s purpose. For example, an ontology containing a taxonomy that classifies tags according to their Greek, Celtic, Slavic or Baltic roots, among others, will not be very useful for users looking for tags in a given domain in a content-oriented folksonomy, nor to users looking for similar videos in an annotation-oriented folksonomy.
TagExplorer is an implementation of such an approach, combining the large OpenCyc ontology and the Delicious shared bookmarking site. It has been tested with three datasets, including 20,000 Delicious tags and 72,000 similarity relations in the largest of them. The mapping tests revealed that the probability of a given tag being automatically mapped to an equivalent concept in OpenCyc is very high when it is a frequently used tag, but falls to 0.27 when dealing with any kind of tag (further statistics are described in Section 5.1). This outcome is in line with other researches that aimed at mapping ontologies to the Delicious folksonomy, and is due to the uncontrolled nature of folksonomies’ vocabularies, which contain many slang words, non-English terms, abbreviations, and a large number of words that just cannot be covered by the domain-oriented nature of most ontologies. For this reason, TagExplorer provides a collaborative method in which users can assert new mappings by themselves, and hence contribute to improvement of the application’s performance.
In order to evaluate how these mappings can benefit the user experience when browsing the Delicious website, several comparisons were carried out, as described in Section 5.2. In general, the recommendation graph that can be built by means of the
In order to improve the effectiveness of TagExplorer, future work should deal with the preparation and cleaning of tags before processing. Since users can choose any word for tagging their resources, the tag sets contain elements that represent not shared categories but other kinds of classification criterion, as described in Section 4.1. In addition, future work will gather a representative sample of Delicious users’ interactions with TagExplorer in order to carry out an experimental evaluation of the folksonomy–ontology hybrid navigation and the Delicious-OpenCyc implementation.
Future work will also deal with evaluation of the effectiveness of different tag recommendation algorithms, based on the links between tags and ontology elements, and how they compare with existing recommendation approaches based on co-occurrence of tags. As suggested in [13], the social interaction manifested in folksonomies, and in their use, should be exploited for building and maintaining ontologies. In addition, future work will analyse and test the feasibility of enriching folksonomies with several ontologies from various sources instead of one. This will allow the use of a series of specific domain ontologies that can reach specific folksonomy areas that are out of the reach of common-sense ontologies such as OpenCyc. Asserting logical equivalences between ontologies’ terms (i.e. ontology mapping) and predicting the similarities between them (i.e. ontology matching) are techniques in constant development [40] that will play a significant role in reusing, expanding and combining existing ontologies.
