Abstract
The task of planning an efficient itinerary has remained laborious and complicated with the bulk of online information imposing difficulty of selecting places and their visiting order. Therefore, a trip planning system generally aims to propose popular points of interests (POIs) and routes in a region structured as a POI graph. The proposed framework aims to utilize travel blogs to accumulate this information. Accordingly, existing approaches have employed frequent pattern mining to construct a POI graph that results in frequency-weighted POIs and route recommendation. The suggested model incorporates a multi-criteria weighting scheme that is contrary to the conventional POI graph. To facilitate travel decision-making, the proposed framework treats frequency measure as an initial weight and further processes blog entries to extract the opinions related to POIs and spatial information between POIs weighting nodes and edges, respectively. A final consolidated weight for each component is computed using defined functions. The contribution has significance for ordinary travelers in efficiently planning their itineraries, as well as for destination management organizations, in realizing the travelling trend and designing tourism products and strategies accordingly.
Introduction
The task of planning an efficient itinerary is laborious and requires one to collect information from several online sources, such as reading descriptions and reviews from official websites and social media, searching images, and exploring google maps, among others. The process is exhausting and complicated with the bulk of online information imposing difficulty of selecting the appropriate attractions and popular travel patterns in the presence of temporal and budget constraints. From a research perspective, this problem can be segmented into two components: (1) extraction of travel information and (2) planning trip paths. Thus, a trip planning system generally aims to propose popular points of interests (POIs) and routes in a region, and structures them in the form a POI graph. To date, geo-tagged Flickr images [1, 2, 3], Global Positioning System (GPS) patterns [4, 5, 6], and social media check-ins [7, 8, 9] have been effectively used as potential travel information sources because of the presence of geospatial metadata that reveal the travel patterns of millions of travelers. However, this information is difficult to decipher from unstructured text descriptions, such as travel blogs and guides, which are devoid of any geo-temporal tagging and structural regularity. This study utilizes travel blogs to accumulate travel-related information and construct a POI graph.
Theoretical foundations of travel blog mining
The research on travel blog text is an interdisciplinary domain and can be broadly categorized into two mainstreams, namely, content analysis and narrative analysis [10]. The primary concern of content analysis is to determine the activities, services, and features associated with a place, as well as to develop destination image and perception by understanding bloggers’ constructed identities of the visited places. In narrative analysis, these identities are interpreted in a scene-recall manner to determine the meaning of bloggers’ travel experience in the time and space dimensions.
Existing frameworks have summarized travel blog content to extract semantic information that corresponds to finding place-related local features and activities, as well as things-to-do, understanding destination image, and disambiguating place names, among others. However, only a few approaches have actually analyzed these writings to determine the travel patterns of bloggers using the frequent sequential pattern mining (FSPM). FSPM is the standard approach to mine travel patterns from blog data and construct a POI graph that contains frequency-weighted nodes and edges that represent popular POIs and routes, respectively. The underlying assumption that motivates the extraction of patterns in the aforementioned method is that travel blogs are regarded as personal online diaries [10, 11, 12], in which the occurrence of facts and events in the text tend to be mentioned in an order. Hence, the appearance of place names in travel blogs can be interpreted as bloggers’ actual travel pattern [12]. The frequency of these patterns is considered a measure of their popularity. Similarly, another assumption states that frequent simultaneous mentions of two POIs are an indication of their geographic proximity [13].
Problem statement
Although these assumptions are perceived to be realistic, they still require empirical contribution using language engineering techniques to construct knowledgeable POI graphs that can facilitate travel decision-making. Accordingly, considering frequent mentions as a measure of POI popularity and mining bloggers’ opinions and sentiments toward this POI would strengthen the argument. Similarly, taking simultaneous mentions as an indication of geographic proximity and extracting spatial indicators among POIs would be beneficial because bloggers generally provide proximity information regarding POIs in their scripts. Consider the following snippet from a travelogue [14], with highlighted
To incorporate these extensions, the solution proposed in this research is a multi-criteria weighted POI graph. Accordingly, the FSPM-based frequency-weighted POI graph will be subjected to another level of weighting. The consequent phase deals with the extraction of spatial relations among POIs from the blog data set. The extracted relation triples will be matched with the POI graph connections to assign each relation to the correct pair of POIs. This process results in the second level of weight for the edges of the POI graph. For the nodes, weight computation will be performed by calculating an individual POI’s opinion polarity from the travel blog data set combined with the rating obtained from credible travel websites. The final stage accumulates the first and second levels of weights for the nodes and edges of the POI graph using the defined functions, as well as generates a consolidated single numeric weight for each node and edge of the POI graph. The higher the weight of a component, the more preferred choice it should be, either as a POI or as route.
The current study attempts to describe the foundational underpinnings and framework details of the suggested multi-criteria weighted graph model. The rest of this paper is organized as follows. Section 2 describes the technicalities of the related literature. Section 3 presents the problem definition and framework details. Section 4 presents an example to illustrate the working mechanism. Lastly, Section 5 concludes this study.
Related work
Mining popular POIs and routes as a POI graph
The research relevant to mining the POI graph from travel blogs includes a few notable studies, including [12], which contributed a web-based multimedia tour guide that uses sequential pattern mining to determine popular tourist routes based on a POI selection, searches for relevant blog entries, and extracts multimedia content along with the route context. Similarly, [15] used FSPM and compact pattern mining to explore popular spots and routes from structured tourism blogs. The framework first finds frequent departure cities, popular POIs and their correlations for a given destination and applies compact pattern mining thereafter to determine the services associated with a destination. The system proposed in [16, 17] extracts popular POIs and their correlations using FSPM and location correlation analysis. A metric of Max-confidence based on frequent item-set mining is introduced to extract popular things of interest (ToIs) for each POI. Recently, [16] implemented a notable enhancement in [13]. The framework devises a frequent item-set mining-based word network, which contains all the popular POIs, as well as their correlations with other POIs and connections to common local features. This extensive word network is segmented into tourism areas based on the assumption that frequent simultaneous mentions of two POIs in blog entries are an indication of their geographic proximity.
Mining spatial information
This section marks the approaches that have been used to extract spatial indicators from travel blog data. However, none of the systems were developed to primarily support trip planning, which is the objective of previously defined studies that show the absence of any spatial information in existing POI graphs. In [18], a framework is proposed to extract geospatial content from travel guides written by expert travelers. The system performs a sort of a map reconstruction by initially extracting spatial information using a rule-based information extraction module and location ontology, followed by geocoding the identified place names, and finally by connecting them based on the extracted route information. In [19], a methodology to construct place graphs has been proposed. The process goes through harvesting web sources to extract descriptions and parsed to extract spatial relations between two locations. The resultant place graphs are aggregated into a composite graph by employing typographic, linguistic, and spatial similarities for node merging. Lastly, [20] proposed a distance and topological relations extraction from crowdsourced geospatial data (i.e., travel blogs) using custom-defined string patterns and syntactic rules, thereby resulting in a relationship graph with nodes representing POIs and with edges representing at least one spatial relation between two POIs.
Mining semantic information
The type of information that has been extensively sourced from travel blogs is the semantic information. Although spatial description is one of the most basic forms of semantics, the semantic models of tourism information mining to be described below have focused on local features, things to do, and activities to a large extent. To begin with, [21] developed a methodology to visualize tourists’ experiences, which is achieved by linking their activities and evaluations related to a particular location at a particular time using association rule mining. Moreover, [22] utilized a location topic model [23] for destination recommendation based on similarity and relevance to a user’s query. This system also performs destination summarization by generating location representative tags and the related text snippets, as well as travelogue enrichment that identifies the informative parts of a travelogue and associates them with their images [24]. By contrast, [25] maximized place semantics in the context of revealing bloggers’ mood and emotion toward places. The travel blog corpus is analyzed for sentiment identification at the paragraph level. Thus, accumulating this spatially distributed sentiment information results in a geospatial opinion map, where one can visualize the opinion-oriented view of a generally large region of interest. The framework of [26, 27] proposed the extraction and disambiguation of place names from travel blogs using a dictionary-based method and dependency structure analysis of text containing place names by focusing on case-marking particles that represent a place where an action has occurred. Similarly, [28] performed structural analysis that resolved blog content into small components, each of which contained real location entities. The entities are disambiguated to form a partonomy structure of the destination. The second stage generates a concept network equipped with descriptive and semantic expressions related to the entity. The framework of [29] described a place as a cognitive region based on a range of activities, and proposed an abstract sematic model. This model is a semantic graph in which a place node may branch further place nodes or geo-features. Moreover, dependency parsing is performed after obtaining the extracted descriptions that contain the geo-feature information, thereby resulting in the generation of activity sub-trees related to a geo-feature. In addition, all the sub-trees are aggregated to provide a complete semantic representation of a place.
Architecture of the proposed framework.
Problem definition
Given a data set
Given a spatial relation annotated corpus
Given a data set
For the multi-criteria weighted POI graph
Categories of qualitative spatial relations
Mining frequency-weighted POI graph
Figure 1 illustrates the overall framework of the multi-criteria weighted POI graph. The first task is to construct a POI graph, that requires the accumulation of popular POIs from travel websites. This process results in the curation of a POI gazetteer that will be used during the pre-processing of blog pages to eliminate non-geographic terms while retaining the required mentions of POIs. A data set
Mining spatial information for edge weighting
After constructing a POI graph, the next phase deals with populating the graph components with knowledge mined from travel blogs. The edge component is first considered and the objective is to extract the spatial indicators that belong to the POI correlations to enrich routes with navigation information. Spatial relations are categorized into three types, namely, topological, directional, and distance relations [30]. Table 1 provides a few examples. In general, spatial relations in the text are described using a variety of linguistic patterns, with direction relations being the most commonly uttered form of spatial information. Hence, a set of spatial lexical patterns is defined and transformed into a rule base using a POI gazetteer base-named entity recognition and a corpus annotated with spatial relations. These relations are extracted in the form of triplets defined as: (trajector, spatial relation, landmark) as described in the Problem Definition section. For example, the sentence “If you have time for only one religious shrine then make the trek to the Batu Caves about 13 kilometers north of KL” contains a type of a “
“
The trajector–landmark pairs in the extracted triplets are now matched with the sequential POI pairs to associate them with the corresponding spatial relations. The order of target and reference objects in spatial relation may conflict with the sequential travel order of POIs. Therefore, while matching, this order should not be distinguished in case of a few topological or distance relations. For example, a triplet extracted, such as “
Mining opinion information for node weighting
The second type of knowledge that should be mined from travel blogs is necessary for objects, that is, nodes of the POI graph. The idea is to perform subjectivity analysis of the travel blog content to determine opinions regarding POIs. This process requires the completion of three tasks. First, POIs should be identified as named entities. Second, certain types of entity modifier dependencies are determined, thereby revealing opinionated information related to POI. Third, the orientation of the opinion should be determined and assigned to a few existing weighting categories, such as positive, neutral, and negative, among others. In this study, POIs are recognized as entities using the constructed gazetteer. Subsequently, dependency parsing will be performed to extract the adjective, adverb, and nominal subject modifiers that correspond to POIs. These modifier dependencies are selected because they can potentially provide a type of opinion related to the modifier object. The polarity of the opinion exhibited by the extracted modifiers will be computed using a subjectivity lexicon and assigned to one of the three classes in
A view of trip advisor interface showing bubble ratings and reviews’ count for KL attractions.
The final step is to define the node and edge weighting functions for the multi-criteria weighted POI graph. These functions are defined to merge the first level of weights obtained through FSPM with the second level of weights obtained through spatial relation extraction for the edges and opinion polarity computation for the nodes. Initially, the edge-weighting function will accumulate the correlation score with the numerically coded spatial information to provide a final weight to each route. The node-weighting function will aggregate the POI popularity and collective subjectivity score retrieved from the blog data set, along with the number of reviews and bubble ratings retrieved from credible tourism websites. This proposition was made to mitigate any type of bias that may occur as a result of bloggers’ personal preferences or a few unpleasant happenings along the journey. Lastly, the resultant-weighting scheme is a function of the qualitative and quantitative parameters as a result of merging multiple attributes. Given the obtained results of the individual stages, we defined Algorithm 1 to elaborate the step-wise construction of the multi-criteria weighted POI graph.
Pre-processed blog entries
Pre-processed blog entries
Popular travel sequences
Equation (3.2.4) shows the node-weighting formula, which is defined as a summation over four parameters, where
Equation (3.2.4) shows the edge-weighting formula, which is defined as a function over two parameters, where
A simple working example in this section is appropriate to explain the working mechanism and expected outcome of the multi-criteria weighted POI graph framework. Accordingly, we use the notations provided in the problem definition phase to consider the contents of gazetteer
Popular routes with spatial information
Popular routes with spatial information
Multi-criteria weight for routes (edges)
Frequency-weighted POI graph.
Multi-criteria weight for POIs (nodes)
For the example blog entries, the next task is to extract spatial relations between POIs (see Table 4). Given that the correlation weights and spatial information are inputs for Eq. (3.2.4), we obtain the final consolidated weight for each edge of the POI graph (see Table 5). For the multi-criteria weighting of nodes, a set of parameters should be calculated based on Eq. (3.2.4). A final consolidated weight is computed for the example blog entries
The preceding illustration leads to visualization of simple multi-criteria weighted POI graph as shown in Fig. 4.
Multi-criteria weighted POI graph.
We realize that a subjective user study is a crucial requirement to compare knowledge expressivity and quality of existing POI graphs with the proposed multi-criteria weighted POI graph. Such study would prove the feasibility level of our approach for travel decision-making in the real world. However, the effectiveness of the resultant graph in Fig. 4 can be perceived by visualizing the findings of [12, 15, 16], where outcome is a conventional FSPM-based POI graph (see Fig. 3 for the similarity). We have illustrated the example of tourist points located in a single tourism area; hence, our approach is able to populate the precise spatial linkages between them in contrast to the word network that contains the detected tourism areas only as proposed in [13]. Lastly, our approach enables the determination and visualization of bloggers’ opinions aggregated with credible measurements; this result is a significant enhancement over previous POI graphs with local features being the only type of semantic information mined and visualized either for routes [12] or tourist points [13, 15, 16].
This study proposed the framework of a multi-criteria weighted POI graph mined from travel blogs. This model features a blend of content and narrative analysis techniques of travel blog mining by incorporating sequential, spatial, and opinion information related to POIs and routes.
The requirement of multi-criteria weighting is necessary because of two reasons. First, multi-criteria weighting will result in a substantially knowledge-equipped graph. Second, it will experimentally contribute to the underlying assumptions related to the travel blog text. Recall that frequent mentions of POI represent its popularity or preference exhibited by a FPSM network. The actual computation of the POI opinion polarities and popularity scores strengthens the proposed model. By determining spatial indicators among POIs, the framework empirically supports the second assumption that the POIs mentioned together in a text may be geographically near to each other. Accordingly, the proposed model has provided a new insight into the existing practices of FSPM for travel paths by enriching these routes with spatial guidance and nodes with sentiment information. These contributions have significance for nonprofessionals and industries. A POI graph equipped with popularity, location, and opinion knowledge regarding POIs and routes will illustrate the tourism profile of a region and assist travel decision-makers to easily observe the overall travelling trend of numerous bloggers through the routes the follow. This observation will enable them to plan their personalized and experience-based itineraries. Apart from travelers, the proposed framework holds potential opportunities for destination marketing organizations (DMOs) and travel service providers. The emergence of user-generated content (UGC), particularly in the form of weblogs and reviews, has inevitable impact on people’s travel decision-making [31, 32] that can cause certain businesses to suffer compared with others. Therefore, research applications that target travel blog analysis will enable industries to gain insights into and analyze travelling behavior, realize the impact of electronic word of mouth (eWOM) on destination image and identity creation, and propose experience-based traveling packages.
The proposed model surrounds the implementation of different techniques, namely, relation extraction and opinion mining of travel blog data. The closely related baseline system [20] has extracted short route spatial indicators, whereas the proposed framework targets all categories of spatial relations. In case of opinion extraction from UGC, a huge tendency of mining other forms of crowdsourced unstructured textual data, such as tweets and reviews [33, 34], but a negligence in utilizing travel blogs exists. Furthermore, out of all the tourism entities, including attractions or POIs, accommodation, and transport, a clearly observed trend is that the focus of maximum research is concentrated on the analysis of accommodation and dining-related services and facilities that include hotels and restaurants [35, 36, 37]. Our framework tends to compute opinions using travel blogs specific to a POI, which is beneficial for deciding the next destination to visit along the way. To strike a balance between qualitative and quantitative parameters, we used a fairly straight weighting mechanism at this stage. Advanced research techniques, such as multiple attribute decision-making [38, 39] and multiple criteria decision analysis [40], can be practically and effectively applied to evaluate suggested or other trip planning parameters, such as cost and time. Therefore, motivation and a room for further research exist from a methodological perspective.
The proposed framework serves as a foundation model of the trip planning graph. At the domain level, the framework holds the possibility of integration and extension to existing travel technologies, such as decision support systems for routing and trip planning, recommender systems, mobile tour guides, personalized trip schedulers, optimal itinerary planners, and multimedia systems for tour scheduling.
Footnotes
Acknowledgments
The paper is an extended version of the paper that has been presented at First EAI International Conference on Computer Science and Engineering held on November 11–12, 2016, at Penang, Malaysia. The authors would also like to acknowledge Tourism Malaysia for giving permission to scrap part of content from the official website. This research was supported by USM Research University Grant (1001/PKOMP/811335: Mining Unstructured Web Data for Tour Itineraries Construction), Universiti Sains Malaysia.
