Framework of blog data based multi-criteria weighted points of interest graph for trip planning

Abstract

The task of planning an efficient itinerary has remained laborious and complicated with the bulk of online information imposing difficulty of selecting places and their visiting order. Therefore, a trip planning system generally aims to propose popular points of interests (POIs) and routes in a region structured as a POI graph. The proposed framework aims to utilize travel blogs to accumulate this information. Accordingly, existing approaches have employed frequent pattern mining to construct a POI graph that results in frequency-weighted POIs and route recommendation. The suggested model incorporates a multi-criteria weighting scheme that is contrary to the conventional POI graph. To facilitate travel decision-making, the proposed framework treats frequency measure as an initial weight and further processes blog entries to extract the opinions related to POIs and spatial information between POIs weighting nodes and edges, respectively. A final consolidated weight for each component is computed using defined functions. The contribution has significance for ordinary travelers in efficiently planning their itineraries, as well as for destination management organizations, in realizing the travelling trend and designing tourism products and strategies accordingly.

Keywords

Travel blogs frequent pattern mining spatial relations opinion mining

1. Introduction

The task of planning an efficient itinerary is laborious and requires one to collect information from several online sources, such as reading descriptions and reviews from official websites and social media, searching images, and exploring google maps, among others. The process is exhausting and complicated with the bulk of online information imposing difficulty of selecting the appropriate attractions and popular travel patterns in the presence of temporal and budget constraints. From a research perspective, this problem can be segmented into two components: (1) extraction of travel information and (2) planning trip paths. Thus, a trip planning system generally aims to propose popular points of interests (POIs) and routes in a region, and structures them in the form a POI graph. To date, geo-tagged Flickr images [1, 2, 3], Global Positioning System (GPS) patterns [4, 5, 6], and social media check-ins [7, 8, 9] have been effectively used as potential travel information sources because of the presence of geospatial metadata that reveal the travel patterns of millions of travelers. However, this information is difficult to decipher from unstructured text descriptions, such as travel blogs and guides, which are devoid of any geo-temporal tagging and structural regularity. This study utilizes travel blogs to accumulate travel-related information and construct a POI graph.

1.1 Theoretical foundations of travel blog mining

The research on travel blog text is an interdisciplinary domain and can be broadly categorized into two mainstreams, namely, content analysis and narrative analysis [10]. The primary concern of content analysis is to determine the activities, services, and features associated with a place, as well as to develop destination image and perception by understanding bloggers’ constructed identities of the visited places. In narrative analysis, these identities are interpreted in a scene-recall manner to determine the meaning of bloggers’ travel experience in the time and space dimensions.

Existing frameworks have summarized travel blog content to extract semantic information that corresponds to finding place-related local features and activities, as well as things-to-do, understanding destination image, and disambiguating place names, among others. However, only a few approaches have actually analyzed these writings to determine the travel patterns of bloggers using the frequent sequential pattern mining (FSPM). FSPM is the standard approach to mine travel patterns from blog data and construct a POI graph that contains frequency-weighted nodes and edges that represent popular POIs and routes, respectively. The underlying assumption that motivates the extraction of patterns in the aforementioned method is that travel blogs are regarded as personal online diaries [10, 11, 12], in which the occurrence of facts and events in the text tend to be mentioned in an order. Hence, the appearance of place names in travel blogs can be interpreted as bloggers’ actual travel pattern [12]. The frequency of these patterns is considered a measure of their popularity. Similarly, another assumption states that frequent simultaneous mentions of two POIs are an indication of their geographic proximity [13].

1.2 Problem statement

Although these assumptions are perceived to be realistic, they still require empirical contribution using language engineering techniques to construct knowledgeable POI graphs that can facilitate travel decision-making. Accordingly, considering frequent mentions as a measure of POI popularity and mining bloggers’ opinions and sentiments toward this POI would strengthen the argument. Similarly, taking simultaneous mentions as an indication of geographic proximity and extracting spatial indicators among POIs would be beneficial because bloggers generally provide proximity information regarding POIs in their scripts. Consider the following snippet from a travelogue [14], with highlighted travel pattern and spatial information, we can also identify bloggers’ opinions regarding POIs:

KLCC is the shiny new centre of Kuala Lumpur best known for its iconic Petronas Towers …The nearbySyakirin Mosque is worth a visit, the shopping and food courts of Suria KLCC mall, the surroundingKLCC Park …

To incorporate these extensions, the solution proposed in this research is a multi-criteria weighted POI graph. Accordingly, the FSPM-based frequency-weighted POI graph will be subjected to another level of weighting. The consequent phase deals with the extraction of spatial relations among POIs from the blog data set. The extracted relation triples will be matched with the POI graph connections to assign each relation to the correct pair of POIs. This process results in the second level of weight for the edges of the POI graph. For the nodes, weight computation will be performed by calculating an individual POI’s opinion polarity from the travel blog data set combined with the rating obtained from credible travel websites. The final stage accumulates the first and second levels of weights for the nodes and edges of the POI graph using the defined functions, as well as generates a consolidated single numeric weight for each node and edge of the POI graph. The higher the weight of a component, the more preferred choice it should be, either as a POI or as route.

The current study attempts to describe the foundational underpinnings and framework details of the suggested multi-criteria weighted graph model. The rest of this paper is organized as follows. Section 2 describes the technicalities of the related literature. Section 3 presents the problem definition and framework details. Section 4 presents an example to illustrate the working mechanism. Lastly, Section 5 concludes this study.

2. Related work

2.1 Mining popular POIs and routes as a POI graph

The research relevant to mining the POI graph from travel blogs includes a few notable studies, including [12], which contributed a web-based multimedia tour guide that uses sequential pattern mining to determine popular tourist routes based on a POI selection, searches for relevant blog entries, and extracts multimedia content along with the route context. Similarly, [15] used FSPM and compact pattern mining to explore popular spots and routes from structured tourism blogs. The framework first finds frequent departure cities, popular POIs and their correlations for a given destination and applies compact pattern mining thereafter to determine the services associated with a destination. The system proposed in [16, 17] extracts popular POIs and their correlations using FSPM and location correlation analysis. A metric of Max-confidence based on frequent item-set mining is introduced to extract popular things of interest (ToIs) for each POI. Recently, [16] implemented a notable enhancement in [13]. The framework devises a frequent item-set mining-based word network, which contains all the popular POIs, as well as their correlations with other POIs and connections to common local features. This extensive word network is segmented into tourism areas based on the assumption that frequent simultaneous mentions of two POIs in blog entries are an indication of their geographic proximity.

2.2 Mining spatial information

This section marks the approaches that have been used to extract spatial indicators from travel blog data. However, none of the systems were developed to primarily support trip planning, which is the objective of previously defined studies that show the absence of any spatial information in existing POI graphs. In [18], a framework is proposed to extract geospatial content from travel guides written by expert travelers. The system performs a sort of a map reconstruction by initially extracting spatial information using a rule-based information extraction module and location ontology, followed by geocoding the identified place names, and finally by connecting them based on the extracted route information. In [19], a methodology to construct place graphs has been proposed. The process goes through harvesting web sources to extract descriptions and parsed to extract spatial relations between two locations. The resultant place graphs are aggregated into a composite graph by employing typographic, linguistic, and spatial similarities for node merging. Lastly, [20] proposed a distance and topological relations extraction from crowdsourced geospatial data (i.e., travel blogs) using custom-defined string patterns and syntactic rules, thereby resulting in a relationship graph with nodes representing POIs and with edges representing at least one spatial relation between two POIs.

2.3 Mining semantic information

The type of information that has been extensively sourced from travel blogs is the semantic information. Although spatial description is one of the most basic forms of semantics, the semantic models of tourism information mining to be described below have focused on local features, things to do, and activities to a large extent. To begin with, [21] developed a methodology to visualize tourists’ experiences, which is achieved by linking their activities and evaluations related to a particular location at a particular time using association rule mining. Moreover, [22] utilized a location topic model [23] for destination recommendation based on similarity and relevance to a user’s query. This system also performs destination summarization by generating location representative tags and the related text snippets, as well as travelogue enrichment that identifies the informative parts of a travelogue and associates them with their images [24]. By contrast, [25] maximized place semantics in the context of revealing bloggers’ mood and emotion toward places. The travel blog corpus is analyzed for sentiment identification at the paragraph level. Thus, accumulating this spatially distributed sentiment information results in a geospatial opinion map, where one can visualize the opinion-oriented view of a generally large region of interest. The framework of [26, 27] proposed the extraction and disambiguation of place names from travel blogs using a dictionary-based method and dependency structure analysis of text containing place names by focusing on case-marking particles that represent a place where an action has occurred. Similarly, [28] performed structural analysis that resolved blog content into small components, each of which contained real location entities. The entities are disambiguated to form a partonomy structure of the destination. The second stage generates a concept network equipped with descriptive and semantic expressions related to the entity. The framework of [29] described a place as a cognitive region based on a range of activities, and proposed an abstract sematic model. This model is a semantic graph in which a place node may branch further place nodes or geo-features. Moreover, dependency parsing is performed after obtaining the extracted descriptions that contain the geo-feature information, thereby resulting in the generation of activity sub-trees related to a geo-feature. In addition, all the sub-trees are aggregated to provide a complete semantic representation of a place.

Figure 1.

Architecture of the proposed framework.

3. Multi-criteria weighted POI graph

3.1 Problem definition

Given a data set B of n blog entries, with pre-processing and gazetteer matching, transformed to a set of vectors X ${}^{\prime}=$ {x ${}_{\mathbf{1}}^{\prime}$ , x ${}_{\mathbf{2}}^{\prime}$ , …, x ${}_{\mathbf{n}}^{\prime}$ }, where vector xi ${}^{\prime}=$ {xi ${}_{\mathbf{1}}$ , …, xi ${}_{\mathbf{n}}$ } and a defined minimum support count threshold s_min, find all frequent ordered sequences with n items {xi ${}_{\mathbf{j}}$ , xi ${}_{\mathbf{(j+1)}}$ , …, xi ${}_{\mathbf{n}}$ } having relative support of no less than s_min to construct a POI graph G $=$ (V, E), where each node in V represents POI ${}_{\mathbf{i}}$ and is connected to POI ${}_{\mathbf{j}}$ through a directed edge e ${}_{\mathbf{ij}}$ in E. The initial sets of V and E hold the following characteristics:

1) 1)
$\mathbf{\forall}{\mathbf{POI}}_{\mathbf{i}}\mathbf{\in V,w(}{\mathbf{POI}}_{% \mathbf{i}}\mathbf{)}$ represents the popularity of POI ${}_{\mathbf{i}}$ measured as its frequency of occurrence in transaction database for n $=$ 1, where $\mathbf{w}({\mathbf{POI}}_{\mathbf{i}})\geqslant\mathbf{s\_min}$ .
2)
$\mathbf{\forall}(\mathbf{e}_{\mathbf{ij}}={\mathbf{POI}}_{\mathbf{i}}\to{% \mathbf{POI}}_{\mathbf{j}})\in\mathbf{E},\mathbf{w}(\mathbf{e}_{\mathbf{ij}})$ represents the popularity of the correlation between POI ${}_{\mathbf{i}}$ and POI ${}_{\mathbf{j}}$ , thereby indicating a frequent sequential connection, where $\mathbf{w}(\mathbf{e}_{\mathbf{ij}})\geqslant\mathbf{s\_min}$ .

Given a spatial relation annotated corpus C obtained by pre-processing data set B and a set of spatial lexical patterns transformed into a rule set R, extract the set of spatial triplets’ instances S $=$ {S ${}_{\mathbf{1}}$ , S ${}_{\mathbf{2}}$ , …, S ${}_{\mathbf{n}}$ } with S ${}_{\mathbf{T}}=$ (POI ${}_{\mathbf{u}}$ , SR ${}_{\mathbf{k}}$ , POI ${}_{\mathbf{v}}$ ), where POI ${}_{\mathbf{u}}$ defines the trajector that refers to the POI whose location will be described with reference to POI ${}_{\mathbf{v}}$ , that defines the landmark or relatum. SR ${}_{\mathbf{k}}$ denotes the spatial relation that exists between POI ${}_{\mathbf{u}}$ and POI ${}_{\mathbf{v}}$ . Accordingly, $\forall\mathbf{S}_{\mathbf{T}}\in\mathbf{S}$ , match vertex pairs $(\mathbf{e}_{\mathbf{ij}}={\mathbf{POI}}_{\mathbf{i}}\to{\mathbf{POI}}_{% \mathbf{j}})$ with trajector–landmark pairs ( ${\mathbf{SR}}_{\mathbf{k}}={\mathbf{POI}}_{\mathbf{u}}\to{\mathbf{POI}}_{% \mathbf{v}}$ ) to assign spatial information to G. For the next level, define a numeric weighting function to indicate the presence or absence of spatial information for an edge $\mathbf{e}_{\mathbf{ij}}$ as F: $\mathbf{w}_{\mathbf{uv}}$ $\to$ {0, 1, 2}, where 0 represents the absence of any spatial information and 1 and 2 represent the presence of one and more types of spatial relations for $\mathbf{e}_{\mathbf{ij}}$ , respectively.

Given a data set B of n blog entries, find POIs as entities using the POI gazetteer GL, $\forall\mathbf{POI}_{\mathbf{x}}$ , and extract the triplets of the modifier dependencies as c $<$ a, b $>$ , where a and b represent the modifier and the object, respectively, and c being the type of dependency. Given a subjectivity lexicon L, $\forall$ POI ${}_{\mathbf{x}}$ , we compute the aggregated polarity score $\mathbf{w}_{\mathbf{x}}$ and assign a class from the defined function F: $\mathbf{w}_{\mathbf{x}}^{\prime}$ $\to$ { $-$ 1, 0, 1}, where 0 corresponds to a neutral opinion and $-$ 1 and $+$ 1 indicate a negative and positive orientation, respectively, of polarity.

For the multi-criteria weighted POI graph G ${}^{\prime}=$ (V ${}^{\prime}$ , E ${}^{\prime}$ ), define the weighting functions W ${}_{\mathbf{node}}$ and W ${}_{\mathbf{edge}}$ as follows:

1)
$\forall{\mathbf{POI}}_{\mathbf{i}}^{\prime}\in\mathbf{V}^{\prime}$ , W ${}_{\mathbf{node}}=\sum\{\mathbf{w}({\mathbf{POI}}_{\mathbf{i}})$ , $\mathbf{w}_{\mathbf{x}}$ , R_count, B_rate}, where R_count is the number of reviews and B_rate is the bubble rating retrieved from travel websites.
2)
$\mathbf{\forall}(\mathbf{e}_{\mathbf{ij}}^{\prime}=\mathbf{POI}_{\mathbf{i}}^{% \prime}\to{\mathbf{POI}}_{\mathbf{j}}^{\prime})\in$ E ${}^{\prime}$ , W ${}_{\mathbf{edge}}=\sum$ { $\mathbf{w}(\mathbf{e}_{\mathbf{ij}})$ , $\mathbf{w}_{\mathbf{uv}}$ }.

Table 1
Categories of qualitative spatial relations

Type of relation Example spatial indicators

Topological Surround, inside, outside, across

Directional Left, right, north, south east, centre

Distance Near, far, 2 kilometers, middle

3.2 Framework description

Type of relation	Example spatial indicators
Topological	Surround, inside, outside, across
Directional	Left, right, north, south east, centre
Distance	Near, far, 2 kilometers, middle

3.2.1 Mining frequency-weighted POI graph

Figure 1 illustrates the overall framework of the multi-criteria weighted POI graph. The first task is to construct a POI graph, that requires the accumulation of popular POIs from travel websites. This process results in the curation of a POI gazetteer that will be used during the pre-processing of blog pages to eliminate non-geographic terms while retaining the required mentions of POIs. A data set B of n blog entries is crawled and passed through a natural language pre-processing stage, such that a set of processed vectors X $=$ {x ${}_{\mathbf{1}}$ , x ${}_{\mathbf{2}}$ , …, x ${}_{\mathbf{n}}$ } is obtained. The pre-processing further involves gazetteer-based filtering of non-geographic terms from X and transforms it to X ${}^{\prime}=$ {x ${}_{\mathbf{1}}^{\prime}$ , x ${}_{\mathbf{2}}^{\prime}$ , …, x ${}_{\mathbf{n}}^{\prime}$ }. At present, we refer to the problem definition and the frequent 1-item set F ${}^{\mathbf{1}}$ (X ${}^{\prime}$ ) that represents the node or POI popularity is mined from X ${}^{\prime}$ . Accordingly, a support count above s_min. F ${}^{\mathbf{1}}$ (X ${}^{\prime}$ ) is used to generate candidate frequent n-item sets. In this study, the value of n is equal to 2; hence, the retrieved correlations or sequences are frequent [15] and the spatial relation extracted between the two POIs in the next phase can be associated with the corresponding pair of the sequential pattern obtained in this phase. Through FSPM, the often visited POIs and their correlations are determined, where the value of correlation weight indicates route popularity or the frequency of its occurrence.

3.2.2 Mining spatial information for edge weighting

After constructing a POI graph, the next phase deals with populating the graph components with knowledge mined from travel blogs. The edge component is first considered and the objective is to extract the spatial indicators that belong to the POI correlations to enrich routes with navigation information. Spatial relations are categorized into three types, namely, topological, directional, and distance relations [30]. Table 1 provides a few examples. In general, spatial relations in the text are described using a variety of linguistic patterns, with direction relations being the most commonly uttered form of spatial information. Hence, a set of spatial lexical patterns is defined and transformed into a rule base using a POI gazetteer base-named entity recognition and a corpus annotated with spatial relations. These relations are extracted in the form of triplets defined as: (trajector, spatial relation, landmark) as described in the Problem Definition section. For example, the sentence “If you have time for only one religious shrine then make the trek to the Batu Caves about 13 kilometers north of KL” contains a type of a “direction $+$ distance” spatial relation that can be extracted based on the following defined pattern:

“Verb $+$ POI ${}_{\mathbf{i}}$ $+$ preposition $+$ quantifier $+$ unit $+$ direction word(s) $+$ POI ${}_{\mathbf{j}}$ ”

The trajector–landmark pairs in the extracted triplets are now matched with the sequential POI pairs to associate them with the corresponding spatial relations. The order of target and reference objects in spatial relation may conflict with the sequential travel order of POIs. Therefore, while matching, this order should not be distinguished in case of a few topological or distance relations. For example, a triplet extracted, such as “Batu Caves, 13 Kilometers, KL” should be matched with a sequential pattern, such as “KL, Batu Caves”. However, a triplet, such as “Batu Caves, North, KL” indicates a direction spatial relation; after being matched with a sequential pattern, target and reference objects’ order should be retained. Moreover, the demonstrated example affirms that over one type of spatial relations may belong to a single sequential connection. To aggregate this information for the succeeding level, a numeric labelling function should be defined to represent the presence or absence of any spatial information for a particular edge. Given that three distinct classes of spatial relations exist, the function F, as defined in the Problem Definition section, returns 0 without any indication of spatial information between two POIs. Accordingly, non-zero values represent the presence of spatial information, in which over one type of spatial relations for a certain route may exist.

3.2.3 Mining opinion information for node weighting

The second type of knowledge that should be mined from travel blogs is necessary for objects, that is, nodes of the POI graph. The idea is to perform subjectivity analysis of the travel blog content to determine opinions regarding POIs. This process requires the completion of three tasks. First, POIs should be identified as named entities. Second, certain types of entity modifier dependencies are determined, thereby revealing opinionated information related to POI. Third, the orientation of the opinion should be determined and assigned to a few existing weighting categories, such as positive, neutral, and negative, among others. In this study, POIs are recognized as entities using the constructed gazetteer. Subsequently, dependency parsing will be performed to extract the adjective, adverb, and nominal subject modifiers that correspond to POIs. These modifier dependencies are selected because they can potentially provide a type of opinion related to the modifier object. The polarity of the opinion exhibited by the extracted modifiers will be computed using a subjectivity lexicon and assigned to one of the three classes in F ${}^{\prime}$ . For each POI, a collective subjectivity score is determined out of all the blog entries.

Figure 2.

A view of trip advisor interface showing bubble ratings and reviews’ count for KL attractions.

3.2.4 Multi-criteria weighting

The final step is to define the node and edge weighting functions for the multi-criteria weighted POI graph. These functions are defined to merge the first level of weights obtained through FSPM with the second level of weights obtained through spatial relation extraction for the edges and opinion polarity computation for the nodes. Initially, the edge-weighting function will accumulate the correlation score with the numerically coded spatial information to provide a final weight to each route. The node-weighting function will aggregate the POI popularity and collective subjectivity score retrieved from the blog data set, along with the number of reviews and bubble ratings retrieved from credible tourism websites. This proposition was made to mitigate any type of bias that may occur as a result of bloggers’ personal preferences or a few unpleasant happenings along the journey. Lastly, the resultant-weighting scheme is a function of the qualitative and quantitative parameters as a result of merging multiple attributes. Given the obtained results of the individual stages, we defined Algorithm 1 to elaborate the step-wise construction of the multi-criteria weighted POI graph.

Algorithm 1: Construction of a multi-criteria weighted POI graph G ${}^{\prime}$
1:	Input: POI graph G $=$ (V, E), Spatial triplets set
	S, Polarity, Rating, No. of Review, Max.
	Reviews
2:	Output: Multi-criteria weighted POI graph G ${}^{\prime}=$
	(V ${}^{\prime}$ , E ${}^{\prime}$ )
3:	begin
4:	foreach ${\text{POI}}_{\text{i}}\in$ V do
5:	assign corresponding Polarity to ${\text{POI}}_{\text{i}}$
6:	retrieve Rating, No. of Reviews and Max.
	Reviews for ${\text{POI}}_{\text{i}}$
7:	extract ${\text{Count}}_{\text{FSPM}}$ and degree of ${\text{POI}}_{\text{i}}$
	from G
8:	compute $\text{W}({\text{POI}}_{\text{i}}^{\prime})$ // Eq. (3.2.4)
9:	add ${\text{POI}}_{\text{i}}^{\prime}$ to V ${}^{\prime}$
10:	end
11:	foreach $\text{e}_{\text{ij}}\in$ E do
12:	if trajector and landmark in $\text{s}_{\text{i}}$ matches
	frequent sequential pair $\text{e}_{\text{ij}}=$
	$({\text{POI}}_{\text{i}}\to{\text{POI}}_{\text{j}})$
13:	assign spatial indicator(s) to $\text{e}_{\text{ij}}$
14:	end
15:	extract correlation value for $\text{e}_{\text{ij}}$ from G
16:	compute $\text{W(}\text{e}_{\text{ij}}^{\prime})$ // Eq. (3.2.4)
17:	add $\text{e}_{\text{ij}}^{\prime}$ to E ${}^{\prime}$
18:	end
19:	end

Table 2
Pre-processed blog entries

Vector	POIs sequence
x ${}_{1}^{\prime}$	{National Mosque, Islamic Arts Museum, Lake Gardens, KL Bird Park, National Planetarium}
x ${}_{2}^{\prime}$	{National Mosque, Islamic Arts Museum, KL Bird Park, Lake Gardens, National Planetarium}
x ${}_{3}^{\prime}$	{Lake Gardens, National Mosque, Islamic Arts Museum, KL Bird Park}
x ${}_{4}^{\prime}$	{Islamic Arts Museum, Lake Gardens, National Mosque, National Planetarium}
x ${}_{5}^{\prime}$	{National Planetarium, Lake Gardens, Islamic Arts Museum, KL Bird Park, National Mosque}

Table 3

Popular travel sequences

POIs sequence ( $n=$ 2)	Correlation weight
{Lake Gardens, National Mosque}	40%
{National Mosque, Islamic Arts Museum}	60%
{Islamic Arts Museum, KL Bird Park}	40%
{Islamic Arts Museum, Lake Gardens}	40%

Equation (3.2.4) shows the node-weighting formula, which is defined as a summation over four parameters, where Degree is the number of outgoing routes from POI ${}_{\mathbf{i}}$ that is a measure of its popularity and ${\mathbf{Count}}_{\mathbf{FSPM}}\mathbf{}$ is the total number of unique POIs in the FSPM transaction database. Polarity is the opinion score for POI ${}_{\mathbf{i}}$ obtained through subjectivity analysis. Rating, No. of Reviews, and Max. Reviews represent the popularity information obtained from credible travel websites, where Max. Reviews represents the POI, which has a maximum number of reviews with the same rating value as POI ${}_{\mathbf{i}}$ . This parameter can be understood by taking an example from TripAdvisor (see Fig. 2), where the Islamic Arts Museum and KLCC possess the same rating value on a 5-star scale but differ in number of reviews posted by travelers. The function value ranges from 0 to 100, where the first two factors will compute 50 points and the succeeding two will compute another 50 points. Each factor should be separately assigned a weight to further distribute 50 points. Hence, $\mathbf{w}_{\mathbf{Degree}}\text{ and }{\mathbf{w}}_{\mathbf{Polarity}}$ are assigned values of 20 and 30, respectively. By contrast, $\mathbf{w}_{\mathbf{Rating}}\text{ and }\mathbf{w}_{\mathbf{Review}}$ are each assigned a value of 10.

$\displaystyle\text{W}({\text{POI}}_{\text{i}}^{\prime})=\sum\left(\frac{\text{% Degree}}{{\text{Count}}_{\text{FSPM}}-1}\cdot{\text{ w}}_{\text{Degree}}\right),$ $\displaystyle(\text{Polarity}\cdot\text{w}_{\text{Polarity}}),((\text{Rating-1% })\cdot\text{w}_{\text{Rating}}),$ $\displaystyle\left(\frac{\text{No. of Reviews}}{\text{Max. Reviews}}\cdot\text% {w}_{\text{Review}}\right)$ (1)

Equation (3.2.4) shows the edge-weighting formula, which is defined as a function over two parameters, where Correlation is the popularity of a route $\mathbf{e}_{\mathbf{ij}}$ and Spatial_Information is the presence of spatial indicators for $\mathbf{e}_{\mathbf{ij}}$ . This function value also ranges from 0 to 100, where $\mathbf{w}_{\mathbf{Correlation}}\text{ and }\mathbf{w}_{\mathbf{SI}}$ are assigned values of 50 and 25, respectively.

$\displaystyle\text{W}(\text{e}_{\text{ij}}^{\prime})=\sum(\text{Correlation}% \cdot\text{w}_{\text{Correlation}}),$ $\displaystyle\quad∼{}({\text{Spatial}}_{\text{Information}}\cdot\text{w}_{% \text{SI}})$ (2)

4. Illustrative scenario

A simple working example in this section is appropriate to explain the working mechanism and expected outcome of the multi-criteria weighted POI graph framework. Accordingly, we use the notations provided in the problem definition phase to consider the contents of gazetteer GL $=$ {National Mosque, Islamic Arts Museum, Lake Gardens, KL Bird Park, National Planetarium}. Table 2 depicts a set of five pre-processed blog entries X ${}^{\prime}$ -related to Kuala Lumpur. With s_min $=$ 60%, we obtain frequent 1-item set F ${}^{\mathbf{1}}$ (X ${}^{\prime}$ ) $=$ {{National Mosque}, {Islamic Arts Museum}, {Lake Gardens}, {KL Bird Park}, {National Planetarium}}. By setting n $=$ 2 and s_min $=$ 40%, we obtain a sequence of frequently occurring correlations of POIs (see Table 3). A frequency-weighted POI graph can be constructed based on the retrieved sequential patterns as depicted in Fig. 3, where POIs are represented by network nodes and correlations by edges.

Table 4
Popular routes with spatial information

Route	Spatial indicator(s)	Spatial_Information
{Islamic Arts Museum,	Close by	1
Lake Gardens}
{National Mosque,	Within walking	1
Islamic Arts Museum}	distance
{Islamic Arts Museum,	Near, east	2
KL Bird Park}
{National Mosque,	1 Kilometer, around	2
Lake Gardens}

Table 5

Multi-criteria weight for routes (edges)

Route	Multi-criteria weight
{Lake Gardens, National Mosque}	70
{National Mosque, Islamic Arts Museum}	80
{Islamic Arts Museum, KL Bird Park}	45
{Islamic Arts Museum, Lake Gardens}	45

Figure 3.

Frequency-weighted POI graph.

Table 6

Multi-criteria weight for POIs (nodes)

Popular POI	Degree ratio	Polarity	Rating	Review ratio	Multi-criteria weight
National Mosque	0.5	0.6	4	0.27	61
Islamic Arts Museum	0.75	0.8	5	1	89
Lake Gardens	0.5	0.65	4	0.174	62
KL Bird Park	0.25	0.6	4	1	63

For the example blog entries, the next task is to extract spatial relations between POIs (see Table 4). Given that the correlation weights and spatial information are inputs for Eq. (3.2.4), we obtain the final consolidated weight for each edge of the POI graph (see Table 5). For the multi-criteria weighting of nodes, a set of parameters should be calculated based on Eq. (3.2.4). A final consolidated weight is computed for the example blog entries X ${}^{\prime}$ and parameter values retrieved from TripAdvisor (see Table 6). The intermediate stage of node weighting can be graphically demonstrated using the polarity class notation explained in Problem Definition part. Similarly, the extracted spatial relations for routes can depict an intermediate stage graph for the weighted edges.

The preceding illustration leads to visualization of simple multi-criteria weighted POI graph as shown in Fig. 4.

Figure 4.

Multi-criteria weighted POI graph.

We realize that a subjective user study is a crucial requirement to compare knowledge expressivity and quality of existing POI graphs with the proposed multi-criteria weighted POI graph. Such study would prove the feasibility level of our approach for travel decision-making in the real world. However, the effectiveness of the resultant graph in Fig. 4 can be perceived by visualizing the findings of [12, 15, 16], where outcome is a conventional FSPM-based POI graph (see Fig. 3 for the similarity). We have illustrated the example of tourist points located in a single tourism area; hence, our approach is able to populate the precise spatial linkages between them in contrast to the word network that contains the detected tourism areas only as proposed in [13]. Lastly, our approach enables the determination and visualization of bloggers’ opinions aggregated with credible measurements; this result is a significant enhancement over previous POI graphs with local features being the only type of semantic information mined and visualized either for routes [12] or tourist points [13, 15, 16].

5. Conclusion and future prospects

This study proposed the framework of a multi-criteria weighted POI graph mined from travel blogs. This model features a blend of content and narrative analysis techniques of travel blog mining by incorporating sequential, spatial, and opinion information related to POIs and routes.

The requirement of multi-criteria weighting is necessary because of two reasons. First, multi-criteria weighting will result in a substantially knowledge-equipped graph. Second, it will experimentally contribute to the underlying assumptions related to the travel blog text. Recall that frequent mentions of POI represent its popularity or preference exhibited by a FPSM network. The actual computation of the POI opinion polarities and popularity scores strengthens the proposed model. By determining spatial indicators among POIs, the framework empirically supports the second assumption that the POIs mentioned together in a text may be geographically near to each other. Accordingly, the proposed model has provided a new insight into the existing practices of FSPM for travel paths by enriching these routes with spatial guidance and nodes with sentiment information. These contributions have significance for nonprofessionals and industries. A POI graph equipped with popularity, location, and opinion knowledge regarding POIs and routes will illustrate the tourism profile of a region and assist travel decision-makers to easily observe the overall travelling trend of numerous bloggers through the routes the follow. This observation will enable them to plan their personalized and experience-based itineraries. Apart from travelers, the proposed framework holds potential opportunities for destination marketing organizations (DMOs) and travel service providers. The emergence of user-generated content (UGC), particularly in the form of weblogs and reviews, has inevitable impact on people’s travel decision-making [31, 32] that can cause certain businesses to suffer compared with others. Therefore, research applications that target travel blog analysis will enable industries to gain insights into and analyze travelling behavior, realize the impact of electronic word of mouth (eWOM) on destination image and identity creation, and propose experience-based traveling packages.

The proposed model surrounds the implementation of different techniques, namely, relation extraction and opinion mining of travel blog data. The closely related baseline system [20] has extracted short route spatial indicators, whereas the proposed framework targets all categories of spatial relations. In case of opinion extraction from UGC, a huge tendency of mining other forms of crowdsourced unstructured textual data, such as tweets and reviews [33, 34], but a negligence in utilizing travel blogs exists. Furthermore, out of all the tourism entities, including attractions or POIs, accommodation, and transport, a clearly observed trend is that the focus of maximum research is concentrated on the analysis of accommodation and dining-related services and facilities that include hotels and restaurants [35, 36, 37]. Our framework tends to compute opinions using travel blogs specific to a POI, which is beneficial for deciding the next destination to visit along the way. To strike a balance between qualitative and quantitative parameters, we used a fairly straight weighting mechanism at this stage. Advanced research techniques, such as multiple attribute decision-making [38, 39] and multiple criteria decision analysis [40], can be practically and effectively applied to evaluate suggested or other trip planning parameters, such as cost and time. Therefore, motivation and a room for further research exist from a methodological perspective.

The proposed framework serves as a foundation model of the trip planning graph. At the domain level, the framework holds the possibility of integration and extension to existing travel technologies, such as decision support systems for routing and trip planning, recommender systems, mobile tour guides, personalized trip schedulers, optimal itinerary planners, and multimedia systems for tour scheduling.

Footnotes

Acknowledgments

The paper is an extended version of the paper that has been presented at First EAI International Conference on Computer Science and Engineering held on November 11–12, 2016, at Penang, Malaysia. The authors would also like to acknowledge Tourism Malaysia for giving permission to scrap part of content from the official website. This research was supported by USM Research University Grant (1001/PKOMP/811335: Mining Unstructured Web Data for Tour Itineraries Construction), Universiti Sains Malaysia.

References

Clements

Serdyukov

Vries

Reinders

MJT

. Using Flickr geotags to predict user travel behaviour. Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, New York: ACM2010; 851-852.

Choudhury

Feldman

Yahiya

Golbandi

Lempel

. Automatic construction of travel itineraries using social breadcrumbs. Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, New York: ACM2010; 35-44.

Kurashima

Iwata

Irie

Fujimura

. Travel route recommendation using geotagged photos. Knowledge and Information Systems2013; 37(1): 37-60.

Chen

Zhang

Guo

Pan

. TripPlanner: Personalized trip planning leveraging heterogeneous crowdsourced digital footprints. IEEE Transactions on Intelligent Transportation Systems2014; 16(3).

Fujisaka

Lee

Sumiya

. Discovery of user behavior patterns from geo-tagged micro-blogs. Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication, New York: ACM2010.

Yoon

Zheng

Xie

Woo

. Smart itinerary recommendation based on user-generated GPS trajectories. Proceedings of the 7th International Conference on Ubiquitous Intelligence and Computing, Berlin Heidelberg: Springer2010; 19-34.

Hsieh

Lin

. Exploiting large-scale check-in data to recommend time-sensitive routes. Proceedings of the ACM SIGKDD International Workshop on Urban Computing, New York: ACM2012; 55-62.

Liu

Sui

Kang

Gao

. Uncovering patterns of inter-urban trip and spatial interaction from social media check-in data. PLoS ONE2014; 9(1): e86026.

EHC

Chen

Tseng

. Personalized trip recommendation with multiple constraints by mining user check-in behaviors. Proceedings of the 20th International Conference on Advances in Geographic Information Systems, New York: ACM2012; 209-218.

10.

Banyai

Glover

. Evaluating research methods on travel blogs. Journal of Travel Research2012; 51(3): 267-277.

11.

Nanba

Taguma

Ozaki

Kobayashi,

Ishino

Takezawa

. Automatic compilation of travel information from automatically identified travel blogs. Proceedings of the ACL-IJCNLP Conference Short Papers, Stroudsburg: ACL2009; 205-208.

12.

Kori

Hattori

Tezuka

Tanaka

. Automatic generation of multimedia tour guide from local blogs. Proceedings of the 13th International Multimedia Modeling Conference, Berlin Heidelberg: Springer2007; 690-699.

13.

Yuan

Qian

. Make your travel smarter: Summarizing urban tourism information from massive blog data. International Journal of Information Management2016; 36(6): 1306-1319.

14.

Wilson

. (13 October 2014). A simple Kuala Lumpur city guide. [Web log entry]. Retrieved from http://live-less-ordinary.com/quick-kuala-lumpur-city-guide-kl-travel/

15.

Guo

Sun

. Understanding travel destinations from structured tourism blogs. Proceedings of the 14th Wuhan International Conference on e-Business, 2015.

16.

Yuan

Guo

Xiang

. Frequent patterns based word network: What can we obtain from the tourism blogs? Proceedings of 6th International Conference on Knowledge Science, Engineering and Management, Berlin Heidelberg: Springer2013; 15-26.

17.

Yuan

, Xu

, Ma

, Qian

,Where to go and what to play: Towards summarizing popular information from massive tourism blogs,Journal of Information Science,2015;41(6):830–854.

18.

Drymonas

Pfoser

. Geospatial route extraction from texts. Proceedings of the 1st ACM SIGSPATIAL International Workshop on Data Mining for Geoinformatics, New York: ACM2010; 209-37.

19.

Kim

Vasardani

Winter

. Harvesting large corpora for generating place graphs. Proceedings of Cognitive Engineering for Spatial Information Processes Workshop at 12th International Conference on Spatial Information Theory, 2015.

20.

Skoumas

Pfoser

Kyrillidis

Sellis

. Location estimation using crowdsourced spatial relations. ACM Transactions on Spatial Algorithms and Systems2016; 2(2).

21.

Kurashima

Tezuka

Tanaka

. Blog map of experiences: Extracting and geographically mapping visitor experiences from urban blogs. Proceedings of the 13th International Conference on Web Information Systems Engineering, Berlin Heidelberg: Springer2005; 496-503.

22.

Hao

Cai

Wang

Xiao

Yang

Pang

, et al. Equip tourists with knowledge mined from travelogues. Proceedings of the 19th International World Wide Web Conference, New York: ACM2010; 40-410.

23.

Wang

Xie

. Mining geographic knowledge using location aware topic model. Proceedings of the 4th ACM workshop on Geographic Information Retrieval, New York: ACM2007; 65-70.

24.

Pang

Hao

Yuan

Cai

Zhang

. Summarizing tourist destinations by mining user-generated travelogues and photos. Computer Vision and Image Understanding2011; 115(3): 352-363.

25.

Drymonas

Efentakis

Pfoser

. Opinion mapping travelblogs. Proceedings of Terra Cognita Workshop (in conjunction with the 10th International Semantic Web Conference)2011; 23-36.

26.

Nakatoh

Yin

Hirokawa

. Extraction and disambiguation of name of place from tourism blogs. Proceedings of the First ACIS International Symposium on Software and Network Engineering, Washington DC: IEEE2011; 73-78.

27.

Nakatoh

Yin

Hirokawa

. Analysis and visualization of tourism blog. Proceedings of IIAI International Symposium on Applied Information2012; 26-27.

28.

Zhu

Shou

Chen

. Get into the spirit of a location by mining user-generated travelogues. Neurocomputing2016; 204: 61-69.

29.

Hobel

Fogliaroni

. Extracting semantics of places from user generated content. Proceedings of the 19th AGILE International Conference on Geographic Information Science, 2016.

30.

Skoumas

Schmid

Josse

Schubert

Nascimento

Zufle

, et al. Knowledge-enriched route computation. Proceedings of the 14th International Symposium on Advances in Spatial and Temporal Databases, Springer International Publishing2015; 157-176.

31.

Xiang

Wang

O’Leary

Fesenmaier

. Adapting to the Internet: Trends in travelers’ use of the Web for trip planning. Journal of Information Science2014; 54(4): 511-527.

32.

Cox

Burgess

Sellitto

Buultjens

. Consumer-generated web-based tourism marketing. Research Report, Sustainable Tourism Cooperative Research Centre (STCRC), Australia, 2008.

33.

Afzaal

Usman

. A novel framework for aspect-based opinion classification for tourist places. Proceedings of Tenth International Conference on Digital Information Management, IEEE, 2015.

34.

Peregrino

Tomas

Clough

Llopis

. Mapping routes of sentiments. Proceedings of Spanish Conference on Information Retrieval, 2012.

35.

Kasper

Vela

. Sentiment analysis for hotel reviews. Proceedings of Computational Linguistics-Applications Conference2011; 45-52.

36.

Marrese-Taylora

Velásqueza

Bravo-Marquezb

Matsuoc

. Identifying customer preferences about tourism products using an aspect-based opinion mining approach. Proceedings of 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems2013; 182-191.

37.

Garcia-Pablos

Cuadros

Linaza

. Automatic analysis of textual hotel reviews. Information Technology & Tourism2016; 16(1): 45-69.

38.

Jiao

Wang

. Multi-attribute decision making with dynamic weight allocation. Intelligent Decision Technologies2014; 8(3): 225-230.

39.

Tian

Jiao

Wang

Chen

. Flexible dynamic weight decision scheme. Intelligent Decision Technologies2015; 9(2): 167-179.

40.

Mikeli

Apostolou

Despotis

. A new recommendation technique for interval scaled multi-criteria rating systems incorporating intensity of preferences. Intelligent Decision Technologies2015; 9(3): 283-294.

Framework of blog data based multi-criteria weighted points of interest graph for trip planning

Abstract

Keywords

1. Introduction

1.1 Theoretical foundations of travel blog mining

1.2 Problem statement

2. Related work

2.1 Mining popular POIs and routes as a POI graph

2.2 Mining spatial information

2.3 Mining semantic information

3.1 Problem definition

3.2.1 Mining frequency-weighted POI graph

3.2.2 Mining spatial information for edge weighting

3.2.3 Mining opinion information for node weighting

Table 2 Pre-processed blog entries

Table 4 Popular routes with spatial information

Footnotes

Acknowledgments

References

Table 2
Pre-processed blog entries

Table 4
Popular routes with spatial information