Abstract
The increasing power of technology puts new, advanced statistical tools at the disposal of researchers. This is one of the first research articles to use a data mining tool—namely, decision trees—to analyze the behavior of inbound tourists for the purpose of effective future destination marketing in Japan. The research results of approximately 4,000 observations show that the main motivation for visitors’ future return is not driven by experiences had during their most current visit but rather by experiences anticipated in the future, such as visiting hot springs or immersing themselves in beautiful natural settings. The data mining method largely excludes the possibility of the intrusion of researcher subjectivity and is conducive to useful discoveries of certain visitor patterns in large data sets, providing governments and destination marketing organizations with additional tools to better formulate effective destination marketing strategies.
Growing international tourism and travel has increased competition among tourism destinations throughout the world. Destinations are a combination of tourist products and services that create an integral experience for tourists and are consumed under the brand name of the destination (Leiper 1995; Buhalis 2000). Historically, destinations have been considered to be specific geographical locations, such as cities or countries (Hall and Hall 2000). On the other hand, there is a new trend in defining a destination as a concept that can be subjectively interpreted by tourists based on their purpose, culture, past experiences, etc. (Buhalis 2000). It is increasingly recognized that the core of any destination’s successful performance is determined by satisfied tourists who intend to return in the future and who will recommend the destination to their friends and families (Chi and Qu 2008; Assaker, Vinzi, and O’Connor 2011; Valle et al. 2006). Ryan (1991) argued that if the tourism industry is to continue satisfying tourists, it has to adopt societal marketing strategies, carefully monitor tourist satisfaction, and use the information collected to create success. Thus, understanding which attributes of a given destination create tourist satisfaction as well as the types of tourists who are willing to come back in the future enables destination marketing organizations (DMOs) to plan future destination marketing strategies that could be essential for an emerging tourist destination such as Japan.
For the past several decades, Japan has established its image as an industrial country; however, with increasing competition from other countries using its manufacturing-led growth model, it has also displayed an increasing interest in inbound tourism and has pursued several marketing campaigns to increase global awareness of Japan as a tourism destination. According to the Japan National Tourism Organization (JNTO), Japan witnessed a historical record of 19,737,409 inbound visitors in 2015, and in 2016, the Japanese government announced an ambitious plan to increase the number of annual inbound visitors to Japan to 40 million by 2020 and even to 60 million by 2030 (JNTO 2016).
As tourism becomes increasingly important to Japan as a destination, academic research is gradually starting to address questions of motivation, intention to return, positive word of mouth, and satisfaction related to Japan as a tourism destination. However, only a limited amount of research has been carried out regarding international tourism in Japan, and it is mainly qualitative in nature and includes very little empirical research. By contrast, extensive contributions that employ various qualitative and quantitative methodologies applied to research on customer satisfaction and behavioral intentions for specific destinations have enriched the current body of literature (Buhalis 1999; Pearce 2014; Som and Badarneh 2011; Okamura and Fukushige 2010; Uzama 2009; Park and Gretzel 2007; Liu, Siguaw, and Enz 2008; Pizam, Neumann, and Reichel 1978). The existing literature is nevertheless not comprehensive, and technological advances have led to methodological and statistical advancements that provide opportunities to take a broader view on tourism behavior for destination marketing by using data mining techniques (Goh, Law, and Mok 2008).
Data mining techniques are extremely underutilized in tourism research, and empirical research on Japan as a tourism destination using data mining techniques is highly limited. Hence, the purpose of this research is twofold: first, to provide an overview of data mining and its potential in tourism research using decision trees, and second, to fill the empirical research gap in the tourism literature related to Japan as a tourist destination.
This article, which presents findings related to tourist satisfaction and intention to return to Japan drawn from research using advanced statistical methods of data mining, has three specific objectives: (1) to identify the most important experiences of inbound tourists to Japan as a destination; (2) to identify the preferences, likes, and dislikes of the tourists; and (3) to identify how those experiences and preferences affect satisfaction and future intention to return to Japan as a destination choice.
Theoretical Background
Data Mining
Advances in technology and computer power have made it possible to collect immense amounts of data across many different fields. There is an increasing need for tools that will assist in extracting useful information from the growing amounts of data and turn it into knowledge (Fayyad, Piateetsky-Shapiro, and Smyth 1996). Many industries are enhancing their competitiveness by adopting data mining technology for various purposes, such as gauging customer preferences in e-commerce and retail, ascertaining medical history in health care, assessing risk factors in insurance, and gathering financial data in banking, to name just a few. The nature of the tourism and hospitality sector has made it one of the largest users of informational technology (Sheldon 1994; Buhalis 1998). Information about tourists is being accumulated at an increasing pace, and it is becoming progressively more difficult for destinations to stay competitive and to increase their market share. Destination management organizations will find a growing need to use data mining if they wish to stay competitive (Pyo, Uysal, and Chang 2002).
With an acceptably accurate learning model, one can not only understand but also predict expected values in the tourism industry. For example, a tourism agency may choose to use its visitor database to predict future arrivals and patterns of consumption. Given an updated visitor profile, agencies will be more prepared, based on actual visitor behavior, to 1) meet the needs of visitors with better marketing material, 2) establish necessary collaborations with agencies such as transportation and lodging, and 3) share information and work with other destinations within a country to improve the country’s desirability for future touristic visits (Apté and Weiss 1997). While some academic articles (Buhalis 1997; Pyo, Uysal, and Chang 2002; Buhalis and Law 2008) have emphasized the need for data mining in tourism rather than empirical research with classical statistics, empirical research undertaken with data mining tools remains highly limited (Kim, Timothy, and Hwang 2011). The question could be raised, What is the advantage of data mining–based research compared to already established techniques using classical statistics? While both techniques have their advantages and disadvantages, the authors thought it would be prudent to give a brief comparison of the two techniques and provide a rationale for why data mining would be advantageous in this case.
Data mining versus classical statistics
Managing very large data sets requires skills different from those used in classical statistical analysis. Data mining manages such problems by efficient summaries of large amounts of data, identifying patterns and relationships of previous data, and constructing predictors for the future. Classical statisticians have well-established tools for such things. Many statistical models have been utilized for explaining relationships and patterns within given data, and it is therefore tempting to think of data mining as an extended branch of statistics. However, data mining has its own merits, being capable of working with larger-scale data sets than the data sets used in classical statistics. Comparatively, there are differences in the approaches to modeling, where data mining pays less attention to the large-sample asymptotic properties of its inferences and more to the “learning,” including the complexity of the modeling and computation required by large data sets. Of course, classical statistics and data mining are similar in that they draw inferences from data. However, unlike classical statistics, data mining is more tolerant toward discreet-valued variables and seeks to minimize a loss function expressed in terms of predictor error, where minimization is achieved by cross-validation (Hosking, Pednault, and Sudan 1997).
One of the oldest definitions of data mining is “the non-trivial extraction of implicit, previously unknown, and potentially useful information from data” (Frawley, Piatetsky-Shapiro, and Matheus 1992, p. 58). Data mining uses machine learning algorithms to find patterns of relationships between data elements in large, noisy, and messy data sets, thereby facilitating actions that enhance in some form (diagnosis, profit, detection, etc.) knowledge discovery in that data (Nisbet, Elder, and Miner 2009, p. 17). As data mining evolved, a new definition was proposed, as follows: “Knowledge discovery in databases is the non-trivial process of identifying valid, novel, potential, useful, and ultimately understandable patterns in data” (Fayyad et al. 1996, p. 30).
One big difference between classical statistics and data mining is that classical statistics has large subjective components, known as predictive models, the main goal of which is to estimate parameters and/or confirm or reject hypotheses. On the other hand, from a data mining perspective, the correct model is unknown. In fact, the goal of the analysis is to discover the correct model. In classical statistics, models must be specified, whereas in data mining, a series of competing models will be specified and selected based on data examination. This preferential ordering addresses the issue of overfitting. There are many other points of difference between classical statistics and data mining techniques; however, this is not the purpose of this study. In summary, one can say that statistical learning (data mining) is much more manageable when there are no restrictions placed on the model for a given data set—that is, where analyses are data driven and the complexities of the given machine learning algorithms are dependent on the underlying distribution we desire to learn (Hosking, Pednault, and Sudan 1997).
An extensive number of data mining techniques have evolved over the years, including but not limited to decision trees, neural networks, regression analysis, text mining, association rules, and clustering.
Data preparation and reduction are essential steps in data mining. Unlike the data sets used in classical statistics, it is impossible to “eyeball” data mining data sets where variables could be counted in the hundreds and observations in the millions, because, just as in classical statistics, the quality of the prediction and accuracy of a model depend on the quality of the data. Furthermore, variables should be reduced and manipulated into analytical data sets. Finally, once a data set is cleaned and finalized, similar to classical statistics, an appropriate statistical tool is chosen for data analysis, such as neural networks, time series, decision trees, etc.
Destination Marketing
Tourism is one of the world’s major industries that contributes significantly to the global economy and has become one of the major sources of wealth for a number of developing and developed countries. Tourism takes place at destinations; consequently, a destination is taken as the fundamental unit of analysis (WTO 2002). Destinations are also a focal point of destination marketing, an essential tool of tourism destinations in an increasingly globalized and international tourism market (UNWTO 2011). Destinations are a conglomeration of tourist services and experiences (Buhalis 2000). Understanding tourists’ perceptions is essential to a successful tourism destination, because they influence a tourist’s choice of destination (Ahmed 1991), their satisfaction, and their decision whether to return (Weiermair 2000). The increasing competition among tourist destinations over the last several decades has prompted concern among destination marketing managers and industry practitioners about the perceptions of a destination by tourists (Wang and Pizam 2011). The marketability of destinations as well as the offered services, entertainment, lodging, transportation, and shopping leave an impression on visitors in terms of their sense of satisfaction and their decision whether to come back in the future. Thus, the following questions are raised: How can a DMO best communicate with stakeholders and the market? How can a DMO engage with visitors to stimulate repeat visits? Finally, how can a DMO filter the vast amount of information to obtain a set of manageable rules to predict visitor behavior and ensure visitor satisfaction and loyalty? (Pike 2012).
Destination marketers conduct extensive research to identify prospective visitors who have not yet visited (suppressed demand) and potential tourists (active demand) (Athiyaman 1997). DMOs need to know how their destination is perceived by potential visitors to better target their market, develop more appropriate tourism products, and increase destination attractiveness (Phillips and Back 2011). For example, cultural differences, the extent of planning time before a vacation, and the number of people in the group influence tourist expenditure (Laesser and Dolnicar 2012). A review of past literature shows an increasing number of articles that deal with aspects of destination marketing, customer satisfaction, and behavioral intentions in tourism overall and for a specific destination. For instance, Kozak and Rimmington (2000) looked at tourist satisfaction in Mallorca, Spain. Baloglu and McCleary (1999) looked at U.S. international pleasure travelers to four Mediterranean destinations. Yoon and Uysal (2005) studied the motivation and satisfaction of tourists in Northern Cyprus. Campelo, Aitken, and Gnoth (2011) looked at visual rhetoric in the destination marketing of New Zealand. Finally, Dwyer et al. (2014) studied destination marketing and return on investment in Australia.
Inquiring into tourist perception of a destination is generally aimed at looking into customer satisfaction and intention to return. The literature related to measuring destination marketing can be successfully arranged into two groups (Hallowell 1996). The service management literature postulates that customer satisfaction leads to customer loyalty and subsequently to profitability (Hallowell 1996; Reinartz and Kumar 2002). The marketing literature claims that if customers are happy with a product, they will purchase it again and tell their friends and relatives about it (Maxham 2001; Ranaweera and Prabhu 2003; Brown et al. 2005). Similarly, this concept could be applied to the body of tourism literature, which finds a significant correlation between satisfaction and future intention to return (Gallarza and Saura 2006; Hernández Lobato, Solis-Radilla, and Moliner-Tena 2006). A number of articles have examined differences between first-time and repeat visitors (Woodside and Lysonski 1987; Lupton 1997; Okamura and Fukushige 2010; Fuchs and Reichel 2011) and have established that repeat visitors are more likely to choose the same destination. First-timers will reduce their stereotypes and obtain a better and deeper understanding of a destination (Pool 1965). Repeaters will move beyond simple stereotyping and build a more subtle and complex understanding (Fakeye and Crompton 1991; Mishler 1965). This of course happens when sufficient time has been spent at a destination, and the tourist has had sufficient saturation through establishing different contacts and relationships (Mishler 1965).
It is generally accepted that tourist satisfaction is essential for destinations to have repeat visitors and that the intention to return to a destination depends on the level of satisfaction visitors had with its products and services.
Since this study is looking into destination experiences and attributes, we will use the definition of tourist satisfaction proposed by Pizam, Neumann, and Reichel (1978): tourist satisfaction is the result of interaction between a tourist’s experience at the destination and the expectations he or she had about that destination (p. 315).
While it is true that satisfaction and intention to return are highly important for tourism, destinations are an amalgam of tourism products and services that create experiences for the consumer. There is a plethora of important dimensions that could potentially contribute to consumer satisfaction and subsequent return. There is also an increasing trend in the recent stream of research that reveals the different dimensions that influence tourists’ destination perceptions, satisfaction, and loyalty (Table 1).
Analysis of Relevant Past Research.
For example, Yoon and Uysal (2005) examined “push and pull” motivation factors for satisfaction and destination loyalty. “Push” motivation factors include relaxation, family togetherness, safety, and fun. “Pull” motivators include weather, shopping, cleanliness, night life, and local cuisine. “Push” motivators have a significant impact on destination loyalty, and satisfaction with a destination leads to destination loyalty. Lee, Kyle, and Scott (2012) approached destination loyalty from an events perspective. They found that satisfaction with a special event (e.g., a festival) led to destination preference and place attachment (place identity and dependence). Lee, Lee, and Lee (2014) studied change in tourist perceptions of destination image before and after a trip, and the destination image was found to have been significantly impacted by satisfaction. Destination image dimensions such as amenities and hygiene, attractions, and accessibility were viewed differently by tourists after visiting a destination. Bajs (2015) evaluated the effects of quality of touristic services, destination appearance, emotional experience, and monetary and nonmonetary costs on perceived value and subsequently on satisfaction and behavioral intentions. Joppe, Martin, and Wallen (2001) studied Toronto visitors’ perceptions of product and service attributes, such as hospitality, accommodations safety, cuisine, and family orientation, using an importance satisfaction model. They found that food, accommodations, and sightseeing ranked as very important and excellent on their importance–satisfaction grid, while family orientation was ranked as unimportant and unsatisfactory. Alegre and Garau (2011) found that cuisine, budget, cleanliness, climate, scenery, and access were of explicit importance to tourist satisfaction and destination competitiveness. Chi and Qu (2008) applied an empirical integrative approach to understanding destination loyalty and used destination image, overall satisfaction, and tourist attributes. Ramseook-Munhurrun, Seebaluck, and Naidoo (2014) used tourist perception of destination image, perceived value, tourist satisfaction, and loyalty for destination marketing for the small island destination Mauritius. In their study, the authors looked at dimensions such as travel environment, attractions, events, infrastructure, sports activities, and perceived value as antecedents of satisfaction and loyalty. While only satisfaction had an impact on loyalty, perceived value and destination image had an impact on tourist satisfaction. Finally, Özdemir and Şimşek (2015) analyzed perceptions of quality, price, and value on satisfaction and destination image. They found that perceived price and quality have a significant impact on destination image.
Another recent trend has been the increase in destination marketing research that takes into consideration specific destination attributes and their effects on tourist satisfaction and behavioral intentions. Research using advanced statistical tools, however, appears to be limited. The authors of the present study took an extensive look at the underlying dimensions of “destination image” and “attribute satisfaction.” Destination image includes dimensions such as travel environment, natural attractions, entertainment and events, infrastructure, relaxation, outdoor activities, price, and value, while attribute satisfaction includes shopping, lodging accessibility, attraction, dining, and environment. Destination image and attribute satisfaction had a significant impact on overall satisfaction and destination loyalty
Advances in technology have increased researchers’ ability to collect, store, and run calculations on very large data sets (Pyo, Uysal, and Chang 2002). Big data analysis is slowly making progress as a valid research tool in broader social science fields. For example, in the medical field, Qu et al. (2002) used decision trees in their evaluation of a proteomic approach to the simultaneous detection and analysis of multiple proteins for the differentiation of prostate cancer patients from noncancer patients. De Reyck, Degraeve, and Vandenborre (2008) used decision trees for evaluation as an alternative approach to valuing real options based on a certainty-equivalent version of the net present value formula. Goh, Law, and Mok (2008) incorporated rough sets theory into tourism demand analysis and created a tourism climatic index using data mining techniques. They found that climate and leisure time have a stronger impact on tourist arrivals than economic factors. Wicker and Breuer (2013) used decision trees to evaluate organizational problems for the recruitment/retention of members at non-profit sports clubs. Duncan (1980) used decision trees in his evaluation of organizational structure and design. Min, Min, and Emam (2002) used a data mining approach in developing hotel customers’ profiles. Chang et al. (2016) applied data mining techniques (decision trees) for tourist loyalty intentions in the hotel sector. Specifically, the authors were looking at hotels/physical environment and social interaction to gauge customer loyalty. Kim, Timothy, and Hwang (2011) used decision trees analysis to evaluate Japanese tourists’ shopping preferences and intention to revisit Korea.
Finally, this article is making one of the first attempts to use DTs to analyze tourist satisfaction and intention to return by utilizing large data sets on inbound visitors using destination-specific attributes of Japan as a destination.
Japan as a Destination
Japan is still largely undiscovered by mass tourism. Mainly known for its industrial power, Japan as a tourism destination is still overshadowed by its industrial and business image. Even though a limited amount of academic research on Japan as a destination does indeed point to its tremendous potential as a tourism destination, this potential remains generally untapped.
Nevertheless, research on inbound tourism to Japan remains highly limited. According to Uzama (2009), the Japanese marketing campaign “Yokoso! Japan” was mainly unsuccessful in advertising Japan as a desirable tourism destination, and in spite of government interest in promoting Japan, it did not go beyond simple promotion. Okamura and Fukushige’s (2010) research on international tourists to Japan looked into the differences between first-time visitors and repeat visitors to the Kansai area of Japan and found that first-time visitors were interested in sightseeing, while repeat tourists were more involved and interested in participating in events. Such relatively limited existing research emphasizes the need for additional empirical research about Japan as a destination for purposes of its destination marketing.
Pyo, Uysal, and Chang (2002) emphasized a need for data mining analysis and its application to the distribution of knowledge about tourists and destinations as well as market information. The authors stressed that promotional activities could be more effective after the characteristics of the destination have been understood and defined. Destinations count their visitors in the thousands and millions, and DMOs and other government institutions have an extensive amount of data that reflects actual tourists’ behavior, but data mining is generally limited to private organizations and consulting firms in the hospitality and tourism industry. Buhalis (2000) emphasized that tourism research is extensively dynamic and that continuous research is necessary to follow developments. However, despite the possible benefits that data mining research can provide to destination marketing, empirical research using data mining techniques in the tourism industry has been sorely lacking.
Methodology
Data Mining—Decision Trees
Decision trees (DTs) are a form of multiple variable analysis. A decision tree “is a structure that can be used to divide up a large collection of records into successively smaller sets of records by applying a sequence of simple decision rules” (Berry and Linoff 2000, p. 6). Another definition of a DT provided by Nisbet, Elder, and Miner (2009) states that a “DT is a hierarchical group of relationships organized into tree-like structures, starting with one variable (like a trunk of an oak tree) called a root node” (p. 241). The root node is split into multiple branches using a split criterion. Each split is defined in terms of an impurity measure reflecting how uniform resulting cases are. Each split node is referred to as a parent node, and the following splits are called child nodes. Splits continue until the final or terminal node with the minimum number of cases is reached. For example, Figure 1 is a small illustration of decision trees used to indicate patterns of travel behavior based on age, gender, and marital status.

Decision tree sample.
DTs are a very appealing method of analysis for the present study because of their relative power, ease of use, robustness, ability to handle ordinal data (Likert scale), and ease of interpretability. It is a collection of one-cause, one-effect relationships presented in the form of a tree. DTs try to find strong relationships between input and target variables; when a set of values is identified that have a strong relationship to a target, all those values are grouped into the bin that forms the branches of a DT.
Impurity-based criterion
In many cases, a DT split is done according to the value of a single variable. The most common criterion for a split would be an impurity-based split, as used in this study. The impurity-based criterion is briefly represented as follows.
Given random variable x with k discrete values and distribution according to P = (p1, p2, . . . pk), an impurity measure is a function of φ:[0,1] k→R that satisfies the following conditions:
φ(P) ≥ 0
φ(P) is maximum if ∃1 such that component pi=1.
φ(P) is maximum ∀1, 1≤ i ≤ k, pi = 1/k.
φ(P) is symmetric with respect to components of P.
φ(P) is smooth (differentiable everywhere) in its range.
Given the training set S, the probability vector of the target attribute y is defined as
The goodness of split due to the attribute ai is defined as a reduction in the impurity of the target attribute after partitioning S according to the values vi, j ∈ dom (ai), as follows:
(Maimon and Rokach 2010, p. 153).
Information gain
Out of three tests (gini, chi-square, and entropy), the entropy information gain criterion was chosen for the purposes of this study. Information gain is an impurity-based criterion that uses the entropy measure (originating from information theory) as the impurity measure (Quinlan 1987). Entropy information gain is represented as
where
(Maimon and Rokach 2010, p. 153).
Sample and data collection
Data were acquired by the Japan Travel Bureau (JTB) Foundation on behalf of the Japan Tourism Agency in 2010. The JTB Foundation is the largest travel agency in Japan and one of the largest travel agencies in the world that specializes in tourism. The JTB Foundation is a nonprofit research organization affiliated with the JTB. (The JTB was established in 1912 and became a for-profit company in 1963.) Data collection was conducted at international airports and seaports in Japan as a part of a tourist expenditure survey series undertaken for the Japan Tourism Agency. Inbound tourists to Japan were approached at random by representatives of the JTB Foundation with an iPad in their hands. Participation in the survey was voluntary, and no monetary incentives were provided for participation. Questions were dictated by the interviewer to the interviewees and answers recorded on the spot, after which the iPad sent the data immediately to the database.
While data mining could potentially offer substantial benefits to research and development, utilizing a large data set potentially raises legal questions and potential liabilities. In 2006, AOL (650,000 users) and Netflix (100 million ratings) released “anonymized” user data. Potential anonymization failed for both organizations, however, creating legal concerns (Walton 2014). The authors emphasize that AOL used about 30% of their users’ information (total users is approximately 2.1 million [Pagliery 2015]), and Netflix released an extensively high number of reviews. The ratio of sample to population of users for those organizations were considerably high, thus creating privacy concerns. The data collected by the JTB Foundation were anonymous, and the sample represented less than 1% of total foreign inbound tourists to Japan (8.65 million total in 2010 [JTB Tourism Research and Consulting 2016
Data were collected using a Likert and binary scale, and out of a total sample size of 6,000, roughly 4,000 usable observations were obtained. Because of the large sample size, the use of classical statistical tools was not appropriate; therefore, the decision tree data mining technique was used for data analysis. Specifically, because of the binary and ordinal scales used in the survey, decision trees with two-step modeling (with two dependent variables) were used to summarize and interpret the behavioral and purchasing patterns of inbound tourists in Japan.
In this article, we use data mining as an exploratory tool and extract hidden knowledge through a set of rules that connects a collection of inputs. In a sense, DTs represent a series of questions, where an answer to a question determines the follow-up question, thereby creating a pattern. The decision tree is probably one of the most popular and powerful techniques used in data mining (Berry and Linoff 2000). DTs do not have strict assumptions concerning the functional form of the model, but they do have computational efficiency, are robust against outliers, are resistant to the curse of dimensionality, and require less data preparation than other data mining tools.
Measurements
This study employed a casual research design. The survey questionnaire consisted of the following major sections: tourist attributes of satisfaction, overall satisfaction, intention to return, and demographic questions for tourists requesting information such as country of residence, party size, gender, age, and number of children.
Attributes of satisfaction
Destination response encompassed information about the current trip to Japan, the purpose of the visit, expenditures, transportation, accommodation arrangements, shopping, sources of information, activities at destination, satisfaction with Japan as a destination, and intention to return to Japan in the future. The survey consisted of more than 150 questions measured on a five-point Likert-type scale and 0/1 binary responses.
Overall satisfaction
A single overall measure of satisfaction was used in this study for its ease of use and empirical support. Satisfaction was measured on a seven-point Likert scale, with 1 being highly dissatisfied and 7 highly satisfied.
Behavioral intentions
A single measure for intention to return was used in this study for its ease of use and empirical support. Intention to return was measured on a seven-point Likert scale, where 1 indicated definitely not returning and 7 definitely returning.
Results
The top most important variables are listed in Table 2 and Table 3. Out of 150 variables, 18 are represented in the tables. The variables’ importance was selected by decision trees and is measured on a continuous scale (decimals) from 1 to 0, with importance decreasing as it approaches 0. The top 15 variables for satisfaction and intention to return are listed in the tables. Variables are listed in order of importance for satisfaction and future intention to return to Japan as dependent variables:
Variables in Order of Importance for Satisfaction.
Variables in Order of Importance for Intention to Return.
Decision Tree Rules
Important variables provide a snapshot of what is important to tourists when they travel to Japan. Conversely, decision trees provide a deeper understanding by grouping and creating patterns of tourist preferences that provide higher levels of satisfaction and intention to return. Because of the large number of independent variables, it was not possible to insert a complete decision tree into this article. However, excerpts from a decision tree are shown as examples here (Figures 2 and 3).

Excerpt of decision tree for Satisfaction.

Excerpt of decision tree for Intention to return.
Demographics of Data
The majority of tourists came from Asian countries (62%), such as Korea (19.51%), Taiwan (18.10 %), and mainland China (14.16%). The second largest group of visitors was from the United States (10.65%). From mainland China, the two largest groups were from Beijing and Shanghai. Gender was rather evenly distributed between men (56%) and women (43%). The average age was 23 years, with a standard deviation of 13 years. The airports that the majority of the tourists arrived at were Narita-Tokyo (53.88%), Kansai-Osaka (17.63%), and New Chitose-Sapporo in Hokkaido (6.212%). The data revealed that 42% of respondents were visiting Japan for the first time, 15% were visiting for the second time, and 10% for the third time. The general distribution of the groups of travelers was alone (17%), with family (21%), with one or more work colleagues (19%), and with one or more friends (19%). Additionally, 57.9% of the respondents traveled for tourism and leisure, while 25% traveled to participate in business training, conferences, or trade fairs.
Decision Trees
Odds ratios
Odds ratios are used to compare the relative odds of the occurrence of the outcome of interest (e.g., a disease or a disorder) given exposure to the variable of interest (e.g., a health characteristic, an aspect of medical history). An odds ratio is represented by the formula
where
OR = 1 Exposure does not affect odds of outcome
OR > 1 Exposure associated with higher odds of outcome
OR < 1 Exposure associated with lower odds of outcome (Bland and Altman 2000).
Satisfaction
For the purposes of better classification with decision trees, variable satisfaction was recoded into a binary variable, where 1 includes highly satisfied and satisfied and 0 includes everything else. This produced binary values that were rather equally distributed between 1 (50.1%) and 0 (48.9%). For the purposes of this study, the top four most important decision tree combinations (rules) were selected. The overall model’s misclassification rate is 0.14. The misclassification rate calculates the proportion of an observation being allocated to the incorrect group. It is calculated as follows: number of incorrect classifications / total number of classifications. This indicates an accuracy for the model of 86%.
Results—satisfaction
The odds ratio of tourists being satisfied is higher by
2.32 if the tourists are mainly from non-Asian countries, had an experience with Japanese food, paid no higher than $1,500 for airfare, purchased Japanese fruits, and shopped at a supermarket;
2.21 if the tourists are mainly from non-Asian countries, paid no higher than $1,500 for airfare, experienced Japanese food, stayed less than eight days, and stayed at a Western-style hotel.
1.64 if they are from a neighboring Asian country (Korea, China, Taiwan, Hong Kong, or Thailand); stayed at a Japanese-style inn; experienced Japanese food; came for tourism/leisure, incentive travel, study, or international conference; and came through one of the two main airports (Narita/Haneda).
1.51 if the tourists are from a neighboring Asian country (Korea, China, Taiwan, Hong Kong, or Thailand), experienced Japanese food, came for tourism or exhibition/conference/company meeting, and had visited Japan more than once before.
Intention to return
For the purposes of better classification with decision trees, the variable satisfaction was recoded into a binary variable, where 1 includes highly likely and likely to return and 0 includes everything else. That produced binary values were rather equally distributed between 1 (49.1%) and 0 (50.9%). The binary response was equally distributed. The overall model’s misclassification rate is 0.13, indicating an accuracy for the model of 87%.
Results—intention to return
The odds ratio of tourists having an intention to return is higher by
3.9 if the tourists experienced Japanese food, want to experience Japanese sightseeing (e.g., nature, scenery) in the future, paid no higher than $1,670 for airfare, visited Japan for the first time, and came through airports such as Narita, New Chitose (Sapporo), or Fukuoka.
3.9 if the tourists experienced a festival/event, sightseeing (nature/scenery), and Japanese food; paid no higher than $1,670 for airfare, and had visited Japan several times.
1.94 if tourists experienced Japanese food; want to experience sightseeing and/or Japanese hot springs; and came with family, spouse, or friends.
1.49 if tourists want to experience sightseeing in the future, experienced Japanese food, and paid no higher than $1,670 for airfare.
Discussion
Various studies (Pizam, Neumann, and Reichel 1978; Buhalis 2000; Weiermair 2000; Kozak and Rimmington 2000; Yoon and Uysal 2005; Chen and Tsai 2007; Liu, Siguaw, and Enz 2008; Lee, Kyle, and Scott 2012; Özdemir and Şimşek 2015) have acknowledged the significance of destination image attributes and their impact on tourist satisfaction and behavioral intentions. Even studies using large data sets (Yoon and Uysal 2005; Chen and Tsai 2007; Chi and Qu 2008; Alegre and Garau 2011; Lee, Kyle, and Scott 2012; Lee, Lee, and Lee 2014; Ramseook-Munhurrun, Seebaluck, and Naidoo 2015; Özdemir and Şimşek 2015; Bajs 2015) have confirmed and emphasized the importance of Japanese food, shopping, and transportation or information about transportation to tourists’ satisfaction and intention to return to Japan as a destination. Contrary to the popular belief that the Internet is the main source of information (Buhalis and Law 2008; Litvin, Goldsmith, and Pan 2008), Lonely Planet travel books have been a major source of information prior to visits to Japan. However, online information is very important for tourists while in Japan for the future intention to return. Place of arrival (airports), prior visits to Japan, country of residency, and flight cost were among the top variables selected by the DT, perhaps because of convenience and the centralization of attractions and/or businesses around airport areas. For example, Tokyo’s main international airport is Narita (Tokyo is the capital and a major business and attraction center), Osaka’s airport is Kansai (Osaka is a major trade center), and New Chitose is Sapporo’s airport (Sapporo is the northern island capital). Another interesting point is that credit cards as a method of payment rather than cash was found to be important to tourists. It was also notable that satisfaction reflects similar variables with several important differences. Preference for a type of the hotel, quality of accommodations, and two main destinations visited in Japan became important in the model (Joppe, Martin, and Wallen 2001).
Thus, the variables show an interesting variance between satisfaction and intention to return. Most of the important variables in both models (i.e., satisfaction and intention to return) are related to convenience and food, such as Japanese food, shopping, transportation, flight cost, etc. However, the models differ in two ways: for intention to return, the source of information and desire to experience new things are important; however, for satisfaction, accommodations and destinations within Japan are of importance.
The results of the decision trees indicate that there are two distinct groups—namely, Asian and non-Asian tourists—who have different preferences related to a high level of satisfaction. The main theme for non-Asian tourists is experiencing Japanese food, shopping at supermarkets, staying at Western-style hotels, staying less than eight days, and reasonable airfare costs. These findings support the results of previous research (Joppe, Martin, and Wallen 2001; Alegre and Garau 2011; Bajs 2015). For Asian tourists, higher satisfaction can be achieved by those who experience Japanese food; stay at a Japanese-style inn; come mainly for an event such as a conference, for incentive travel, or to study; and by those who have visited Japan more than once.
On the other hand, for future intent to return to Japan, nationality plays no role in the decision but rather whether the visitors are more family-oriented/non-business and whether they are first-time visitors have effects on their intention. The main motivation for a visitor’s future return is not driven by experiences they had during their visit but rather by experiences they want to have when they return, such as Japanese hot springs or immersing themselves in the beauty of nature. Furthermore, experiencing Japanese food appears to remain a main attraction across all segments as a common denominator to attract all different groupings.
The decision tree analyses revealed the existence of intriguing segments irrespective of nationality, gender, total expenditure during the current visit, or even the purpose of the visits, such as a core repeater grouping of those who “experienced Japanese food, want to experience Japanese nature/scenery sightseeing next time, paid no higher than $1,670 for airfare, visited Japan for the first time, and came through airports such as Narita, New Chitose (Sapporo), or Fukuoka”—a grouping whose likelihood of returning to Japan is almost four times (3.9) higher than that of average inbound visitors.
Nationality plays an important role in the satisfaction of tourists and again separates them into two distinct groups of Asian and non-Asian. An interesting point about satisfaction is that tourists from non-Asian countries have higher odds of being satisfied than tourists from Asian countries. This could potentially be explained by the desire of non-Asians to visit a destination with a culture very different from that of their homeland. On the other hand, Tran and Ralston (2006) proposed a model pertaining to tourists’ unconscious needs and their preferences in tourism. For example, they found a connection between achievement motivation and preference for adventure in American tourists. Considering that the largest non-Asian group of tourists in this study is from the United States, the authors speculate that Japan falls into both categories as a very different culture and an adventurous destination (a unique country with a very different culture, different food, and a different language that is a long distance away from home). Also, for Asian countries, airfare is no longer an important variable, which could probably be explained by the closer proximity. For example, the likelihood of satisfaction is a little more than double (2.32) for “tourists from non-Asian countries who had experience with Japanese food, paid no higher than $1,500 for airfare, purchased Japanese fruits, and shopped at a supermarket.” Conversely, the odds increase by a little more than half (1.64) “if they are from a neighboring Asian country (Korea, China, Taiwan, Hong Kong, or Thailand); stayed at a Japanese-style inn; experienced Japanese food; came for tourism/leisure, incentive travel, study, or an international conference; and came through one of the two main airports (Narita/Haneda).”
Managerial Implications
Data mining is a data-driven technique that can analyze data without introducing any major subjectivity of data analysts who may have preferred approaches or agendas associated with their past research streams. Therefore, by its structure, it does not lead to the verification of a specific existing theory unless the researchers are dealing with multiple data sets and see commonalities across them.
Data mining, however, presents unique managerial implications in that the resulting analysis can more effectively identify certain combinations of profiles and characteristics of visitors without relying on the study design or the subjective judgment of the researchers. In other words, data mining results can present more compelling responses to questions of marketing return on investment on expenditures by government marketing and DMOs by identifying the specific grouping of potential visitors who are more likely to come back as repeaters than any other combinations of groupings—based purely on the analysis of the objective big data in question. Odds ratios are rather easy to interpret for non-academic practitioners, and, with an understanding of groupings with higher odds ratios, DMOs may put a strategy in place to aim at groups whose odds ratios are higher than certain discretionary benchmarks. For example, a DMO’s strategy to market to groups with an odds ratio higher than 1.5 means the DMO is aiming at groups who have at least a 50% higher likelihood (of being satisfied with the destination or of coming back to the destination), with an expectation that the same marketing expenditures would work more efficiently than targeting the average at large. Data mining results might not corroborate prior-knowledge, prior-beliefs, or myths, as the structure does not pay attention to those but rather purely seeks winning combinations of groupings by mathematical computations in an objective manner. As governments and DMOs can access big data on tourists, data mining techniques open a new chapter for their quantitative data analysis to aid managerial decisions for better allocation of their limited resources into segments more likely to be enticed back to their destination than the market average.
Study Limitations and Future Research
This study has several structural limitations, and we believe that acknowledging these limitations may lead to more viable future research in the field of quantitative destination marketing. First, our research is based on the data of visitors who came to Japan, representing a small fraction of all travelers—including those who decided not to visit Japan and to whom we cannot extrapolate our findings. Second, our research data were collected at one period in the study year of 2010, after which Japan saw a huge drop in visitors because of the Great East Japan Earthquake and simultaneous radiation leaks from nuclear power plants in Fukushima in 2011. The total number of inbound visitors to Japan reached the national goal of 10 million in 2013, overcoming the negative impacts on inbound visitors in the two years prior. We do not have any evidence to support whether our findings based on 2010 data have temporal stability with later data. This indicates that updated research on the data may generate an interesting answer to the issue of temporal stability of the behavior of inbound visitors.
Finally, we did not have any involvement with the data collection processes nor the design of the survey; thus, we have dealt with secondary data collected by professionals. The lack of direct data collection experience with the data set may prevent us from having certain insights that may be useful in evaluating the data set, or it may have just saved us from any sampling errors associated with data input. Future research may be performed with a survey exclusively on satisfaction and repeat intents, without piggybacking on the visitors’ expenditure survey, the length of which may be a partial reason for relatively high numbers of incompletion of the survey.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
