How to Avoid Random Market Segmentation Solutions

Abstract

Tourism researchers and the tourism industry rely heavily on data-driven market segmentation analysis for both knowledge development and market insight. Most algorithms used in data-driven market segmentation are exploratory; they do not generate one single stable result. Only when data are well-structured (when very clear, distinct market segments exist in the data) are repeated calculations likely to generate the same segmentation solution. When data lack structure, which is frequently the case in empirical consumer data sets, repeated calculations lead to different solutions. Running a market segmentation analysis once only can therefore lead to an entirely random solution that does not represent a strong foundation for developing a long-term market segmentation strategy. The present study (1) explains the problem, (2) assesses how high the risk is of random solutions occurring in tourism market segmentation studies, and (3) recommends an approach that can be used to avoid random solutions.

Keywords

data-driven market segmentation stability reliability cluster analysis bootstrap reproducibility

Introduction

Data-driven market segmentation (Dolnicar 2004) uses empirical data—frequently survey data—to extract groups of similar consumers. Market segmentation has been introduced to marketing in 1956 by Wendell Smith and has since been widely adopted. The aim of market segmentation as understood by marketers is to identify segments that have the characteristic that members of each group are very similar to one another and members of different groups are as different as possible. This allows developing a marketing mix that is particularly attractive to selected segments, which leads to more efficient marketing spending in the short term and competitive advantage in those segments in the long term (Myers and Tauber 1977; Wedel and Kamakura 1998; Lilien and Rangaswamy 2004; Dolnicar 2008).

Data-driven market segmentation is heavily used by tourism researchers to develop knowledge (Zins 2008; Mazanec et al. 2010) and by the tourism industry to gain market insight on which marketing planning is based. Ensuring the validity of market segmentation study results is therefore of critical importance.

A range of methodological decisions in segmentation studies can undermine the validity of results. For example, the underlying data can be of low quality (Dolnicar 2002), the sample can be too small (Dolnicar 2002; Dolnicar et al. 2014; Dolnicar, Grün, and Leisch 2016), and the number of variables can be too high (Dolnicar 2002; Dolnicar et al. 2014; Dolnicar, Grün, and Leisch 2016). Arguably the most fundamental of all possible problems, however, is to end up with an entirely random grouping of consumers.

Ending up with a random solution is not something one would naturally expect to happen, but random solutions often occur because data-driven segmentation analyses do not guarantee stable results. If the analysis lacks test–retest reliability, and if the result cannot be replicated by running the exact same calculation on the exact same data again, the results cannot be trusted. Unstable solutions are caused by the exploratory nature of the calculations used in data-driven market segmentation analysis. For example, in k-means clustering (one of the most popular algorithms in tourism; Dolnicar 2002), the first step is to randomly select a number of data points, which serve as starting points for the calculation. Different starting points lead to different segmentation solutions. Therefore, if data are not well structured, each repeated calculation with different starting points will generate a different solution (algorithm randomness). To complicate matters, the sample of respondents in the data set may not be a perfect representation of the population, so slight variations in the sample are a second source of randomness (sample randomness).

It is for this reason that Dolnicar and Leisch (2010) use stability to try to establish whether any given data set enables the extraction of naturally existing market segments or whether, instead, the data analyst is forced to create a number of alternative segmentation solutions and let the user decide the most managerially useful one.

The key to conducting high quality data-driven market segmentation studies, therefore, is to assess in advance the risk of a random solution and, if possible, avoid such random solutions. The present article (1) explains the problem of sample and algorithm randomness in detail, (2) offers assessments of the risk of random solutions occurring based on past published segmentation studies as well as unpublished tourism survey data sets, and (3) recommends an approach that can be used to avoid random solutions.

This article also serves to raise awareness of the issues pointed out in Dolnicar and Leisch (2010), which was well received in the marketing literature but has so far gone unnoticed by tourism researchers. This article proves that sample and algorithm randomness have major implications on segmentation solutions based on tourism data sets, but are currently being ignored.

The article is structured as follows: first, in an interdisciplinary literature review, developments in clustering and market segmentation methodology are discussed. Methodological aspects of clustering and market segmentation are researched primarily in the fields of statistics and marketing, which is the reason that the literature review relies heavily on work from those fields. Next the methodology is explained in detail. As opposed to empirical tourism studies, a very different approach has to be taken to answer the second research question, that of assessing the risk of random solutions occurring specifically when empirical tourism data sets are segmented. The approach takes two perspectives: a retrospective perspective where past applied segmentation studies in tourism are analyzed to see whether algorithm and sample randomness are accounted for or not. And a prospective approach asking how high the risk is—if a data analyst would take any random empirical tourism data set—to arrive at a random segmentation solution based on the typical nature of tourism data sets. These two perspectives require different research designs. The retrospective question is answered by analyzing prior empirical tourism segmentations studies. This is not another literature review; rather each empirical study here is one data point in the analysis. For the prospective analysis, the structure of a large number of empirical tourism data sets is explored. Results are presented in two separate sections. Finally, conclusions are drawn and practical recommendations offered how tourism researchers can minimize the risk of arriving at random solutions.

Advances in Market Segmentation Research

Despite the long history of market segmentation, academic research on data-driven market segmentation has focused primarily on fine-tuning algorithms, resulting in a wide range of highly sophisticated methods, such as latent class analysis (Gibson 1959; Lazarsfeld, Henry, and Anderson 1968), finite mixture modeling (Banfield and Raftery 1993), and methods that simultaneously group and select the most influential variables (Dolnicar et al. 2012) or correct for response styles while grouping individuals (Grün and Dolnicar 2015).

Review articles on market segmentation typically discuss the wide range of variables that can be used as the basis for segmentation (Beane and Ennis 1987; Dolnicar 2002, 2003; Tuma, Decker, and Scholz 2011) and the advantages and disadvantages of available algorithms (Beane and Ennis 1987; Tuma, Decker, and Scholz 2011). Some also cover issues of reliability or stability (Dolnicar 2002, 2003; Tuma, Decker, and Scholz 2011), which stand at the center of the present study, but do not provide any practical recommendations how stability can be systematically assessed. The reason is that methods for the systematical assessment of stability of clustering solutions were developed later (Tibshirani and Walther 2005; Hennig 2007; Dolnicar and Leisch 2010).

Interestingly, books on market segmentation as well as review articles typically do not discuss some of the fundamental conceptual questions relating to data-driven market segmentation, such as whether market segments of consumers actually exist naturally or whether they are created (as a notable exception, see Putler and Krider 2012). This discussion was ignited by the article by Dolnicar and Leisch (2010) in which the authors postulate that empirical data can permit one of three cases of market segmentation: natural, reproducible, or constructive market segmentation.

In the case of natural market segmentation, true market segments exist. Natural market segmentation is in line with the traditional view of market segmentation that the “initial premise in segmenting a market is that segments actually do exist” (Beane and Ennis 1987, 20). If natural market segments exist, segment members are very similar to one another and very different from members of other segments. As a consequence, natural segments are easy to find. If a data-driven market segmentation analysis is repeated multiple times—using the same or even a different grouping algorithm—the same segments emerge; stability across repetitions is high.

The case of constructive market segmentation represents the exact opposite: no natural market segments exist. Market segmentation is still valuable because sections of consumers will differ from other sections of consumers. But the borders between these sections are not clear; they could be drawn anywhere. As a consequence, repeated calculations of data-driven market segmentation solutions will lead to different results. Stability across replications will be extremely low.

The third possibility is that of reproducible market segmentation. This concept is characterized by a lack of true market segments in the data, as well as by the existence of some other structure in the data that allows the repeated identification of some segmentation solutions (Dolnicar and Leisch 2010).

In their article, Dolnicar and Leisch (2010) also propose a method that allows data analysts to investigate which of the three concepts their data fall under. The idea is to repeat—for each number of clusters—the segmentation analysis many times, each time using a different bootstrap sample of the data. Then, the stability across repetitions is calculated. Stable solutions are good; they give the user of the segmentation solution confidence that their marketing mix will not be based on a random solution (Dolnicar and Leisch 2010).

Unstable solutions, on the other hand, indicate a lack of naturally occurring market segments in the data. Here, each resulting segmentation solution—though not incorrect—represents one of many equally valid alternatives. In such circumstances, market segments are artificially constructed; the data analyst and the user need to select one of many alternative solutions using other statistical criteria (Dolnicar and Leisch 2010).

The detailed algorithm works as follows:

First, $B$ pairs of bootstrap samples are drawn from the original data set. Bootstrap samples are randomly picked subsamples of the data set. Using a number of bootstrap samples accounts for the fact that the sample may not be a perfect representation of the population.

Next, the clustering algorithm is run on each bootstrap sample, resulting in $2 B$ clustering solutions. For each pair of solutions, the segment memberships are predicted on the original data set and the agreement between the clusters derived from one pair of bootstrap samples is computed using the Adjusted Rand Index (Hubert and Arabie 1985). Repeating that for each pair yields $B$ Rand indices, which are used as indicators of stability.

Then, boxplots and density plots can be utilized to visualize stability. When stability (the Rand index value) is close to 1 the solution is stable, pointing to the existence of natural market segments in the data. Stability values close to 0 indicate unstable solutions and point to market segments having to be artificially constructed (Dolnicar and Leisch 2010).

These new concepts and methods make it possible to avoid negative effects of randomness in both the sample and the algorithm when market segmentation analyses are conducted. The question investigated next is whether tourism researchers display awareness of the problem of randomness and whether they make use of tools available to manage randomness.

Methodology

The risk of random segmentation solutions resulting from data-driven market segmentation studies in tourism is assessed separately for (1) data-driven market segmentation studies published in tourism in the past six years and (2) a number of unpublished tourism survey data sets.

Analysis of Published Data-Driven Market Segmentation Studies

A Scopus search reveals that segmentation studies in tourism appear most frequently in the following three outlets: the Journal of Travel Research, Journal of Travel & Tourism Marketing, and Tourism Management. As a consequence, articles on data-driven segmentation published in these three journals were included. In addition, the International Journal of Hospitality Management and the International Journal of Contemporary Hospitality Management were included on request of the reviewers.

Articles were identified using a key word search on the journal websites. The key words used were “segmentation” and “market segmentation.” Between 2010 and May 2016, a total of 78 segmentation studies appeared in those journals. Of those, 53 (68%) were data-driven segmentation studies. These were used.

The time frame of 2010–2016 was chosen because methods to assess the stability of clustering solutions have developed in the statistical literature between 2005 and 2010 (Tibshirani and Walther 2005; Hennig 2007; Dolnicar and Leisch 2010). Typically, it takes a few years for new methodological advances to be adopted and for software to become available that facilitates the uptake of new methods. The implementation of the bootstrap procedure proposed by Dolnicar and Leisch (2010), for example, is now also available in a graphical user interface (Putler 2014), making it more accessible. Therefore, 2010 was chosen as the starting point for the search of articles using data-driven market segmentation.

Note that the aim is not to review studies in terms of their knowledge contribution. Rather, the aim is to identify key methodological features that allow an estimation of the error potential as a result of running a single calculation of a segmentation algorithm.

Analysis of Unpublished Tourism Survey Data

Typically, when market segmentation analysis is conducted, an empirical data set is used as the basis, usually a survey data set. It is not known in advance if a data set contains market segments or not. The fact that market segmentation analysis leads to a grouping of respondents does not prove that these groupings are real. Every market segmentation algorithm groups respondents, either into their natural segments or, if they do not exist, into artificial groupings. If distinct, real, natural market segments exist in the data and one single market segmentation analysis is calculated, it is very likely that the algorithm will identify the correct market segments. If, however, the data are unstructured and there are no real market segments in the data, the result of one single calculation will necessarily be random. It may still be managerially useful, but it is random.

To be able to derive an estimate of the risk of random solutions in future segmentation studies, it is necessary to understand how structured typical tourism survey data sets are. This is the approach used here.

Specifically, the procedure proposed by Dolnicar and Leisch (2010) was applied to 32 tourism survey data sets. These data sets have not been collected for this study; rather they represent data sets accessible to the authors. Critically, they are very different in nature. High variability minimizes the bias of the estimate derived about the typical extent of data structure in empirical tourism data.

The data sets vary in the following aspects: they contain between 4 and 45 survey questions and between 1,000 and 4,800 respondents. They vary in scale type, including binary (11 cases), metric (3 cases), and ordinal data (18 cases). The content of questions covers destination image, motivations, vacation activities, Internet use, and information sources used during travel planning.

The k-means clustering algorithm (Hartigan 1975) was applied because (1) it is widely used in tourism research according to Dolnicar (2002) and the literature analysis undertaken for the present study (Table 1) and (2) the bootstrap procedure (Dolnicar and Leisch 2010) for k-means is readily available and easy to use given its graphical user interface (Leisch 2006; Putler 2014). One hundred pairs of bootstrap samples were used. Starting points were drawn randomly 20 times and the best starting points were used. All computations were done using the statistical computing software R (R Core Team 2015) with the package flexclust (Leisch 2006). Two criteria were used to classify each one of the 32 empirical data sets into natural clustering, reproducible clustering, or constructive clustering:

Stability boxplots. An example of a boxplot is shown in the left panel of Figure 1. It plots, on the x axis, the number of segments and, on the y axis, stability. Each column contains 100 stability values. Half of them are visualized by the box itself, the other half in the so-called whiskers and the individual circles representing extreme value. Stability values being spread across a large portion of the column—as is the case for five clusters in Figure 1—point to lack of stability and thus the need to construct clusters. High stability values located close together—as is the case for four or six clusters in Figure 1—point to natural clustering. In such a case, random solutions are unlikely.

Separation rootograms. An example of a rootogram is provided in the right panel of Figure 1. It visualizes how far apart market segments are located. Segments are far apart (well separated) if most values in the rootogram are 0 or 1. If there are many values in between, separation is not good.

In technical terms, the rootogram is defined via the Euclidean distance between each centroid and all data points divided by the sum of all clusterwise distances and all data points. Let $d_{i h}$ be the Euclidean distance between observation $i$ and the centroid of segment $h$ . With the number of clusters $k$ ,

s_{i h} = \frac{\exp (- d_{i h}^{2})}{{\sum^{​}}_{l = 1}^{k} \exp (- d_{i l}^{2})}

transforms the distances into similarity measures (instead of dissimilarities) between observations and centroids. This transformation was chosen akin to finite mixture models where posterior class probabilities constitute the similarity measures (Fraley and Raftery 1998; Leisch 2004). In fact, this similarity measure is equal to the posterior probabilities of the corresponding finite mixture model (a mixture of independent normal distributions with standard variance). The square roots of these similarities are computed and drawn as histograms for each segment, resulting in the rootogram.

Table 1.

Published Data-Driven Segmentation Studies in Tourism Analyzed in This Study.

Author(s)	Journal	Algorithm	Multiple Starting Values	Deterministic	Starting Points from Hierarchical Clustering	Bootstrap
Agapito, Valle, and Mendes (2014)	TM	Ward’s, k-means	No	No	Yes	No
Alexander, Kim, and Kim (2015)	TM	PCA, k-means	No	No	No	No
Chen, Lin, and Kuo (2013)	IJHM	Factor analysis, hierarchical, k-means	No	No	Yes	No
Chen, Liu, and Chang (2013)	IJHM	Hierarchical, k-means	No	No	Yes	No
Choe, Lee, and Kim (2014)	JTTM	Hierarchical, k-means	No	No	Yes	No
Choi (2011)	TM	Hierarchical, nonhierarchical	No	No	Yes	No
De Cantis et al. (2016)	TM	MONA	n/a	Yes	n/a	No
Denizci Guillet, Guo, and Law (2015)	JTTM	Ward’s	n/a	Yes	n/a	No
Denizci Guillet and Kucukusta (2016)	IJCHM	Ward’s	n/a	Yes	n/a	No
Dey and Sarma (2010)	TM	PCA, k-means	No	No	No	No
Díaz and Koutra (2013)	IJHM	Ward’s, LCA	No	No	Yes	No
Dolnicar et al. (2012)	JTR	Biclustering, k-means	n/a	Yes	n/a	Yes
Hadjikakou et al. (2014)	JTR	TwoStep	n/a	Yes	n/a	No
Iversen, Hem, and Mehmetoglu (2016)	JTTM	Ward’s	n/a	Yes	n/a	No
Khoo-Lattimore and Prayag (2015)	IJHM	Ward’s	n/a	Yes	n/a	No
Kim and Weiler (2013)	TM	k-means	No	No	No	No
Koh, Yoo, and Boger (2010)	IJCHM	Factor analysis, hierarchical clustering, k-means	No	No	Yes	No
Konu, Laukkanen, and Komppula (2011)	TM	PCA, k-means	No	No	No	No
Kruger, Myburgh, and Saayman (2016)	JTTM	Ward’s	n/a	Yes	n/a	No
Kruger, Viljoen, and Saayman (2016)	JTTM	Ward’s	n/a	Yes	n/a	No
Landauer, Haider, and Pröbstl-Haider (2014)	JTR	PCA, k-means	No	No	No	No
Le and Pearce (2011)	JTTM	Factor analysis, hierarchical clustering, k-means	No	Yes	No	No
Lee and Kyle (2014)	JTR	Ward’s, k-means	No	Yes	No	No
Legohérel, Hsu, and Daucé (2015)	TM	CHAID	n/a	Yes	n/a	No
Lima, Eusébio, and Kastenholz (2012)	JTTM	Ward’s	n/a	Yes	n/a	No
Li et al. (2011)	JTTM	Factor analysis, hierarchical clustering, k-means	No	No	Yes	No
Lo, Law, and Cheung (2011)	JTTM	Hierarchical clustering, k-means	No	No	Yes	No
Lyu and Lee (2013)	JTTM	Ward’s, k-means	No	No	Yes	No
Maggioni, Marcoz, and Mauri (2014)	JTTM	PCA, k-means	No	No	No	No
Masiero and Nicolau (2012)	JTR	Ward’s	n/a	Yes	n/a	No
Nicolau (2012)	JTR	Ward’s	n/a	Yes	n/a	No
Oh and Schuett (2010)	JTTM	k-means	No	No	No	No
Pesonen (2014)	JTTM	k-means	No	No	No	No
Pesonen (2015)	JTTM	Ward’s	n/a	Yes	n/a	No
Polo-Peña, Frías-Jamilena, and Rodríguez-Molina (2012)	IJHM	CHAID	n/a	Yes	n/a	No
Prayag et al. (2015)	JTR	Bagged clustering	Yes	No	No	Yes
Prayag and Hosany (2014)	TM	Ward’s, k-means	No	No	Yes	No
Priporas et al. (2015)	JTR	PCA, hierarchical clustering, k-means	No	No	Yes	No
Rasmi et al. (2014)	TM	TwoStep	n/a	Yes	n/a	No
Rid, Ezeuduji, and Pröbstl-Haider (2014)	TM	PCA, hierarchical clustering, k-means	No	No	Yes	No
Ring, Tkaczynski, and Dolnicar (2016)	JTR	Neural gas	Yes	No	No	Yes
Ritchie, Chien, and Sharifpour (2016)	JTTM	TwoStep	n/a	Yes	n/a	No
Ritchie, Tkaczynski, and Faulks (2010)	JTTM	Hierarchical clustering, k-means	No	No	Yes	No
Seabra et al. (2013)	TM	Neural gas	Yes	No	No	Yes
Schofield and Fallon (2012)	TM	Factor analysis, k-means	No	No	No	No
Thapa et al. (2013)	TM	Ward’s, k-means	No	No	Yes	No
Tiago et al. (2016)	TM	TwoStep	n/a	Yes	n/a	No
Tkaczynski and Rundle-Thiele (2013)	JTTM	TwoStep	n/a	Yes	n/a	No
Tkaczynski, Rundle-Thiele, and Beaumont (2010)	JTR	TwoStep	n/a	Yes	n/a	No
Tkaczynski, Rundle-Thiele, and Prebensen (2015)	JTR	TwoStep	n/a	Yes	n/a	No
do Valle et al. (2012)	TM	CHAID	n/a	Yes	n/a	Yes
Weaver (2012)	TM	Ward’s	n/a	Yes	n/a	No
Xia et al. (2010)	TM	Finite mixture	Yes	No	n/a	No

Note: JTR = Journal of Travel Research; JTTM = Journal of Travel & Tourism Marketing; PCA = principal components analysis; IJHM = International Journal of Hospitality Management; LCA = latent class analysis; MONA = Multilevel ONtology Analysis; TM = Tourism Management; CHAID = chi-square automatic interaction detector; n/a = not applicable.

Figure 1.

Prototypical boxplot and rootogram (natural segmentation).

The three data-driven market segmentation concepts were operationalized as follows:

Natural segmentation

For natural segmentation, very high stability up to the number of naturally existing segments $k$ is expected. After that, stability should drop sharply because at least one of the naturally occurring segments needs to be split up. This splitting of a natural segment reduces stability because the two new, artificially created segments show little cluster structure. Figure 1 (left) shows a sample boxplot from one of the empirical data sets that appears to contain natural segments. As can be seen, stability is very high and has little variance as indicated by the interquartile range.

Naturally occurring segments are also well separated, which is evident for this data set from the rootogram in the right panel of Figure 1. It plots each of the four segments contained in the four-segment solution. As can be seen, most data points have values of 0 or 1; only few fall in between. This means that most respondents are clearly assigned (or not assigned) to each segment and there is little ambiguity; segments are well separated.

Figure 1 is based on 4,794 tourists’ typical vacation activities. For each of 30 activities, respondents indicated whether they engaged in them “never,” “sometimes,” or “a lot.” The box plots and rootograms (Figure 1) point to the existence of six naturally occurring market segments. Two of them (containing 14% and 9% of respondents, respectively) may represent response styles. The remaining four segments are suitable for interpretation. Segment 2 (24%) engages in many swimming-related activities, such as going to swimming pools, saunas, and swimming in general. Members of segment 3 are sight-seers (22%). Segment 4 (13%) is the sports segment favoring activities like tennis, cycling, sailing, and winter sports, but also attending bars and discotheques and posting electronic evidence of their activities on social media. Segment 5 (18%) contains cultural tourists attending festivals, events, theatres, and the opera and also going out for dinner. They also like taking walks, going swimming, and going on boat trips.

We operationalize natural segmentation as follows: (1) high stability and low variance in the boxplot for at least a limited range of cluster numbers, (2) a steep drop from the high stability range to solutions outside of this range, and (3) good segment separation in rootograms.

Reproducible segmentation

This segmentation concept is characterized by segments being located close to each other, but still containing enough structure to derive stable results. As the number of clusters increases stability declines because it becomes more difficult to find exactly the same partitions with more clusters (if there is a small “classification” error for each cluster, these errors add up, resulting in low stability for large numbers of clusters). Figure 2 shows a prototypical boxplot and rootogram for such an empirical data set.

Figure 2.

Prototypical boxplot and rootogram (reproducible segmentation).

These data contain 4,794 tourists’ stated willingness to take 12 different risks. Examples of risks are acting in their boss’s job in order to demonstrate their competence despite the risk of making mistakes, setting only small easily achievable goals, and having sympathy for adventurous decisions. Based on the boxplot shown in Figure 2, the four-segment solution emerges as the best option displaying high stability and low variance. The first two segments comprising 22% and 25% of respondents, respectively, are cautious segments. They tend to disagree with statements such as “I would like to act in my boss’s job some time so as to demonstrate my competence, despite the risk of making mistakes” and “Success makes me take higher risks.” They differ, however, in the item “Nothing ventured, nothing gained” and in taking risks when they find their chances of success are limited. Respondents in the third segment (21%) show a higher affinity toward risky behaviors. They like “putting something at stake, [rather] than be on the safe side.” Further, they are not cautious when making and acting on plans, do not tend to imagine unfavorable outcomes of actions, and take higher risks when they have been successful. The last segment (33%) appears to be a response style because of high agreement across all items, although about half of the items are reverse coded.

We operationalize reproducible segmentation as follows: (1) gradual decline in stability in the boxplot as the number of clusters $k$ increases and (2) weak separation of segments in the rootograms.

Constructive clustering

If data contains no grouping structure, such a structure can be imposed. Consequently, one would expect the stability to be low. In empirical data sets, this can manifest as near-constant stability independent of the number of clusters. Figure 3 shows prototypical plots. These are based on an empirical data set about information sources used by 1,231 respondents while planning their last vacation, such as “tour operator or travel agents,” “social media,” “guidebooks,” “national tourism offices,” or “online travel community companies (e.g., TripAdvisor).”

Figure 3.

Prototypical boxplot and rootogram (constructive segmentation).

The five-segment solution contains a segment that mostly obtains information from “friends and relatives” (22% of respondents) and a segment (30%) that does not. Two segments (19% and 17% each) show average usage of information sources with one exception each. One made little use of “online travel community companies” while the other made little use of “official local, regional or national tourism offices.” The fifth segment (22%) did not use any information sources, which can be, at least in part, explained by answer patterns.

To illustrate how starting values can affect segmentation solutions, the same algorithm was run twice on this data set using different starting points. This resulted in a new and different grouping of individuals. Two of five segments can still be found in this solution: the segment that claims to use no information sources (12%) and the segment that reports not using online travel company websites (17%). The remaining segments do not emerge from this solution. They are replaced by one segment that does not use local tourism offices (19%) and does not get information from friends and relatives (30%) and another segment that relies heavily on information from friends and relatives (22%).

While there might be managerial insights in these segmentation solutions, there is no clear statistical evidence that any one particular segmentation solution should be chosen over an alternative solution. In such cases, selecting the optimal segmentation solution requires heavy involvement of the user. Also, it is important to note that in case of constructive segmentation, the segmentation analysis cannot recommend the best solution.

We operationalize constructive segmentation as follows: (1) near-constant stability for all numbers of clusters in the boxplot and (2) weak segment separation in the rootogram.

Results

How High Is the Risk of Random Solutions in Data-Driven Segmentation Studies Conducted in the Past?

The 53 data-driven market segmentation studies identified in the primary publication outlets of academic segmentation research between 2010 and May 2016 were analyzed in view of two criteria: (1) methods and algorithms used and (2) measures taken to control for algorithm and sample randomness (Dolnicar and Leisch 2010). Ways to control for algorithm randomness are using multiple (random) starting points, initializing the algorithm with the results of hierarchical clustering, or side-stepping this problem entirely by only using deterministic algorithms. Sample randomness can be controlled using the bootstrap stability and, to some degree, using cross-validation or bagged clustering.

Results are provided in Table 1. Note that only data-driven segmentation studies were included. This excludes commonsense segmentation studies (such as segments created by splitting up tourists by one single variable like gender), literature reviews and simulation studies.

Table 1 lists—for each data-driven segmentation study—which journal it appeared in and which algorithm was used. When “Ward’s” occurs together with “k-means,” the authors first ran Ward’s clustering analysis and initialized k-means with its result. For “hierarchical clustering” and “k-means” the authors did the same however with an unspecified hierarchical clustering algorithm. When the algorithm contains “factor analysis” or “PCA” (principal components analysis) the authors first reduced the dimensionality of the data sets using either factor analysis or PCA and used the result as the basis for the actual segmentation. The remaining columns list whether the authors report using multiple different starting values, whether the results from hierarchical clustering were used as starting values (where necessary), whether the algorithm is deterministic (usually for hierarchical clustering and TwoStep), and if any kind of bootstrap was performed (including bagged clustering and cross-validation). The columns pertaining to starting values might be “n/a” for “not applicable” in case of deterministic algorithms.

Table 2 summarizes the insights from this analysis by showing how often algorithm and sample randomness were controlled for. The left column shows how often algorithm randomness is accounted for. In eight cases (15% of studies), randomness is not controlled for; 23 articles use entirely deterministic algorithms; 18 control algorithm randomness by initializing the algorithm with the result of a hierarchical algorithm, and 3 articles use multiple random starting values. The right column shows how often sample randomness was controlled for. Only four studies take this into account by either using the bootstrap, bagged clustering, or cross-validation; 49 studies (92%) do not control for sample randomness.

Table 2.

Number of Reviewed Articles That Control for Either Source of Randomness.

Algorithmic Randomness		Sample Randomness
Multiple starting points	4 (8%)	Bootstrap	2 (4%)
Starting points from hierarchical clustering	18 (34%)	Cross-validation	1 (2%)
Deterministic	23 (43%)	Bagged clustering	1 (2%)
Uncontrolled	8 (15%)	Uncontrolled	49 (92%)

Approximately half (30) of the 53 studies use clustering algorithms, the results of which depend on starting points (i.e. k-means, neural-gas, bagged clustering). These algorithms are often used in combination with other statistical methods (such as hierarchical clustering or factor analysis).

Of those 30 segmentation studies, the number of starting points for the calculations is only specified in three studies, in which 5, 10, and 500 random restarts are implemented, respectively. For two other segmentation bases, the number of repeated calculations can be inferred from the software used: one study uses bagged clustering (as implemented in the e1071 R package), where the default value for starting values is 10; the other uses the neural-gas algorithm (using the flexclust R package), where the default value for repeated calculations is 3. In 18 studies, solutions from hierarchical algorithms are used to initialize the cluster algorithms. Most articles used SPSS, which only uses a single starting value for a segmentation basis given the number of clusters $k$ . It has to be assumed, therefore, that only one starting point was used by default.

The remaining 23 articles use deterministic algorithms (e.g., Ward’s or SPSS TwoStep clustering) where multiple initializations are not necessary. These studies would have benefited from bootstrapping, however.

Some studies also suffer from a bad sample size to number of variables ratio (e.g. 181 observations and 40 variables; 211 observations and 28 variables; see Dolnicar et al. (2014) and Dolnicar, Grün, and Leisch (2016) for an illustration of the problem of low sample sizes as well as a recommendation for suitable ratios). These studies also use a single starting value with k-means only. As a consequence, they are particularly prone to generating unstable results, making it even more critical to investigate the stability of solutions.

Among the 53 segmentation studies, 2 use the bootstrap procedure proposed by Dolnicar and Leisch (2010). One article uses the chi-square automatic interaction detector (CHAID) algorithm with 10-fold cross-validation, which is similar in spirit to the bootstrap. Another article uses bagged clustering, which implicitly employs a bootstrap procedure as well in order to increase the stability of the result.

It can be concluded from this analysis of segmentation studies published in the past that the risk of these studies presenting a random solution is not negligible. A common strategy to combat algorithm randomness in tourism segmentation studies is to use the result of Ward’s clustering as starting points for k-means analysis (Milligan and Sokol 1980). Steinley and Brusco (2007) show that this procedure works well when the true number of clusters is known. Yet, they argue, results can further be improved by using a large number of random starting points. It can be concluded that there is no single best initialization strategy for k-means that works best in every real-world use-case. It is therefore advisable to compare different starting points for k-means segmentation or take the advice of Steinley and Brusco (2007) and use multiple random initializations.

The fact that some algorithms depend on starting points should not be taken as evidence that deterministic hierarchical clustering on their own should be preferred in general; these methods are typically computationally very intensive with larger sample sizes. Also, they tend to find suboptimal solutions with respect to the within-cluster variance as they are unable to correct for misclassified observations early in the estimation process. Hence, it is quite popular to refine this clustering by running k-means afterwards (Milligan and Sokol 1980; Steinley and Brusco 2007).

The other source of errors is sample randomness. Suppose the same study were conducted repeatedly on two different sets of individuals from the same population. The segmentation results might be very similar in both cases (pointing to a stable solution) or they might be very different (pointing to an unstable solution). Most segmentation algorithms—and all of the algorithms in the reviewed studies—do not account for this variation. Bootstrap or cross-validation procedures are required to obtain estimates of the degree to which segmentation solutions would vary across repeated studies. As Table 2 shows, only 4 of the 53 reviewed studies accounted for sample randomness. The majority of studies (49) did not control for this source of error at all.

Overall, it can be concluded from the analysis of published studies that approximately 92% are at risk of having presented a random solution. These findings – while studying sample and algorithm randomness separately – reflect the findings from prior review studies which conclude that general stability of segmentation solutions has not been investigated at all by 67% (Dolnicar 2003) and 85% (Tuma, Decker, and Scholz 2011) of studies, respectively.

How High Is the Risk of Random Solutions in Future Data-Driven Segmentation Studies?

Figure 4 shows how many of the 32 inspected data sets allow natural segmentation, reproducible segmentation, and how many require constructive segmentation. As can be seen, the vast majority of data sets (72%) fall into the category of reproducible segmentation; 22% can be classified as constructive segmentation, and only two data sets (6%) contain true, naturally occurring market segments.

Figure 4.

Distribution of empirical tourism data sets across segmentation concepts.

One issue that affects these results is sample size. Dolnicar et al. (2014) recommend that the sample size should exceed 70 times the number of variables. This requirement is met by 22 of 32 (69%) data sets. Coding whether or not this requirement was met (as a binary variable in the data set) and testing whether meeting the sample size requirement is associated with the type of clustering (natural, reproducible, or constructive) for those particular data sets leads to the conclusion that this is not the case (Fisher’s exact test p = 0.3318).

Further, the scale type might have an influence on the stability as it is possible that the stability decreases with the increasing number of answer categories. In this instance, the scale type does not affect the type of clustering (Fisher’s exact test p = 0.5927). With only 32 data sets and a large range of scale types, a more thorough investigation of this issue is better left to a different study that contains a larger number of data sets and a more even distribution of scale types. Additionally, response styles that seem to occur in larger variety with increasing number of answer categories would need to be controlled for, by, for example, using a model that simultaneously groups individuals and corrects response styles (Grün and Dolnicar 2015). As a consequence, all data sets are retained.

One of the two data sets with naturally occurring segments is that on tourists’ vacation activities (discussed in detail above). Distinct market segments were those interested in sight-seeing, a sports and activities segment, cultural visitors, as well as a group visiting beaches and swimming pool/saunas.

The second data set with naturally occurring segments was based on a study of 4,794 participants who responded to the question “To me vacations away from home are…” by choosing a position along a bipolar 7-point ordinal scale. The 10 items were polar opposites, including “Important / Unimportant,” “Boring / Interesting,” “Exciting / Unexciting,” “Fascinating / Mundane” and “Not needed / Needed.” For this data set, the five-segment solution emerges as the optimal choice. The segment profiles of those five segments reveal that there are strong response styles across all segments where respondents ticked answer boxes in a straight vertical line, for example, one segment uses only negative categories, another segment uses only positive categories and another one middle options only. The remaining segments only pick categories in between swaying to either the positive or negative side of the scale. As there were no well-profiled segments, the managerial insight from this data set seemed rather limited.

The concerning aspect of this finding is that the proportion of empirical data sets that contain natural clusters is very small. This has major implications for market segmentation analysis in tourism. It means that the assumption underlying most of the market segmentation studies that were published in tourism in the past—the existence of homogeneous market segments in the marketplace—does not hold in most instances. As a consequence, the common approach to segmentation analysis where only a single calculation is run for each number of clusters is very risky as it generates only one of many possible—not necessarily the best—segmentation solution.

The encouraging finding is that the proportion of empirical data sets falling into the constructive segmentation category is relatively low: only one in five data sets falls in this category. It is the constructive segmentation case that does not allow the data analyst to make any recommendations about which market segmentation solution is the best, thus placing much of the responsibility on the user who ultimately needs to decide which of the many possible solutions is the most promising in terms of the development of marketing activities.

The very high proportion of empirical data sets that display characteristics of the reproducible segmentation concept also has immediate implications for data analysts and managers using market segmentation analyses: to obtain the best possible results under the reproducible market segmentation concept, data structure analysis in the lead-up to the actual final grouping of cases is essential. Such data structure analysis provides critical input into decisions that need to be made, most critically the selection of the number of clusters. Cluster numbers that generate stable segmentation solutions should be selected.

Conclusions

Data-driven market segmentation analysis is critical to knowledge development in academia and market insight in tourism industry. Data-driven market segmentation is by its very nature exploratory. It is critical, therefore, that data analysts are aware of the nature of the data they are analyzing. This enables them to determine whether the segmentation solution they arrive at reveals natural segments, creates entirely artificial (random) segments, or whether the data contain some structure that allows one or more segmentation solutions to be identified repeatedly.

The present study set out to assess the risk of data-driven market segmentation studies in tourism leading to random market segmentation solutions and to recommend systematic stability analysis as a way to protect oneself from selecting such random solutions.

Based on an analysis of data-driven market segmentation studies published recently in the academic tourism literature, it has to be concluded that algorithm randomness is frequently controlled for, but sample randomness is not. As a consequence, the risk of random market segmentation solutions resulting is high, as high as 92%.

The analysis of the data structure of unpublished tourism survey data sets suggests that real market segments are rarely present in empirical data (6% of data sets). Only when real segments exist in the data can sample and algorithm randomness be ignored. The low proportion of such data sets means that systematic stability analysis is essential in the lead-up to data-driven market segmentation studies. Results from the stability analysis inform the data analyst and the user about whether natural segments exist, whether market segments can at least be repeatedly identified, or whether each repeated calculation leads to entirely different results.

The fact that naturally occurring market segments rarely exist is bad news because it puts most data-driven market segmentation studies at risk of generating random solutions. The good news which emerges from the analysis of unpublished tourism survey data sets, however, is that only about one-fifth of data sets lack any structure at all. This means that about three-quarters of data sets contain at least some structure—even if it is not cluster structure—which can be harvested to select segmentation solutions that are stable.

These findings lead to the following practical implications:

The nature of empirical data resulting from survey studies in tourism implies that the occurrence of natural market segments is unlikely. Tourists come in all shapes and forms, rather than in neat and tidy groups. As a consequence, it cannot be assumed that calculating one run of one segmentation algorithm will identify naturally occurring segments.

Running one single run of one segmentation algorithm is likely to produce one of many possible groupings—a “random” segmentation solution. Such random solutions are not a strong basis upon which to base marketing planning or knowledge development.

Data analysts can protect themselves by conducting systematic stability analysis to determine which segmentation concept their data fall into: natural, reproducible, or constructive. Data structure can be analyzed using the approach and the visualizations illustrated in this article, relying on bootstrapping (to counteract sample randomness), repeated calculations with different starting points (to avoid algorithm randomness), and repeated calculations with different numbers of clusters.

Segmentation results based on different segmentation concepts have to be interpreted differently. Natural segments can be interpreted directly; for reproducible segments it is important to inform the user that these are not natural segments but that they emerge repeatedly, thus offering some confidence. In the case of constructive clustering, the data analyst must disclose openly that there are many alternative solutions, all of which are essentially artificially created and that it is entirely up to the user which of those solutions offers the most strategic benefit to them.

Maybe most importantly: one single calculation is never enough in data-driven market segmentation.

The above recommendations have the potential of improving the quality of 1/10th of articles published in tourism, that is, those that use cluster analysis to derive insights (Mazanec et al. 2010).

The present study has two main limitations: (1) although every attempt was made to quantify the criteria as much as possible, the classification into each of the three segmentation concepts ultimately is still made by a human. It is possible that other human judges may classify one or the other data set in a different way. While this is a limitation, the key message of this study would not change. Even if 30% of the data sets would point to natural clustering, 30% to reproducible clustering, and 30% to constructive clustering, a data analyst would need to run multiple calculations to assess which concept is most suitable for their data. It is clearly not the case that the majority of data sets contain natural clusters. In future, it would be interesting to attempt to automatize this process. Automatization, however, would require a very large number of data sets that would allow validation of the automated rules. (2) The data sets used to answer research question 2 are all data sets collected by the second author. As such, they are inherently biased by the second author’s questionnaire design principles. Future research could extend the investigation of research question 2 to other data sets free of this systematic bias.

Footnotes

Acknowledgements

We thank the Australian Research Council for contributing to the funding of this study through grant DP110101347. The computational results presented have been achieved using the Vienna Scientific Cluster. We thank Friedrich Leisch and Homa Hajibaba for feedback on previous versions of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We thank the Australian Research Council (ARC) for contributing to the funding of this study through grant DP110101347.

Author Biographies

Dominik Ernst is a doctoral candidate from the Institute of Applied Statistics and Computing at the University of Natural Resources and Life Sciences, Vienna, Austria. His research interests include statistical computing, cluster analysis and its applications in marketing.

Sara Dolnicar is a professor of Tourism at The University of Queensland. Her research interests are the improvement of market segmentation methodology and measures used in social science research. She applies her work primarily to tourism, but also social marketing challenges, such as environmental volunteering, foster care and public acceptance of recycled water.

References

Agapito

Dora

Valle

Patrícia

Mendes

Júlio

. 2014. “The Sensory Dimension of Tourist Experiences: Capturing Meaningful Sensory-Informed Themes in Southwest Portugal.” Tourism Management 42:224–37.

Alexander

Amanda

Kim

Sung-Bum

Kim

Dae-Young

. 2015. “Segmenting Volunteers by Motivation in the 2012 London Olympic Games.” Tourism Management 47:1–10.

Banfield

Jeffrey D.

Raftery

Adrian E.

1993. “Model-Based Gaussian and Non-Gaussian Clustering.” Biometrics 49 (3): 803–21.

Beane

T. P.

Ennis

D. M.

1987. “Market Segmentation: A Review.” European Journal of Marketing 21 (5): 20–42.

Chen

Kaung-Hwa

Liu

Hsiou-Hsiang

Chang

Feng-Hsiang

. 2013. “Essential Customer Service Factors and the Segmentation of Older Visitors within Wellness Tourism Based on Hot Springs Hotels.” International Journal of Hospitality Management 35:122–32.

Chen

Li-Chan

Lin

Shang-Ping

Kuo

Chun-Min

. 2013. “Rural Tourism: Marketing Strategies for the Bed and Breakfast Industry in Taiwan.” International Journal of Hospitality Management 32:278–86.

Choe

Yeongbae

Lee

Sang-Min

Kim

Dae-Kwan

. 2014. “Understanding the Exhibition Attendees’ Evaluation of Their Experiences: A Comparison between High versus Low Mindful Visitors.” Journal of Travel & Tourism Marketing 31 (7): 899–914.

Choi

Andy S.

2011. “Implicit Prices for Longer Temporary Exhibitions in a Heritage Site and a Test of Preference Heterogeneity: A Segmentation-Based Approach.” Tourism Management 32 (3): 511–19.

De Cantis

Stefano

Ferrante

Mauro

Kahani

Alon

Shoval

Noam

. 2016. “Cruise Passengers’ Behavior at the Destination: Investigation Using GPS Technology.” Tourism Management 52:133–50.

10.

Denizci Guillet

Basak

Guo

Yuanyuan

Law

Rob

. 2015. “Segmenting Hotel Customers Based on Rate Fences through Conjoint and Cluster Analysis.” Journal of Travel & Tourism Marketing 32 (7): 835–51.

11.

Denizci Guillet

Basak

Kucukusta

Deniz

. 2016. “Spa Market Segmentation According to Customer Preference.” International Journal of Contemporary Hospitality Management 28 (2): 418–34.

12.

Dey

Banasree

Sarma

Mrinmoy K.

2010. “Information Source Usage among Motive-Based Segments of Travelers to Newly Emerging Tourist Destinations.” Tourism Management 31 (3): 341–44.

13.

Díaz

Estrella

Koutra

Christina

. 2013. “Evaluation of the Persuasive Features of Hotel Chains Websites: A Latent Class Segmentation Analysis.” International Journal of Hospitality Management 34:338–47.

14.

Dolnicar

Sara

. 2002. “A Review of Data-Driven Market Segmentation in Tourism.” Journal of Travel & Tourism Marketing 12 (1): 1–22.

15.

Dolnicar

Sara

. 2003. “Using Cluster Analysis for Market Segmentation—Typical Misconceptions, Established Methodological Weaknesses and Some Recommendations for Improvement.” Australasian Journal of Market Research 11 (2): 5–12.

16.

Dolnicar

Sara

. 2004. “Beyond ‘Commonsense Segmentation’: A Systematics of Segmentation Approaches in Tourism.” Journal of Travel Research 42 (3): 244–50.

17.

Dolnicar

Sara

. 2008. “Market Segmentation in Tourism.” In Tourism Management—Analysis, Behaviour and Strategy, edited by Woodside

Martin

, 129–50. Cambridge, UK: CABI.

18.

Dolnicar

Sara

Grün

Bettina

Leisch

Friedrich

. 2016. “Increasing Sample Size Compensates for Data Problems in Segmentation Studies.” Journal of Business Research 69 (2): 992–99.

19.

Dolnicar

Sara

Grün

Bettina

Leisch

Friedrich

Schmidt

Kathrin

. 2014. “Required Sample Sizes for Data-Driven Market Segmentation Analyses in Tourism.” Journal of Travel Research 53 (3): 296–306.

20.

Dolnicar

Sara

Kaiser

Sebastian

Lazarevski

Katie

Leisch

Friedrich

. 2012. “Biclustering Overcoming Data Dimensionality Problems in Market Segmentation.” Journal of Travel Research 51 (1): 41–49.

21.

Dolnicar

Sara

Leisch

Friedrich

. 2010. “Evaluation of Structure and Reproducibility of Cluster Solutions Using the Bootstrap.” Marketing Letters 21 (1): 83–101.

22.

Fraley

Chris

Raftery

Adrian E.

1998. “How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis.” Computer Journal 41 (8): 578–88.

23.

Gibson

Wilfred A.

1959. “Three Multivariate Models: Factor Analysis, Latent Structure Analysis, and Latent Profile Analysis.” Psychometrika 24 (3): 229–52.

24.

Grün

Bettina

Dolnicar

Sara

. 2015. “Response-Style Corrected Market Segmentation for Ordinal Data.” Marketing Letters. Published online June 18. doi:10.1007/s11002-015-9375-9.

25.

Hadjikakou

Michalis

Chenoweth

Jonathan

Miller

Graham

Druckman

Angela

Gang

. 2014. “Rethinking the Economic Contribution of Tourism: Case Study from a Mediterranean Island.” Journal of Travel Research 53 (5): 610–24.

26.

Hartigan

John A.

1975. Clustering Algorithms. New York: Wiley.

27.

Hennig

Christian

. 2007. “Cluster-wise Assessment of Cluster Stability.” Computational Statistics & Data Analysis 52 (1): 258–71.

28.

Hubert

Lawrence

Arabie

Phipps

. 1985. “Comparing Partitions.” Journal of Classification 2 (1): 193–218.

29.

Iversen

Nina M.

Hem

Leif E.

Mehmetoglu

Mehmet

. 2016. “Lifestyle Segmentation of Tourists Seeking Nature-Based Experiences: The Role of Cultural Values and Travel Motives.” Journal of Travel & Tourism Marketing 33 (suppl. 1): 38–66.

30.

Khoo-Lattimore

Catheryn

Prayag

Girish

. 2015. “The Girlfriend Getaway Market: Segmenting Accommodation and Service Preferences.” International Journal of Hospitality Management 45:99–108.

31.

Kim

Aise KyoungJin

Weiler

Betty

. 2013. “Visitors’ Attitudes towards Responsible Fossil Collecting Behaviour: An Environmental Attitude-Based Segmentation Approach.” Tourism Management 36:602–12.

32.

Koh

Suna

Yoo

Joanne Jung-Eun

Boger

Carl A.

Jr.

2010. “Importance-Performance Analysis with Benefit Segmentation of Spa Goers.” International Journal of Contemporary Hospitality Management 22 (5): 718–35.

33.

Konu

Henna

Laukkanen

Tommi

Komppula

Raija

. 2011. “Using Ski Destination Choice Criteria to Segment Finnish Ski Resort Customers.” Tourism Management 32 (5): 1096–1105.

34.

Kruger

Martinette

Myburgh

Esmarie

Saayman

Melville

. 2016. “A Motivation-Based Typology of Road Cyclists in the Cape Town Cycle Tour, South Africa.” Journal of Travel & Tourism Marketing 33 (3): 380–403.

35.

Kruger

Martinette

Viljoen

Armand

Saayman

Melville

. 2016. “Who Visits the Kruger National Park, and Why? Identifying Target Markets.” Journal of Travel & Tourism Marketing 1–29.

36.

Landauer

Mia

Haider

Wolfgang

Pröbstl-Haider

Ulrike

. 2014. “The Influence of Culture on Climate Change Adaptation strategies: Preferences of Cross-Country Skiers in Austria and Finland.” Journal of Travel Research 53 (1): 96–110.

37.

Lazarsfeld

Paul Felix

Henry

Neil W.

Anderson

Theodore Wilbur

. 1968. Latent Structure Analysis. Boston: Houghton Mifflin.

38.

Diem-Trinh Thi

Pearce

Douglas G.

2011. “Segmenting Visitors to Battlefield Sites: International Visitors to the Former Demilitarized Zone in Vietnam.” Journal of Travel & Tourism Marketing 28 (4): 451–63.

39.

Lee

Jenny Jiyeon

Kyle

Gerard T.

2014. “Segmenting Festival Visitors Using Psychological Commitment.” Journal of Travel Research 53 (5): 656–69.

40.

Legohérel

Patrick

Hsu

Cathy H.C.

Daucé

Bruno

. 2015. “Variety-Seeking: Using the CHAID Segmentation Approach in Analyzing the International Traveler Market.” Tourism Management 46:359–66.

41.

Leisch

Friedrich

. 2004. “Flexmix: A General Framework for Finite Mixture Models and Latent Class Regression in R.” Journal of Statistical Software 8:1-18.

42.

Leisch

Friedrich

. 2006. “A Toolbox for K-Centroids Cluster Analysis.” Computational Statistics and Data Analysis 51 (2): 526–44.

43.

Mimi

Zhang

Hanqin

Mao

Iris

Deng

Claire

. 2011. “Segmenting Chinese Outbound Tourists by Perceived Constraints.” Journal of Travel & Tourism Marketing 28 (6): 629–43.

44.

Lilien

G. L.

Rangaswamy

2004. Marketing Engineering: Computer-Assisted Marketing Analysis and Planning. 2nd ed. Upper Saddle River, NJ: Prentice Hall, DecisionPro.

45.

Lima

Joana

Eusébio

Celeste

Kastenholz

Elisabeth

. 2012. “Expenditure-Based Segmentation of a Mountain Destination Tourist Market.” Journal of Travel & Tourism Marketing 29 (7): 695–713.

46.

Ada S.

Law

Rob

Cheung

Catherine

. 2011. “Segmenting Leisure Travelers by Risk Reduction Strategies.” Journal of Travel & Tourism Marketing 28 (8): 828–39.

47.

Lyu

Seong Ok

Lee

Hoon

. 2013. “Market Segmentation of Golf Event Spectators Using Leisure Benefits.” Journal of Travel & Tourism Marketing 30 (3): 186–200.

48.

Maggioni

Isabella

Marcoz

Elena Maria

Mauri

Chiara

. 2014. “Segmenting Networking Orientation in the Hospitality Industry: An Empirical Research on Service Bundling.” International Journal of Hospitality Management 42:192–201.

49.

Masiero

Lorenzo

Nicolau

Juan L.

2012. “Tourism Market Segmentation Based on Price Sensitivity Finding Similar Price Preferences on Tourism Activities.” Journal of Travel Research 51 (4): 426–35.

50.

Mazanec

Josef A.

Ring

Amata

Stangl

Brigitte

Teichmann

Karin

. 2010. “Usage Patterns of Advanced Analytical Methods in Tourism Research 1988–2008: A Six Journal Survey.” Information Technology & Tourism 12 (1): 17–46.

51.

Milligan

Glenn W.

Sokol

Lisa M.

1980. “A Two-Stage Clustering Algorithm with Robust Recovery Characteristics.” Educational and Psychological Measurement 40 (3): 755–59.

52.

Myers

J. H.

Tauber

1977. Market Structure Analysis. Chicago: American Marketing Association.

53.

Nicolau

Juan L.

2012. “Asymmetric Tourist Response to Price: Loss Aversion Segmentation.” Journal of Travel Research 51 (5): 568–676.

54.

Joanne Y. J.

Schuett

Michael A.

2010. “Exploring Expenditure-Based Segmentation for Rural Tourism: Overnight Stay Visitors versus Excursionists to Fee-Fishing Sites.” Journal of Travel & Tourism Marketing 27 (1): 31–50.

55.

Polo-Peña

Ana Isabel

Frías-Jamilena

Dolores María

Rodríguez-Molina

Miguel Ángel

. 2012. “Validation of a Market Orientation Adoption Scale in Rural Tourism Enterprises. Relationship between the Characteristics of the Enterprise and Extent of Market Orientation Adoption.” International Journal of Hospitality Management 31 (1): 139–51.

56.

Pesonen

Juho Antti

. 2014. “Testing Segment Stability: Insights from a Rural Tourism Study.” Journal of Travel & Tourism Marketing 31 (6): 697–711.

57.

Pesonen

Juho Antti

. 2015. “Targeting Rural Tourists in the Internet: Comparing Travel Motivation and Activity-Based Segments.” Journal of Travel & Tourism Marketing 32 (3): 211–26.

58.

Prayag

Girish

Disegna

Marta

Cohen

Scott Allen

Yan

Hongliang Gordon

. 2015. “Segmenting Markets by Bagged Clustering Young Chinese Travelers to Western Europe.” Journal of Travel Research 54 (2): 234–50.

59.

Prayag

Girish

Hosany

Sameer

. 2014. “When Middle East Meets West: Understanding the Motives and Perceptions of Young Tourists from United Arab Emirates.” Tourism Management 40:35–45.

60.

Priporas

Constantinos-Vasilios

Vassiliadis

Chris A.

Bellou

Victoria

Andronikidis

Andreas

. 2015. “Exploring the Constraint Profile of Winter Sports Resort Tourist Segments.” Journal of Travel Research 54 (5): 659–71.

61.

Putler

Dan

. 2014. RcmdrPlugin.BCA: Rcmdr Plug-In for Business and Customer Analytics. R package version 0.9-8. https://CRAN.R-project.org/package=RcmdrPlugin.BCA.

62.

Putler

Daniel S.

Krider

Robert E.

2012. Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R. Boca Raton, FL: CRC Press.

63.

R Core Team. 2015. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

64.

Rasmi

Sarah

SiewImm

Lee

Julie A.

Soutar

Geoff N.

2014. “Tourists’ Strategies: An Acculturation Approach.” Tourism Management 40:311–20.

65.

Rid

Wolfgang

Ezeuduji

Ikechukwu O.

Pröbstl-Haider

Ulrike

. 2014. “Segmentation by Motivation for Rural Tourism Activities in The Gambia.” Tourism Management 40:102–16.

66.

Ring

Amata

Tkaczynski

Aaron

Dolnicar

Sara

. 2016. “Word-of-Mouth Segments Online, Offline, Visual or Verbal?” Journal of Travel Research 55 (4): 481–92.

67.

Ritchie

Brent W.

Chien

P. Monica

Sharifpour

Mona

. 2016. “Segmentation by Travel Related Risks: An Integrated Approach.” Journal of Travel & Tourism Marketing. Published online April 14.

68.

Ritchie

Brent W.

Tkaczynski

Aaron

Faulks

Pam

. 2010. “Understanding the Motivation and Travel Behavior of Cycle Tourists Using Involvement Profiles.” Journal of Travel & Tourism Marketing 27 (4): 409–25.

69.

Schofield

Peter

Fallon

Paul

. 2012. “Assessing the Viability of University Alumni as a Repeat Visitor Market.” Tourism Management 33 (6): 1373–84.

70.

Seabra

Cláudia

Dolnicar

Sara

Abrantes

José Luís

Kastenholz

Elisabeth

. 2013. “Heterogeneity in Risk and Safety Perceptions of International Tourists.” Tourism Management 36:502–10.

71.

Smith

Wendell R.

1956. “Product Differentiation and Market Segmentation as Alternative Marketing Strategies.” Journal of Marketing 21 (1): 3–8.

72.

Steinley

Douglas

Brusco

Michael J.

2007. “Initializing k-Means Batch Clustering: A Critical Evaluation of Several Techniques.” Journal of Classification 24 (1): 99–121.

73.

Thapa

Brijesh

Cahyanto

Ignatius

Holland

Stephen M.

Absher

James D.

2013. “Wildfires and Tourist Behaviors in Florida.” Tourism Management 36:284–92.

74.

Tiago

Maria Teresa Pinheiro Melo Borges

Couto

João Pedro de Almeida

Tiago

Flávio Gomes Borges

Faria

Sandra Micaela Costa Dias

. 2016. “Baby Boomers Turning Grey: European Profiles.” Tourism Management 54:13–22.

75.

Tibshirani

Robert

Walther

Guenther

. 2005. “Cluster Validation by Prediction Strength.” Journal of Computation and Graphical Statistics 14 (3): 511–28.

76.

Tkaczynski

Aaron

Rundle-Thiele

Sharyn

. 2013. “Understanding What Really Motivates Attendance: A Music Festival Segmentation Study.” Journal of Travel & Tourism Marketing 30 (6): 610–23.

77.

Tkaczynski

Aaron

Rundle-Thiele

Sharyn

Beaumont

Narelle

. 2010. “Destination Segmentation: A Recommended Two-Step Approach.” Journal of Travel Research 49 (2): 139–52.

78.

Tkaczynski

Aaron

Rundle-Thiele

Sharyn R.

Prebensen

Nina K.

2015. “Segmenting Potential Nature-Based Tourists Based on Temporal Factors: The Case of Norway.” Journal of Travel Research 54 (2): 251–65.

79.

Tuma

M. N.

Decker

Scholz

2011. “A Survey of the Challenges and Pitfalls of Cluster Analysis Application in Market Segmentation.” International Journal of Market Research 53 (3): 391–414.

80.

do Valle

Patrícia Oom

Pintassilgo

Pedro

Matias

António

André

Filipe

. 2012. “Tourist Attitudes towards an Accommodation Tax Earmarked for Environmental Protection: A Survey in the Algarve.” Tourism Management 33 (6): 1408–16.

81.

Weaver

David B.

2012. “Psychographic Insights from a South Carolina Protected Area.” Tourism Management 33 (2): 371–79.

82.

Wedel

Michel

Kamakura

Wagner

. 1998. Market Segmentation—Conceptual and Methodological Foundations. Boston: Kluwer Academic.

83.

Xia

Jianhong Cecilia

Evans

Fiona H.

Spilsbury

Katrina

Ciesielski

Vic

Arrowsmith

Colin

Wright

Graeme

. 2010. “Market Segments Based on the Dominant Movement Patterns of Tourists.” Tourism Management 31 (4): 464–69.

84.

Zins

Andreas H.

2008. “Marketing Segmentation in Tourism: A Critical Review of 20 Years’ Research Efforts.” In Change Management in Tourism: From ‘Old’ to ‘New’ Tourism, edited by Kronenberg

Christopher

Müller

Sabine

Peters

Mike

Pikkemaat

Birgit

Weiermair

Klaus

, 289–301. Berlin: Erich Schmidt.