Abstract
Destination image is among the most studied constructs in tourism research. Many researchers are still convinced that the rating scale method is the most accurate for assessing destination image. This study presents alternative methods of data collection, namely, free-sorting and reduced paired comparisons, and investigates their applicability in a Web-based environment. The study then subjects these data collection methods to empirical analysis and compares the judgment task’s effects on perceived difficulty, fatigue, and boredom, on data quality, and on perceptual maps derived with MDS. The findings demonstrate that these methods are more accurate whenever a large number of objects have to be judged, which is particularly the case for positioning and competitiveness studies.
Keywords
Introduction
In research, city tourism has gained importance since the 1980s. Previously, cities were primarily seen as tourists’ places of origin, although Christaller’s (1966) central place theory already pointed out that cities have great potential as tourist destinations. Meanwhile, experts agree that city tourism has become an important economic factor for cities. Law (1993) even holds that large cities are among the most important destinations in the world.
Today, city tourism is booming in Europe. The ITB World Travel Trends Report 2014/2015 (IPK International 2014) declared city trips to be the boom market segment. Worldwide, city trips soared by 58% over five years, to reach a 20% market share, while sun and beach holidays grew by 18% and touring holidays by 32% in the same period. According to the World Travel Monitor®, “this dynamic growth has been supported by the increase in low-cost flights and expansion of budget accommodation” (IPK International 2014, 8).
However, with the increase in urban tourism, competition among city destinations has become fiercer, creating major challenges for destination marketing organizations. One of the most significant marketing challenges is the need for an effective destination positioning strategy in order to make a place attractive in the minds of the target audience (Echtner and Ritchie 1991).
There is a general consensus that image forms an essential part of this positioning process, since it is a key criterion by which destinations can be differentiated from their competitors (Hunt 1975; Calantone et al. 1989; Echtner and Ritchie 1991; Baloglu and Brinberg 1997). This particularly applies when consumers have limited personal experience of the destinations they are considering (Crompton 1979).
Thus, an accurate assessment of destination image is critical if city marketers are to understand their competitive positioning and to use this knowledge to attract visitors. However, owing to its complexity and multidimensionality, a reliable and valid operationalization of this construct is challenging. Dobni and Zinkhan (1990) note that because different researchers use different definitions, image construct is operationalized in multiple ways.
Most researchers consider destination image to be a multidimensional construct consisting of at least two distinctive components: the cognitive and the affective (e.g., Crompton 1979; Walmsley and Young 1998; Baloglu and Brinberg 1997; MacKay and Fesenmaier 1997). The cognitive dimension is concerned with beliefs and knowledge about a destination, while the affective dimension is related to an individual’s feelings or emotions about it (Baloglu and McCleary 1999). The summation of these two components produces an overall or compound image, which leads to a positive or a negative evaluation of a destination (Beerli and Martín 2004).
Some researchers (Gartner 1994; Dann 1996; Pike and Ryan 2004) even refer to a third—the conative or behavioral—component, which can be defined as the likelihood of visiting a destination within a certain period (Pike and Ryan 2004). The conative component therefore depends on “the images developed during the cognitive stage and evaluated during the affective stage” (Gartner 1994, 196).
While several authors have called for the measuring of both the cognitive and the affective components in assessing destination image, few studies have done so (e.g., Baloglu and McCleary 1999; MacKay and Fesenmaier 2000; Hosany, Ekinci, and Uysal 2007; Kim and Perdue 2011). Pike’s (2002) review of 142 destination image papers published between 1973 and 2000 reveals that only six show an explicit interest in affective images, while the vast majority focuses on the cognitive dimension, measuring destination image on the basis of physical attributes (mostly by means of Likert-type scales or the semantic differential) (Echtner and Ritchie 1991; Gallarza, Gil Saura, and Calderón García 2002; Pike 2002).
In addition, most of these studies are case studies that compare destination images of only a few cities. In his meta-analysis, Pike (2002, 542) finds that “over half of the papers (75) measured the perceptions of only one destination, without a frame of reference to any competing destinations.” Few studies deal with a larger number (>20) of cities or destinations (e.g., Gearing, Swart, and Var [1974] explored 65 destinations, Grabler [1997] 77 cities, and Oppermann [1996] 30 cities). To uncover competitive relationships, it is crucial to obtain data of many objects of comparison. There are few systematic studies in this field.
Literature Review
Destination Image Measurement
In their meta-analysis, Stepchenkova and Mills (2010, 591) state, “Previous research recognized multidimensionality and complexity of image but applied different methods for measuring different components of DI, which was not conducive to assessing the holistic nature of the construct.” Pan and Li (2011) also criticize the disagreement of the scientific community regarding the dimensionality of the image construct. They argue that different concepts of destination image lead to different measurement techniques. As a consequence, cross-validation and comparison of results is virtually impossible.
Given the multidimensional structure of image as a construct consisting of at least two distinctive components (i.e., cognitive and affective) (e.g., Crompton 1979; Dobni and Zinkhan 1990; Walmsley and Young 1998; Baloglu and Brinberg 1997; MacKay and Fesenmaier 1997), the method of choice for capturing a destination image should also be multidimensional.
Multidimensional methods can be categorized into decompositional and compositional approaches. Both approaches seek to uncover the basic dimensions by which consumers perceive and evaluate products.
In destination image measurement, there is a strong preference for structured research designs or compositional approaches. Owing to the use of standardized scales, they are simple to administer, and findings can be analyzed by applying an array of statistical procedures. Hauser and Koppelman (1979) conclude that compositional methods have been found to be superior to decomposition methods regarding the underlying theory, interpretability, ease of use, and predictive validity. In his meta-analysis, Pike (2002) finds that 114 out of 142 papers use structured techniques to operationalize the destination image construct. Dolnicar and Grün (2013) review 86 image papers published between 2002 and 2012 in top tourism journals and find that 75% took a structured approach. However, since structured methods are based on the so-called piecemeal-based or attribute-based approach to assessing destination image, which forces respondents to think about destination image in terms of predefined characteristics, they seem inappropriate for adequately capturing holistic impressions and unique features of a destination (Echtner and Ritchie 1991). Another weakness is the standardization of attributes, which implies that the same attribute sets suit all consumers, and thus does not take into account that respondents vary in their knowledge of destinations and use different words to describe them (Steenkamp, Van Trijp, and Ten Berge 1994). Furthermore, the a priori specified attributes might be totally unimportant to a respondent, or important attributes may be missing (Jenkins 1999). The omission of a relevant attribute is most likely when little previous research is available (Steenkamp, Van Trijp, and Ten Berge 1994), which is the case in city tourism.
In decompositional approaches, judgments are made on the basis of global evaluations such as similarities, distances, or preferences, which are then broken down into dimensions. Assessments refer to all objects and their interrelationships. Decompositional methods are particularly advantageous when respondents are unable to determine which attributes serve as a basis for a decision, or when the rules by which the individual features are combined cannot be identified (Kuehn 1976).
Tasci, Gartner, and Cavusgil (2007) find an average of around 22 attributes (items) used in quantitative destination image studies. In the case of assessing many objects or destinations, this leads to an enormous number of judgments. In addition, few image studies deal with city destinations, which complicates the selection of adequate city attributes. These arguments, along with calls from within the scientific community to capture a destination image’s holistic nature (Echtner and Ritchie 1991; Jenkins 1999; Nieschlag, Dichtl, and Hörschgen 1988), suggest using a decompositional image measurement method. A well-established decompositional method in marketing is multidimensional scaling (MDS).
Multidimensional Scaling
MDS is one of the most important tools in marketing research for product positioning (Cooper 1983). MDS not only allows the detection of image dimensions but also the representation of the relationships between objects in a perceptual space. Thus, because positioning identifies the destination attributes or image dimensions that can lead to a competitive advantage, it should be emphasized in the communication strategy (Calantone and Mazanec 1991).
MDS requires proximities calculated on the basis of similarities or dissimilarities. Thus, individual objects are not described but, instead, pairs of objects are placed into relationships concerning their similarity. Cooper (1983, 444) therefore states that “the advantage of MDS in this context is that it allows the researcher to discover the relevant attributes. The similarities question is probably the most neutral question in the social science. It allows the respondent to bring a personal frame of reference to the judgment task, rather than having one imposed by a prescribed list of attributes on which the product alternatives are rated.”
Decompositional methods avoid an a priori determination of destination attributes by the researcher. This is a huge benefit of these methods. However, only 15 destination image studies using MDS as a primary methodology have been identified to date (e.g., Gartner 1989; Goodrich 1978; Phelps 1986). Data in these studies are mostly gained from derived similarity data (rating scales, semantic differential) (Pike 2002; Gallarza, Gil Saura, and Calderón García 2002). This means that researchers have not used MDS to its full capacity.
A second drawback can be identified: the question of choosing a suitable distance measure. To date, very few studies have used direct similarities as input data for MDS analysis (e.g., MacKay and Fesenmaier 2000; Kemper, Roberts, and Goodwin 1983; Zins 1994).
Green and Carmone (1970) make a detailed distinction between direct and derived methods of gathering dissimilarities. In derived methods, relationships are not determined directly; instead, distance measures (e.g., distances, correlations, or measures of fit) are calculated from so-called profile data (Green and Rao 1972). These measures can, for instance, be derived from the results of a semantic differential or other rating scales. Derived dissimilarities should be used with caution, since the criteria as well as their weightings for the similarity judgment is predetermined by the researcher. Thus, a primary advantage of MDS, its exploratory nature, is limited in this case. However, the evaluation of predetermined criteria is useful as a supplement and can greatly simplify the interpretation of dimensions.
Direct dissimilarities are directly obtained from respondents’ similarity judgments. Direct methods of capturing global similarities are sorting, ranking, and rating. In the ranking process, the order of similar objects can be determined by paired comparisons, the rotating anchor point method, or triadic combinations. A respondent is simply asked to arrange all possible pairs of objects in their order of similarity. In the rating method, the subject is asked to rate object pairs on a scale. This task can be facilitated by using binary scales rather than multilevel scales.
One reason why direct similarities are seldom used is the large number of judgments needed. In the case of pairwise comparisons, n objects result in
Several authors have investigated different approaches to gathering direct similarities (Rao and Katz 1971; Neidell 1972; Henry and Stumpf 1975; Whipple 1976; McIntyre and Ryans 1977; Bijmolt and Wedel 1995). In their work, Bijmolt and Wedel (1995) find that if the stimulus set is relatively large, the methods of sorting and paired comparisons are well suited for collecting similarity data. All the other above-mentioned authors do not include both methods (sorting and paired comparisons) in their investigations. Rao and Katz (1971), for instance, compare the methods of sorting, conditional ranking, and ranking of pairs, while Neidell (1972) examines the differences between conditional ranking and triadic combinations. They conclude that paired comparisons outperform conditional rankings, triadic combinations, and ranking of pairs in terms of completion time and boredom. However, findings concerning data quality and the MDS solution’s fit are incomplete. Moreover, all these studies were published before the Internet was in its ascendancy. In their meta-analysis, Stepchenkova and Mills (2010) identify eight image studies using the Internet for data collection and register an extension of destination image studies into the Web environment. These new interactive Web-based survey formats provide opportunities to implement methods to assess direct similarities in cost-efficient and time-efficient ways.
This article seeks to contribute to the above-mentioned aspects by investigating the applicability of free-sorting and reduced paired comparisons for obtaining similarity data online. The following underlying hypotheses, based on the work by Bijmolt and Wedel (1995), are tested:
There are differences between the data collection methods concerning
completion time,
perceived task difficulty, and
variety.
There are differences between the data collection methods concerning
the recovered stimulus coordinates,
the optimal dimensionality, and
the goodness of fit.
Methodology
A total of 44 European cities were used in this study. The cities were selected according to their touristic impact, which has been defined using the number of bed nights totaling more than one million in 2006. Additionally, Bucharest, Luxemburg, Reykjavik und Sofia were selected as they are capital cities. The empirical study was conducted using an experimental setting. Two questionnaires were developed: a card-sorting application using WebSort with a subsequent online survey and an interactive questionnaire for the reduced paired comparison task using CIW (Sawtooth Software). Besides the dependent variables (similarities, task difficulty, and completion time), various control variables (experience, preferences, previous visits, and demographics) were selected.
The control variables were chosen based on general destination image formation models, according to which the following factors have substantial impacts on destination image: information sources (Beerli and Martín 2004), personal factors such as motivation and travel experience (Pearce and Caltabiano 1983; Baloglu 2001; Kim and Perdue 2011; Kim, Hallab, and Kim 2012), and sociodemographics (Beerli and Martín 2004; Baloglu and McCleary 1999), as well as preferences (Hunt 1975; Goodrich 1978; Goodall 1988). Both survey instruments were subjected to a pretest. Finally, 465 questionnaires were completed in total—184 for the card-sorting task and 281 for the reduced paired comparison task.
Card Sorting
For the free-sorting (form K groups with variable k) or the individual sorting task, respondents were presented with all stimuli (44 cities) simultaneously and then asked to sort the cities into piles according to their global similarity and, finally, to label the piles. Each subject was free to choose the number of piles, with each pile containing as many objects as the subject found appropriate. The cities within a pile should be similar, while piles should be less similar (Coxon 1999). Previous empirical results suggest that, on the one hand, this data collection type is perceived as quick and simple in terms of respondent fatigue and boredom but, on the other hand, it yields little information for individual subjects (Bijmolt and Wedel 1995).
Reduced Paired Comparison
The paired comparisons method was first developed within psychology (Thurstone 1927; Bradley and Terry 1952). It received much attention owing to its simplicity. It is easy for respondents to handle, since they need to focus on only two items at a time. But, in the case of a larger number of items, the number of comparisons can rapidly increase and can thus overburden respondents. Various incomplete designs have therefore been suggested in the literature.
A complete paired comparison task for 44 cities resulted in 946 similarity judgments. Even when two-thirds of the comparisons were eliminated using certain designs, 315 comparisons would be needed. Bisset and Schneider (1991) achieved satisfactory results with less than 10% of all possible pairwise comparisons, producing 95 judgments. The reduction of the stimuli, that is, the creation of different subsets, can either be random (Spence and Domoney 1974) or can proceed via certain designs, such as the cyclic elimination of pairs (Clatworthy 1973). In most cases, authors use incomplete block designs (Bradley and Terry 1952) or incomplete cyclic designs (David 1982). The random or cyclic elimination of comparisons is deterministic and based on technical solutions instead of selection on the characteristics of the brand or the respondent. MacKay and Zinnes (1981) as well as Zinnes and MacKay (1983) propose a probabilistic approach by forming two groups: known and unknown brands. Thus, they could evaluate the similarity of the pairs containing at least one well-known brand.
This article follows these authors’ rationale. Therefore, the reduction of the comparisons was achieved by restricting the dissimilarity judgments to the cities included in the consideration set of individual subjects. Thus, in the first part of the questionnaire, respondents were asked about their consideration set based on the studies by Brown and Wildt (1992) as well as Woodside and Lysonski (1989). Each subject had to state at least one favorite city. Three subsequent indications were optional.
The cities specified by a respondent plus four cities (London, Paris, Rome, and Vienna) were adopted in a script for the following pairwise comparisons and for the assessment of cities based on predefined attributes. These four cities were chosen based on their popularity as tourism destinations. Thus, if a respondent specified four preferences, he or she had to carry out the maximum number of pairwise comparisons (8 × 7/2 = 28). However, if only one preference was selected, the number of pairwise comparisons was significantly smaller (5 × 4/2 = 10). The assessment of the similarities was carried out using a five-point Likert-type scale. This approach avoids destination unfamiliarity and reduces the number of judgments in a reasonable way.
After respondents in both samples completed the judgment task, they were asked to rate the task on a five-point scale in terms of its difficulty, the perceived task length, and the extent to which they had found the task boring/interesting and diversified.
Data Analysis
The first step in preparing sorting data for further analysis is to create a co-occurrence matrix (also called Miller’s co-occurrence or Burton’s F). It is the aggregate of all respondents’ frequencies of sorting two objects in a group. Usually, the main analysis is preceded by a data transformation step in which the direct similarities are transformed into distances by subtracting cell data from the total number of subjects, as suggested by Coxon (1999). Since the PROXSCAL algorithm uses similarities as well as dissimilarities as input data, this step was omitted. In the case of the pairwise comparison data, the mean values from a five-point Likert scale (1 = very similar and 5 = not similar at all) form a city × city matrix.
The main analysis was undertaken by using nonmetric MDS using the PROXSCAL algorithm. In the literature, most authors who deal with sorting data use nonmetric MDS (Faye et al. 2004; Van der Kloot and Van Herk 1991; Lawless, Sheng, and Knoops 1995).
The use of aggregate data should generally be avoided, since important information can be lost. In this case, a disaggregate analysis can only be performed for the sorting data owing to the large number of missing values in the individual datasets for the pairwise comparison data.
Congruence Testing
There is an array of methods to assess the overall extent of association between two matrices, generally referred to as congruence testing. Dillon, Frederick, and Tangpanichdee (1982) provide an overview of different methods commonly used to check the conformity of two or more perceptual spaces. The matching techniques include the calculation of similarity measures such as the product-moment correlation coefficient (Spearman, Pearson), the RV coefficient (Robert and Escoufier 1976), or for the comparison of the total matrices canonical correlation (in the case of two subgroups).
If one uses the correlation coefficients of distances between two perceptual spaces, the correlations of the distances within the perceptual space are not considered (Schneider and Borlund 2007). Therefore, Borg and Groenen (2005, 440) discourage interpreting the correlation of distances: “correlating distances does not properly assess the similarity of geometric figures (configurations). The problem is easily resolved, however, if we do not extract the mean from the distances and compute a correlation about the origin, not the centroid.”
Commonly found methods of congruence testing include the Mantel test (Mantel 1967) and the Procrustean superimposition approach (Gower 1971). Both techniques, although also based on the calculation of distances between congruence coefficients of distances (Mantel test) or points (Procrustes), have the advantage that corresponding permutation tests are available to test the significance of differences.
Procrustes analysis was originally developed for comparing ordination results of factor analysis but became an important tool for multidimensional scaling. Comparisons of ordination results are rare in social sciences, but are prevalent in natural sciences. The technique can be traced back at least to Mosier (1939), and the name Procrustes was invented by Hurley and Cattell (1962). Procrustes is based on the simple idea of minimizing the sum of squared differences between two or more configurations or data matrices in a multivariate Euclidean space. Procrustes has a wide range of applications and can be used to compare any kind of ordination results.
By applying Procrustes transformation (Gower 1971), two or more configurations are adjusted to each other. One of the two MDS solutions serves as the starting configuration, which can then be rotated, mirrored, and stretched for an optimal fit with the target configuration (Borg and Groenen 2005). In doing so, the sum of squared distances of both configurations is minimized. The sum of squared residuals of the two configurations serves as a descriptive measure of dissimilarity, called m2 (Gower 1971). The smaller m2 is, the more similar the two perceptual maps are. Gower (1975), Schneider and Borlund (2007) as well as Borg and Groenen (2005) provide mathematical details.
Basically, either the distance matrices can be compared directly, or—in a first step—a dimensional reduction (MDS or factor analysis) is carried out. Then the distance matrices or the coordinates of the solutions of the previous MDS or factor analysis are superimposed. The matrix of the residuals of the two solutions can then be analyzed further (Procrustean superimposition plot, partial Protest, cluster analysis). For a detailed description, see Peres-Neto and Jackson (2001).
An outstanding advantage of Procrustean superimposition over other congruence testing approaches is that one can visually verify the resemblance for each dimension separately.
Results
The descriptive analysis shows that the noncompletion rate for the paired comparison sample is significantly higher (24%) than for the card-sorting sample (10%). The homogeneity of the sample was checked in terms of demographic characteristics, travel experience (number of trips within a year), and preferences (consideration sets). Table 1 shows a list of sociodemographic and travel-related data of the two samples (frequencies or means and standard deviations) and the results of the χ2 tests (gender) and the t-tests (age, travel experience) respectively. Results show that the two samples are homogeneous regarding gender, age, and travel experience. According to the Mann–Whitney U test of the preference rankings, significant differences (p < 0.05) are observed only for the cities of Hamburg, Madrid, Naples, and Venice. Thus, the preference differences must be taken into account when interpreting the perceptual spaces.
Sociodemographics and Travel-Related Characteristics.
χ2 test; t test (T), Mann–Whitney U test (U).
The time records of WebSort and SSI provided a basis for calculating the response times. Results indicate that the reduced paired comparisons task took less time to complete than the card-sorting task. The mean value for the sorting task (16.7 minutes, SD 10.79) is about twice as high as for the paired comparison task (8.7 minutes, SD 6.83). The t-test is highly significant and shows that there is a difference between the two samples regarding interview time.
The assessment criteria of Krapp and Sattler (2001) provided a basis for determining task difficulty. The individual items were measured on a scale from 1 (completely true) to 5 (not true at all). The individual items’ ranking order is consistent in both samples, but with slightly better values for the paired comparison task concerning task simplicity and interview length. As expected, the sorting task was judged better regarding the variety, the fun factor, and the extent of interestingness. Basically, the assignment was considered to be easy (means 1.9/1.6) and as not too long (means 1.5/1.4), even though the survey took approximately 17 (sorting) and 9 (pairwise comparison) minutes, respectively, on average. Altogether, the variety of the interviews was evaluated worst, with mean values of 2.7 and 2.9. Concerning the different task difficulty levels, the sorting task was considered much more varied and interesting, despite a considerably longer processing time. The Mann–Whitney U test results show that there are significant differences for all five difficulty items (p < 0.05).
Congruence Testing
As described above, perceptual maps were generated by using nonmetric MDS (PROXSCAL algorithm). To make a decision concerning the optimal number of dimensions, two-dimensional to five-dimensional solutions per output matrix were calculated. Table 2 shows the respective stress values and the explained variance, R2, for dimensions 2 to 5. The value of R2 (squared correlation between disparities and distances) also serves as a criterion for configuration quality; the closer this is to 1, the better the solution. Considering the goodness of fit of the configurations, three-dimensional solutions seems to be appropriate.
Goodness of Fit of the MDS Solutions.
Kruskal and Carmone (1973) present a reference for interpreting stress values. Some authors (e.g., Spence and Ogilvie 1973) provide tables with stress values of random numbers for nonparametric two-way MDS models. Even more helpful than these calculated values is Spence’s (1979) formula, because it also considers the number of objects. According to Spence’s formula, the threshold for three dimensions and 44 objects is 0.272. Thus, the goodness of fit of both solutions is lower (see Table 3).
Goodness-of-Fit Thresholds According to Spence (1979).
Concerning the optimal number of dimensions, the two MDS solutions hardly differ. Three-dimensional or four-dimensional solutions can be justified owing to the calculated stress and the improvements in the stress levels as dimensions increase. Since it is recommended to achieve a dimensional representation as low as possible to facilitate interpretation, a three-dimensional space was used for further analysis and interpretation.
Since the data samples were independent, there were two independent MDS representations. Thus, in the next step, transformations of the MDS solutions were found to make them comparable; this is known as the Procrustes problem (Borg and Groenen 2005; Gower 1971). The transformation’s goodness of fit (sums of squares) was assessed by using permutation tests (Jackson 1995; Schneider and Borlund 2007). The analysis was done using the R environment for statistical computing (R Development Core Team 2005), in particular, the VEGAN package (Oksanen et al. 2009), which among others provides an implementation of Procrustes transformations, the Protest, and the Mantel test.
For congruence testing, the MDS solutions (ordination coordinates) for the paired comparisons and the sorting task were used as input data, and a superimposition plot was created. As a starting configuration, the MDS solution for the paired comparisons was specified and fitted to the target configuration, the sorting task. Unless stretching with a fixed parameter is carried out, it cannot be considered which solution serves as the starting configuration. The goodness-of-fit statistic is the same in both cases if the two tasks are reversed. However, if stretching is performed, the variance of both configurations should be standardized to 1 (Schneider and Borlund 2007). By standardizing the variances to 1, all variables and dimensions have the same weight in the adjustment process (Peres-Neto and Jackson 2001, 171). In the present case, this transformation was carried out (Oksanen et al. 2009). In addition, the scaling of the initial configuration’s axes was allowed.
The permutation procedure Protest implemented by Jackson (1995) was then used to assess the statistical significance of the Procrustean fit. Based on random permutations of the original data, this procedure allows one to determine whether the observed m2 value is smaller than expected due to chance (Jackson 1995).
The Procrustes transformations produced results indicating significant agreement between the two perceptual spaces, with a correlation of 0.63. A few cities—such as Athens, Belgrade, Salzburg, and Zurich—contributed to the existing variance in above-average ways.
Protest shows that there is a significant similarity between the two three-dimensional configurations—the sum of squared residuals (m2) is 0.598 (which can be described as average), with p ≤0.001 based on 1,000 permutations.
Figure 1 shows the residuals for all dimensions and each city with the length of the lines describing the sizes of the residuals. The mean of the residuals is 0.119. It is evident that the positions of Athens (0.183), Barcelona (0.135), Belgrade (0.231), Brussels (0.150), Krakow (0.133), Dublin (0.157), Lisbon (0.155), Madrid (0.145), Munich (0.158), Oslo (0.130), Salzburg (0.195), Warsaw (0.155), Zurich (0.214), and Vienna (0.124) differ substantially in both configurations. By contrast, there are hardly any deviations for the cities of Berlin, Dresden, Florence, Hamburg, Helsinki, Naples, Prague, Stockholm, Venice, and Paris.

Residuals for each city.
In Figures 2 to 4, the residuals are displayed as vectors for each dimension with the arrows pointing to the target configuration (sorting task) and the points marking the initial configuration (pairwise comparisons). The longer the vectors are, the greater the difference between the two MDS solutions. Cities such as Paris and Prague have small residuals, indicating a close match between the two data sets, in contrast to cities having larger residuals such as Athens and Warsaw.

Superimposition plot: dimensions 1 and 2.

Superimposition plot: dimensions 1 and 3.

Superimposition plot: dimensions 2 and 3.
One can also detect the direction of the differences along the respective dimensions. Schneider and Borlund (2007) point out that the interpretation of such vectors should be carried out with care, because the MDS plots’ axes are generally hard to compare. The interpretation of the dimensions can be facilitated by using additional information such as group labels gained from the sorting task. This approach is applied in the application section of this article.
In Figures 2 to 4, each city is represented by a vector, the length of which is proportional to the correlation of the city’s position in both ordination results. In the presentation of the first and second dimensions (Figure 2), the differences for Athens, Belgrade, Dublin, Brussels, Lisbon, and Krakow are notably large compared to the other cities. In contrast, there is hardly any difference in the position of Oslo and Milan.
When inspecting the graphical representations of dimensions 1 and 3, Salzburg, Belgrade, Warsaw, Munich, Dublin, and Lisbon show large discrepancies, whereas the position of Nice, Amsterdam, and Athens is very similar in both ordination results.
Figure 4 shows the differences in the ordination results of dimensions 2 and 3. Here the cities of Salzburg, Athens, Belgrade, and Munich exhibit major differences. There is great correspondence in the position of Reykjavik, Paris, Prague, Bratislava, and Rome.
As experience in terms of already visited cities is a crucial factor of image formation, it must be considered when interpreting the superimposition plots. Remarkable differences between the two samples occurred for the cities of Vienna, Stockholm, Helsinki, and Dublin. Concerning preferences, significantly different ranks were observed for the cities of Hamburg, Madrid, Naples, and Venice (Table 1). Dublin, Madrid, and Vienna are among the cities with less congruence between the two ordination results, while the position of Helsinki and Venice hardly differ in both configurations. Consequently, traveler experience as well as different preferences should not or at least not be solely responsible for the different position of these cities.
Application
Figure 5 exemplifies the image positioning of cities on the basis of the aforementioned sorting data. To facilitate the interpretation of the dimensions, additional qualitative analysis was carried out.

City positioning based on the sorting task.
Since respondents chose their own criteria for their sorts (a total of 959 groups were sorted and almost every group was labeled differently), in a first step, content analysis was performed in order to identify and harmonize common group labels. Four researchers from the field of tourism independently classified the sorts according to the group labels. A similar approach was proposed by McCauley et al. (2005). Finally, the 959 group labels were reduced to 38 common attributes (e.g., boring, business, charming, cultural, exciting, food, historical, popular, relaxing, romantic, trendy, ugly).
In a second step, each attribute’s frequency was correlated with each dimension. Since the axes do not correlate in principle, bivariate coefficients may be calculated (Borg and Groenen 2005). A similar procedure was proposed by Faye et al. (2004). Figure 5 exhibits that the southern European cities are grouped together. Also the Eastern European cities are situated close together and can be clearly distinguished from all other cities, whereas the northern and western European cities are very close to each other. Cities perceived very similar by respondents are as follows: Gothenburg and Dresden, Budapest and Tallinn, Venice and Florence, Rome and Barcelona, Helsinki and Oslo, as well as Amsterdam and Berlin.
With regards to the labeling of the dimensions determined through the above-mentioned correlations, dimension 1 can be described in terms of touristic importance or popular cities. Popular touristic cities are grouped close together (Barcelona, Rome, Madrid, Paris, London, Amsterdam, and Berlin). In contrast, cities such as Belgrade, Bucharest, Warsaw, Sofia, Bratislava, Budapest, and Tallinn are perceived as less popular. This interpretation is consistent with the results of the qualitative analysis of the group labels. The term favorites highly correlates with dimension 1 (Pearson’s r 0.599) as well as the term popular (0.733). The term not interesting shows a negative correlation (–0.838) with dimension 1. Accordingly, the cities of Paris, London, Amsterdam, Rome, and Berlin are among the most popular city break destinations.
The interpretation of dimension 2 seems to be slightly more complicated. The detailed inspection of the group labels used by respondents leads to the conclusion that the cities situated in the upper region of the perceptual space (Dubrovnik, Bologna, Naples, Venice, Nice, Florence, and Valencia) can be described as calm and relaxing. According to the respondents, these cities can be characterized as classic summer holiday destinations, whereas the cities situated at the bottom of this dimension (Hamburg, Berlin, Amsterdam, Brussels, Zurich, Munich, and London) are often sorted into groups that can be generally designated as business destinations. The group names business (–0.537) and trendy (–0.545) are highly correlated with this dimension.
Discussion and Limitations
This study investigates whether the two methods of collecting direct similarities, sorting and paired comparisons, produce similar perceptual maps and how the two data collection modes are perceived by respondents. Based on the literature review, this article first demonstrated that the choice of the data collection method is critical for image-positioning studies, especially in the case of large stimulus sets.
Then, the two data collection procedures were subjected to the empirical investigation. As a result, the proposed research questions can be answered as follows:
The two modes of data collection differ in terms of completion time, perceived task difficulty, and variety. Concerning the ordination results, a difference in goodness of fit was discovered. The two ordination results did not differ in terms of their optimal dimensionality and recovered stimulus coordinates.
The study’s key findings are summarized in Table 4, which compares the data collection procedures using formal evaluation criteria proposed by Bijmolt and Wedel (1995). Overall, results indicate that the sorting task is superior to the reduced paired comparisons.
Summary of Results.
Note: ++ = excellent, + = good, +/– = medium, – = poor.
As far as the subject-related variables are concerned, the reduced comparison performed slightly better than the sorting task. Regarding task difficulty, the paired comparison task is perceived as easier to complete than the sorting task. In contrast, the sorting task is considered much more varied and interesting, despite a considerably longer processing time (a mean of 16.7 minutes, compared to 8.7 minutes for the pair comparisons). Thus, the two samples’ termination rates differed notably. While for the sorting sample, 184 of 205 questionnaires were completed, only 281 out of 372 complete questionnaires were available for the paired comparison sample.
Similarity judgments were evaluated based on completion time, missing values, and data quality. The comparison of the stress values as a quality criterion indicates that the MDS solution of the paired comparison data is worse for all dimensions. The number of missing values was much higher for the paired comparison data.
Concerning recovered stimulus coordinates, there was much consensus between the two perceptual spaces. The results of Procrustean transformations showed a significant match (a correlation of 0.63) of the two perceptual spaces. For both MDS solutions, the optimal number of dimensions according to stress levels did not differ as well. Concerning goodness of fit, the sorting task revealed slightly better results than the paired comparison task.
Consequently, the choice between sorting and reduced paired comparisons depends on the specific application. In the course of this study, the paired comparisons were reduced by restricting the dissimilarity judgments to those cities included in the consideration set of individual subjects. Hence, the paired comparison task was swifter and was perceived as easy to complete. On the other hand, the data quality was worse owing to the large amount of missing data. In the case of a full paired comparison task, sorting is the quicker method and causes less respondent fatigue and boredom.
Furthermore, an important consideration is how far one should reduce paired comparisons. In this study, the reduction, from 946 to at least 10 (5 cities) and a maximum of 45 (10 cities) judgments, was substantial. However, the results indicate that this reduction does not affect the data quality, since there was congruence between the two different perceptual maps.
Nevertheless, difficulties can arise with this approach. For instance, the number of missing values can be relatively large, since the cities that are not known to the respondents are not rated. In the literature, this data type is known as censored data (Malhotra 1986). MDS programs for so-called pick-any data were not suitable in this case, since the assumption that unselected alternatives are considered as rejected must be satisfied. Thus, if perceptual spaces are constructed, respondents are situated close to the chosen objects. But there are no hints as to where all other (unselected) objects are located. In this case, the close location is necessary but not sufficient to infer preferences (Holbrook, Moore, and Winer 1982).
Another important aspect in the context of reduced paired comparisons is the extent to which the different compositions of stimuli (sets of cities) influence assessments. For positioning, the objects of comparison are important. If there are different sets of cities available for comparison, the results may change. Green and Carmone (1970) describe several studies that explore these effects, and conclude that this difference in composition leads to configuration changes. This problem can be solved when examining the results of the four leading cities (London, Paris, Rome, and Vienna), which were rated by all respondents.
Yet another important point is the retrieved information. Sorting data can also be analyzed at a disaggregated level, which permits considerations for individual subjects and segments. Because of the large amount of missing data, this analysis is not possible for reduced paired comparisons. Moreover, additional information (e.g., group size, sorting criteria in terms of group labels) is available for the sorting data, which facilitates the interpretation of the perceptual maps, as was demonstrated in the application section. For a detailed discussion, see Coxon (1999).
From the results of this study and the above-mentioned aspects, it is recommended to use reduced paired comparisons, unless missing data is not an issue. On the other hand, if authors are planning further analysis using statistical methods that are prone to incomplete data, sorting should be the method of choice. From the respondent’s point of view, both methods are practicable and easy to perform.
Regarding limitations, it must be recognized that the empirical setting of this research was an experiment. However, as with all empirical work, the results must be considered in light of possible interference that could not be entirely ruled out by the experimental design. There were two independent test samples, and the allocation to the two groups was randomized, but there were no before and after measurements. Furthermore, since the questionnaire was Web-based, representativeness (coverage sampling error and error) cannot be assumed. However, the study does not claim to be representative, since it focuses on the methodological aspects of the two data collection procedures.
Conclusion
Many researchers are still convinced that the rating scale method is most accurate for assessing destination image. This article aimed to illustrate the disadvantages of this method and provided insights into alternative data collection procedures. The results presented and their implications demonstrated that sorting as well as reduced paired comparisons are adequate methods of data collection in the case of large stimulus sets. The study further presented each method’s benefits and drawbacks. These results are consistent with those of Bijmolt and Wedel (1995), Rao and Katz (1971), as well as Henry and Stumpf (1975), and therefore support the need for alternative data collection methods in the case of large stimulus sets.
Furthermore, this article introduced Procrustes analysis, which is well established in the natural sciences but is rarely used in the social sciences. This method proved to be a useful tool for assessing the congruence between multivariate data sets in this study. There is an array of possible applications of Procrustes analysis and this study is a first step in demonstrating the statistical performance of Procrustes in the context of destination positioning.
The results of this research are applicable whenever a large number of objects have to be judged and compared. This is particularly the case for positioning and competitiveness studies. In these cases, the length of questionnaires as well as task difficulty and the fun factor have serious consequences. I believe that the empirical investigation of data collection methods constitutes an important contribution to the application of MDS in tourism science and practice. MDS is an important tool for destination positioning but, as the literature review reveals, is rarely used in the course of destination image studies.
Therefore, I suggest that future research should focus on gaining deeper insights into suitable designs for the reduction of the number of judgments included in the paired comparison task. New interactive online survey tools have the potential to offer innovative ways to create different stimuli reduction designs. Furthermore, to test whether the approach presented here applies to image studies in general, replication studies should be carried out.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
