Abstract
Online consumer behavior has become a valuable and viable source of consumer insights. Consumer comments in online forums, or discussion groups, have proven useful as a source to extract brand similarity data from. Apart from the cost and speed advantages, such data can be captured easily over different time periods. Both online consumer-generated data (CGD) and surveys have their pros and cons. To date, little is known as to how these two data sources compare in terms of brand insights. In this study, we discuss the results from analyzing survey and consumer-generated online data pertaining to the U.S. skincare market. Our study included 57 brands, and we used multidimensional scaling (MDS), t-stochastic neighbor embedding (t-SNE; an alternative to MDS), hierarchical clustering, and additive similarity trees (an extension of hierarchical clustering) to analyze the data. We show that the outcomes vary between CGD and surveys. As an additional insight, we show that, rather than the spatial scaling methods, additive trees result in a much better fit of brand similarity data in cases where we have many brands.
Keywords
Introduction
Practical market structure analysis and perceptual brand maps are typically derived from survey data (Cooper, 1983; Green, 1975). Respondents are presented with pairs of brands and are asked to rate them based on perceived similarity. Such data are then analyzed using a spatial mapping analysis like multidimensional scaling (MDS) approaches (e.g., Kruskal, 1964a) or a tree-type analysis approach such as hierarchical clustering (e.g., Johnson, 1967; Ward, 1963). However, spatial maps have been said to be somewhat more useful relative to hierarchical trees (Johnson & Hudson, 1998). In practice, both are often combined. For example, an MDS analysis may be performed to produce a spatial representation of the brands, followed by a tree analysis to help in demarcating the domains in the spatial map.
Brand similarities can now be derived from online consumer-generated data (CGD), such as web search queries (e.g., Won, Oh, & Choeh, 2018), discussions on product review websites (e.g., Lee & Bradlow, 2011), clickstream data (e.g., Ringel & Skiera, 2016), and online forums and blogs (Netzer, Feldman, Goldenberg, & Fresko, 2012; Vriens, Vidden, Chen, & Kaulartz, 2017) to study market structures and perceptual brand maps. However, apart from the Netzer et al. (2012) result, little to the authors’ knowledge is known about how perceptual brand maps differ between online CGD and survey data in both terms of fit and interpretation.
This article sheds light on this issue by comparing the results of perceptual brand analyses of similarities between 57 skincare brands using both online CGD and survey data collected in the overlapping time period. The dominant analysis approaches for similarity data include MDS approaches and tree-based approaches such as hierarchical clustering. There are many MDS varieties but they all have in common that they assume that consumers’ perceptions of similarity arise from the positions of the stimuli on a set of underlying continuous dimensions. Tree-based approaches don’t make such an assumption and assume that two stimuli can be deemed similar because they have an unknown discrete feature in common. Here too, many varieties exist. We apply two spatial scaling methods and two hierarchical tree methods. With both the spatial approaches and the tree-based approaches, we have a standard variety and an alternative variety that we expect might be better able to deal with noisier data and with larger brand sets. We compare the results on (a) fit (i.e., Kruskal’s Stress-1 is an accepted standard to evaluate what is deemed acceptable/reasonable/good), (b) the number of outlier brands, and (c) similarity of the solutions in terms of interpretation.
Online unstructured CGD versus survey data
Conceptually, there are several differences between the online and survey-based approaches that could affect the quality and usefulness of the results.
Pros and cons of unstructured CGD
The verbal comments and discussions that consumers have on brand review websites, forums, and blogs contain errors and typos. Such unstructured data need to be cleaned and text-mined before we can subject it to further analysis. This may result in noisy data and may affect performance of MDS solutions in a negative way (Bijmolt & Wedel, 1999). We also cannot control the “sample” of consumers who are generating the comments, which raises questions of representativeness and prevents the ability to focus on sub-groups (e.g., brand users vs. brand non-users). However, CGD offers several benefits that make it an attractive alternative to surveys:
The data can be scraped from online sources (forums, blogs, discussion websites, etc.) at any time.
CGD has a time stamp enabling evaluating the impact of campaigns or assessing the impact of brand crises. For example, Vriens et al. (2017) used data from the Internet from both before and after the VW Diesel emission crisis to assess the damage to the VW brand.
It is easy to zoom in and out (i.e., if certain brands are missed, we can easily capture additional data over the same time period on these overlooked brands).
Finally, we can study larger brand sets. The number of brands that can be studied with surveys is limited, as the pairwise comparison task becomes too tedious. Most MDS applications in marketing have used less than 20 brands. For example, in the studies of Bijmolt, Wedel, Pieters, and DeSarbo (1998); Bijmolt and Wedel (1995); and Bijmolt and Wedel (1999), the highest number of brands used was 18. In the three studies of Hodgkinson, Padmore, and Tomes (1991), 16, 15, and 20 brands were used, while Vriens et al. (2017) used 15 cars brands and Won et al. (2018) used 20 car brands. Using CGD, Netzer et al. (2012) studied 30 parent car brands, although about 150 specific car models were included.
Pros and cons of surveys
Survey-based direct similarity data are frequently using some sort of similarity exercise (e.g., pair-wise comparisons). Survey-based similarity data have several disadvantages:
Fatigue and data quality. The larger the set of brands in the study, the longer, more tedious the task. Consumers can get fatigued as they work through a tedious and seemingly repetitive pairwise similarity judgment task. Bijmolt et al. (1998) found the effect of serial position (a proxy for fatigue) to be small. Respondents were asked to go through a task of 66 (Study 1) and 20 (Study 2) pairwise similarity judgments. Johnson, Lehman, and Horne (1990) showed a larger impact of fatigue. Fit from similarities later in the study was higher, suggesting less complexity in respondents’ judgments. It seemed they used a simpler similarity assessment strategy. When it comes to larger brand sets, fatigue is more likely.
Simplification strategies. The larger the set of brand pairs, the more likely it is that respondents may engage in task simplification strategies to manage a tedious task.
Familiarity. Consumers differ in familiarity with certain brands. It seems logical that the larger the set of brands in a study, a greater number of consumers will be unfamiliar with one or more brands. When such brands are presented, they may simply not know how to respond and may give uninformed judgments to those pairs. Familiarity was found to have a significant negative impact on similarity judgments in the Bijmolt et al. (1998) study. We would expect that this is much less of an issue with CGD, as consumers would not comment on unknown brands. Larger brand sets are also more difficult to map because the dimensionality is higher. In other words, it is more difficult to map 50 brands in a two-dimensional space than it is to map 15 brands. This would be true for both survey data and CGD.
Methodology and data
We executed two studies: one using an online survey-based approach and another using unstructured CGD extracted from the Internet. Both online and survey data were collected by Ipsos, on the product category of female skincare in the United States. We studied 57 skincare brands (see Appendix 1 for the full list of brands) within the same time period of October 2016.
Online CGD
To scrape the CGD, we used a methodology that mirrors the user’s experience when using search engines. This allowed us to capture all websites with relevant publicly available data on the product category. As a result, the unstructured CGD was captured from a large variety of websites. These were cleaned and text-mined, so adequate brand co-mentions could be derived (for details, see Netzer et al., 2012; Vriens et al., 2017). In total, 139k comments were extracted. In previous research (e.g., Netzer et al., 2012), the degree to which brands were co-mentioned was used as a measure of similarity between brands. We applied the metric used by Netzer et al. (2012). Co-mentions indicate the number of times each pair of brands is being mentioned together across comments. The raw number of co-mentions is adjusted for the total number of mentions a brand has. Popular brands will have a high number of mentions and, as a result, will have a higher number of co-mentions. For example, in our study, the brand Algenist appeared in 8,931 comments, whereas Aveeno (a mainstream brand in the United States) was mentioned in 87,337 comments. To measure the similarity of two brands, we count the number of co-mentions of pairs of brands. Consider Table 1 as a simple example with only four total brands:
Co-mentions example.
This example shows that simply using the raw co-mention count to measure similarity is insufficient. That is, while Volkswagen and Toyota are co-mentioned more often than Audi and BMW were, consumers are far more likely to discuss Volkswagen and Toyota (3,000 total mentions) than Audi and BMW (500 total mentions). To give a fair comparison of co-mentions, we normalize via the lift metric. Lift is the ratio of actual co-mentions to expected co-mentions as given by the following formula.
where P(A) is the probability that Brand A is mentioned and P(A
After these adjustments, the matrix of adjusted co-mention measures can be analyzed with standard MDS or other analysis methods. There are other metrics that can be used, for example, Jaccard coefficient, term frequency–inverse dominant frequency (td-idf), and so on. Netzer et al. (2012) found these to be highly correlated with the lift measure.
Survey data were collected via online interviews among a sample of 502 U.S. respondents. To qualify, we screened for female consumers between ages 18 and 49 who had used a skincare product within the past 3 months. The average age was 35 years old with a standard deviation of 8.1 years. Survey respondents then completed a similarity exercise. The collected data were averaged to obtain an aggregate level similarity matrix.
Data analysis
To create a two-dimensional perceptual map, we use the SMACOF model: a non-metric MDS approach that uses majorization (de Leeuw & Mair, 2009) and is the standard MDS model to analyze similarities. In this model, Kruskal’s Stress-1 is minimized. Stimuli (in this case, brands) are mapped into a lower-dimensional space (in our case, a two-dimensional space) in such a way that the inter-brand distances are, as much as possible, rank-wise in line with the brand similarities.
In our study, we have 57 brands: that is, a much larger brand set than we have encountered in the marketing literature, and larger than we have encountered in the commercial studies we have been involved in. The larger the number of competing alternatives (i.e., brands), the more difficult it is to map these brands into a two-dimensional space. This means that (a) the standard fit metric n (Kruskal’s Stress-1) may become unacceptably high (i.e., higher than say 0.10), (b) more brands may be likely positioned in a counter-intuitive area of the map or are positioned far from the brand to which they should be close to, and (c) worst-case scenario, we may get a degenerate solution (Buja et al., 2008). We have a large brand set, and we have CGD: both factors can be expected to result in noisier data. Hence, we also included in our analysis a spatial approach that may be better equipped to deal with noisy data and large stimulus sets. We chose t-stochastic neighbor embedding (t-SNE), a method developed by Van der Maaten and Hinton (2008) and Van der Maaten (2014). The technical differences and similarities are outside the scope of our article. Conceptually, MDS, in trying to minimize Kruskal’s Stress-1, will seek to keep dissimilar brands far apart (as not doing so would result in steep increases in the Stress-1 value), whereas t-SNE is designed to keep very similar brands close together. We could refer to this difference as global (MDS) versus local (t-SNE) optimization. 1
Also, it is feasible that with larger brand sets, hierarchical clustering and other tree-based approaches may better reflect the data generating mechanism. Hodgkinson et al. (1991) analyzed similarity data from three studies: newspapers (16 brands), shops (20 brands), and cereals (15 brands). In all three datasets, hierarchical clustering resulted in a lower Kruskal’s Stress-1 than MDS. They also tested additive trees. Hierarchical clustering (Johnson, 1967; Ward, 1963) uses the ultra-metric inequality theorem: in practical terms, this restricts the branches within a cluster to be of equal length (i.e., same distance to each other). The larger the brand set, the more difficult it is to keep intra-cluster distances the same. The additive tree approach (Sattah & Tversky, 1977) allows intra-cluster distance differences and, therefore, is more flexible. This feature may result in better (lower fit, etc.) results. Hence, we also analyzed the data using the additive trees approach (Sattah & Tversky, 1977).
Evaluation metrics
We evaluate the various solutions on:
A standard metric to evaluate MDS solutions is Kruskal’s Stress-1 (Kruskal, 1964a, 1964b; see also Groenen & Borg, 2015):
where d(ij) is the dissimilarity between stimuli i and j (in our case, brands i and j), and where δ(ij) is the Euclidean distance between brands i and j. Kruskal suggested that as a rule of thumb, Kruskal’s Stress-1 should be <0.10 to be deemed fair or good (Kruskal, 1964a, p. 3, see also Doyle, 1973).
The number of outlier (mis-placed) brands. This metric is calculated as follows: we have 57 brands, which means we have 57 × (56) / 2 = 1,596 pairs of brands for which we have a similarity number. For brands with high similarity, the brands should lie closely together on the map, while brands that are not very similar should be further apart from each other in the map. We calculated the outliers as follows: (a) We compute a Shepard diagram, (b) compute how much deviation each pair of brands is from the monotonic regression line (deviation = [real y value – predicted y value] / real y value). Any pair that has a deviation larger than 2 is considered an outlier.
Visual inspection. We visually interpret both the CGD and survey-based solutions, then assess to what degree these are similar and whether they offer different insights. We limit ourselves to comparing the SMACOF and additive trees methods, as our goal is not to compare analytical methods, but to better understand similarities that align from different data sources. The results of CGD and survey data are then compared by method (e.g., we compare the CGD SMACOF solution with survey-based SMACOF solution, etc.).
Results
Table 2 shows how well the similarity data can be fitted by the various analysis approaches.
The fit of the two-dimensional solutions.
CGD, consumer-generated data; MDS, multidimensional scaling.
First, the spatial methods do not perform well, as evaluated by the general rule of thumb that Kruskal’s Stress-1 should ideally be 0.10 or less (this threshold was recommended by Kruskal, 1964). It is not even close. Although not shown here, a three-dimensional and even a four-dimensional solution still does not give us a good solution. Furthermore, this finding holds true for both solutions based on survey data and those based on CGD. The tree approaches perform better. The additive trees approach is the only method to perform adequately for the survey data, although for the CGD, it still does not manage to get down to 0.10.
Second, we look at the number of misplaced brands. We can again see that the numbers in general are consistent with the Kruskal’s Stress-1 results. Note that under additive trees analysis using survey data, there are zero misplaced brands. The percentage of misplaced brands may not look so bad if we are only interested in an overall perspective on the structure of the market. However, if your client’s brand is among those misplaced, it will be difficult for them to accept the results (e.g. see Vriens, 2012).
Third, we visually inspect the results. We only show SMACOF and additive tree results. SMACOF conclusions were similar to t-SNE and the hierarchical clustering solution was similar to the additive trees solution. Figure 1(a) and (b) shows the SMACOF map, as it was the best spatial method, and the additive tree map, as it was the best tree-based solution.

(a) Two-dimensional (non-parametric) SMACOF solutions and (b) additive tree solutions.
Comparing the two SMACOF maps, we can see that the two solutions are quite different. For example, Exuviance and Marie Badescu are positioned closely in the CGD map but are quite far apart in the survey-based map. This is an example of a misplaced brand, as the number of co-mentions is very low and hence these brands should not be positioned close together Also, in the CGD map, Filorga, Biafine, and Lierac are quite separated from the other brands, whereas in the survey map, they are positioned more in the middle. The two additive tree maps look quite different, as well. It is immediately apparent, if we visually take in the solution holistically, how dissimilar both solutions look. Even at the specific brand level, we can spot several differences. For example, Perricone is in a different cluster under CGD than under the survey-based solution. Clean & Clear and Pro-active belong to the fourth CGD cluster, whereas in the survey-based solution, they fall in different clusters.
Conclusion
This is the first study to our knowledge that compares survey and CGD perceptual brand maps for data collected in the same time period on the same set of brands. We also compared spatial brand maps and tree-based approaches. We collected data on a complex product category: female skincare—a market characterized by many brands and both brand and specific products can differ on many marketable attributes. This makes this a challenging market for market research.
We used MDS, t-SNE, hierarchical clustering, and additive trees to analyze the data. First, we found that MDS (SMACOF), t-SNE, and hierarchical clustering were unable to get Kruskal’s Stress-1 down to a level generally used in academics and industry, which is less than 0.10. Although the Kruskal’s Stress-1 values for the spatial solutions were much higher than generally deemed acceptable (e.g., Doyle, 1973), recent studies have pointed out that as the number of stimuli (brands) increases, the usual rule of thumb of 0.10 may not be adequate anymore (e.g., Mair, Borg, & Rusch, 2016). They propose methods to derive acceptable thresholds unique to the data at hand. However, the number of outliers in most solutions in our study remains an issue.
Comparing the survey and CGD maps derived from MDS (SMACOF) showed significant differences. Under the additive trees solutions, we believe the differences between survey and CGD look even starker. This, combined with the fact that the survey tree-based solution resulted in a Stress-1 lower than 0.10, is of interest. Although a Stress-1 level of 0.09 seems to be a good sign, a closer inspection of the results indicate that the tree-solution looks too clean. One hypothesis is that respondents engaged in a task simplification strategy, which resulted in predictable similarities (e.g., Bijmolt et al., 1998). Hence, when studying larger brand sets, it may be something to be aware of in a survey design.
Implications
We cannot generalize our results yet. Netzer et al. (2012) found a high correlation between their CGD MDS map and an MDS map derived from actual trade-in data but the correlation between their CGD map and a survey-based map was much lower. In our study, the correlation between CGD and the survey results was very low. CGD can be used as an alternative to surveys. However, we cannot simply assume that the different types of data will give similar insight, or that CGD will always provide a more useful insight. Brand maps from CGD may simply represent one view of the market. Even if the results cannot be considered “representative,” it is still important to know what the brand’s public position is. This is important for two reasons: (a) If this “public” position deviates from the firm’s intended position, then that may be an indication that the firm’s messaging is not working, or consumers do not care about whatever the firm thinks is a good selling point. For example, Dr. Dennis Gross highlights that their products are cruelty-free (i.e., no animal testing), yet this attribute did not show up significantly in the consumer discussions. (b) Marketers who know what prospective consumers read about the brand, as well as its’ discussed alternatives, can better inform and adjust their advertising efforts.
We believe CGD may be a better method than the survey-based approach. Netzer et al. (2012) derived a market structure map for the U.S. car market using CGD. This map highly correlated (.87) with a market structure map that was based on actual brand switching (trade-in) data but correlated only moderately with a map based on survey data (r = .43). This seems to suggest that market structure and brand maps may be more accurate than the usual survey-based brand maps using stated direct similarity data. In situations where we have a large brand set, surveys may induce task simplification. Also, with so many brands, it is inevitable that we are asking respondents to evaluate brands they know little or nothing about, which will further negatively affect the solution. When analyzing large brand sets, one will inevitably find misplaced brands. An added benefit of CGD is that we also capture attributes (associations) from the comments, allowing us to calculate the number of positive, negative, and neutral comments. For example, if we compare Dr. Brandt and Dr. Dennis Gross, we find several interesting results. At first glance, these two brands seem quite similar, having both originated from an academic/medical doctor (physician). First, if we compare both brands in terms of percentage positive and negative comments, we find that Dr. Dennis Gross has 30% positive comments and 30% negative comments, whereas Dr. Brandt has 48% positive comments and only 12% negative comments.
Related to the issue of misplaced brands, we recommend the use of additive trees as this method seemed to result in a much lower percentage of misplaced brands. The superior performance of additive trees was evident and consistent with Hodgkinson et al. (1991).
Footnotes
Appendix
The 57 brands used in the study.
| Clean & Clear | ZO Skinhealth | Lierac | Cetaphil | SkinCeuticals | Murad |
| EltaMD | Uriage | Carita | MD Complete | No 7 | GlamGlow |
| Obagi | Biafine | Exuviance | Garnier | Aveeno | Dr. Dennis Gross |
| NeoStrata | Guinot | Clinique | Biore | Juice Beauty | L’Oreal |
| Dermalogica | Vichy | Cevare | First Aid Beauty | Ponds | Dior |
| SkinMedica | Nuxe Paris | Shisheido | Neutrogena | Mario Badescu | Kate Somerville |
| La Roche-Posay | Filorga | Roc | Proactiv | La Mer | Algenist |
| Avene | Decleor | Fresh | Olay | Peter Thomas Roth | Estee Lauder |
| Ren | StriVectin | Clarins | Lancome | Bliss | Caudalie |
| Dr. Brandt | Perricone MD | Philosophy | Origins |
Acknowledgements
We would like to thank the productive feedback from two anonymous reviewers, and we would like to thank Paige Forde, Kassidy Steyer, and Stephen Brokaw for reviewing an earlier draft and providing valuable feedback. The data was collected with Ipsos. We thank Douwe Rademaker, Sandro Kaulartz, and Craig Rome for their help with the survey and consumer generated data collection.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
