Abstract
User generated content on web serves as a valuable source of information for both companies and consumers. Scholars have analyzed emotional polarity of the reviews to study customer satisfaction, but the dominant factors are not explained accurately by numerical ratings solo and the simplistic-categories of emotional polarity. This paper investigates the service attributes and detailed emotions effecting consumer satisfaction using deep learning, to explore how consumption satisfaction is influenced by emotions and what factors arouse the certain emotion. First, more than 120,000 online hotel reviews related were retrieved. Second, a novel and dataset-based seven-dimensional evaluation system, applying the BERT model was proposed. This solves the problem of polysemous words, and can more accurately reflect the service attributes consumers really care about. In particular, the analysis reveals that the overall consumer satisfaction is affected by key service attributes including service, cleanliness, equipment, price, location, internet and catering, among which the cleanliness attributes has the greatest impact. Lastly, the latest Kismet emotional recognition method was adopted to effectively identify the emotional polarity and 11 detailed emotions. The regression relationship between emotion and overall satisfaction was also verified, which enabled a more accurate analysis for consumption emotions and satisfaction.
Introduction
The rapid growth of WEB 2.0 applications, which empower Internet users and allow two-way communication in tourism, has generated an enormous number of online user-generated content (UGC) about hotels, travel destinations, and travel services [1]. Many studies have shown that these online reviews constitute an important source of information for consumers and companies [2, 3]. This type of information not only helps consumers to make purchasing decisions, but also assists corporate managers improving the quality of their products and services [4]. Online reviews have numerical attributes as well as text attributes [5], which contain much more information compared with simple overall ratings. By analyzing commentary text, one can understand the product and service attributes that consumers really care about [6].
Traditionally, most of the studies have used the semantic analysis of text or reviewer-related features to explain consumer satisfaction [7, 8]. There are only a handful studies referring to sentiment analysis of online reviews, and among those studies, the focus is polarity of emotions(namely positive, negative, or neutral) [9, 10]. However, many research have proven that a variety of emotional responses occur during consumption, such as joy, excitement, pride, anger, sadness, and guilt [11, 12]. Consumers’ decisions are influenced by those emotions produced by the service or products characteristics [13]. Unpleasant affective responses are related to lower satisfaction appraisal, while consumer satisfaction will be high when people are happy and cheerful in the process of purchasing or using products [14]. Scholars have noticed this emotional information is extremely essential to better understand consumer satisfaction judgment on certain aspects of product or service [15]. Online commentary, which is full of people’s emotional expression, has become an important source for research on consumer satisfaction [16, 17]. Companies and potential consumers can read these subjective comments to learn what the public think about products and services. Therefore, emotional analysis is an important and innovative method for bridging the relationship between service attributes and consumer satisfaction in order to propose suggestions for improving the product or service experience.
Accordingly, this research explores the service attributes that consumers really care about and their impact on consumer satisfaction from the perspective of emotional analysis of commentary text. This paper makes two contributions to the literature:
Firstly, through data mining and analysis, seven hotel service attributes that consumers really care about are clustered, and a new hotel service evaluation system based on these service attributes is constructed based on BERT model. Compared with other methods using word2vec model, BERT model can solve polysemous words issues in Chinese during the process of clustering to make the text analysis and emotion recognition more accurate. Experiments show that the 7-dimension evaluation system based on BERT model more accurately reflects the real feelings of consumers.
Secondly, one of the novel short and long-term memory and automatic encoder models (Kismet) is adopted to effectively identify the overall emotional polarity and 11 detailed emotions in the Chinese language. It is an innovative method to bridge the relationship between service attributes and consumer satisfaction by using sentiment analysis. Most of the known linguistic emotional recognition technologies are limited to one-dimensional emotional judgment and can only identify positive, neutral, or negative emotions, while Kismet can identify and analyze consumers’ multidimensional and detailed emotions, such as happiness, exciting, angry, disgust, anxiety, shyness and so on. Furthermore, it has revealed the detailed emotions are aroused by certain service attributes. Based on Kismet and regression model, the relationship between emotions and consumer satisfaction were also verified.
The paper is organized as follows: Section 2 critically reviews the relevant social sciences and marketing literature. Section 3 illustrates the data and research methodology. Section 4 presents analysis and findings. Section 5 summarizes the conclusions, and discusses the limitations of the study.
Literature review
The basic theory of customer satisfaction and consumption emotion
Customer satisfaction has been deeply discussed and there are a variety of definitions [18]. It’s broad consensus that customer satisfaction is the model of gap between customer expectation and actual perception. Giese and Cote define satisfaction as “response (emotional or affective) pertaining to a particular focus (product, consumption experienceand so on) determined at a particular time (immediately upon purchase, after consumption, based on accumulated experience)” [19]. Zeithaml defines customers’ satisfaction as “the important service attributes and measure customer’s perception of those attributes and overall customer satisfaction” [20]. PZB thought that the most significant factor that affects the customer satisfaction is not the gap but consumption emotion [21]. It was found that satisfaction is people’s subjective emotional feelings, so it is difficult to explicitly define. As early as 1969, there was a consensus that customer satisfaction depends on the perceived value of the product [22]. Kotler and Levy [22] examined customer perceived value customer satisfaction research, they believed that customer satisfaction depends on the perception of product value and evaluation of the product. Customer perceived value is the overall evaluation of the utility of the product after accounting for the customer’s acquisition cost. Fan [23] held that customer perceived value provides customers with a subjective cognition of the value of an enterprise’s products, and that customer perceived value has five significant characteristics, that is, it is subjective, multidimensional, hierarchical, and comparative and weight changing.
Menon and Dube pointed out that consumption emotion is the consumers’ emotional response to product attributes and value of the consumption [24]. Consumption emotion is generated by the consumption of goods and services, response to the attributes of products and services, as well as the customer’s perceived ultimate value [25]. Differing from the related affective phenomenon of mood, it is always described by the distinctive categories of emotional experience and articulated through the expression of joy, anger, and fear and so on [11]. It is an important factor affecting consumer satisfaction [26]. Consumer behavior researchers have examined the relationship between consumption emotion and satisfaction and recognized the need to go beyond the cognitive component in order to provide empirical evidence for the role of emotions in the situational formation of satisfaction judgment [11]. Burns and Neisner [27] examined the role of emotions in developing customer satisfaction in a retail setting.
It is generally accepted that consumption emotions are divided into positive and negative emotions, Westbrook pointed out that the positive and negative emotions are independent, that means the customer may experience the excitement, pleasure and other positive emotion as well as worries, anger and other negative emotions during the same service process [28]. This conclusion is that human emotion is a complicated psychological process and the different emotions have different impacts on customer satisfaction and assessment of service quality.
We therefore hypothesize H1.
H1: Most of the overall satisfaction with a hotel can be explained by the overall consumption emotion with the service.
The impact of hotel features on customer satisfaction
There is no doubt that consumer satisfaction remains at the heart of manager’s and researchers’ concern. Despite this interest, assessing the exact contribution of hotel attributes to consumer satisfaction remains a challenge [29]. Managers’ and academics’ interest in consumer satisfaction is explained by the numerous positive consequences satisfied consumers have for a firm or business such as fewer complaints, positive word-of-mouth, repeated purchase, commitment, behavioral intentions and loyalty [30]. Service attributes have been found to have an impact on consumer satisfaction. For example, Ekinci et al. found that both physical quality and staff behavior had a significant influence on consumer satisfaction but that staff behavior had a bigger influence [31]. In another study, Taylan Dortyol et al. found that tangibles (for example hotel lawn and green space, resort buildings), food quality, and reliability were the main service quality attributes influencing consumer satisfaction, with tangibles being the most important factor [32]. Brady and Cronin [33] proposed that the factors influencing consumer satisfaction with budget hotels mainly include the following aspects: attitude, behavior, occupation, environmental, design, social, waiting, tangible product, etc. The service attitude and behavior of the staff will inevitably affect the guests’ feelings about staying in the hotel. Customers in different occupations and of various social statuses have different measures for hotel satisfaction. Compared with white-collar customers, they will demand higher quality service and comfort, while blue-collar workers prefer the comfort of facility. Whether the hotel provides free or self-pay tangible products, such as hair dryers, disposable toiletries, water and other items, will directly affect the guest experience; Waiting times and booking efficiency also affect customer satisfaction.
Analyzing the content of 343 Trip Advisor sites, Lu and Stepcheneva found that key factors in hotel characteristics, which including environment, eco-friendliness, room, bathroom decor, customer service, and food quality are all affect customer satisfaction. They listed factors that customers were not satisfied with, including environment, noise, bathroom facilities, and room amenities, insect problems, booking process, management policy, natural landscape and service attitude [34]. More specifically, there were eight customer-rated attributes on the site’s evaluation platform: cleanliness, service and staff, room comfort, hotel facilities, convenience of location, neighborhood, value and room quality. The first four are key attributes of a hotel. The test of consumers’ experience satisfaction on these attributes can be determined by emotional factors such as satisfaction or dissatisfaction [35]. Therefore, we assume that the key attributes that have the greatest impact on customer satisfaction are cleanliness, service and staff, room comfort, hotel conditions, and so on.
We therefore hypothesize H2.
H2: Most of the overall satisfaction with a hotel can be explained by the consumption emotions with the critical service attributes of hotel.
Sentiment analysis of online comments
As one of the most active studies in the field of natural language processing, emotion analysis refers to the analysis of online comment texts to determine whether the emotional polarity of the text is positive, negative or neutral, or to identify whether the user’s opinion is in favor or against. The technology is widely used to predict product sales, political votes, and box office receipts and so on. The essence of emotion analysis is to guess whether words are positive or negative based on known words and emotional symbols [36]. A good handle of emotion analysis can greatly improve people’s understanding efficiency of things, and can also use the conclusion of emotion analysis to serve others or things. For example, many fund companies use people’s opinions and attitudes towards a company, an industry, and an event to predict the rise and fall of stocks in the future. Tumasjan [37] used Twitter sentiment analysis to predict election results, and Asur [38] used Twitter data, movie reviews and blog text to do sentiment analysis to predict movie box office receipts.
Current research on consumption emotion can be divided into positive and negative polarity, but human emotions are very complex, this simple binary classification may not be sufficient to thoroughly conduct research to the consumer emotions [39]. In addition, many users’ comments contain mixed opinions. They always affirm a certain point and criticize other aspects simultaneously, which brings difficulty to the judgment of the final emotion. The analysis method named coarse granularity, which only tells whether the overall emotional tendency of the comment is positive or negative, cannot accurately judge the real emotional attitude of users [40].
One emerging research area in the field of natural language processing, sentiment analysis commonly used online commentary texts for sentiment analysis to determine the emotional polarity of the text [41]. Sentiment analysis mainly includes three research levels: subjective and objective classification, emotional orientation classification (polar classification) and emotional intensity classification [42]. The study methods are mainly divided into two categories [43]: Research based on emotional word annotation (or semantic method), mainly relying on emotional dictionary and part-of-speech template for part-of-speech tagging, by analyzing features with emotional color words to judge the category of documents; Research based on machine learning or deep learning, mainly to consider sentiment analysis as the emotional polarity (or degree) classification problem of supervised learning, constructing classifiers by selecting appropriate emotional features, and realizing Emotional classification [44].
With the continuous improvement of artificial intelligence, deep learning has gradually become the main technology used for sentiment classification research. Huang [45] used the weakly supervised machine learning method based on bootstrapping to describe the characteristics and specifications of user reviews with automated text mode extraction and sentiment analysis. This method can be regarded as a semi-automatic method; Song [46] proposed an automatic identification method for unsupervised evaluation objects that do not depend on external resources. This method realized the function of automatically extracting the product name and product attribute from the evaluation objects; Na Rishang [47] fuzzily modeled the product review evaluations and emotions, established a consumer evaluation and emotion fuzzy corpus, and proposed a kind new product comprehensive evaluation and emotional calculation method; Kim [48] proposed a new sentence model using convolution neural networks, solving emotional analysis tasks, and achieved good results on multiple data sets; Socher [49] proposed a multiple Recursive neural network model such as RNN, MRNN, and RNTN, in which the syntactic structure of the text is considered in the neural network model. The results obtained at the sentence and the phrase levels are much higher than those with the reference system. Considering timing information between the words in the text, Tai [50] used the more complex long-short term memory to solve the problems with sentiment analysis; Liang Jun proposed a content extension method based on Weibo [51], which combines text and its comments to form a micro blog conversation for sentiment analysis. Wang [52] and others used GRU’s deep learning algorithm for e-commerce text sentiment analysis, to obtain the user sentiment analysis of Taobao e-commerce.
From the above research review, it can be seen that sentiment analysis has been gradually used from LSTM to GRU. This paper will use the methods adopted by predecessors to calculate the emotional index using the BI-LSTM+CRF model to obtain the emotional score corresponding to each feature. The model is further analyzed. Consequently, we hypothesize H3:
H3: Most of the overall satisfaction with a hotel can be explained by the detailed consumption emotion with the service.
Research methodology
The research steps are divided into four parts: data acquisition and preprocessing, text clustering, emotional processing of text and data analysis. The research framework of this paper is shown in the Fig. 1:

A flow chart of research methodology.
Because the original evaluation indexes of four service attributes can’t reflect the service content that consumers really care about, we cluster out seven new evaluation indexes of service attributes through text mining and analysis, and generate a new evaluation system of hotel consumer satisfaction. In clustering, we use the latest BERT model to generate word vectors, which greatly improves the accuracy of text recognition and effectively extracts high-frequency words representing service attributes. In addition, in order to determine more complex and detailed consumption emotions in Chinese text, this study uses the latest Kismet emotion analysis technology (the short and long-term memory and automatic encoder emotion recognition mode) to calculate the probability distribution of 11 emotions for each key service attribute dimension mentioned in each online comment. We try to find the emotional performance of consumers in each service attribute, and compare the impact of these emotions on consumer satisfaction.
Modified “jieba” Chinese word segmentation
Currently, the software that can provide word segmentation includes Boson NLP’s free and open Chinese semantic platform, Chinese lexical analysis system developed by the Chinese academy of Sciences (ICTCLAS, or NLPIR), and jieba word segmentation that supported by the Python and R languages. Jieba as Python Chinese participle components comes with a dictionary called dict.txt, which contains more than 20,000 words, including the number of occurrences (which the author himself has trained on sources like the People’s Daily corpus) and parts of speech. We compare a small number of manually arranged words with the jieba dictionary and add unregistered words to the jieba word splitter. A modified jieba word splitter is then used to mark the parts of speech for each sentence in the entire comment. Then every comment is then inserted into a structure like this [(” I “, “n”), (” love “, “v”), (” China “, “n”)], in which the words are paired with their corresponding parts of speech.
Statistical sorting
In order to obtain and identify high-frequency words, we ranked and trained the comment text after performing word segmentation, and calculated the priority for each word according to the frequency of words occurring between sentences. Through text analysis, high-frequency words including nouns, verbs, adjectives and other parts of speech can be identified for subsequent analysis.
Generating word vectors with BERT
Generally, scholars use word2vec to generate word vectors. However, word2vec is a unidirectional model and it cannot solve the problem of polysemous words, with which it is easy to lose some important information and cause errors in the results. BERT can solve the problem. BERT is a new bidirectional language model published by Google in December 2018. It adopts bidirectional synthesis to consider the contextual features and can clearly distinguish between the various meanings of a word. Therefore, BERT can perform a polysemous discrimination using contextual information. The data size required for BERT pre-training is very large, which greatly improves the accuracy of its text segmentation analysis.
Clustering
The word vectors are clustered using a K-means clustering algorithm. K-means is a typical clustering algorithm. Loss can be calculated at each step of the K-means algorithm, also known as sum of square errors (SSE). Loss value is calculated by taking the square of the distance between each cluster point and their center of mass.
Specify a maximum for the number of possible clusters. Then increase the number of K class clusters K from 1 to max, and calculate max SSE. By drawing the K-SSE curve and finding the inflection point on the way down, the K value can be well determined. For example, if K is 7, then the cluster is 7, but it should be noted that manual discrimination is required when classifying and naming all high-frequency words nous.
Visualization
Since the sorted words generate 128 dimensions, in order to visualize the data, we have to reduce 128 dimensions. T-SNE is a machine learning algorithm for nonlinear dimensionality reduction, which is suitable for reducing high-dimensional data dimensionality reduction to two or three dimensions for visualization.
Emotional analysis based on the kismet emotion recognition system
Currently, most known technologies for linguistic emotion recognition are limited to one-dimensional emotion judgment, namely positive, neutral, and negative emotions, and there is no effective way to judge multidimensional human emotions, such as pleasure, nervousness and shyness. The latest Kismet emotion recognition system which is based on short and long-term memory combined with automatic encoder to identify emotions can fill this gap. It is a method that can recognize the multidimensional emotions in language despite deficiencies in existing artificial intelligence technology.
In this study, we adopted Kismet emotion recognition method (Patent NO.CN 106598948 A, Fig. 2). It can recognize various complex human emotions such as pleasure, shame and anger through text analysis. By changing the traditional single deep learning neural network supervised training mechanism, and introducing the supervised neural network layer and step by step training, effective combination again many times, the original data to fully tap of grammatical relationship of implicit memory, which can effectively identify entrainment in the Chinese language of all kinds of complicated emotions.

The emotion recognition flow chart.
A long- and short-term memory neural network based on the automatic encoder emotional recognition method includes the following specific steps: Step A: Collect a large amount of text with positive, negative, neutral emotional labels; Step B: Enter the data into a double-layer neural network to establish word embedding, with the dimensional range of words embedded being 150–200; Step C: Ten to twenty percent of the words embedded in the data established in Step B are entered into a double-layered short-term memory artificial neural network for training to obtain positive, negative, and neutral emotional labels for the first time; Step D: The model trained in Step C is applied to the remaining 80%–90% of the untrained words embedded of the data to make positive, negative and neutral emotion predictions, and to get the corresponding positive, negative and neutral emotion labels, as well as the corresponding positive, negative and neutral emotion probabilities P+, P-and P*; where P + is the positive emotion probability, P- is the negative emotion probability, P* is the neutral emotion probability, and: P++P-+P *=1; Step E: Input 10% to15% of the positive, negative and neutral emotion probabilities P+, P- and P* data obtained in Step D and 10% to15% of the untrained 80–90% of the word embedded data into a five layer automatic compiler neural network for unsupervised training. The five layer automatic compiler neural network is defined Equation 1 and trained by SGD(stochastic gradient descent);
Step F: The untrained 85% to90% of the positive, negative and neutral emotion probability data obtained in Step D are input into the five layer automatic compiler neural network trained in Step E to get recombined features. Among them, the values of all hidden neural elements in the middle layer of the five layer automatic compiler neural network are taken as the input data of the next step; because of the hidden neurons in the middle layer neural network The quantity is far less than the word embedding dimension, and the model produces the process of dimension reduction; Step G: The untrained 85%–90% of the positive, negative and neutral emotional probability data obtained in Step D, the text data collected in Step A and the values of all the hidden neurons in the middle layer in Step F are divided into three groups according to positive emotion, negative emotion and neutral emotion. Each group of emotions corresponds to a double-layer long-term and short-term memory artificial neural network, which is respectively allocated to these three double-layer long-term and short-term memory artificial neural networks, separately trained in the network to get the result of detailed emotion recognition. For example, if the emotion tag of the data is happy, then the data will be assigned to the double-layer short-term memory artificial neural network corresponding to the positive emotion and the probability of emotion will be obtained after trained.
We used linear regression models to study H1, H2, and H3 (see Table 1). Linear regression model can reveal the results and cause relationship between the dependent variable and the independent variable. In every model, the dependent variable was the overall satisfaction of consumption. H1 will test the associations between overall consumption satisfaction and overall emotion polarity. H2-1 to H2-7 will test the associations between the overall consumption satisfaction and the sentiment score of the hotel service attributes dimension generated by BERT and K-means. H3-1 and H3-2 will test the associations between the overall consumption satisfaction and 11 detailed emotions.
The description of regression models
The description of regression models
Data collection and preprocessing
On April 28, 2019, we collected data ranging from January 2016 to April 2019 on Ctrip (http://www.ctrip.com) by Locoy Spider (http://www.locoy.com). The website is the largest comprehensive online travel service community and the most influential online travel review website in China. Consumers are only allowed to post reviews after actual consumption.
The data we collected are online reviews of four major economic chain hotels (Rujia Hotel, Hanting Hotel, Jinjiang star Hotel and the SVN Hotel) in China located four different cities (Beijing, Hangzhou, Xiamen and Guilin); the sample criterion was set as hotels nearby the scenic spots 3 km within the city.
There are three aspects of the data we need to collect: The numerical aspect including the overall rating of the hotel, and the original multiple rating of four dimensions (location, facilities, service and cleanliness) that Ctrip set beforehand; The review text itself and of comments, as well as the online comment; The ID of the reviewer and other additional information such as details of room type the reviewer booked.
As raw data, we obtained 120452 online reviews related to 56 hotels. The raw data contain all kinds of noise data and junk information such as colloquial expression, network language, popular language, date labels and emoticons. In order to sort out the customer comment text and remove the junk information that is not helpful for text analysis and classification, text preprocessing is a necessary premise.
For the Chinese text corpus of this study, the cleaning process is as follows: We define regularization restrictions, such as time format (2020Y-11M-11D), non-empty items evaluation, garbled code, etc. We use EXECL 2010 for semi manual screening and standardization, such as sorting the evaluation content to find abnormal data and empty values. For each store, we use standardized coding; we find that the evaluation time is not completely a standard time format; there are characters, which are intercepted by the substring method of EXECL. There is also some evaluation content, but the score is empty, which need to be handled separately.
After the cleaning process, we turn to preprocess the data: simplified traditional Chinese characters conversion, dropped words (including punctuation, numbers, conjunctions, some nonsense words and emoticons), filtered out the meaningless date tags, and corrected the misspelled words.
After data cleaning and sorting, the number of comments without missing value was 116,891.
Descriptive statistics of key hotel service attributes and 12 emotions
Descriptive statics of hotel service attributes clustered
One thousand words with high priority are selected for further analysis. The pre-trained BERT model is used to generate a 128-dimensional word vector for 1000 words that have been sorted. In this study, we need to find out high-frequency noun related to service attribute consumer concerned, 228 nouns were manually selected, and those with repeated meanings were deleted, leaving 168 nouns. The top 60 frequently-used nouns in hotel reviews are selected. It can be seen from the cloud map (see Fig. 3) that consumers are concerned about location, service, sanitation, traffic, front desk, reception, price, breakfast and so on.

The first 60 high frequency noun cloud image.
As shown in Fig. 4, our value range for K is [4–10], and the clustering effect is better when K = 7. (The inflection point of the elbow diagram is at k = 7). Through the clustering algorithm, we get the parameter k = 7, and gather seven large categories (the clustering algorithm puts together the words that he thinks the same category, and does not tell us the meaning of each category). With these seven categories, we will analyze the internal factors of each class, and then summarize and name the seven categories. In other words, we finally decided to cluster 168 words into 7 categories according to the basic theory discussed in Section II which can be named service, catering, equipment, cleanness, location, price and internet (Table 2).

K – SSE graph, the inflection point occurs when K is equal to 7.
Description of 7 categories of service attributes
We will then do visualization using t-SNE visualization. In this way, we will visualize 168 words from 128 dimensions to 2 dimensions (Fig. 5). From the Fig. 5, we can find that 168 words are grouped into 7 different colors, forming 7 categories.

Visualization of clustering.
Lastly, the opinions of each comment will be extracted according to the 7 categories of keywords existing after clustering.
After semantic analysis of the comment text, 174,058 effective comment units are obtained and we used Kismet to analyze the emotional identification of each opinion in each comment. In the sentiment polarity analysis, the distribution of emotional polarity is calculated (Table 3), positive emotions are defined as above 55 points, negative emotions are defined as below 45 points, and neutral emotions are defined as between 45 (inclusive) and 55 (inclusive).
Polarity distribution of emotion
Polarity distribution of emotion
Besides, the probability of 11 detailed emotions are recognized respectively which are “happy”, “shy” and “exciting”, “love,” “optimism,” “disgust”, “sad”, “fear”, “angry”, “surprise”, “anxiety”.
We finally get the results of 12 emotional scores (one overall polarity and 11 detailed emotions) for each comment on the seven dimensions of service, equipment, catering, location, cleanliness, price and network (Fig. 6).

Sample screenshot showing sentiment analysis.
In every model, the dependent variable was the overall rating score which ranged from 1 to 5, and represented the overall consumption satisfaction. OLS (The ordinary least square method) was suitable since the dependent variables were all continuous variables.
Regression analysis of sentiment polarity of consumer emotion(x) and consumer satisfaction(y)
After further screening, 109,736 comment units with corresponding emotional scores were obtained to test H1. The total rating score represents consumer satisfaction, which is a continuous variable distributed from one to five. Consumer emotion is expressed by the sentiment polarity value of calculated in V-A, which is 0–100 continuous variables. Both the T statistic and the F statistic are significant at the 1% confidence interval, so the test passes. This indicates that the sentiment polarity value of consumer emotion is positively correlated with consumer satisfaction. That means when consumer comments show positive emotional polarity in a certain dimension, their overall satisfaction is relatively high. When consumer comments show negative emotional polarity on a certain dimension, their overall satisfaction is low (See Table 4).
Regression analysis
Regression analysis
As mentioned above, the characteristics of hotel service attributes on which consumers really focus include service, catering, equipment, cleanliness, location, price and internet. The correlations of these newly generated dimensions were investigated, as shown in Table 5.The correlation coefficient are all less than 0.1 and most of the number are positive Compared with the correlation coefficient of the four original dimensions (namely cleanliness, equipment, location and service) provided by the Ctrip website (Table 6), this accurately reflect the actual attitude of customers towards certain aspects of the hotel, since the largest correlation coefficient of the original dimensions is 0.82, and the lowest is 0.71. The high correlation also shows that the original evaluation model is not reasonable in its division of dimensions.
Correlation coefficient of the new evaluation system
Correlation coefficient of the new evaluation system
Correlation coefficient of the original evaluation system
We further explore the relationship between emotion and consumer satisfaction in each dimension. By testing the H2-1 to H2-7, it is found that the regression of all seven dimensions is significant at 1% confidence interval. It can be determined that the degree of influence of consumers’ overall satisfaction on the seven dimensions is in the sequence of cleanliness, service, price, internet, equipment, catering and location (see Table 7).
The degree of influence of the seven dimensions
As show in Table 7, if consumers’ emotional polarity for cleanliness increases by 1%, their overall satisfaction increases by 1.6%. If their emotional polarity for service increases by 1%, their overall satisfaction increases by 1.4%. If their emotional polarity for price increases by 1%, the overall satisfaction increases by 1.1%. If their emotional polarity for the internet can increase by 1%, their overall satisfaction increases by 0.96%. If the emotional polarity for the equipment increases by 1%, their overall satisfaction increases by 0.91%. If the emotional polarity for catering increases by 1%, their overall satisfaction increases by 0.78%. Under the 5-point scale, these seemingly small increases can lead to significant changes in performance. For example, if the overall evaluation score is 4.5, and the emotional polarity of cleanliness can be increased by 1%, the overall satisfaction will be increased by 4.572.
It can be seen that users of budget hotels pay more attention to cleanliness since the users’ travel purpose is mainly mass tourism and business trips. Service has become the most critical source for user consumption experience. Price is the third key point to which consumers pay attention.
Eleven detailed emotions, including “happy”, “shy” and “exciting”, “love,” “optimism,” “disgust”, “sadness”, “fear”, “angry”, “surprise” and “anxiety”, were incorporated into the regression model as independent variables. Some missing data were processed to form a group of 109,308 effective variables. Statistical analysis of the correlation among the eleven variables, emotional polarity X and consumer satisfaction Y was obtained (Fig. 7).

Statistical analysis of the correlation among 11 detailed emotions, emotional polarity X and consumer satisfaction.
As can be seen from the Fig. 7, anger, anxiety and surprise are highly negatively correlated with emotional polarity; happiness, love and optimism are highly positively correlated with emotional polarity; and the correlation coefficients of the independent variables are very large. It is obviously caused by multicollinearity. To solve the problem of multicollinearity, we used the stepwise regression method.
By comparing the goodness of fitting, it was found that disgust, sad, shy, anger, surprise and happy, exciting and love, all have significant effects on consumer satisfaction had a better goodness of fit. In addition, there are multiple collinearity in economy and opt, and the influence of fear on consumer satisfaction is not significant and the goodness of fit cannot be increased. Using these eight variables as basic multivariate variables, the regression analysis resulted in the data show in Table 8.
Basic multivariate regression analysis
As we can see in Table 8, the goodness of fit was improved and the regression direction was consistent with correlation coefficient direction, indicating that the problem of multicollinearity was solved. Therefore, eight emotions are considered to be included in the regression equation. Among them, the influence parameters of disgust, sad, shy, anger and surprise on consumer satisfaction are –2.29, –1.63, –0.78, –0.69, –0.51. The influence parameters of happy, exciting and love on consumer satisfaction are 1.28, 0.58, and 0.24 respectively. Regression coefficients in all equations were significant, and the hypothesis was verified.
Through text mining and emotion recognition technology, as well as the regression analysis method, this study created a new hotel service evaluation dimension system, and specifically analyzed how the emotions generated by seven hotel service characteristics affect consumer satisfaction.
Firstly, since the simple average total point score does not reflect which dimension has the greatest impact on consumer satisfaction, we have improved the original consumer satisfaction evaluation index designed by the OTA (Online Travel Agency) website.Through commentary text analysis, we found that the existing four evaluation indices cannot thoroughly describe the service attributes about which consumers actually care. Therefore, we used the latest word vector model BERT and the in-depth learning clustering method to determine the seven service attributes about which consumers really care. Compared with the original evaluation system, which included “location, room, price and cleanness”; our analysis from the UGC revealed that there are seven hotel service attributes with which consumer are concerned: cleanliness, service, price, network, equipment, catering and location. Through investigation of the correlation of the seven hotel evaluation dimensions after clustering, it has proven that the correlation coefficients of the new dimensions are more reasonable than that of the old dimensions, which indicates that the seven dimension evaluation system more accurately reflects the real feelings of consumers.
Secondly, we found out there was a positive correlation between the polarity values of consumer sentiment and consumer satisfaction. The different degrees of the overall emotion polarity generated by different service attributes’ impact on overall consumption satisfaction were calculated. If consumers’ emotional polarity for cleanliness increased by 1%, their overall satisfaction increased by 1.6%; if their emotional polarity for service increased by 1%, their overall satisfaction increased by 1.4%; if their polarity of sentiment for price increased by 1%, their overall satisfaction increased by 1.1%.The dimension that has the greatest impact on the overall satisfaction of budget hotel service was cleanliness, such as whether the cleaning was done well, whether the rooms are smelly, whether the quilts and beds were clean, whether there were mosquitoes and cockroaches, etc. The primary function of budget hotels is enabling the consumers to sleep and rest well, rather than the enjoyment and leisure [49]. Therefore, the cleaner the hotels are, the more satisfied the customers will be. Cleanliness became the most important factor for consumer satisfaction. In addition, service and price have a big impact on consumer satisfaction. Budget hotel groups were more sensitive to price and should pay more attention to price. There is a gap between what consumers actually pay and what they are willing to pay, which in economic theory is called the consumer surplus [53]. The highly sensitive price elasticity means that when a small amount of price preference occurs, the customer’s utility will increase greatly, positively affecting their overall satisfaction. All of the three factors had the biggest impacts on consumer sentiment. Budget hotels should focus on cleanliness and service, strive to keep rooms clean, strengthen hygiene management, improve service attitude.
Thirdly, we also found out the relationship between detailed emotions and customer satisfaction as well as the reason for a certain emotion. Eight emotions are considered to be significant in the regression equation. Among them, the regression coefficient of disgust, sad, shy, anger and surprise on consumer satisfaction are –2.29, –1.63, –0.78, –0.69, –0.51. The regression coefficient of happy, exciting and love on consumer satisfaction are 1.28, 0.58, 0.24respectively. The results indicate that hotels should pay more attention on the factors leading to the eight detailed emotions, especially happy and disgust. If consumers’ happy emotion increased by 1%, their overall satisfaction increased by 1.28%; if their disgust emotion increased by 1%, their overall satisfaction decreased by 2.29%.Furthermore,our research has revealed what is the main reason for different kind of emotions. For instance, the top five sources of “disgust” are broken TV, inferior slippers, bad air conditioning, smelly toilet, and dirty floor. While the top five sources of “happy” are comfortable sheets, satisfied quilt, pleasant service attitude, complimentary breakfast buffet, and free park. So economy hotel should focus on cleanliness, service and the quality of the facilities to create positive emotional feelings and improve consumption experience.
The deficiency of this paper is that the sample selection was not comprehensive enough. In acquiring hotel review data, set the label in the major tourist attractions in four different cities within 3 km of four major economic hotels, so the analysis when the position is fixed, the influence factors become less important. Studies have shown that customers search for different purposes when choosing the hotel’s focus is also different. In future research, we will expand the scope of data acquisition and strive for comprehensive comparative analysis.
