Abstract
Traditional customer satisfaction research considers satisfaction judgments invariant to temporal distance. We conduct two experiments and a field study to show that the amount of time elapsed between a service consumption experience and its evaluation influences satisfaction judgments. We show that consumers rely on concrete attributes to represent near-past (NP) experiences and on abstract attributes to represent distant-past (DP) experiences (i.e., different construal levels). The findings indicate that construal mechanisms generate intertemporal shifts in the importance of the attributes driving satisfaction over time (Study 1), in the weights assigned to abstract and concrete attributes of a past service experience (Study 2), and in overall satisfaction judgments when abstract and concrete attributes perform differently (Study 3). Overall, the results provide support for the idea that satisfaction judgments shift over time as a result of the different psychological mechanisms that are activated as a function of the time elapsing between the service experience and its evaluation. Managers are advised to adopt longitudinal approaches to customer satisfaction measurement: An immediate assessment to capture customers’ evaluations of the performance of the concrete details of the experience and a delayed assessment to measure customer satisfaction with more abstract and goal-related features of the experience.
Measuring customer satisfaction has become common practice for service industries, and there are many different ways to collect satisfaction ratings. Mobile telecommunication service providers survey customers about their satisfaction with a call center immediately after they interact with a customer support team. Health care organizations often administer a satisfaction survey to patients at the end of their hospitalization. Other service providers, such as banks, adopt a slightly different approach, contacting their customers at discrete time intervals (typically months or even years apart) to measure their satisfaction with past service experiences. By measuring customer satisfaction immediately after service use, service providers are likely to maximize the response rate and accuracy of the recalled details of the service experience. Conversely, by measuring satisfaction after a delay, service providers are more likely to allow evaluations to solidify and to depend less on context. Although the managerial practice of measuring customer satisfaction at different points in time is common, the effect of the time elapsed since the service experience on evaluations is often underestimated or overlooked. This issue has received little attention from marketing scholars, and it is unclear whether and how an experience evaluation might change depending on whether the experience occurred in the near (e.g., 2 days earlier) or distant (e.g., 2 months earlier) past.
When choice precedes consumption, valuation of delayed consumption may either increase or decrease as time passes between the two (Chan and Mukhopadhyay 2010). Some research on discounted utility asserts that people exhibit positive discount rates, preferring things sooner rather than later (Ainslie 1975; Samuelson 1937), whereas other research suggests that people sometimes voluntarily impose a waiting period on themselves and drawing a positive utility from “savoring” the temporal separation between choice and consumption (Loewenstein 1987).
Scholars and practitioners have traditionally examined satisfaction cross-sectionally, paying more attention to interpersonal variations than to the longitudinal shift of the individual satisfaction judgment. The effect of time on the evaluation of a service experience has frequently been ascribed to respondent memory bias, an issue thought to be operationally solvable by administering the satisfaction instrument shortly after the service experience.
Only a few psychological studies have examined the temporal pattern of satisfaction, focusing on shifts occurring either in overall satisfaction (OS) judgments (Soman 2003; Wirtz et al. 2003) or in attribute-importance weights (Mittal, Kumar, and Tsiros 1999; Oliva, Oliver, and Bearden 1995), without univocal theoretical explanation. As a consequence, service marketing scholars have relied on context-specific or firm-related explanations for the observed temporal shift in satisfaction judgments (Mittal, Katrichis, and Kumar 2001).
The goal of this study is to show that the amount of time elapsed since a service experience affects the weights assigned to the determinants of the evaluation of that experience. We rely on construal-level theory (CLT) to develop assumptions about how evaluations of past service experiences evolve over time and then empirically test our hypotheses. According to CLT, individuals tend to construe distant-future events on the basis of more abstract information (high-level construals) and near-future events on the basis of more concrete information (low-level construals; Liberman and Trope 1998; Trope and Liberman 2000, 2003). For example, subscribing to a theater for the upcoming season might be construed in abstract terms, such as an enriching cultural experience and a social activity. By contrast, the day before the first show, the same event might be construed in concrete terms, such as choosing the appropriate dress and reserving a taxi. Therefore, representations of distant-future events tend to omit secondary features, enhancing the weight of those features related to their “desirability” (e.g., I want to cultivate my passion for the Opera), whereas representations of near-future events are rich in details that strengthen their “feasibility” (e.g., How do I get to the theater?). We argue that a similar process can be applied to the evaluation of past service experiences and that the weights assigned to the determinants of satisfaction judgments might change over time by virtue of the different mental representations of the service afforded by temporal perspective.
This research holds both conceptual and managerial value. We contribute to the satisfaction literature by offering a theoretical explanation for the intertemporal evolution in the evaluation of past consumptions. We offer managers new insights into customer satisfaction measurements. We design and conduct two experiments and a field study to show that (1) the importance of the attributes driving satisfaction shifts over time, (2) the weights assigned to high- and low-level attributes of a past consumption experience shift over time consistently with the temporal perspective adopted at the time of evaluation, and (3) OS judgments shift over time as a function of the different performances of high- and low-level attributes.
Theoretical Background
Customer Satisfaction and Time
The customer satisfaction literature defines satisfaction as a global evaluative judgment about product usage/consumption (Westbrook 1987) driven by the performance of a set of attributes and the overall evaluation of the service itself (Oliver 1997). Research in this field traditionally measures satisfaction using multiple attribute models (Oliver 1980; Woodruff, Cadotte, and Jenkins 1983), in which attributes (i.e., satisfaction drivers) are generated through qualitative research and subsequently used in structured questionnaires.
Traditionally, this stream of research assumes that the attributes customers use to evaluate the service are not affected by the length of time between consumption and evaluation. In other words, when evaluating a past experience (e.g., a long-distance flight), customers purportedly weigh the same attribute (e.g., food quantity) identically 2 days or 2 months after the service fruition. Recent research has questioned this underlying assumption. First, some studies have documented that customer overall judgments change over time (Wirtz et al. 2003). This might happen because preconsumption standards are replaced by new situational cues that shape OS judgments, so satisfaction unfolds over time as an interactive process of reelaboration of the available information (Fournier and Mick 1999) or because customers learn from past experiences when repeatedly exposed to the same consumption situation (McQuitty, Finn, and Wiley 2000). Alternatively, a shift in overall evaluation has been ascribed to the “rosy view” phenomenon such that the same event is anticipated and recalled more positively from a distal perspective than during the experience itself (Mitchell et al. 1997).
Second, other studies have acknowledged variations in the weight of the attributes that drive OS evaluations. Some researchers hypothesize an effect of temporal distance on the weight of these attributes (Jiang and Rosenbloom 2005), and others actually manipulate the temporal distance from the experience and the time of evaluation and report that different attributes influence OS in opposite directions. For instance, Mittal, Kumar, and Tsiros (1999) observe that the salience of certain attributes varies temporally as customers use the product (cars), so that their weights on OS judgments shift over time. Some attributes are extremely relevant to the purchase decision (e.g., a car’s color and styling) and heavily contribute to immediate postpurchase satisfaction judgments, but their relevance declines over time. Conversely, the relevance of other attributes is marginal at the time of choice but increases over time, either because they are more closely related to different consumption goals (whose salience is increased by consumption) or because their performance can be evaluated only after usage.
Although this body of work provides evidence of a temporal shift in the OS judgment and suggests that the weighting of attributes that drive OS might change, several issues remain unresolved. First, previous studies do not offer a univocal theoretical explanation for the shift of satisfaction over time. The phenomenon has been ascribed to a change in attributes’ relevance due to a possible misalignment between the time consumers experience a product and the introduction of the attribute in the market (Anderson and Mittal 2000), to attributes’ resolvability (Slotegraaf and Inman 2004), to customers’ characteristics and satisfaction standards (Mittal and Kamakura 2001), or to attribute nature (Mittal, Kumar, and Tsiros 1999). Second, previous studies tie temporal distance to memory decay and do not explore other possible reasons for changes in customer evaluation over time. Third, most of these studies address the evaluation of product ownership, whereas, to the best of our knowledge, no studies have focused on the weighting of the attributes of a service experience. The difference between product- and service-attribute weighting is not trivial: Products move through time together with the consumer, whereas most service experiences are enclosed within a well-defined point in time and thus are subject to a process of reconstruction of the available information at the moment of evaluation.
Customer Satisfaction and Construal Levels
CLT posits that individuals generate different mental representation of events that are placed at distinct points in the near rather than the distant future. More precisely, CLT states that individuals use abstract, high-level construals to represent distant events and concrete, low-level construals to represent temporally near events (Trope and Liberman 2000, 2003). Representations of near-future events are rich in details that strengthen their “how” aspects and “feasibility,” whereas representations of distant-future events achieve abstraction by omitting secondary and incidental features, enhancing the weight of features related to their “why” aspects and “desirability” (Liberman, Sagristano, and Trope 2002; Liberman and Trope 1998). For example, organizing a party for the next month is construed at a high level of abstraction, in terms of “having fun” and “getting together.” A few days before the party, however, the same event is construed at a low level of abstraction, such as “buying food and drinks,” and “selecting the music.”
Construal levels affect consumers’ decisions, such as whether to consume now or later (Malkoc, Zauberman, and Ulu 2005); the alternatives placed into a consideration set (Lynch and Zauberman 2007); the attributes stressed in advertising communication (Fujita et al. 2008); the weight assigned to the attributes that guide the adoption of a new product (Grant and Tybout 2008); and the perceived fit of brand extensions (Kim and John 2008). CLT research has focused on the representation of future events, the only notable exception being Kyung, Menon, and Trope’s (2010) study, which explores the effects of temporal distance on the construal levels of past events. Their research demonstrates that people perceive an event described more concretely as being subjectively more recent than when described more abstractly. However, their study focused on the representation of past events rather than their evaluation. More important, the authors examined how a low- versus a high-level description of a past event affects the perceived temporal distance from the event, rather than how an NP versus a DP temporal distance affects the representation of the event.
Drawing on CLT, we show that time affects the evaluation of a service experience by modifying the importance of the attributes that drive the evaluation and overall judgment. In particular, we contend that individuals adopt a high-level perspective when evaluating a service that occurred in the DP but adopt a low-level perspective when evaluating the same service occurring in the NP. As a consequence, concrete attributes weigh more in the evaluation of an NP than a DP service experience. Conversely, abstract attributes weigh more in the evaluation of a DP than an NP service experience.
In the following paragraphs, we present and discuss three empirical studies. Study 1 uses a lab experiment to show that individuals rank concrete attributes higher in NP and abstract attributes higher in DP evaluations. Study 2 extends the findings of Study 1 in an uncontrolled field setting and tests whether the weights associated with low- and high-level features of a service change over time. Finally, Study 3 also employs a lab experiment and assesses whether the OS judgment is subject to intertemporal variation.
Study 1: An Experiment on the Effect of Temporal Distance on the Drivers of Customer Satisfaction
In Study 1, we explore the temporal dynamics in the importance that individuals assign to low- versus high-level features of a past service experience. Building on CLT, we posit that construal mechanisms operate with respect not only to the representation of future events but also to the representation and overall evaluation of past experiences. This distinction is not trivial. Future events can be envisioned without any constraint because of concrete contingencies that, on the opposite side, are an intrinsic quality of past events. In the former case, the initial representation of a future event begins from a distal perspective that carries over to the subsequent representations (Kim, Park, and Wyer 2009). Conversely, when referring to the representation of a past event, individuals begin from a concrete representation of the event that might carry over to the subsequent evaluations made from a more distal perspective. Metaphorically speaking, people approach the representation of the future as a progressive zooming in into the details of the target event. By contrast, the past is subject to a zooming-out process due to the progressive distancing from the focal past event. These asymmetries appear strong enough to warrant an in-depth investigation of the legitimacy of the extensibility of CLT tenets to the domain of past events.
We contend that, over time, individuals tend to evaluate the same past event by assigning greater weight to the more abstract representations of the attributes that guide satisfaction, to the detriment of the more concrete representations. For example, on leaving an academic conference, a participant might evaluate the conference on the basis of a low-level attribute, such as the venue, whereas a couple of months later, the same participant might evaluate the conference on the basis of more abstract attributes, such as the stimulation of new ideas. Therefore, we hypothesize the following:
Method
Sixty undergraduate students from a large European university were recruited to participate in this study. Students were informed of the opportunity to attend a breakfast seminar on how to write a final dissertation. From a pretest conducted on a sample of 44 graduate students who participated in a similar seminar, we identified 10 attributes that students considered relevant in determining their satisfaction with the seminar. Participants in the pretest were told about the meaning of low- and high-level attributes following Sagristano, Trope, and Liberman (2002, p. 365) and then asked to provide up to five low-level attributes and five high-level attributes regarding a breakfast seminar. This process generated nine low-level attributes and six high-level attributes, from which we selected the five most cited low-level (“seminar hours and lecture room,” “content of the slides used for the presentation,” “content of the seminar,” “breakfast offered by the organizing committee,” and “questions asked by the other students attending the seminar”) and the five most cited high-level (“competence of the lecturer,” “clarity,” “informality of the atmosphere,” “ability of the lecturer to answer questions,” and “sense of involvement”) attributes to be included in the main study.
The breakfast seminar was taught by one of the experimenters who took particular care in the delivery of the seminar to ensure a positive evaluation from participants of each low- and high-level attribute. For example, the seminar was held during convenient hours, and the lecture room was conveniently located and spacious. We did this to rule out the possibility that shifts in attribute importance could be ascribed to the variability of attribute evaluation and not to the effect of temporal distance. For example, if participants vary excessively in their evaluation of the attribute “lecture room,” the shift in this attribute importance could refer not only to temporal distance but also to the variability of the attribute evaluations. At the end of the seminar, participants were asked to provide their e-mail addresses to receive a survey about the seminar.
We implemented a 2 (NP vs. DP) × 2 (High- vs. Low-level construal) mixed experimental design, manipulating temporal distance between subjects and attribute construal level within subjects. Half the participants received the questionnaire immediately after the seminar, and the other half received the same questionnaire two months after the seminar. Participants were randomly assigned to either the NP (n = 30) or the DP (n = 30) condition, and the response rate was 93% in the NP condition and 83% in the DP condition, leaving 53 complete responses.
Participants were first asked to rate their OS with the seminar. Next, they were asked to rank the 10 attributes in decreasing order of importance, so that their ranking represented the relevance of each attribute in determining their OS. We used attribute ranking rather than attribute rating because it reveals differences that would not have emerged through regression coefficients as a result of the narrow range of variability of the dependent measure (i.e., OS), whose level we intentionally kept constantly high under experimental control. As a manipulation check, we asked participants to provide their evaluation of each of the 10 attributes on a 5-point Likert-type scale. Finally, participants were asked to provide demographic information, thanked, and debriefed.
Results and Discussion
The results from the manipulation check revealed no significant difference in OS between the NP (MNP = 4.21) and the DP conditions, MDP = 4.40, F(1, 51) = 1.58, p = .22, η2 = .03. In addition, we found a small dispersion of attribute evaluations at the individual level: Only the evaluations of 7 of the 53 participants have a standard deviation (SD) that exceeds 1 (which would imply a change in the discrete values of the evaluation score). To control for the role of attribute evaluations, we created a dummy variable coded as 1 when attribute ratings had an SD below or above .99 (a cutoff value corresponding to a change in attribute evaluations) and 0 otherwise. The results show that attribute rankings vary as a function of temporal distance only (Wilks’s λ = .306, F = 9.092, df = 10; 40, p < .001), and not of the variability of the 10 attribute evaluations (Wilks’s λ = .912, F = 0.388, df = 10; 40, p = .945) or of the interaction term (Wilks’s λ = .861, F = .643, df = 10; 40, p = .768). Thus, differences in attribute rankings can be ascribed to temporal distance and not to individual attribute evaluations.
As a first step, we assigned numbers to participants’ rankings in decreasing order, so that high numbers (6–10) indicate a low ranking and low numbers (1–5) a high ranking. Then, to test our hypothesis, we investigated whether low- and high-level attributes follow two idiosyncratic patterns (i.e., span an underlying dimension). One possible solution might be splitting the 10 attributes into the two separate dimensions that we determined a priori from the pretest. However, stronger evidence would be provided if we could reproduce the same variable groupings through a post hoc clustering procedure, such that each cluster of variables can be interpreted as essentially one dimensional.
An appropriate methodology to detect underlying sets of ranking variables is to have them located in the n-dimensional space defined by the participants (in this case, the sample size is 53, and thus, our space consists of 53 dimensions). In this way, we can compute a distance matrix between the 10 variables defined on the 53-dimensional space. We conducted a hierarchical cluster analysis, which provided a clear partitioning of the set of 10 variables into two subsets exhibiting low intracluster variance and high intercluster variance. We employed both Ward’s and average linkage methods and, in both cases, obtained the same partitioning of the 10 variables. This post hoc partitioning fully overlaps with the a priori distinction between concrete and abstract features obtained in the pretest and further corroborates the results of our experimental manipulation.
Then, from the findings of the cluster analysis, we computed two average ranking scores for the concrete and abstract dimensions, respectively. A two-way analysis of variance (ANOVA) with average ranking as the dependent variable and temporal distance and attribute construal level as the independent variables yields a significant crossover interaction effect between the independent variables, F(1, 530) = 163.64, p < .001, η2 = .24. The concrete dimension ranks higher in the NP (MNP = 4.28) than in the DP condition, MDP = 7.04; F(1, 51) = 125.81, p < .001, η2 = .71, whereas the abstract dimension ranks higher in the DP (MDP = 3.96) than in the NP condition, MNP = 6.80; F(1, 51) = 153.37, p < .001, η2 = .75.
To analyze the observed effect at each attribute level, we performed 10 one-way ANOVAs with ranking score as the dependent variable and temporal distance as the independent variable. We found a significant difference in attribute rankings between participants evaluating satisfaction with the seminar in the distant versus NP, as shown in Figure 1. Participants in the DP condition tended to assign greater importance to high-level features of the experience when evaluating their satisfaction with the seminar, whereas those in the NP condition tended to consider low-level attributes first in their judgment. Participants ranked low-level attributes (listed as 1–5 in Figure 1) significantly higher in determining OS in the NP than in the DP condition: specifically, “hours and lecture room,” F(1, 51) = 34.22, p < .001, η2 = .40, “slides content,” F(1, 51) = 14.91, p < .001, η2 = .23, “breakfast,” F(1, 51) = 31.34, p < .001, η2 = .38, “questions’ content,” F(1, 51) = 15.32, p < .001, η2 = .23, with only “content of the seminar” being marginally significant, F(1, 51) = 3.59, p = .06, η2 = .06. Conversely, they ranked high-level attributes (listed as 6–10 in Figure 1) significantly higher in the DP than in the NP condition: specifically, “competence,” F(1, 51) = 31.86, p < .001, η2 = .38, “clarity,” F(1, 51) = 11.91, p = .001, η2 = .19, “informality,” F(1, 51) = 5.56, p = .022, η2 = .10, “ability to answer questions,” F(1, 51) = 76.35, p < .001, η2 = .60, and “involvement,” F(1, 51) = 10.44, p = .002, η2 = .17.

Study 1: Attribute importance rankings in near (1 day) and distant (1 month) past evaluations.
In summary, the results provide support for Hypothesis 1a; when evaluating satisfaction with a DP event, individuals assign greater weight to more abstract than to more concrete attributes. The results also lend support to Hypothesis 1b. When evaluating satisfaction with an NP event, individuals assign greater weight to more concrete than to more abstract attributes. Overall, Study 1 demonstrates that individuals explicitly assign different importance to low- and high-level attributes of a past service experience according to the temporal perspective they adopt at the time of evaluation: Low level, concrete attributes are considered more important when evaluating an NP service experience, and high level, abstract attributes are considered more important when evaluating the same service experience in the DP.
Study 2: A Field Experiment on the Effect of Temporal Distance on the Drivers of Customer Satisfaction
The purpose of Study 2 is to provide additional empirical support for the intertemporal variations in satisfaction judgments in a real setting. In this field experiment, neither attribute performance nor OS is under experimental control. As a consequence, we expect more variance in both the dependent and independent variables than in Study 1. This allows us to estimate how attribute weights, rather than rankings, vary over time. In other words, Study 2 aims to establish the effect of temporal distance on derived (i.e., attribute weight) rather than stated (i.e., attribute rankings as in Study 1) attribute importance measures.
We collected data on a service experience that is relevant and important enough not to be confused over time with other, similar experiences. We measured customer satisfaction with the blood donation service operated by the main nationwide organization. This research setting offers two main advantages. First, donating blood is an experience that is unlikely to be repeated frequently (e.g., in the country where we collected the data, men are prevented by law from donating blood sooner than three months and women sooner than 6 months after their last donation). Therefore, we can rule out the possibility that some participants have trouble evaluating a DP event because they experience it repeatedly. Second, Choi, Park, and Oh (2012) show that the experience of blood donation can be psychologically construed at different levels, so that individual intentions to donate blood in the future can be influenced by the social desirability (i.e., high-level construal) of the act of donating blood or by the various low-level contingencies (e.g., physical disease and time inconvenience) that can impede the behavior, regardless of whether donating blood is envisioned in the distant or the near future.
Method
Two hundred donors were randomly picked from a list of eligible donors made available by the Blood Donation Association and were included in the sample. We considered eligible to participate in the survey only donors who, at the time of data collection, had donated blood no sooner than 2 (men) or 5 (women) months before. This procedure ensured that no participant would have donated blood twice in the time span investigated. Participants were randomly assigned to either the NP or the DP condition, so that the experimental procedure yielded a 2 (NP vs. DP) × 2 (low- vs. high-level attributes) mixed experimental design, in which we manipulated temporal distance and attribute construal levels between and within subjects, respectively.
Participants assigned to the NP condition were scheduled an appointment for their periodic donation and were contacted immediately on their exit from the doctor’s office. Those who agreed to take part in the study (66 of the 100 individuals assigned to the NP condition) were surveyed with a self-administered electronic questionnaire uploaded on a laptop in the Blood Donation Association building lounge. Participants assigned to the DP condition received a copy of the same questionnaire pertaining to their last donation by e-mail 2 months later. The response rates in the NP and DP condition were 66% and 77%, respectively, yielding 143 completed questionnaires.
Participants were first asked to rate their overall satisfaction with the last service experience provided by the Blood Donation Association on a 7-point bipolar scale ranging from 1 (extremely dissatisfied) to 7 (extremely satisfied). Next, they were asked to evaluate 10 attributes of the service experience. We manipulated attribute construal level within subjects by asking participants to evaluate low- and high-level attributes on a 5-point Likert-type scale ranging from 1 (completely disagree) to 5 (completely agree). For the low-level attributes, participants expressed their level of agreement with the following statements: “The appointment hour was at a convenient time,” “The time needed for the whole process of donation was reasonable,” “I perceived some pain during the donation,” “I had some discomfort immediately after the donation,” and “The structure of the Blood Giving Association is clean and hygienic.” For the high-level attributes, participants expressed their level of agreement with the following statements: “I monitored my health conditions,” “I helped other people,” “The personnel were professional,” “I felt safe during the whole donation process,” and “The Blood Giving Association protected my privacy.” We identified and scored attributes as either high- or low-level construals, according to the results of a focus group conducted with a separate sample of 10 donors who were not included in the final sample. The presentation order of the 10 attributes was randomized across participants to avoid sequence effects.
We administered a memory task to participants in the DP condition to assess their retention of the concrete details of the last service experience. Namely, they were asked to recall the exact time of their appointment, the name of the café where they had breakfast with the complimentary coupon offered by the Blood Donation Association (donors can choose from a list of 10 cafés in the surrounding area), what they had for breakfast, whether the doctor was a man or a women, and the duration of the whole donation process.
Last, participants were asked to provide their demographic information and number of years as a donor and were thanked, tested for suspicion, and debriefed. No participant correctly guessed the purpose of the study; therefore, we included all in the subsequent analyses.
Results and Discussion
To test whether participants attached different importance to the features of the experience between their NP and DP evaluations, we focused on the weights assigned to the features of the donation service in NP versus DP evaluations. First, because attributes were correlated, we performed a factor analysis to rule out potentially misleading results due to multicollinearity (Oliver 1997). Exploratory factor analysis with principal axis factoring and oblimin rotation yielded two separate dimensions (eigenvalues > 1; cumulative variance explained = 55%), in which low- and high-level attributes have high loadings (>.6) on two distinct factors. Coefficient αs ranged from .64 to .79 across the two dimensions (Churchill 1979), and the analysis suggested that elimination of 1 item (I felt some discomfort immediately after the donation) would improve the α values for the first factor (Cronbach’s α = .81). The attributes that load on Factors 1 and 2 coincide with the set of abstract (“privacy protection,” “safety,” “personnel professionality,” “help to others,” and “health monitoring”) and concrete (“appointment hour,” “time to donate,” “perceived pain,” and “cleanliness of the structure”) attributes identified in the pretest, respectively.
Next, we ran two separate multiple regressions for NP and DP evaluations, with OS as the dependent variable and the two factor scores as the independent variables. The results of the two regression models illustrated in Table 1 show that only concrete (low level) features significantly weigh on OS when the experience is set in the NP. Similarly, only abstract (high level) features significantly weigh on OS when the experience is set in the DP.
Study 2—The Importance of Abstract and Concrete Features in Near and Distant Past Judgments—Regression Results.
*p < .1. **p < .05.
To test whether the attribute weights estimated in the previous step differ significantly between the two temporal conditions, we ran a multiple linear regression with OS as the dependent variable and the two factor scores, temporal distance, and two interaction terms between each of the two factor scores and temporal distance as the independent variables. As Table 2 shows, the results of the multiple regression model reveal significant interactions between temporal distance and the two factor scores. Specifically, the weight assigned to the interaction term of Temporal Distance × Concrete Features is negative (β = −.831) and significant, whereas the weight assigned to the interaction term of Temporal Distance × Abstract Features is positive (β = .688) and marginally significant. This finding suggests that the time elapsed between a past service experience and its evaluation affects the derived importance attached to the features of the experience that individuals consider in their OS judgments by decreasing the importance of concrete features and increasing the importance of abstract features, consistently with Hypotheses 1a and 1b.
Study 2—The Effect of Temporal Distance on the Weights of Abstract and Concrete Features—Regression Results.
*p < .1. **p < .05.
In addition, we tested whether participants displayed different levels of OS when they reported it immediately versus a couple of months after the service experience. A one-way ANOVA with OS as the dependent variable and temporal distance as the independent variable shows that participants were more satisfied with the service provided by the Blood Donation Association when they evaluated it in the NP (M = 6.25, SD = .87) than in the DP, M = 5.39, SD = 1.65; F(1, 141) = 14.439, p < .001, η2 = .11. Furthermore, we found a similar pattern from the comparison of individual ratings of low- and high-level features between the NP and DP conditions. Participants provided higher ratings for low-level features when they evaluated them in the NP (M = 3.75, SD = .31) than in the DP, M = 3.54, SD = .50; F(1, 151) = 9.09, p = .003, η2 = .06; similarly, they evaluated high-level features higher in the NP (M = 4.86, SD = .21) than in the DP condition, M = 4.48, SD = .49; F(1, 151) = 33.14, p < .001, η2 = .18. These findings are in line with previous empirical research showing a typical decay on OS judgments over time (Bendall-Lyon and Powers 2002; Peterson and Wilson 1992). A possible explanation for this result is that the differences in DP judgments are due to different retention in memory of low- and high-level features, such that low-level details fade from memory more rapidly than high-level features do.
To rule out the possibility that individuals rely exclusively on high-level features only because concrete details fade from memory, we tested for the interaction between attribute weights and memory retention as measured by the number of details recalled in DP evaluations. We computed a dummy variable, indicating recalled details below (0) or above (1) the median, in DP evaluations. We ran a multiple linear regression with OS as the dependent variable and the two factor scores, memory retention, and two interaction terms between each of the two factor scores and memory retention as the independent variables. The results show that the interaction terms between memory retention and the low-level, β = −.238, t(78) = −.519, p = .605, and high-level, β = .228, t(78) = .629, p = .531, dimension are not significant, suggesting that the weights assigned to low- and high-level features do not vary between those who recalled concrete details below or above the median.
Finally, we controlled for the potential effect of familiarity with service. Results show that familiarity with the service (i.e., those who had been donating blood for more than the median value of 6 years) does not influence OS judgments, F(1, 141) = .644, p = .424, η2 = .005, nor interacts with temporal distance, F(1, 141) = .934, p = .335, η2 = .007, in affecting OS judgments. Moreover, different levels of familiarity with the service do not influence the way individuals weigh low- and high-level features in the assessment of their OS with the service experience. Interaction between the low-level dimension and level of familiarity with the service is not significant neither for NP, β = .456, t(58) = 1.131, p = .263, nor for DP evaluations, β = −.116, t(78) = −.271, p = .787. Analogously, the interaction between the high-level dimension and level of familiarity with the service is not significant neither for NP, β = .361, t(58) = .722, p = .473, nor for DP evaluations, β = .080, t(78) = .264, p = .792.
In summary, Study 2 extends the findings of Study 1 in two ways. First, it demonstrates how temporal distance affects the weights associated with the drivers of customer satisfaction. Second, it shows that the OS judgment is also subject to intertemporal variation when the performance of high- and low-level attributes is not kept constant under experimental control.
Study 3: The Effect of Temporal Distance on OS
In Study 3, we aim to understand how the positive or negative performance of high- and low-level attributes interacts with temporal distance in shaping OS. Thus, in this study, we manipulate the performance (positive vs. negative) and the level of abstraction (high vs. low) of the attributes and observe OS ratings. Consider, for example, a customer evaluating two attributes of a restaurant: The number of organic ingredients in the menu (low-level attribute) and the competence of the personnel (high-level attribute). Both attributes can be either positive or negative. The combinations of these attributes (high vs. low level) and their valence (positive vs. negative) generate four possible outcomes: (1) a restaurant with many organic ingredients in the menu but incompetent personnel, (2) a restaurant with few organic ingredients in the menu but competent personnel, (3) a restaurant with many organic ingredients in the menu and competent personnel, and (4) a restaurant with few organic ingredients in the menu and incompetent personnel.
This study explores whether OS judgments change over time depending on the performance of high- and low-level attributes. Building on the findings of Study 2, we expect that, over time, customers assign less weight to the low-level attributes and more weight to the high-level attributes of a service experience. As a consequence, the positive (negative) performance of low-level attributes should be more likely to positively (negatively) affect NP than DP OS evaluations. Similarly, the positive (negative) performance of high-level attributes should be more likely to positively (negatively) affect DP rather than NP OS evaluations. Accordingly, we expect to observe a shift in OS judgments when the performances of high- and low-level features are discordant as a function of the different weights attached to high- and low-level attributes under a proximal versus distal temporal perspective. More formally, we hypothesize the following:
Method
Three hundred graduate students (53.4% female, age ranging from 22 years to 45 years, mean age = 23.6 years) from a large European university were recruited to participate in this study. Students were told they would be contacted by e-mail to evaluate a new service. Participants were randomly assigned to the conditions of a 2 (temporal distance of service evaluation: NP vs. DP) × 2 (low-level attribute performance: positive vs. negative) × 2 (high-level attribute performance: positive vs. negative) mixed factorial design. Temporal distance was manipulated within subjects, as participants were asked to rate their OS twice: After a few minutes and again after 2 months. To rule out potential lock-in effects due to subsequent evaluations, half the participants were randomly assigned to a “DP-only” condition and were surveyed only once in the second phase of the experiment.
Participants were asked to evaluate a beta release of a fictitious online movie review service, which they were told had been developed by other students enrolled in the School of Information Technology at the same university. They were asked to browse the movie reviews from a database of 40 movies equally divided into four different genres: comedy, thriller, fantasy, and action. We kept the genres and the movies in the database constant across all experimental conditions. Participants first read a list of the four genres, and then, by clicking on a genre, they could view the reviews of the 10 movies (they could return to the genre list using a “back” button). Participants were free to browse through the movie review service as long as they wanted; the number of genres they viewed was recorded.
Previous research has shown that evaluations are more negative when participants expect to evaluate a product or service (Ofir and Simonson 2001), because they overweigh negative aspects (Skowronski and Carlston 1989). To avoid this effect, we did not inform participants in either the NP or the DP condition that they had to evaluate the service.
We manipulated the service experience by presenting movie reviews that differed in the performance of their low- and high-level features. We operationalized these features using the results of a pretest administered to 30 graduate students who listed low- and high-level attributes of a movie review service. The most cited features were “level of detail of the plot” as a low-level feature and “the presence of other customers’ ratings” as a high-level feature. We manipulated the valence (positive vs. negative) of these two attributes by presenting a set of reviews with detailed (vs. hasty) plot descriptions and with other customers’ ratings available (vs. unavailable).
Participants assigned to the high-level negative, low-level positive condition evaluated a set of movie reviews in which each review had a detailed plot description, but ratings from other customers were not available. Those assigned to the high-level positive, low-level negative condition evaluated hasty plot descriptions, but with the ratings (1 to 5 stars) assigned by other customers. Participants assigned to either the high-level positive, low-level positive or the high-level negative, low-level negative conditions evaluated movie reviews that were positive or negative along both the plot description and the availability of other customers’ ratings, respectively. To increase the realism of the experimental stimuli, all the reviews included a picture of the movie poster and basic information about the movie cast and director. We kept these details constant across all eight experimental conditions.
When participants thought they had read enough reviews, they could exit the application by clicking on a button that redirected them to an OS questionnaire. After 2 months, the same participants were contacted by e-mail and asked to complete the same satisfaction questionnaire again.
We measured OS with the online movie review service using a 2-item, 5-point bipolar scale (Cronbach’s α = .88; “How satisfied are you with the ‘Movie Reviews Service’?” “In comparison with your expectations, how do you evaluate the ‘Movie Review Service’?”). Next, as a manipulation check, we asked participants to rate the accuracy of the plot descriptions and the helpfulness of other customers’ reviews. Then, we asked them to rate their level of expertise with movies and to complete a memory test consisting of four open-ended questions to assess their ability to recall the concrete details of the experience even from a distal perspective. Finally, participants were redirected to an end-of-survey webpage, where they were thanked and debriefed.
Results and Discussion
Two hundred thirty-six participants completed the first data collection (response rate: 79%). Of these, 76% completed the second data collection (180 respondents). Participants assigned to the NP condition who did not answer the second data collection were not included in the final sample.
The results of the manipulation check confirmed that participants correctly perceived the low-level positive scenario as more feasible (M = 3.42, SD = .89) than the low-level negative scenario, M = 2.98, SD = 1.15; F(1, 145) = 6.605, p = .011, η2 = .04, and correctly perceived the high-level positive scenario as more desirable (M = 3.15, SD = .88) than the high-level negative scenario, M = 2.76, SD = 1.06; F(1, 145) = 6.151, p = .01, η2 = .04. OS did not vary as a function of expertise: Participants who rated themselves as more expert with movies did not display significant differences from less-skilled participants in their OS with the movie review service, from either a proximal, M skilled = 2.81, SD = .98; M less skilled = 2.97, SD = .99; F(1, 92) = .356, p = .552, η2 = .004, or a distal, M skilled = 2.87, SD = 1.14; M less skilled = 2.87, SD = 1.06; F(1, 179) = .001, p = .980, η2 < .001, temporal perspective. Similarly, the results indicate that in the NP and DP conditions, the number of reviews viewed and its interaction with attribute performance has no significant effect on satisfaction judgments.
In line with our expectations, a repeated measures ANOVA with a Greenhouse–Geisser correction yielded a significant difference over time in the levels of OS as a function of the performance of low-level attributes, F(1, 89) = 7.008, p = .010, η2 = .07, and high-level attributes, F(1, 89) = 6.199, p = .001, η2 = .11. As Table 3 shows, follow-up tests indicate that participants were globally more satisfied with the high-level negative, low-level positive movie review service when they evaluated it in the near (M = 3.04, SD = 1.02) than in the DP, M = 2.34, SD = .85; t(22) = 2.34, p = .03, η2 = .20 (paired sample); this evidence provides support for Hypothesis 2a. Conversely, participants assigned to the high-level positive, low-level negative condition displayed higher levels of OS when evaluating it under a distal (M = 3.19, SD = .75) than a proximal temporal perspective, M = 2.58, SD = .78; t(23) = –2.732, p = .01, η2 = .25 (paired sample). Therefore, our data also lend support to Hypothesis 2b. No significant differences emerge when low- and high-level attribute performances have the same valence—that is, both positive, t(22) = –.030, p = .98, η2 < .001 (paired sample), or both negative, t(22) = .805, p = .43, η2 = .029 (paired sample). Overall, these results show that individuals adopt different construal levels in evaluating past experiences: Low-level features are relevant in determining immediate satisfaction with an experience, but their relevance decreases over time.
Study 3: Satisfaction Evaluations of the Movie Review Service—Paired-Sample t-Tests.
Note. OS = overall satisfaction.
*p < .001. †p < .001.
To rule out the possibility that our results are due to a “lock-in” effect resulting from subsequent evaluations (Kim, Park, and Wyer 2009; Pocheptsova and Novemsky 2010), we compared these findings with the judgments obtained from the sample that evaluated the experience only after 2 months. We found no significant differences, M twice = 2.89, SD = 1.07; M once = 2.84, SD = 1.11; F(1, 179) = .09, p = .76, η2 = .001, across all four experimental conditions, F(3, 179) = 1.35, p = .26, η2 = .02. Therefore, we can rule out a lock-in effect between subsequent evaluations.
As in Study 2, we test weather individuals’ ability to retain concrete details in memory affects DP evaluations. We computed a dummy variable indicating the number of recalled details below or above the median memory score (M = 1.00). The results show no significant effect of memory score on OS judgments, F(1, 92) = .78, p = .38, η2 = .009, nor significant interactions between the memory score and the experimental conditions, F(3, 92) = .97, p = .41, η2 = .033. In contrast with Kim, Park, and Wyer (2009), who find that when individuals reevaluate a service/product for future use, they recall their previous construals rather than forming a new representation of the same object, we suggest that this is not the case when referring to past events.
In summary, Study 3 shows that OS shifts over time as a function of attribute performances: Low-level attribute performances influence NP satisfaction judgments, and high-level attribute performances influence DP satisfaction evaluations. Furthermore, the results from Study 3 rule out the possibility that the observed shifts in the OS judgments are due to lock-in effects between subsequent evaluations or memory decay effects.
A Posttest of the Attributes Used in Studies 1–3
Given the centrality of the concrete versus abstract distinction in our theorizing, we also ran a posttest to empirically ascertain the different levels of concreteness and abstraction associated with the attributes under investigation in the previous studies. Fifty undergraduate students not included in the sample of the main studies were asked to rate the concreteness and abstractness of the attributes characterizing an orientation seminar, a blood donation, and an online movie review service on 7-point bipolar scales. Dhar and Wertenbroch (2000) suggest that concreteness and abstraction are not the extremes of a continuum but rather two separate dimensions. Participants were presented a random subset (50%) of attributes for each of the three services used in the three studies, for a total of 12 attributes evaluated by each participant. Table 4 presents the results from the posttest and shows that, overall, participants rated the low-level attributes considered in our studies higher on concreteness than on abstraction and rated high-level attributes higher on abstraction than on concreteness.
Posttest: Concreteness and Abstractness Ratings of the Attributes Used in Studies 1–3.
Note. Low-level attributes are in italics. Standard deviations are in parentheses.
General Discussion
Our research advances a new analytical perspective to shed light on how satisfaction evaluations change over time by virtue of the different representations that individuals activate in the time elapsed between a past experience and its evaluation. We relied on CLT to organize our theoretical framework and extended its domain to past events.
Taken together, our results support that satisfaction judgments shift over time as a result of the different psychological mechanisms that are activated during the time between service completion and evaluation. We designed three studies to test our predictions. Study 1 showed that individuals evaluating an NP event tend to rank the concrete details of the experience higher than its abstract details and that the opposite is true for individuals evaluating a DP event. Study 2 replicated the hypotheses of Study 1 in a field experiment, showing that attribute weights significantly differ between the two temporal conditions. Individuals assign high importance to concrete (low level) features in NP evaluations and to abstract (high level) features in DP evaluations. Study 3 demonstrated that intertemporal shifts occur at the level of OS judgments as a function of the different performances of high- and low-level attributes. Individuals are more satisfied with a high-level negative, low-level positive experience when they evaluate it in the near than in the DP, and their OS with the same experience decreases over time. Conversely, they evaluate a high-level positive, low-level negative experience more positively in the distant than in the NP, and the OS judgment increases over time.
Our findings have several important theoretical implications for customer satisfaction research. Well-established paradigms in this domain have assumed that postconsumption evaluations are invariant to temporal distance. Here, we contend that evaluation is subject to intertemporal changes. An initially low-level representation of a service experience is replaced by a high-level representation so that, over time, individuals tend to base their satisfaction judgments on the high-level rather than the low-level features of that experience. Acknowledgment of the underlying processes that guide the dynamics of customer satisfaction is crucial for marketing research, which has tied customer satisfaction to many other relevant constructs, such as repurchase intention and positive word-of-mouth referral (Jiang and Rosenbloom 2005; Szymanski and Henard 2001). If temporal distance affects the relative importance of the determinants of customer satisfaction, and thereby OS, repurchase intention and word-of-mouth referral might also change over time accordingly.
The customer satisfaction literature has extensively argued that when service failures occur, an appropriate recovery response has a positive impact on customer evaluations (Smith, Bolton, and Wagner 1999). In line with our findings, we maintain that the attributes characterizing a service failure might evolve over time as well. Low-level failure attributes might evolve over time into high-level failure attributes that address the whole service company. Consequently, providing customers with a prompt recovery for low-level failure attributes might prevent these attributes to influence long-term OS. Presumably, a concrete compensation may be more effective in restoring OS in the NP, but the general quality of the complaint-handling procedures may drive OS in DP evaluations.
Our study also reinforces the use of CLT in customer satisfaction research. We show that construal mechanisms apply to the reconstruction of service experiences, not just to consumption situations imagined for the future and not yet experienced. Moreover, we show that CLT mechanisms guide the evaluation and not only the representation of experiences that are set at different points in time.
Finally, we show that construals of past events are different from memory-retention effects. Even individuals who can still recall concrete details in their DP judgments construe such events at higher levels. Therefore, individuals actively build their representations on retained information rather than passively adopting the construal level of their memories. In addition, our results show that individuals do not recall from memory their initial judgments and do not use them for later evaluations.
Our findings can be useful for designing satisfaction surveys more effectively. It may be beneficial to measure satisfaction repeatedly, to obtain the whole spectrum of evaluations. Indeed, our results show that focusing on online evaluations of a service experience (i.e., evaluation collected immediately after the service experience is over) may be misleading: Online satisfaction might overemphasize (underemphasize) the impact of low-level (high-level) features on the OS judgment (Study 3). In other words, managers might react to short-term, time-related artifacts and not to the long-term attribute relevance. Accordingly, satisfaction should be measured within a few days after the service experience if the purpose is to capture customers’ evaluations on the most concrete details. Conversely, a survey should be conducted between 2 weeks and 2 months after the service experience if the purpose is to measure customers’ satisfaction with more abstract features.
In addition to timing, if the content of the questionnaire and the construal level of the past experience are not correctly paired, it may be difficult to find an exhaustive explanation for the determinants of overall customer satisfaction/dissatisfaction.
Limitations and Future Research Directions
Although we have discussed the potential implications of this research, several issues may limit its application and require further investigation. First, in Study 1, we kept low- and high-level attribute performances constant (and positive) between subjects. We did so to prevent possible confounding effects due to variance in attribute performance. However, this prevents us from drawing inferences about the different importance that individuals assign to low- and high-level attributes when their performance is constant but negative. It may be that satisfied and dissatisfied customers differ in their construals of past experiences. Moreover, participation in the seminar that served as the experimental setting was optional and involved no additional cost other than the opportunity cost of the time involved. This might pose a potential threat to the external validity of the experiment, as spending money might lead individuals to judge the experience more critically: In accordance with CLT, costs are considered low-level features and pertain to the feasibility dimension. Further research is necessary to explore these issues. Second, in Study 2, individuals reported higher levels of OS in the NP than in the DP condition, contrary to Wirtz et al. (2003) who find higher levels of OS in DP than in NP evaluations. This apparent contradiction might be ascribed to the different performances on low- and high-level attributes between that study and ours.
Study 2 raises another challenging research question: Would participants evaluate the service differently if they experienced the service with relevant others or alone? For example, donors can go through the donation process with a group of friends or by themselves. Previous research has shown that the relevance of the social desirability of the act of donating blood decreases with temporal distance from the donation, regardless of the initial reasons that drove individuals to give blood (Choi, Park, and Oh 2012). The issue of whether experiencing a service with others or alone shifts evaluations seems promising and would include an additional measure of psychological distance, such as social distance (see Trope and Liberman 2003).
Finally, in our studies, we opted for a dichotomization of the time variable by manipulating it into an immediate (NP) assessment and a 2-month (DP) time span. This manipulation is well rooted in extant CLT literature, but we are aware that a clear border between near and distant temporal perspectives is fuzzy at best. Accordingly, we cannot give any suggestion on the exact time point—or time interval—when the shift occurs. Further research is required to empirically identify the temporal watershed between the distant and near past and, possibly, which temporal perspective is best for predicting customers’ future behaviors.
This research highlights several promising future research areas that could be investigated. One potential direction is the moderating role of action experience in the relationship between temporal distance and construal. It is possible that very skilled and experienced individuals tend to represent information about the product or the service more abstractly than those having limited knowledge about it (Hong and Sternthal 2010). Previous studies indicate that increasing experience with an action affects one’s familiarity with the action’s low-level components, which become “chunked into larger units” (Vallacher and Wegner 1987, p. 7). Thus, further research is required to understand whether experienced and inexperienced customers differ in their construal levels of a past consumption experience and, ultimately, in the relative importance they assign to low- and high-level features when evaluating their OS.
In addition, research has shown that people judge experiences with a service not just from a cognitive perspective but also from an affective perspective (e.g., Smith and Bolton 2002). In this research, we do not deal with emotions but limit our attention to cognitive evaluations. However, it is plausible that emotions moderate the relationship between construal and individual evaluations (see, e.g., Hong and Lee 2010). Further research might shed more light on the potential moderating role of emotions on the construal–satisfaction relationship.
Finally, a promising avenue for research might be to analyze the content of individual construals of a past experience. A deeper understanding of the content of individual construals would help identify the features that customers spontaneously consider when evaluating a past experience at different points in time without being “guided” by a questionnaire. Prior research has shown that when individuals recall an experience, they do not remember every single moment; rather, they recall a few significant moments vividly and gloss over the others, taking away an overall assessment of the experience based on three factors: (1) the trend in the sequence, (2) the high or low points, and (3) the ending (Kahneman et al. 1993). Further research might explore how this fits in with the idea that customers use abstract or concrete dimensions depending on the temporal perspective.
Footnotes
Acknowledgments
The authors are indebted with Janet Wagner for her helpful suggestions and comments on earlier drafts of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
