Abstract
Judgmental adjustments of algorithmic predictions with the aim of improving demand forecast accuracy are a common revenue management practice. While empirical evidence on the impact of these user overrides is growing, little research attention has been given to the time horizon and the frequency in which these adjustments take place. Utilizing a multilevel regression model for repeated measures, 20,081,973 forecasts comprising seven different time horizons were analyzed. Data were collected from 1752 hotels of different hotel types belonging to 232 hotel chains in seven geographical regions. We find that the accuracy of algorithmic computer forecasts improves as time nears the date of stay and that the number of user overrides impacts this accuracy. The effect of the override frequency depends on the type of the forecasted demand and on the presence of special events. A higher number of user overrides is beneficial for group segment, but damaging for the transient segment. During special events periods, override frequency enhances accuracy.
Keywords
Introduction
Occupancy forecasts are an essential element of hotel revenue management (RM) practice (Koupriouchina et al., 2014). The effectiveness of daily decisions, such as the dynamic setting of room rates, the allocation of room inventory to price levels and to distribution channels, and the determination of overbooking levels and cancelation fees, to a great extent, relies on accurate forecasts (Talluri and Van Ryzin, 2004). A common procedure, designed to improve forecast accuracy is for hotel revenue managers to override the computer-generated forecasts. That is, managers evaluate the algorithmic occupancy predictions, and then modify and adjust the numbers to reflect their knowledge and beliefs. These overrides are the most common way by which managers incorporate expert knowledge (i.e., information not observable to the algorithm), and it is therefore a valuable user option in computerized RM systems (Alvarado-Valencia et al., 2017).
In this context, time ahead of a date of stay—the forecasting horizon—is an important element of hotel RM optimization because both hotels and their guests make time-dependent decisions (Chen and Schwartz, 2008; Arenoe and Van der Rest, 2020). A thorough understanding of how demand is distributed over time in order to accurately forecast hotel occupancy along the forecasting horizon is necessary for optimal hotel RM, as daily occupancy reforecasts are used to update room allocation policies and price optimization decisions (Schwartz and Hiemstra, 1997). And yet, few studies have explored the role of time in subjective user overrides of these algorithmic predictions.
In this paper, we examine the effect of the time horizon and the number of judgmental adjustments on forecast accuracy. We first investigate whether algorithmic predictions improve over the time horizon at all. That is, we seek to determine whether for each future date of stay, the accuracy of machine-generated reforecasts—that is, forecast updates at narrowing forecast horizons—improves as the date of stay nears. We then examine the effect of the number of user overrides on forecasting accuracy for each date of stay, given the different forecasting horizons.
Literature review and hypotheses
Time horizon
Hotel occupancy forecasting is a complex and dynamic process where reforecasting over the time horizon is a continuous process. That is, the occupancy forecasts for each single day of the year are constantly updated and regenerated over the forecasting horizon as the date of stay approaches. The literature on forecasting shows that the shorter the forecasting horizon, the more accurate the forecast. In his seminal paper, summarizing the findings of decades of research on forecasting, Makridakis (1986: 34) provides an intuitive explanation: “Inertia in business and economic trends makes short-term forecasting easier…As the time horizon of forecasting increases so do the chances of systematic changes that can affect the future.”
We contend that in the context of hotel forecasting, there is another “technical” explanation beyond Makridakis’ “systematic changes” argument, in particular the common use of booking curves. A booking curve for a certain date of stay “starts” when the first reservation is recorded and it then accumulates reservations as they arrive. In a very long forecasting horizon (that is, many days ahead of the date of stay) the booking curve could be practically empty, that is, no reservations are recorded that early. It follows that the longer the horizon, the less the contribution from this important data resource of reservations, and consequently a less accurate forecast.
Unlike in the general forecasting literature, empirical evidence regarding the relationship between hotel demand forecasting horizon and accuracy is limited and inconclusive. Kimes (1999) investigated group forecasting accuracy at various days prior to arrival and found that the forecast error (MAPE and MAD) declined as the day of arrival approached. “In general, the MAPE averaged 40% at two months before arrival, dropped to about 30% at one month before arrival, and decreased to 10-15% on the day of arrival” (Kimes, 1999: 1106). While the findings provide support for the notion that the longer the forecasting horizon, the less accurate the forecasts, the generalizability of these results is subject to certain limitations. The study examined the group segment and excluded transient segment, used a single hotel chain’s data over a period of four months, and was carried out 20 years ago, prior to significant technological advancements in RM systems. Most importantly, the accuracy is calculated on aggregated monthly figures and not for an individual arrival date, while in reality, RM optimization decisions are made at the more granular arrival date level.
Schwartz (1998) applied the Curve Similarity model to forecast the occupancies for one hotel for 45 sample days for each of the following forecasting horizons: 1, 2, 3, 7, 14, 21, 30, 45, 60, and 99 days in advance. He found that accuracy declined with the forecasting horizon in a non-linear manner. Again, caution needs to be exercised when generalizing this study’s findings, as it was performed using a single hotel for a period of 45 days, more than 20 years ago.
Rajopadhye et al. (2001) evaluated the Holt-Winters procedure. They report that the MAD and MAPE values fluctuate as the arrival day nears, and that at 2–4 days before arrival the forecast accuracy is lower than at 14 days. Weatherford and Kimes (2003) tested seven forecasting methods on data from Choice Hotels and Marriott Hotels. Their findings indicate that the impact of the forecasting horizon on accuracy varies across the tested forecasting methods. For example, pickup and regression forecasting methods had fairly similar forecast errors (MAE) for all days to arrival, while “the performance of the multiplicative booking method deteriorated when used more than 7 days before arrival” (Weatherford and Kimes, 2003: 409). Most recently, Pereira (2016) compared the performance of a new time-series method (TBAT) with five more traditional time-series methods when forecasting daily hotel room demand, using six forecast accuracy measures (RMSE, MAPE, MdAPE, sMAPE, RelMAE, and RelRMSE) for multi-step forecasts from 1 to 56 days ahead. The findings suggest complex patterns, where a shorter horizon was not always associated with a lower forecast error.
To summarize, the hospitality literature provides limited support for the notion that longer horizons of hotel demand forecasts are less accurate, and that the relationship is non-linear. It is unsurprising that to date, this support has been only partial. Very few hotel-specific published studies are directly relevant, and most were not designed specifically to answer this question. With the exception of Kimes (1999), it appears that all of these studies are based on observations using predictions generated by the researchers as part of a test of newly proposed forecasting models, and not on “real” predictions generated by the hotels as part of their ongoing operation (e.g., Chen and Kachani, 2007; El Gayar et al., 2011; Haensel and Koole, 2011; Law, 2004).
This limitation is of significance in the context of the horizon/accuracy issue, as most of the reported results used only one of the two main sources of information: historical stays or reservations on hand. It has been established that the forecast combinations often used by RM systems are superior (e.g., Andrawis et al., 2011; Claeskens et al., 2016; Clemen, 1989; Hsiao and Wan, 2014; Jaganathan and Prakash, 2020; Schwartz et al., 2021) and that it is, at least partly, due to the fact that the sources’ effectiveness, or accuracy, differ across the time horizon. Each source can potentially “balance” out relative weakness (inaccuracy) of the other source to various degrees as time nears the date of stay. Specifically, reservation-data based forecasting models, from the simple pickup methods to the more advanced KNN (K nearest neighbor) and ANN, (artificial neural networks) are theoretically and practically bound to generate more accurate forecasts the closer it is to the date of stay. The intuition is rather straightforward. The closer it is to the date of stay the more that number on the books is closer to the “actual” and the shape of the booking curve is more informative because it has more data points. Perhaps the best illustration of this difference is to consider predicting occupancy a day ahead of the date of stay. On that day (the shortest forecasting horizon) the number of reserved rooms is most often very close to the actual since very few rooms are typically added on the date of stay.
As such, previous work that tested the time/accuracy question using a single source instead of the best practice approach of multiple combined sources, and most often looked at time-series models applied to actual stays, may have led to biased inconclusive results. Therefore, we formulated the following hypothesis:
H1: The longer the forecasting horizon, the less accurate the computer-generated forecast.
User overrides
Integrating subjective human judgment in the RM system is common practice in the hotel industry (e.g., Koupriouchina et al., 2014; Schwartz, 1998; Schwartz and Cohen, 2004), though the literature on the effect of these subjective forecast adjustments is scarce. Borrowed from airline forecasting, the best practice of hotel occupancy forecasts combines three sources of information: a forecast based on data from booking curves (i.e., Reservation on Hand), time-series predictions based on historical stays, and exogenous information (e.g., Belobaba, 1987; Haensel and Koole, 2011; Schwartz and Hiemstra, 1997). User overrides of algorithmic predictions, that is, humans subjectively adjusting computer-generated predictions, is the third prong of the forecast combinations approach to hotel daily occupancy forecasting. In this way, exogenous and relevant information, that is the knowledge beyond “what the system knows,” can be added. Schwartz et al. (2016: 270) explain that the trigger to override often includes the presence of additional knowledge, exogenous to the system, such as: “a large new convention in the destination, a large marketing campaign, the opening of a new tourism attraction, a reduction in the attractiveness of competing tourism attractions due to new circumstances, changes in the carrying capacity of airlines flying to the destination, ongoing or anticipated health/political/climate crisis, … an abrupt change in hotel reputation or hotel capacity.”
One approach to incorporating this additional information is to “inform the computer system,” by adding (flagging) a special event in the system and indicating the anticipated impact of this event (e.g., an increase of 3%). Another common practice, though not necessarily optimal, is to judgmentally adjust the system forecast. Several studies have found support for the practice of subjective “intervention,” noting that statistical forecasts can be made more accurately if experts judgmentally adjust them (Donihue, 1993; Fildes et al., 2009; McNees, 1990; Turner, 1990). The underlying cognitive mechanisms behind judgmental adjustments are, however, complex, and it is argued that research so far has only “touched the tip of the iceberg” (Gönül, 2013). Moreover, a number of studies have demonstrated that forecasters often make unnecessary judgmental adjustments (Lawrence et al., 2006). Interestingly, training in forecasting methods does not seem to improve forecasters’ ability to extrapolate, and forecasters continued to make adjustments even when informed they were reducing accuracy (Goodwin and Wright, 1994; Lim and Oconnor, 1995). Research on “big losses” (i.e., judgmental adjustments that significantly reduced the forecasting accuracy) also identified an interesting phenomenon—after the first big loss, forecasters were still more likely to make another large adjustment, even though it increased the probability of a second big loss (Petropoulos et al., 2016).
Many of these findings are in the context of Stock-Keeping Unit (SKU) forecast adjustments so their applicability to hotel occupancy forecasting should be interpreted with caution; there is a substantial difference between hotel RM forecasting and SKU-based forecasting. For example, users of a forecast sometimes adjust the forecast in order to “control” the forecast (Gönül, 2013). In this regard, the SKU forecast creator and the forecast user are different, while in hotel RM, managers are usually co-creators of the forecast as they are involved in the setting up of forecast parameters by mapping special events in the system and indicating their impact on the forecast. Should substantial and consistent discrepancies be found in forecasting, revenue managers are encouraged to contact the system provider, “open a case,” and work directly with the system provider on investigating inconsistencies and re-calibrating the model (IDeaS, personal communication, July 2016). Thus, in contrast to managers in an SKU forecasting context, hotel revenue managers may be less inclined to make forecast adjustments solely in a pursuit of gaining the sense of control or ownership.
Several publications have explored user interventions in airline forecasting, a business operational context which has many similarities with hotel forecasting. An experiment at Northwest Airlines where a subset of markets was put on autopilot (not allowing human interventions to the forecasts) found that it led to a decrease in revenue. Bach (1999) reported that analysts increased revenue by 7.9%. At America West airline, a study demonstrated the possibility to identify and correct flights with strong negative bias (Loew, 2000). At US Airways, Zeni (2003) estimated that revenue analysts’ interventions added up to 3% incremental revenue. A study by Mukhopadhyay et al. (2007) demonstrated that user interventions were valuable, especially when competitors added flights to certain routes. Finally, based on a comprehensive review of RM forecasting models, Weatherford (2016) concluded that human judgment in airline RM forecasting is indispensable.
Multiple tourism studies explored judgmental forecasting in general, and the more particular question of subjective adjustments of algorithmic forecasts. Recently, Lin (2019) demonstrated that expert adjustments of forecast of tourist arrivals to Hong Kong proved more beneficial when the algorithmic predictions had large variability. Similarly, earlier studies by Song et al. (2013), Lin (2013) and Lin et al. (2014) provide support to the notion of increased in accuracy when time-series algorithmic predictions of tourists’ arrivals are adjusted by expert judgments through a Delphi survey.
Despite its importance, little attention has been paid to judgmental adjustments in the context of hotel RM forecasting. Kimes (1999) investigated the effect of forecast updates, using four months of group forecast data of a single hotel chain in North America. The study found that the “frequency of updates was associated with increased forecast accuracy from 0-11 days before arrival” (Kimes, 1999: 1109). Other studies (Koupriouchina et al., 2014; Schwartz and Cohen, 2004) demonstrated that managerial overrides are beneficial though prone to biases, akin to claims in the generic forecasting literature (e.g., Bearden et al., 2008; Fildes et al., 2009). Most recently, a study by Schwartz et al. (2021) indicates that as learning occurs—as the computer learns and improves its forecasts—the effectiveness of overrides declines. In other words, the ability of subjective adjustments to improve the machine’s predictions diminishes as the accuracy of the machine predictions improves over time.
In this context, it is important to consider the typical characteristics of effective hotel RM practice where the demand for the same date of stay is re-forecasted as time passes. These repeated forecasts are necessary because they support decisions on how to optimally adjust RM controls in dynamic markets. It follows that hotel revenue managers repeatedly adjust, or override, these repeated forecasts as they deem necessary. This leads to the question, is repetition—the frequency of overrides—associated with accuracy? In other words, is an override likely to be more (or less) accurate because it was performed multiple times in the past for the same date of stay?
There are two opposing arguments in this regard. On the one hand, if the forecast for the same date of stay has been overridden multiple times, it might indicate a higher perceived volatility, a higher uncertainty, or an inability of the algorithm to adjust to new information and dynamic market conditions, hence the need to repeat overriding. The more difficult and challenging the situation, the less likely the override will succeed in fully correcting the algorithmic prediction and generate an accurate forecast. As such, it is reasonable to expect that a high frequency of overrides might be associated with a less accurate computer forecast. On the other hand, a higher frequency of revenue managers’ interventions might also indicate that the revenue manager is more competent, as they continuously monitor and assess the computer output and are on “standby,” ready to adjust things as necessary. This might be a signal of “high quality” and therefore indicate that a higher number of overrides (superior RM practices) is associated with more accurate forecasts. In lieu of prior research or empirical evidence on the topic, it is, however, difficult to assess which argument prevails, and consequently our second hypothesis is non-directional:
H2: The frequency of judgmental forecast adjustments is associated with different forecast accuracy (higher or lower).
Method
Data
A major Revenue Management System (RMS) provider, serving over 1.6 million hotel rooms in 124 countries, provided the anonymized data. A protocol, logical checks, and detailed code books were established to ensure correct interpretation and understanding of the extracted data as well as their underlying forecasting processes. From the 196 variables available, 15 were found to be applicable for testing the hypotheses. From the 25,003,446 cases obtained, 1,718,856 (6.8%) were incomplete. A further 3,202,617 (12.8%) cases were removed during data checks, bringing the final sample to 20,081,973 (80.3%) cases. Observations were excluded from the study for two reasons: missing data and time-horizon data incompatibility. The IT department of the RMS provider couldn’t clarify the reasons for the missing data. Possible reasons include corrupt files, software glitches, system upgrades and user errors. The second reason was time-horizon incompatibility. Not all subjective predictions’ horizons, that is, how far in advance they were made, matched the algorithmic predictions’ horizon. However, the measures of the hypothesized accuracy improvement due to user override must be horizon specific. That is, both the algorithmic prediction and the user override must be done at the same day and for the same future date of stay. Accordingly, these observations with no “counterpart” had to be removed.
Number of hotels by region and type.
Hotel size by hotel type.
Dependent variables
Error measures are known to yield contradicting results (Davydenko and Fildes, 2013; Koupriouchina et al., 2014; Makridakis, 1993; Pereira, 2016; Trapero et al., 2011), so we followed recommendations by Koupriouchina et al. (2014) in selecting three scale-independent forecast accuracy measures as dependent variables. First, given the widespread use of percentage-based accuracy measures, the non-aggregated APE (Absolute Percentage Error) was selected to allow for comparison with existing work. Second, as APE is not symmetric and rewards under-forecasts (e.g., Armstrong and Collopy, 1992; Foss et al., 2003; Gneiting, 2011; Makridakis, 1993), its symmetric equivalent sAPE (symmetric Absolute Percentage Error) was chosen as a second measure even though Goodwin and Lawton (1999) and Koehler (2001) showed that it is not invariably symmetric. Third, the relative measure LnQ (log “forecast to actual” ratio) was selected as it is dimensionless, produces less skewed results than sAPE, and is thus considered superior to APE (Tofallis, 2015).
Independent variables
Two independent variables were selected. First, days to arrival (DTA) was selected at seven forecast horizons (values: 0, 1, 3, 7, 21, 84, 252 days prior to arrival). Second, the frequency of judgmental adjustments made for each specific day of arrival was selected. This continuous variable, which ranged from 0 to 28, was labeled as the number of overrides (numOVER).
In addition, five control variables were selected, two at within-hotel and three at hotel level. The within-hotel level variables included business type and special events, coded as BusinessType (0 = transient, 1 = group) and SpecialEvent (0 = no special event, 1 = special event). The business type control variable was included because of differences in the characteristics of transient and group demand. Special events were included due to their effect on the reliability of forecasts based on historical data, and, thus, accuracy. The hotel-level variables pertained to hotel size, type, and region. HotelSize is a hotel’s capacity in rooms ranging from 6 to 4028. As it does not have a meaningful zero, it was centered on its grand mean. HotelType comprised a variable with aggregated hotel type categories: airport hotels, casino hotels, city hotels, and resort hotels. This variable was coded as a dummy variable, with city hotels as the reference category. Both hotel size and hotel type were added as hotel-level control variables, as previous studies have shown these to be determinants of hotel performance (Claver-Cortés et al., 2007; Pine and Phillips, 2005). HotelRegion had the following categories: Asia, Australasia, Central America, Europe, Middle East and Africa, North America, and South America. It was also coded as a dummy variable, with Europe as the reference category. This variable was included as the development of revenue management might be different in geographical regions, as explained by Kimes (2008). She found that respondents from Asia are more likely to believe that the various technical aspects of RM are in need of improvement compared to North American respondents. She suggested that this difference could be explained by the slower development of revenue management in Asia compared to that in North America.
Analytical approach
A variation of multilevel analysis commonly described as repeated measures design or longitudinal data multilevel analysis was applied (Heck et al., 2014; Hox et al., 2018). Forecasting accuracy at DTA was considered at level 1, nested within hotels (level 2). Two subscripts to describe hotels (i) and occasions of measurement (t) were used. It was assumed that the observed status Y ti at time t for hotel i is a function of a systematic change trajectory plus random error. We followed Hox et al. (2018) five-step multilevel modeling approach for repeated measures data.
First, an intercept-only (null) model was developed to test whether the grand-mean intercept for forecast error (APE, sAPE, LnQ) varied across hotels. The repeated measures (level 1) variance was estimated as 36,399.14, and the hotel-level (level 2) variance as 361.82 (Wald Z = 29.448, p < .001). The intra-class correlation (ICC) at the hotel level was estimated for APE as
Second, we added five level-1 (within-hotel) variables to the model, three explanatory (DTA, DTA quadratic, numOVER) and two control variables (BusinessType, SpecialEvent). 1 Starting with DTA, the predictors were added one at a time. Whereas DTA quadratic was found significant, its effect was very small. For reasons of parsimony, it was excluded. Moreover, models were formulated to explore interactions between the level-1 variables. For example, as demand patterns can differ during special events, it was verified whether the relationship between number of overrides and the improvement in forecasting accuracy depended on the presence or absence of a special event.
Third, three level-2 (hotel-level) control variables were added to the level-1 model of step 2. Except for a random intercept and the level-1 residual, all variables were still treated as fixed (i.e., coefficients were not allowed to differ across hotels).
Fourth, we assessed whether the intercept and linear component (DTA) varied randomly across hotels (i.e., forecasting accuracy and the coefficient of DTA varied across hotels). To assess whether forecast errors evolved differently over time for the different hotels, seventeen fixed effects, one random intercept (
Finally, a cross-level interaction between HotelSize (level-2 control variable) and DTA (level-1 explanatory variable) was added. Scaled identity was selected as the level-1 covariance structure, and unstructured as level-2 covariance structure.
To evaluate all models from steps 1 to 5, improvements in the model fit were assessed with reference to a deviance statistic (−2 loglikelihood ratio), that is distributed as χ2 with associated degrees of freedom, and by examining the Akaike and Bayesian Information Criteria (AIC and BIC).
Results
Descriptive statistics
Figure 1 illustrates the relationship between the mean forecast error and forecasting horizon for all forecasts. On average, the forecasting error decreases over time in a non-linear fashion. Relationship between mean forecast accuracy and days to arrival.
Means and standard deviations of absolute percentage error, symmetric absolute percentage error and LnQ by days to arrival.
Note. Standard deviations are in parentheses.
Forecast errors by hotel type.
Multilevel analysis
Results of the multilevel regression models for APE.
Note. Standard errors are in parentheses.
DTA: days to arrival; numOVER: number of overrides. BusinessType (0 = transient, 1 = group), SPECIAL (0 = no special event, 1 = special event)
*p < .05, ** p < .01, ***p < .001
Results of the multilevel regression models for sAPE.
Note. Standard errors are in parentheses.
DTA: days to arrival; numOVER: number of overrides. BusinessType (0 = transient, 1 = group), SPECIAL (0 = no special event, 1 = special event)
*p < .05, ** p < .01, ***p < .001
Results of the multilevel regression models for LnQ.
Note. Standard errors are in parentheses.
DTA: days to arrival; numOVER: number of overrides. BusinessType (0 = transient, 1 = group), SPECIAL (0 = no special event, 1 = special event)
*p < .05, ** p < .01, ***p < .001
Considering model 9 in Table 6, a value of 5.47 for DTA=0 is found, and this value increases by 0.14 with each additional day. In other words, forecast error increases when DTA increases, and vice versa. This supports Hypothesis 1 (forecast accuracy improves as date of arrival nears). Furthermore, to explore whether this change in DTA is impacted by hotel size, a cross-level interaction between hotel size and DTA was estimated. A small and insignificant effect can be observed, suggesting that the impact of DTA on forecast error is independent of hotel size.
Focusing on the effect of the number of user overrides on forecasting accuracy, it can be observed that for each additional override, forecast error increases by 0.13. However, when the within-hotel level control variable business type is considered, a more nuanced picture emerges. That is, the effect of override number on accuracy depends on (i.e., is moderated by) the business type. As the positive coefficient shows, the forecast error (sAPE) for group business is, on average, 16.53 higher than the forecast error for transient business. However, the negative coefficient of the interaction term (numOVER × BusinessType) indicates that for group business, the forecast error decreases by 1.31 with each additional user override. To a much lesser degree this also applies to special events, inasmuch that in the presence of a special event the forecast error decreases by 0.09 with each user override. This provides partial support for Hypothesis 2 (hotels with a higher frequency of forecast overrides have higher forecast accuracy). A higher number of user overrides is beneficial for group business, but damaging for the transient segment.
Analyzing the hotel-level control variables, hotel size is found to positively impact forecast accuracy. Moreover, resort hotels have higher and casino hotels have lower forecast accuracy than city hotels. Airport hotels and city hotels do not differ in forecast accuracy. Hotels in North America and Australasia have a higher forecast accuracy than hotels in Europe, which in turn have a higher accuracy than hotels in Asia, Middle East and Africa, and South America. There is no difference in the forecast accuracy of hotels in Europe and in Central America.
The findings are robust for all three accuracy measures (APE, sAPE, LnQ), as shown in Tables 5–7. Comparing the successive five modeling steps, each subsequent and more complex model (1–9) is preferred based on the χ2—difference test and the AIC and BIC statistics.
Discussion
The findings of this large-scale investigation suggest that the accuracy of hotel occupancy forecasts improves as time nears the date of stay, and that user override frequency impacts forecasting accuracy depending on the business segment. Frequent overrides to the group segment positively affect the forecasting accuracy at different forecasting time horizons. The opposite holds for the transient segment. The study thereby contributes to the growing interest in the impact of judgmental adjustments in forecasting, in particular across time horizons.
Similar to Franses and Legerstee (2011) who also examined SKU forecasting, Van den Broeke et al. (2019: 41) found that “the closest time horizon does not necessarily produce the most accurate adjusted forecast,” in particular when the statistical forecast is updated over the time horizon. The researchers argued that forecasters may put too much weight on their own judgment relative to the statistical model. They also found that “adjustments far away from the sales point contribute little to, or damage, forecast accuracy” (Van Den Broeke et al., 2019: 39). Our findings, investigating a hotel revenue management operating environment, provide new insights. We found a significant relationship between the frequency of user override and forecasting accuracy at seven different forecasting time horizons (0, 1, 3, 7, 21, 84, 252 DTA). The impact of overrides, however, is dependent on the business type. A higher frequency of judgmental adjustments improves the forecast accuracy of group business, but damages transient forecast accuracy. In other words, RMS users appear to face difficulties with improvements to computer-generated predictions of transient business, but add value when they override the RMS group business forecast. This is an important insight, as group business plays a key role in terms of occupancy and revenue forecasting. According to Anderson and Xie (2010: 57) “many city hotels have upwards of 50 percent of their reservations blocked far in advance by large groups at discounted rates for conferences or special events.” As such, group business decisions have a direct impact on the room availability for the transient segment. Once rooms are blocked for a group, they become unavailable for other segments. Decisions about group acceptance thus directly impact the revenue management optimization cycle for the entire hotel. For full-service hotels, the group segment may represent more than half of the total revenue (Cross et al., 2009; Hormby et al., 2010).
There are a number of reasons for why modeling group forecasts is difficult in hotel RM. The first is linked to the nature of the group business: “groups demand blocks of rooms, introducing a combinatorial aspect to the optimization problem” (Hormby et al., 2010: 49). A related reason is that the nature of group cancellations is more variable and statistically uncertain (Sierag et al., 2017). A third reason is that the information necessary to calculate the unconstrained demand is not always available and/or reliable. For transient demand, statistical methods can be used to infer unconstrained demand. However, for group demand, lost and turndown data need to be recorded; in practice these are not always logged or completed by the sales team (Hormby et al., 2010). This may explain why judgmental adjustments provide value to group forecasting.
These modeling challenges have significant consequences. El Gayar et al. (2011) point out that the majority of existing forecasting models in hotel revenue management ignore group reservations and assume a pre-defined probability distribution to represent group arrivals. They argue that ignoring the group is a critical limitation, as groups can sometimes constitute the main guest segment. Furthermore, assuming a probability distribution for group demand can lead to inaccurate revenue management decisions, as hotel arrival characteristics tend to have a dynamically changing and evolving nature. For example, in a single small hotel case study, (Sierag et al., 2017) found evidence for an inhomogeneous Poisson nature of the probability distribution function. Furthermore, in the absence of user overrides, there are indications that these modeling challenges result in higher forecasting errors for group forecasts. In one hotel chain study, Kimes (1999) reports substantial group forecasting errors, depending on the time horizon with MAPE, of up to 40% two months before arrival. Whereas no direct comparison was made with the transient segment, the errors Kimes (1999) found for group business were much higher than the forecast errors reported by other studies at the total hotel level (Schwartz et al., 2016; Yuksel, 2007), or by room type (Pereira, 2016).
It is known that revenue managers spend substantial time monitoring and adjusting group forecasts. Noone and Hultberg (2011) found evidence that hotel revenue managers use multiple sources of information for group forecasting, including information about specific group characteristics and external information. They also revealed that revenue managers explicitly refer to communication with the sales team as part of their group forecasting activities. Insights obtained from the sales team, who are in direct contact with the group, contain important and the most updated information about the final group size or any changes affecting the final group demand. Since group demand modeling has important limitations, revenue managers thus adjust group forecasts using multiple sources of information exogenous to the RMS. This third prong of the forecast combinations approach to hotel occupancy forecasting, acting on exogenous and relevant information (i.e., knowledge beyond “what the system knows”) is where judgmental adjustments seem to add value to the RMS (Schwartz et al., 2016).
Limitations and future research
Although subject to limitations, our study points to multiple promising venues for further research. The first is related to a key limitation of this study. Given the enormous size of the data set and constrained computational capacity, this study analyzed seven forecasting horizons. Future research with access to more powerful and advanced computing resources, as well as data storage capacity, could potentially extract more data and analyze a wider range of forecasting horizons. This could improve the model estimates and, in particular, their non-linear form as our findings appear to agree with previous research on the relationship between time and forecast accuracy, for both linear and non-linear components (e.g., Schwartz, 1998; Pereira, 2016; Rajopadhye et al., 2001).
Additional area for future improvement is related to the estimation of the polynomial functions. Given that the components of a polynomial function are often correlated, Heck et al. (2014) recommend using an orthogonal function if a trajectory estimation is involved. We did not adopt this approach because it requires the observations to be equally spaced. Future research might follow Hox (2010) suggestion regarding using orthogonal polynomials if repeated measurements are not exactly spaced, and contrast the results with the non-orthogonal findings.
Another limitation has to do with the aggregation levels. Our understanding of overrides performance and the implications on revenue management effectiveness could greatly benefit from a study that explores overrides at a more granular level. For example, extending the scope beyond the group/transient dichotomy addressed in this study and contrast performances across the various commonly-used sub-segments and distribution channels. Another related future improvement could be to examine the role of additional operational factors on the impact of time on the accuracy of judgmental adjustments. These include the size of overrides, their direction (positive or negative adjustment), and the experience of the expert overriding the forecast.
In periods of extreme crisis situations such as the recent covid-19 pandemic, historical patterns of demand on which algorithmic prediction rely, are effectively useless. It was demonstrated during 2020 and 2021 in all but a few markets globally. This could also be true for the period following the crisis until stable demand patterns are reestablished to enable the desirable level of algorithmic forecast accuracy. In these periods, human assessment proves invaluable. Future research could define and quantify the crisis-induced level of pattern disruption. It could then be beneficial to assess how the level of pattern disruption affects the relative importance contribution of user overrides to the forecast accuracy. In addition, in situations where the historical patterns prove useless, the importance of advanced booking data as the source of valuable information increases considerably. However, a related issue of shrinking booking curves makes it less likely to be that beneficial. What works remains to be explored in future research. In this context, this study did not explore the impact of the characteristics of the user overriding the computer forecast. Relevant experience skills and even personality traits such as self-confidence could affect the accuracy of the subjective adjustment. This is certainly an interesting topic to explore in future research.
Finally, and perhaps the most exciting beneficial research in this domain of subjective overrides of machine predictions, is to assess the overall effectiveness of overrides. In other words, it has yet to be established to what degree (if at all) overrides improve the accuracy of algorithmic predictions, and what are the circumstances in which subjective overrides are more likely to improve the forecast accuracy in hotel revenue management.
Footnotes
Acknowledgments
The authors gratefully acknowledge the insights and continuous support of Dr Helen Pluut and Dr Peter van der Zwan (Leiden University) who greatly assisted this work.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Netherlands Organisation for Scientific Research (NWO) (023.002.090).
