Abstract
Revenue management (RM) systems forecast demand and optimize prices to maximize a hotel’s revenue. The RM function operates in coordination between a system and an analyst. Systems provide recommendations while analysts review the forecasts and prices to approve or make subjective adjustments. In many cases the recommendations are a “black box” with little insight regarding how recommendations are derived. This article proposes the k-Nearest Neighbor (k-NN) algorithm as a forecasting approach that can transition the “black box” to a “glass box.” The benefits of the k-NN are discussed in detail and compared with neural networks. The analysis is conducted on 35 hotels in partnership with a leading RM service provider. The results indicate similar performance for both techniques, leading to an important discussion on model evaluation outside of accuracy. In particular, the article discusses some of the unique advantages k-NN provides for the RM discipline.
Introduction
Revenue Management (RM) research continues to advance in the areas of forecasting (Fiori and Foroni, 2020, Webb et al., 2020) and optimization (Brunato and Battiti, 2020, Petricek et al., 2021) with the goal of improving pricing decisions. While these efforts are warranted, the fundamental components of the system remain masked, and as such, less useful to users. These algorithms run behind the scenes, producing recommendations with little explanation as to how they are derived. While RM systems have long been considered a “black box” (Skugge, 2004, Varini, 2012), the recent emergence of data science has further concealed the driving elements of system-based recommendations. For instance, neural networks have become one of the most popular tools for improving forecasts, but the increase in accuracy comes at the cost of transparency and diminished managerial insights.
For many firms, RM systems operate in conjunction with analysts. It is the job of these analysts to oversee system guidance and implement or override recommendations (Egan and Hayes, 2019, Oancea, 2018, Rennie et al., 2021, Weatherford, 2009, Westermann, 2015). This structure is preferable because analysts may have market knowledge the system is unaware of (Egan and Hayes, 2019). When this occurs, analysts make subjective adjustments based on their knowledge and experience. Many times, the analysts don’t know what the system has accounted for and may lack the time and tools to make calculated adjustments (Oancea, 2018). Industry has recognized this challenge as Richard Ratliff the Executive Scientist for Sabre Labs recently discussed RM systems stating, “it is incumbent on researchers to make sure that the results are a glass box instead of a black box” (Walson, 2022).
This paper strives to answer this call by reconsidering how RM forecasts are evaluated. Rather than focusing on accuracy, which may encourage “black-box” approaches, Voulgaris (2019) suggests considering a broad range of factors including methodology, accuracy, and usefulness. This framework captures the big picture of an RM forecast, including developers, optimization algorithms, and the end-users with the overall goal of improving the RM function.
Formally, this paper explores the application of the k-Nearest Neighbor (k-NN) algorithm under the evaluation criteria suggested by Voulgaris (2019). The k-NN algorithm uses pattern matching to identify a subset of similar historical observations to generate a prediction. The primary advantage of k-NN rests in the premise that the subset of observations for generating the forecast are known. From a RM perspective, the similar dates could be used by managers to enhance decision-making and subsequently trust in the system. Furthermore, k-NN identifies neighbors through a similarity score that is easy to calculate and comprehend. Therefore, its methodological obstacles are mitigated. Ultimately, the interpretability of the technique and its resulting output allows for a transparent forecasting approach, unveiling a portion of the RM function, namely, forecasting.
The investigation compares the performance of k-NN to a multi-layer perceptron neural network (ANN) in conjunction with a third-party hotel RM service provider. While prior research has suggested models similar to the k-NN structure, these approaches were only rudimentarily explored (Schwartz and Hiemstra, 1997, Webb et al., 2020). This study constitutes the first large-scale investigation of k-NN for RM forecasting. Specifically, forecasts were developed for 35 properties, across 30 horizons, comparing five different reservation features, while evaluating 17 different combinations of k. In total, 1050 models are identified for each technique (k-NN, ANN) and are statistically compared using both parametric (Matched Pairs t test) and non-parametric (Wilcoxon Signed-Rank Test) approaches. The results show that when both models are provided with the same information, they perform similarly across most instances, highlighting the importance of evaluating methods across a range of criteria. The paper discusses the practical value of k-NN for industry practitioners that may consider implementing the model to augment RM decisions. Furthermore, the performance of k-NN helps to address a key theoretical question regarding the data required to produce accurate RM forecasts (Weatherford and Kimes, 2003).
The article is organized as follows. The literature review provides an overview of RM systems and their forecasting applications, highlighting the human interaction in this process and the prevalence of overrides. This leads to a discussion of forecasting evaluation outlined by Voulgaris (2019), discussing the consideration of methodology, accuracy, and usefulness under the context of k-NN and ANN approaches tested in this study. The methodology section details the data provided by our RM service provider, as well as our model development and evaluation. The results section compares the performance of the models, followed by a discussion of the findings. Finally, the article provides concluding remarks regarding practical and theoretical implications as well as limitations and thoughts for future research.
Literature review
Revenue management systems
Firms employing RM systems may build company specific algorithms or employ vendor-based solutions. In practice, the algorithms routinely receive input data (e.g., reservations and historical sales) to forecast demand, which is used to allocate inventory at optimal prices. From the user perspective, the results of the system provide updated forecasts and prices for each day; however, the underlying factors (algorithms, variables) deriving these forecasts and prices are typically complex and unknown (Oancea, 2018).
Firms employing RM systems have teams overseeing the system recommendations (Egan and Hayes, 2019, Oancea, 2018, Rennie et al., 2021, Skugge, 2004, Westermann, 2015, Weatherford, 2009). This human-supervised structure is necessary because the algorithms optimize solely on the data required for running the model. However, external forces such as events, economic conditions, travel restrictions, and weather will often impact demand in ways that systems are not designed to capture, nor recognize (Egan and Hayes, 2019, Rennie et al., 2021, Skugge, 2004). In the simplest setting, a local event may increase demand for a property unbeknownst to the system, thus miscalculating its forecast and subsequent pricing recommendations (Rennie et al., 2021). In these instances, revenue managers can override the systems by manually adjusting the forecasts and rates to improve performance.
In the context of human oversight and intervention, one of the primary challenges of the analyst is to determine when to trust the system and when recommendations should be questioned (Egan and Hayes, 2019, Rennie et al., 2021, Skugge, 2004). Revenue managers are faced with decisions regarding prices for different room types as much as a year in advance, sometimes across a host of properties. Under these circumstances, managers must make a significant number of daily decisions. While research continues to suggest the automation of RM decision-making through big data and analytics (Wang et al., 2015, Kimes, 2017), Egan and Haynes (2019) reveal that this is not the case. In an industry survey, they find the majority of hotel RM decisions are made in conjunction between the system and analyst. Their findings indicate that revenue managers use systems as a guide, while relying on local market knowledge to override the system for more tactical pricing strategies. Furthermore, research suggests that overrides are necessary for optimization when expected demand deviates from system-based predictions (Oancea, 2018, Rennie et al., 2021).
Recent research suggests that override decisions are a common practice in RM applications (Egan and Haynes, 2019, Rennie et al., 2021). In a survey of airline RM systems, it was found that less than 20% of flights were managed in a “hands-off” manner (Weatherford, 2009). In the hotel segment, Schwartz et al. (2021) observed that 39% of 20 million forecasts were overrode, while Koupriouchina et al. (2022) reported instances of up to 28 overrides for an individual day. In other words, the decision to override algorithmic predictions is a frequent occurrence with significant implications for hotel performance. At the same time, the override decision is also subjective and based on intuition. Analysts must rely on system recommendations in conjunction with their knowledge and experience (Egan and Haynes, 2019, Rennie et al., 2021, Schwartz et al., 2021). Interestingly, Koupriouchina et al. (2022) find that overrides for transient demand tend to be less accurate than system generated forecasts, while group overrides were beneficial. This highlights the importance of the feedback loop and availability of information, where the analyst may have information about the group that the system is unaware of. However, the transient segment is less known and the number of individual decisions (price, segment, distribution channel) does not permit analysts the ability to review every rate in detail.
Building upon the complexity, the information RM systems provide to analysts is limited. Systems generally indicate a forecast and price, while more advanced systems may consider individual segments and competitor rates (Hertzfeld, 2022). While these components are the fundamental elements of the RM system, how the forecasts and rates are determined is largely unknown to the analyst. For this reason, systems have been labeled by the industry as a “black-box” (Pereira, 2016; Varini, 2012; Westermann, 2015;). Acknowledging the analyst’s role, overrides should be based on the analyst’s ability to realize what the algorithms can solve and what the system lacks (Westermann, 2015, Oancea, 2018). In many cases, the decisions are based on market perceptions and year-over-year tracking due to a lack of information and algorithmic insight (Egan and Haynes, 2019). Therefore, analysts may misjudge what the system has accounted for and make incorrect adjustments negatively impacting performance. In the worst-case scenario, they may begin to mistrust the system and stop using it altogether (Skugge, 2004).
It is therefore imperative that RM begins to consider approaches to unveil the fundamental components of the system to assist revenue managers in their decision-making. Improvements in this domain may lead to meaningful contributions that augment analyst’s decisions and improve the RM function. Accordingly, this paper considers k-NN, a fully transparent forecasting technique that demonstrates the underlying structure and logic of its predictions, and as such, can provide useful information such as relevant reference dates for revenue managers to consider in their decision-making.
Revenue management forecasting
RM performance is directly linked to the accuracy of the forecast as these predictions feed the optimization component of the system (Talluri and Van Ryzin, 2004: 407). Research in RM forecasting has consistently evaluated the applicability of new models to the discipline (Fiori and Foroni, 2020, Pereira and Cerqueira, 2021, Lee 2018, Ampountolas and Legg, 2021). As methods continue to evolve, much of the improvement is in the subdomain of neural networks, which is the initial building block in machine learning. Over the last decade, the advancements in computational abilities have allowed neural networks to become widely applied and utilized in hotel RM (Huang and Zheng, 2021, Lee et al., 2020, Wang and Duggasani, 2020, Webb et al., 2020). In general, recent studies suggest that neural networks are more accurate than many of the traditional approaches.
The advancements in computational abilities have also paved the way for other data mining algorithms. k-Nearest Neighbor (k-NN) is a filtering algorithm that identifies k similar historical records and uses those observations to make a prediction. The technique’s underlying premise is that similar records should have similar outcomes. k-NN is not new, but unlike ANN it has received little attention in the RM discipline and in hospitality in general. So far, only simplistic explorations in hotel forecasting exist (Schwartz and Heimstra, 1997, Webb et al., 2020). This lack of interest is likely due to computational requirements: k-NN relies on comparisons between the observation in question, and a historical database of observations which is computationally intensive. As such, it requires machine processing capabilities which were not commonly affordable or accessible in the hospitality industry. The recent advancements in computing make it more feasible to update k-NN forecasts in real-time across a number of dates and horizons.
Forecasting evaluation
Forecasting research evaluates model performance by identifying the models that produce the lowest error. This is because accurate models allow systems to make informed decisions, and in the case of RM, it allows further optimization of hotel performance. While accuracy is the primary indicator for model selection, the choice between models is not always clear. For instance, a model may work well under certain market conditions but not in others. Alternatively, the performance may be similar across models with only minimal differences in error. Furthermore, the results may differ based on the error metric selected (Koupriouchina et al., 2014). Therefore, it is important for forecasters to consider a range of criteria.
The importance of multiple criteria is discussed by Makridiakis et al., (Makridiakis et al., 2008: 534) who state, “selecting any method should not, under any circumstances, be solely based on the methods accuracy or its statistical sophistication or complexity”. Rather, several considerations, such as the availability of data and its characteristics, the frequency of the forecast, and the general understanding of the model’s assumptions, limitations, and results, are critical to successful model development (Makridiakis et al., 2008: 534). An enhanced understanding of the model allows managers to assess when to trust the prediction or use judgment for enhanced decision-making. More recently, Voulgaris (2019) investigated the elements of a good forecast, synthesizing considerations across scholars and disciplines. The synthesis provides a framework of three elements to consider when evaluating forecasting techniques, namely, (1) methodology, (2) accuracy, and (3) usefulness. Methodology describes the process of constructing the forecast such as the input data and the training and development of algorithms. Accuracy is concerned with methods that produce the lowest error, as well as the reliability of the estimates. Finally, usefulness describes the degree to which the forecast influences users to make better decisions, representing the applicability of the approach. The concept and measurement of accuracy are well known qualities, and they have been the focus of forecasting research for years. One of this study’s contributions is an expansion of the evaluation discussion in the domain of RM, contrasting two models, k-NN and ANN, in the context of all three criteria.
Forecasting evaluation–—criteria 1: Methodology
According to Voulgaris (2019), the methodology criteria include the key components of designing a forecasting model. First, the forecaster must consider what data is relevant and readily accessible for estimating the forecast. Many times, raw data is aggregated, transformed, or standardized to meet model requirements and assumptions. Subsequently, the forecaster must identify the methods that are best suited for the data structure and begin appropriate steps for model estimation. In many cases, the development time, data required, and statistical knowledge may force management to adopt less sophisticated techniques (Makadrikis and Wheelwright, 1980: 29). In the models presented here, we consider data that is commonly held by all hotels, the reservations on-the-books (OTBs). Reservations OTBs have been widely used in RM and are a valuable input for forecasting models as they generally surmise the macro and micro factors of a market (Tse and Poon, 2015). While both models can accommodate these data points, their development process is quite different.
ANNs provide the ability to capture complex relationships with a highly flexible structure and design. However, ANN model estimation requires several decisions during the development stage. First, the user must specify the model architecture, which is comprised of the input variables, the number of hidden nodes and layers, the activation functions, and the output layer. To date there is no best path for identifying the optimal architecture, and it is common to use a trial-and-error approach (Shmueli et al., 2017: 285). Researchers and data scientists have the flexibility to lengthen training time or add more hidden nodes and layers (deep learning) to increase the accuracy of the prediction. While the error can be minimized through added dimensions and iterations, it also introduces the risk of overfitting. This occurs when weights are updated to the point, they mimic the patterns in the training set. The result is very low estimation errors, but poor performance on new data (holdout sample), which causes the model to lack generalizability.
Ultimately, the process of designing a neural network is not trivial and requires significant testing. Furthermore, it is critical to monitor the training and validation performance, while limiting the number of iterations and estimating with a lot of data to ensure only general patterns are captured (Shmueli et al., 2017: 283). To minimize the risk of overfitting, forecasters may consider limiting the complexity of the network, the number of training iterations, and utilizing large datasets (Shmueli et al., 2017: 285). Overall, many hotel employees do not have formal training in this development process. This makes the practical understanding and interpretation of the results challenging when companies strive to employ advanced machine learning models such as neural networks.
For k-NN, the developer must select the variables, a similarity score, and k, when estimating the model. Similarity is determined by a distance function (ex. Euclidean, Manhattan) which is used to compare variables across observations. Equation (1) shows a similarity score using Euclidean distance between observation x (the observation the model is attempting to predict) and the historical known record u. The records with the smallest distance are most similar, and the smallest k historical records are used for prediction. From an estimation standpoint, k-NN is straightforward and understandable. The model compares the difference between common points, in this case the booking curve, to identify similar historical booking curves. The final number of rooms sold from the historical curves are averaged together to generate a forecast. In the final step, the errors are compared across a range of k neighbors to identify the optimal k. Unlike ANN, it does not require advanced statistical training that must consider the number of nodes, layers and activation functions.
A second notable advantage of the k-NN methodology considers the performance of the model over time. If the underlying data structure changes, forecasting performance may diminish (Webb et al., 2022) and techniques such as the multi-layer perceptron model may require retraining to account for these new relationships. This is because the weights were identified based on observed data points. When new observations enter the model with data points outside the expected range, the model may produce inaccurate forecasts and lead to poor decisions. Conversely, k-NN can simply add these new records to the historical database. Future observations that follow these new patterns will identify the added records as part of the k most similar instances and automatically use them in the prediction. Therefore, k-NN may be more adaptable to the dynamic nature of the travel industry.
k-NN also possesses some disadvantages worth noting, the first is that the selection of k is somewhat arbitrary. That is, other techniques solve for parameter estimates, while k-NN must test a range of k to identify how many neighbors to utilize in the prediction. Another disadvantage is the numerous comparisons required between the forecasted observation and the historical database of observations. This presents efficiency challenges because rather than simply multiplying parameters to variables, the similarity comparison must be calculated at each stage.
Forecasting evaluation—criteria 2: Accuracy
Accuracy is the most critical component when deciding which method to select because only accurate forecasts can help improve decisions. Prior RM forecasting research focuses solely on accuracy for model evaluation based on the lowest error. From a RM perspective, this focus aligns with the goal of the system because accurate predictions improve pricing recommendations. For example, greater accuracy may lead to price increases earlier in the window providing incremental revenue on each sale. From an algorithmic perspective, ANNs are known to have high predictive accuracy (Shmueli et al., 2017: 271).
Research indicates that models should be evaluated across a range of accuracy metrics (Koupriouchina et al., 2014). Koupriouchina et al., (2014) compare RM forecasts across 17 different forecasting errors demonstrating that the best performing model varied based on the evaluation metric. This finding presents great challenges for forecasters as the lowest error typically dictates model selection. In the best-case scenario, the most accurate model is the same across all the tested metrics. However, this is not always the case, and it is important for research to consider several error measures in its evaluation.
Finally, it is important to consider that size effect is as important as statistical significance when evaluating accuracy across models. When one model is more accurate than another, the question is simple: Does the marginal improvement in accuracy matter? That is, will this marginal improvement in accuracy impact decisions and outcomes, or is the difference negligible with little managerial importance?
Forecasting evaluation—criteria 3: Usefulness
The final category of evaluation is the usefulness of the forecast. Murphy (1993) defines usefulness as the degree to which the forecast influences users to make better decisions. In most cases, forecasts lead to a single decision, and it is not always possible to evaluate a competing choice. For example, would you have sold more rooms if the discount were left open? In this instance, only one price outcome can be observed.
At the same time, usefulness can also relate to the user’s ability to gain insights from a prediction. There is value in understanding the assumptions, advantages, and limitations of the method, as well as the ability to interpret its results (Makadrikis and Wheelwright, 1980: 29, Voulgaris, 2019). With greater understanding, managers can build confidence in the predictions, leading to enhanced trust for decision-making, such as when to use subjective assessment. Oancea (2018) highlights that it’s critical for revenue managers to understand the underlying algorithm and subsequent updates so that they can make more effective decisions.
Unfortunately, models have become increasingly data intensive and complex, and consequently the process by which inputs are converted to predictions has become even more opaque and unclear. As model complexity increases, the number of managers who can readily understand and effectively use the models decrease (Armstrong, 2001: 451). In particular, techniques such as neural networks have attained labels such as “black box” algorithms, since they mask the input to output relationship (Shmueli et al., 2017: 286, Voulgaris, 2019). With these algorithms, the model structure and data driven estimation process limits the ability of managers to identify which variables, relations, and patterns are important for prediction, as well as how they influence the prediction in terms of size and direction. This is because each input is weighted, scaled, and adjusted in coordination with the other variables through a sequence of functions and layers. In other words, there is no system for variable selection, and no opportunity to determine which variables are important at any stage of the model. Ultimately, the user receives a prediction that does not provide information regarding how it was derived that may assist or augment the decision-making process.
Conversely, k-NN is a technique that is simple to understand (Shmueli et al., 2017: 182) as the model uses a single calculation to identify similar historical observations. In essence, the model identifies and categorizes the “nearest neighbors,” that is, booking curves that are most similar in shape and volume. As such, it provides clarity regarding why the model is predicting the specific outcome. Furthermore, the algorithms filter the database to the most similar days. These “filtered” dates can be provided to revenue managers as an output of the system, thus removing the black box. Not only does this approach provide explanation to the prediction, it provides greater context to the future date by automatically identifying dates with similar patterns of behavior. In comparison, revenue managers may rely on year-over-year benchmarks as their primary reference point. Under the new structure, the RM decisions associated with those “filtered” days could provide intuitive understanding beyond the mere predictive outcome of the model. This valuable information has the potential to augment the decision-making process and mitigate uncertainty. Interestingly, it can help with both the tactical and strategic levels of RM decisions. Tactical improvements are concerned with better RM decisions on a day-by-day basis. It means, for example, making better pricing and rooms inventory allocation decisions for the specific day at hand. Contribution to RM strategy is due to a deeper understanding that stems from synthesizing the information over time. The long-term accumulation of information generates general and encompassing insights. These insights could support the formation of a better, more profitable, revenue management strategy. For example, a hotel might realize that certain types of days (i.e., dates of stay with certain types of booking behavior) respond better to certain “interventions.” If similar dates that increased prices tended to have greater total revenue than dates that held or decreased prices, the insight could improve performance.
For greater context, consider the scenario where a sales manager has just accepted a group block for a local event. The number of rooms sold (blocked) from the inventory adjusts overnight and the RM system then prices the remaining rooms based on the reduced inventory and forecasted demand. With the k-NN application, the algorithm would simultaneously identify similar dates that booked groups of similar size. These similar dates may have notes in the system regarding the type of event the revenue manager could interpret directly. In addition, these dates may provide advanced insights regarding how many rooms are typically picked up for this group and when actual bookings are to be expected, allowing the revenue manager to better anticipate pricing for the remaining inventory. Without k-NN, the revenue manager has to identify these similar dates on their own for any subsequent analysis. In this simple example, the advantages of k-NN are clear.
Summary
The RM system has largely been regarded as a “black box.” While generally deemed beneficial, the output of these systems is provided to analysts in a take-it or leave-it fashion. While prior research tends to focus on testing new models and variables to improve accuracy, the resulting applications don’t assist in unveiling the interworking of the RM system, nor in generating useful managerial insights. While accurate forecasts are required, a range of criteria should also be considered when evaluating these algorithms. In particular, it is important to consider the tradeoff between model accuracy and the usefulness and methodology criteria. If forecasts perform similarly, then the model that provides the most value to improve decision-making should be selected.
This study argues that the k-NN algorithm is a promising candidate model for many reasons outlined in Figure 1. From a methodological standpoint, it is simple to implement. Unlike contemporary machine learning models, the forecaster can easily incorporate the technique into the RM process with just a few lines of code. In addition, k-NN can easily account for changes in consumer behavior. New observations exhibiting the new behavior are simply added to the reference database, requiring no additional effort or human intervention. Finally, the technique can filter the historical database to the most relevant dates so that a revenue manager can review this information and make more informed decisions. Given the great potential of k-NN in providing valuable advantages to RM practitioners, the following section conducts a large-scale investigation into its performance in comparison to a MLP neural network, the most popular machine-learning technique. Advantages and Disadvantages of k-NN and ANN for Forecasting.
Methodology
Data
Data was obtained from a third-party party RM system that provides services to hundreds of hotels across the globe. In total, 35 anonymous properties agreed to participate in the study with data beginning at the start of 2016 until the end of 2020. Due to the disruptions of the pandemic, the 2020 data was removed from the analysis. Therefore, the estimation sample was comprised of data from January 1st, 2016 until June 30th, 2019, leaving the remainder of 2019 as the holdout sample. Each hotel provided the daily number of reservations or cancellations leading up to the date of stay. For each hotel, the booking curve (total number of reservations on-the-books) was calculated for every date beginning 90 days in advance. The booking curves prior to June 30th, 2019, were used as the historical database for k-NN, as well as the estimation sample for neural networks. The most recent booking curves beginning on July 1, 2019 were used to test the models by comparing these booking curves to the historical database of booking curves to test their ability to forecast the final number of reservations.
Model estimation
The focus of the analysis is to observe if k-NN can produce forecasts at par with the competing neural network. Several tests were conducted to identify the best combination of k and features for prediction. Specifically, a range of k-values of 1-15, 20, and 30 were tested with comparison features of OTBs reservations for 3, 7, 14, 21, and 28 days prior to the horizon of the forecast. The models were estimated for each property at forecasting horizons of 1 to 30 days in advance. In total, the errors were assessed for 35 properties across the 30 horizons, while varying the 17 different k values and five features of reservation inputs leading to 89,250 separate forecasting errors. The results were analyzed at an aggregate level to determine which k and features generally performed the best across all models. The results of the analysis indicated that the error was minimized when 14 days of history were used along with k = 10. That is, the error was minimized across properties when comparing the previous 14 days of reservations to identify the 10 most similar historical dates for prediction.
To demonstrate the algorithms comprehendible approach and usefulness consider the example in Figure 2. The figure displays the results of the model when k = 3 and feature weighting of 21 days, for a booking curve with a horizon of 7 days. The panels show the k similar instances for the forecasted booking curve. In the upper left panel, the first historical booking curve had a pickup of 40 rooms in the last 7 days. Subsequently, the other two similarly identified historical curves had pickup factors of 28 and 18. The average pickup across the k = 3 similar historical curves was 28.6. Adding this to the current OTB reservations of 69 produces a forecast of approximately 98 rooms. In the bottom right panel, the dotted red line shows the actual reservations of the forecasted booking curve leading to 96 rooms sold. k-NN Algorithm.
The k-NN estimation indicated that an OTBs history of 14 days produced the best performance across hotels. To compare model performance, the neural networks were estimated with the exact same inputs, to ensure both models were provided the same information. As stated earlier, the neural network architecture is specified by the user. While not the primary focus of the study, the neural networks were tested with both one and two hidden layers containing 5, 10, and 14 neurons (to match the inputs). The model employed the tangent hyperbolic activation function, which has been found to be preferable for MLP applications (Karlik and Olgac, 2011). To ensure the models reached a global solution, each model was estimated 100 times (using randomized starting points) with a maximum of 1000 iterations in each run. The iterations continue until there is no further improvement in mean squared error.
The neural network performance was similar across the six competing architectures. When considering that increasing the size of the network also increases the potential for overfit, and expanding the architecture provided no measurable gain in performance, the most simplistic model was chosen. That is, one hidden layer and five neurons were used for comparison to the k-NN model.
Error measures
As stated earlier, it is important to consider several error metrics when determining performance (Koupriouchina et al., 2014). The analysis considered five error metrics but due to the large number of hotel properties (varying in size), only one scale dependent metric Absolute Error (AE) was used. The remaining four metrics were scale independent, which included Absolute Percentage Error (APE), Symmetric Absolute Percentage Error (sAPE), Absolute Scaled Error (ASE), and LnQ proposed by Tofallis (2015). The calculation of each error is provided in equations [2-6]. In each equation,
Model comparison
The errors were calculated for both models over the hold out period. The results were statistically tested with the Wilcoxon Signed Rank Test and the Matched Pairs t test. The Wilcoxon Signed-Rank Test is non-parametric and ranks the positive and negative differences in error values between the two predictions. The null hypothesis states the median difference between the two error metrics is zero. The Matched Pairs t test is a parametric comparison that calculates the average difference between the two models for each forecasted day. The null hypothesis is that the mean difference between the two models is zero. In both tests, an insignificant result shows that the two models are comparable while significant differences indicate a higher performance for one of the models. The statistical tests were run for every horizon and property. In total, 1050 statistical comparisons were conducted.
Results
Model performance
Average error across properties and horizons.
The results of the Wilcoxon Signed Rank Test and Matched Pairs t test are shown in Figures 3 and 4. The figures compare the number of significant to non-significant results when comparing errors between models for every property and horizon. Each error value contained 1050 error comparisons with no difference found in approximately 70% of the comparisons (≈735 tests). The overall results suggest that the majority of comparisons revealed no difference in accuracy between the two approaches. Model performance: Wilcoxon Signed rank test. Model Performance: Matched Pairs t test.

Significant differences and booking window
The results were also reviewed by booking window to determine if one model had better performance at further horizons. Figures 5 and 6 show the number of significant and insignificant tests across all error metrics for every horizon. The charts show a relatively stable number of significant tests across all horizons. Therefore, the chart suggests that neither model has an advantage further from the date of stay. Model performance by booking window: Wilcoxon Signed rank test. Model Performance by Booking Window: Matched Pairs t test.

Significant differences
The percentage of occurrences when one model was statistically superior to another.
Size of significant differences measured by MAPE.
Discussion
The findings suggest that the k-NN and the ANN models perform similarly when utilizing the same information for prediction. Across a wide variety of error measures, significant differences were found less than 30% of the time. These differences were not due to the horizon but tended to occur uniformly over the booking window. When differences between the models were significant, the neural network performed better with absolute error (AE), absolute percentage error (APE), symmetric absolute percentage error (sAPE), and absolute scaled error (ASE). Only the LnQ metric indicated both models achieved a relatively equal number of significant results. Although the results suggest the neural network was statistically superior, the associated practical gain of choosing neural networks was negligible. When using MAPE, choosing neural networks over k-NN showed no difference more than 70% of the time, and the majority of significant differences show a marginal gain of less than 2%. In other words, although the neural networks outperformed k-NN statistically, the practical gain does not appear to be substantial. In other words, the models perform similarly based on the accuracy criteria leaving the methodology and usefulness criteria as the main considerations for model selection.
Managerial implications
From the methodological perspective, k-NN requires less sophistication with regards to training the models. Neural networks require an advanced knowledge of statistical methods that the industry generally lacks. For hotels that operate without a standard RM system, the ability and technology required to implement neural networks is often missing. On the other hand, the k-NN algorithm can be added to a database or developed in commonly used software such as Microsoft Excel. The user can create a formula for Euclidean distance that compares the forecast booking curve and a historical set of curves. Then the distance is sorted and k observations can be identified for prediction. Additionally, the k-NN algorithm requires less maintenance. Neural networks may require retraining as consumer behavior changes so the model can adapt to new patterns. On the other hand, the k-NN algorithm remains the same and these new patterns can be easily incorporated by adding the new observations to the historical database.
Neural networks do hold a few methodological advantages worth noting. The technique can add additional variables and parameters to the model with relative ease. Best practices suggest that both methods scale the parameters prior to running the algorithm (Shmueli et al., 2017: 174: 277), however training the k-NN across more features and weighting of these features may become tedious. Each variable must be scaled, and the weighting structure across these features must be tested. On the other hand, neural networks can easily incorporate new variables where even unimportant additions may be assigned no weight by the algorithm, thus removing the variables’ importance. Future research could explore adding other important variables to the k-NN model. Additionally, neural networks may be speedier because once the networks are estimated they only need to multiply parameters by inputs to generate the prediction; k-NN must identify nearest neighbors iteratively with historical comparisons. The methodological trade-offs must be considered on a case-by-case basis.
From a usefulness perspective the advantage of k-NN is clear. Neural networks remain the “black box” solution with little explanation regarding how the variables influence the prediction. Comparatively, k-NN can provide the most similar historical dates that generate its prediction. Many times, the revenue manager will make overrides to the system for better or worse (Koupriouchina, 2019, Schwartz et al., 2021). These overrides could be due to knowledge of market conditions that the system is unaware of, or a lack of trust with the system. In these scenarios, k-NN can provide a benefit by filtering the historical days to the most relevant information, saving the revenue manager time, and allowing them to dig deeper into what occurred on these dates. For instance, a revenue manager can investigate interesting questions such as, were similar events occurring in the past? What rate decisions were made leading up to the date of stay and what happened to revenue based on these actions? For example, if a revenue manager reviewed the five most similar days and found that revenue was less on dates when rates were increased compared to dates when rates were held, the revenue manager may choose to hold rates. In an ideal setting, this may agree with the systems recommendations, enhancing trust and comprehension.
Theoretical implications
From a theoretical perspective, it has long been suggested that revenue management systems and research must consider what data is important for forecasting (Weatherford and Kimes, 2003). While forecasting studies have tested a variety of different variables, this study sheds light on the power of the booking curve for prediction. While neural networks utilize all available data to train the network and optimize the prediction, k-NN shows the same level of accuracy with just a handful of reference dates. The results lend credence to the arguments of Tse and Poon (2015) that the booking curve is representative of all the macro and micro factors affecting demand in the market. The most similar booking curves appear to provide signals to the current booking environment and future reservation patterns. Therefore, the push for acquiring and incorporating more detailed data into RM forecasting models may not be a worthwhile endeavor. RM researchers should continue to explore and consider the incremental value of adding additional data sources. Furthermore, future research should continue to explore applications for forecasting with on-the-books data that may improve accuracy when incorporating reservation trends.
Conclusion
The study aimed to reduce the opaqueness of current revenue management algorithmic methods by exploring the potential of k-NN to replace or be used along the popular ANN approach to forecasting. While neural networks have been found to be highly predictive (Webb et al., 2020, Lee et al., 2020), the results of this study show that k-NN can provide similar accuracy performance with a “glass box” rather than a “black box.” When neither approach presents better accuracy, the forecaster must weigh other advantages of implementing the model. While all hotels exhibit different markets, demand, and competition that may dictate the best forecasting technique for each location, the stated advantages of k-NN deem it worthy for exploration and comparison during model development. The model provides numerous advantages that may provide a first step to uncovering the “black box” and further optimizing a hotels performance.
Most importantly, k-NN can explain its prediction with the identification of reference dates. As a benefit, RM systems could provide these dates for analysts to build confidence in the system and assist with decision-making. The k-NN approach is also more adaptable as consumer behavior changes because new curves that deviate from prior booking curves are simply added to the database. The primary downside to this algorithmic approach is that adding additional data points (outside the booking curve), may require significant testing to identify each variable’s relationship and weighting. Ultimately, the results show that k-NN can provide similar results to neural networks when receiving the same information but are simpler to implement and understand. This is particularly valuable to smaller hotels and companies who do not have advanced RM training and systems.
Limitations and future research
The study is not without limitations. The results of the test are dependent on the 35 hotels the RM firm provided and the error metrics used in the study. While the results of the error metrics were similar, additional metrics may indicate different results. Firms wishing to implement these algorithms should do significant testing to ensure that models perform accurately, and predictive accuracy is stable for their property. Future research should continue to explore the value of the k-NN algorithm for RM research. Several important questions remain such as identifying the optimal k value and the diminishing return of adding more k to the prediction. Furthermore, research may begin to incorporate additional features to the k-NN algorithm to see if accuracy can be improved, such as room rates or competitor prices.
Supplemental Material
Supplemental Material - Beyond accuracy: The advantages of the k-nearest neighbor algorithm for hotel revenue management forecasting
Supplemental Material for Beyond accuracy: The advantages of the k-nearest neighbor algorithm for hotel revenue management forecasting by Timothy Webb, Misuk Lee, Zvi Schwartz, and Ira Vouk in Tourism Economics
Footnotes
Acknowledgments
The authors would like to thank the third-party RM firm that found this project interesting and was graciously willing to allow us to investigate these models with their hotel companies.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Author biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
