Abstract
Within the most productive route, online travel agencies (OTAs) intend to use advanced digital media ads to expand their piece of the industry as a whole. The metasearch engine platforms are among the most consistently used digital media environments by OTAs. Most OTAs offer day by day deals in metasearch engine platforms that are paying per click for each hotel to get reservations. The administration of offering methodologies is critical along these lines to reduce costs and increase revenue for online travel agencies. In this study, we tried to predict both the number of impressions and the regular Click-Through-Rate (CTR) level of hotel advertising for each hotel and the daily sales amount. A significant commitment of our research is to use an extended dataset generated by integrating the most informative features implemented in various related studies as the rolling average for a different amount of day and shifted values for use in the proposed test stage for CTR, impression and sales prediction. The data is created in this study by one of Turkey’s largest OTA, and we are giving OTA’s a genuine application. The results at each prediction stage show that enriching the training data with the OTA-specific additional features, which are the most insightful and sliding window techniques, improves the prediction models ’ generalization capability, and tree-based boosting algorithms carry out the greatest results on this problem. Clustering the dataset according to its specifications also improves the results of the predictions.
Keywords
Introduction
The economic advantage of Web ads depends on whether the users click on the ad. Clicking the advertisement lets Internet companies recognize the most appropriate ads for each client and improve the customer experience. More precisely, one of the most important measurements utilized to determine the trading value of an ad is the click-through rate (CTR), which is the calculated sum of clicks divided by the sum of impression [1]. For search advertising, the CTR is used to rate advertisements and to calculate clicks [18]. The impression is a term which refers to the point where a visitor sees the ad once. Getting more CTR influences pay-per-click (PPC) performance as it legitimately prompts how much advertisers will pay for every click [15]. Pay-per-click advertisement is an auction-based system that usually appears to be the highest bidder in the most visible place in the ads.
Once the advertisement is clicked on, the advertising platform charges the bid amount for each click to the advertiser. Additionally, predicting each hotel’s daily sales have demonstrated to be a difficult errand in light of the elements and unpredictability of the booking procedure. Bookings are influenced by numerous elements, for example, regularity, bunch bookings, occasions, lodging types, and events in the hotel, likewise, offering accomplishment in a serious situation. Joining these variables are significant for the success level of the prediction.
In this study, it is aimed to predict both the number of impressions and the click-through rate measurements of hotel ads and booking revenue, which is the aggregate of hotel reservation amounts online travel agency (OTA). To get a reservation from these engines, OTAs give digital advertisements to meta-search advertising channels with cost-per-click and cost-per-acquisition model. In our model, we focused on cost-per-click advertisement model. Precisely predicting the number of clicks for each hotel will, therefore, become compellingly important for online travel agencies to adjust their daily, weekly and monthly advertising spending plans and build their income models.
In this paper, we also present a relative overview of the efficiency measurements of the most advanced level prediction algorithms available in Opensource libraries, such as Random Forest, Gradient Boosting, eXtreme Gradient Boosting, and Support Vector Regression. Similarly, the performance information provided by a performance report produced by day-to-day metasearch advertising platforms is enhanced by some public information, such as currency, meteorological outlook, public holiday data for each nation, and so on, which may be relevant to the advertising performance of the hotels.
Also, the clustering-based and regression-based approach are combined for improving CTR and Impression prediction results. In this method, clustering algorithms K-Means and Fuzzy C-Means are applied to data, then XGBoost, Gradient Boosting, and Random Forest algorithms are applied to each cluster and the overall results of the regression are evaluated. The results show that clustering the hotels and training them separately are improved the prediction results.
In the modelling section, some of the popular regression algorithms which are used in CTR, impression and sales prediction are described. Also, clustering algorithms which K-Means and Fuzzy C-Means are applied to data for clustering hotels according to their behaviour are described in the modelling section.
In the methodology section, applied techniques for data preprocessing and feature engineering steps before machine learning algorithms are applied are clearly explained. Also, features which are used in the modelling and regression algorithms’ performance results are briefly explained in this section. Distinguishing the hotels which have different behaviour from the dataset, clustering algorithms are utilized then regression models are applied to data with combining approach and this process’ results are shown in clustering-based and regression approach section.
Modelling
Models of regression are popular in machine learning and are used for predicting numerical target variables. There are many literature studies aimed at predicting digital advertisement impression count, click count and CTR level [14,15, 14,15]. We used support vector regression (SVR), random forest, extreme gradient boosting (XGBoost), AdaBoost, gradient boosting, and deep neural network in this research to predict numbers of interactions, CTR rates, and volumes of sales that were effectively implemented for many regression tasks. In this section, those algorithms are briefly explained.
SVR is an open-source machine learning algorithm and the regression variant of the support vector machine. SVR agrees that modelling of non-linear regression problems is very successful [4]. In any case, anomalies or noise happen unavoidably for different reasons in the dataset, for example, numerical issues, changes in framework conduct, incorrect estimations, and inspired distorting. To decrease their negative impact, a few anomalies or noise are expelled legitimately, and others are recognized by outlier detection methodologies. If these anomalies or noise in learning progress are not adequately detected, they may decrease SVR’s strength and performance. Along these lines, reducing the impact of anomalies or noise on the prediction stage is one of the primary objectives of SVR [29].
AdaBoost is also an open-source machine learning algorithm adapted as the first fruitful boosting algorithm. Though pro-positioned as an ensemble learning method for classification tasks, it was subsequently adapted to the regression tasks [16]. AdaBoost focuses on the observations misclassified by the former learner, expanding the observations of these perceptions; henceforth, changing the information distribution adaptively. As an exceptionally powerful ensemble learning technique, AdaBoost can get very high classification precision and is periodically less defenceless to the issue of over-fitting. Notwithstanding, AdaBoost includes the age of various base learners, representing a challenge to capacity resources [30].
Random forest is yet another machine learning algorithm depends on a combination of predictions from many decision trees. The idea behind those approaches to the ensemble is to build a strong solitary model on multiple frail models. There are many effective applications for various types of problems concerning machine learning [8,17, 8,17]. The model of random forest regression is an improved version of the CART methodology and may offer better results for prediction. The training phase of R.F. is to build different decision trees. Each tree in R.F. is developed with a randomized subset of indicators and thus is called ‘random’ forest. It is an ensemble method that combines all of the generated decision trees using a calculation called bagging or bootstrap aggregation. Bagging is a technique proposed by Breiman [31] and can be used with numerous regression strategies to reduce the prediction-related change, thus enhancing the prediction performance. Randomly selected features build R.F. for each decision tree or randomly selected samples build R.F. for each decision tree. Randomly gathered observation data process is called bootstrapping [32].
XGBoost is an as of late proposed machine learning algorithm that is an adaptable machine learning approach dependent on a boosting form. It is getting progressively well known because of its prevalence over many machine learning algorithms in a few machine learning quests [2]. For example, in [12], XGBoost is increasingly successful in predicting a bicycle station’s hourly requests over cutting-edge strategies. The most important factor for the prominence of XGBoost and its prosperity is its adaptability to all tasks of machine learning. The architecture runs much faster than current well-known regression algorithms on a solitary server and it responds to settings that are circulated or limited by memory [7]. XGBoost’s scalability is attributed to a few major methodologies and advances in algorithms [9,3,12, 9,3,12]. XGBoost is an improved GBDT (Gradient Boosting Decision Trees) algorithm that includes numerous decision trees and is used regularly in the classification and regression field. In any case, XGBoost varies in some respects from GBDT. To begin with, the GBDT algorithm only uses the Taylor development of the first order, while XGBoost includes a Taylor extension of the loss function for the second request. Second, normalization is used in the objective function to forestall over-fitting and decrease the complexity of the model [33].
To advance the hyperparameters of all machine learning algorithms utilized in this research, we used grid searching. Sixty per cent of the examples are used in this study for training, twenty per cent for validation and the remaining twenty per cent for testing. The data was shuffled during the splitting process, and use was made of the sci-kit library’s data split module.
We also used an alternative type of non-parametric machine learning algorithms to allow the process of enriching data to be involved in predicting hotel sales. One such technique is the tree-based algorithms that combine numerous vulnerable learners to obtain a generalizable lone model. Extreme gradient boosting (XGBoost) [6] is a machine learning algorithm, which has since become famous among data scientists because of its notoriety in many machine learning competitions [11,13,19, 11,13,19]. XGBoost [6] contains additional regularization parameters that control the size and shape of the trees, making predictions stronger and better suited to the algorithm. Ultimately, random forest, gradient boosting, and extreme gradient boosting (XGBoost) algorithms from tree-based algorithms were selected to be implemented in our research, as high accuracy appeared to be achieved on various regression problems [5,9,10, 5,9,10].
The hotels in the dataset have different behaviours in terms of some performance metrics. Some of them have both impressions, click and sales information according to performance report data, while some of them don’t even get impressions. Because of this difference, hotels are clustered according to their features in the observations in the dataset. For the clustering process, K-Means and Fuzzy C-Means algorithms are selected.
K-Means is a clustering algorithm that is a powerful and traditional method in clustering. The purpose of K-Means is minimizing the sum of squared error in a given dimensional space with selected k initial seeds. In terms of its working principles and being easy to interpret, K-Means is a clustering algorithm which is widely used clustering algorithm [20,21,22,23, 20,21,22,23].
Fuzzy C-Means is also a clustering algorithm that proposes a membership function for each variable and it calculates the association level of each variable and each cluster. So, this method allows multiple memberships for each variable [24]. Fuzzy C-Means algorithm is defined by Dunn and developed by Bezdek [24,25, 24,25]. This method is widely used in several studies and several wide varieties of substantive areas.
For performance evaluation in our CTR, impression and sales prediction, we used some performance metrics which are popularly used in regression prediction in the literature. These metrics are Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and R-Squared (R2) value. Explanation of these metrics is given in below.
MAE: The MAE is defined by Equation 1. MAE averages the absolute differences between predicted and actual values. The smaller amount of MAE means that prediction is more accurate.
In this equation, n denotes the number of samples, then it is calculated by the sum of absolute values of differences between actual and predicted values.
RMSE: This metric is very similar to MAE. It is also used to measure the difference between predicted and actual values. The difference between MAE and RMSE is because of RMSE gets square of error, it punishes more the larger differences in predicted and actual values. RMSE is defined by Equation 2.
R2: This metric is another measurement of how intently predicted and actual values match each other. R2 is a score between 0 and 1. When this score is getting closer to 1, it means that prediction accuracy is getting better. It is calculated by subtracting division of explained variation to total variation from 1. R2 is defined by Equation 3.
Regression-Based Approach
In our click prediction stage; Hotel impression (shortly will be alluded to as impression) is the number of impressions got for a hotel. When the client sees the hotel, it is listed on the search result page of a meta-search engine for a hotel. It is a major predictor of a hotel’s reputation and can be used to measure a specific hotel’s traffic capability. This number is synonymous with the hotel’s marketing potential and reputation. The meanings of the significant measurements utilized in this paper given underneath:
Click: The number of clicks the meta-search advertising platform checks.
Click-Through-Rate (CTR): This ratio shows the sum of total clicks divided by total impressions sum.
CPC: The OTA will pay the amount of money for each click to the meta-search engine for the respective date.
Cost: It is the product of the total number of clicks received by the hotel for the date and cost-per-click value corresponding.
Initially, data cleaning techniques were administered to data. The features that are hotel URL, hotel name, etc. that could not be used for machine learning algorithms have been wiped out of the data for this reason. Also, duplicate rows that ought not to be on the data were wiped out, like more than one row, which is about a hotel from that day. Then, steps to enrich the data were applied to the dataset. In the data enrichment step; certain features have been added to the dataset with shifted and rolling average. Similarly, as indicated by their location, the hotels were sorted as city or summer hotels.
We created a new feature named “hotel_type” to speak with this category of the hotel to data such as a resort or city hotel. Given the importance of coming public holidays in predicting the potential number of clicks and reservations, the holiday period and how many days up to the start of the holiday are implemented as new features in the dataset. The price of the hotel in the corresponding date and position information (position number of the OTA advertisement) is also added as new features for each hotel in the meta-search bidding engine, which additionally incorporates the prices and positioning information of the closest competitors of the OTA, which are so vital to the competitiveness of the OTA sources.
Missing values in the dataset were filled in after the enrichment process of the data. The OTA performance report included a few missing values which can be filled in by using some of the statistical methods suggested in the literature. For example, if the value of the “click” variable is 0, and the cost information for this hotel is missing, the cost is filled with 0 because it is accepted that in such a situation the hotel in question did not take any clicks on the corresponding date if the click is 0, the marketing cost for that hotel will be 0. Missing values in hotel-related characteristics, such as stars, the hotel’s rating is filled with the column average. The categorical values that speak to a hotel property (e.g., the city information of the hotel) are filled with the most frequent data point in that hotel area. Ordinal categorical variables, which are string values like the booking value index, are converted to numeric values 1 through 5.
In our issue, the OTA will provide the metasearch engine performance reports which have the most important features for our model on the next calendar day. The average values of some of the features are shifted and rolled average as 3,7 and 30-day values applied to the training set to address this problem and also to use the relevant sequential features in the predictive function. Additionally, the data is added to the training set on the day of the week, as it tends to be a significant click amount pointer for explicit hotels. Furthermore, the value of the bid, click and profit for each hotel are added to the dataset; both the last values from the previous day and the values from the same weekday from a week earlier. The hotel’s prices over the last ten days are also added as separate columns to catch the changing pricing patterns. The data set that was used in prediction consisting of nearly 220 features and more than 800,000 samples was obtained as a result of the data preprocessing steps mentioned above. The modelling features are given in Table 1.
Base Features used in CTR and Impression Prediction
Base Features used in CTR and Impression Prediction
Tables 2, 3 and 4 show prediction results obtained by taking care of all the features as input to the machine learning algorithms. The results show that for both CTR and impression prediction, XGBoost performs broadly superior to any other machine learning algorithm. The most noteworthy R-Squared value acquired when predicting CTR and impression values based on individual hotels is 0.61 and 0.84, separately, both accomplished by XGBoost. After XGBoost the other two tree-based algorithms, Random Forest and Gradient Boosting are then placed. The results show that SVR and AdaBoost don’t produce generalizable models on this task. In the click the prediction stage; the most noteworthy R-Squared value of 0.81 is also gotten with the XGBoost algorithm. The results showed that impression prediction is more successful than CTR prediction.
Algorithms’ Results for CTR Prediction
aRoot Mean Square Error. bMean Absolute Error.
Algorithms’ Results for Impression Prediction
aRoot Mean Square Error. bMean Absolute Error.
Algorithms’ Results for Click Prediction by predicted Impression multiplied by predicted CTR
aRoot Mean Square Error. bMean Absolute Error.
Sum Success ratio which indicates the ratio for the sum of actual clicks divided by the sum of predicted clicks were also added as a performance metric for each algorithm. This ratio is also another important metrics for algorithms’ performance. Because it also represents nearly the sum of the marketing cost for the next day. The tree-based ensemble methods are seen to offer comparable results for this problem, which overall exceeds ninety-five per cent.
Revenue forecasting with the marketing cost is crucial for top management for companies. So, sales prediction is also important for OTAs. In our sales prediction, predicted click amounts which were from the previous study. Because the sales amount is positively correlated with marketing cost. So, for the sales prediction, with the sliding window method, we tried to predict the next day’s sales number with predicted click values. For this purpose, additional features provided by OTA performance report are generated as moving averages and shifted values from specific features. These features include moving averages and 3, 7, 15, 30 and 45-day standard deviations of the original features. Also, the day-to-day sliding windows approach has been updated for only the features that are correlated to the target variable. The most correlated features were the slided values of the target variable, which were features of the sliding windows of the previous days. Analyzing the correlation showed that the total number of nights and rooms was directly related to sales. Such features as sliding features of the information from past days have been applied to the dataset.
Likewise, additional features with daily moving feature averages of their ten days were added to the dataset. Some of these features are the cost of opportunity per click, clicks, average booking value, bid, cost, gross revenue, the number of people who stayed at the hotel, respectively. In addition, the moving sums of those values from the previous 3, 7, 15, 30, 45 days are included in the dataset. Also, we were able to get the number of bookings that have the breakdown of completed, cancelled, and pre-booking details from the booking data. These numbers have been added to data in previous periods with their moving averages and standard deviations. But, because of the reservation data depends on the target variable, actual values of reservation data is not added to dataset directly. These data were added to dataset as previous days’ moving averages and standard deviations of previous days.
After all of these steps, one-hot encoding method was applied to dataset for categorical variables. Then, some of the features which are related to the next day’s target variable were dropped from the dataset because of avoiding the bias-variance trade-off.
The dataset which was obtained after all of these data preprocessing and feature enrichment steps contains nearly 375000 rows and 315 features which belong to the dates which are used in CTR and impression prediction dataset. The additional features which are used in the sales prediction but not in the CTR and click prediction are given in Table 5.
The additional features which are used in the sales prediction
In the final step, eXtreme Gradient Boosting, Gradient Boosting Machines, Random Forest, Generalized Linear Models and Deep Neural Network was applied to the dataset which is mentioned above. Sum of success criteria of these machine learning algorithms is given in Table 6.
Algorithms’ Results for Sales Prediction
The hotels in the dataset have different behaviours from each other and also, some of the hotels which are in the dataset are affected by seasonality.
So, instead of keeping these hotels in the same dataset, hotels are clustered and it was aimed to improve performance metrics. Due to this reason, K-Means and Fuzzy C-Means clustering algorithms were applied to the dataset. The dataset consists of nearly ten thousand unique hotels’ data which are formed from CTR, impression and sales prediction’s dataset. For setting k initial seed in the clustering, Silhouette scores for k values which are 3, 5, 7 and 9 were evaluated. In each method, k values are set to five, which has greater Silhouette score than other k initial seeds and for successful clustering validity index values were analyzed. The number of sets for clusters is given in Table 7.
Number of Sets for Clusters
Number of Sets for Clusters
In K-Means clustering, Silhouette coefficient is a widely used metric for validation criteria. Silhouette width presents the difference between the intra clusters tightness and separation from the inter-clusters [26,27, 26,27]. In Fuzzy C-Means clustering, Fuzzy Partition Coefficient (FPC) is a popular index which is used for measuring the quality level of clustering [28]. FPC score changes between 0 and 1 and the maximum value indicating that the best clustering quality.
Silhouette score of K-Means clustering was nearly 0.9 and it indicated that clustering of the hotels was successful. On the other hand, the FPC score for Fuzzy C-Means was 0.96 and also it indicated that this clustering process was successful as well. Due to the working principles of Fuzzy C-Means clustering algorithm, the 0.25 threshold point was set after calculating the relationship level of each hotel with the relevant cluster. So, a hotel can be a member of more than one cluster according to its relationship level.
In the second stage of this approach, each algorithm and hyper-parameters were tested for each cluster in both clustering algorithms’ result set with XGBoost, Random Forest and Gradient Boosting algorithm which are the most three successful algorithms in impression and CTR prediction. The results of impression and CTR prediction in each cluster and overall performance of full dataset are given in Table 8, Tables 9 and 10.
Comparison of Clustering Algorithms for CTR Prediction
Clustering Algorithms and Regression Algorithms’ Results for Impression Prediction
Clustering Algorithms and Regression Algorithms’ Results for by predicted Impression multiplied by predicted CTR
The results showed that the clustering-based approach is more costly in prediction but performance metrics for prediction are improved. Also, Fuzzy C-Means clustering algorithm is slightly better than the K-Means clustering algorithm. For this reason, we decided in the production stage to use both Fuzzy C-Means clustering and XGBoost algorithm.
In this study, we intended to predict the number of clicks every hotel will get in the meta-search advertisement platform for the following day using historical data for the click prediction. To this end, we first used numerous data preprocessing techniques and arranged the dataset containing the moving average and standard deviations of the original features, then applied some of the feature selection methods proposed in the literature to reduce the feature dimension and used the final data set selected in the feature selection process and we gave a set of machine learning algorithms as input to this dataset. The primary commitment of this paper is acquiring the predicted clicks depends on the prediction of the CTR and the values of hotel impression. Predicted click amount of each hotel for the next day was obtained by multiplying predicted CTR value and predicted impression value for the corresponding hotel.
The results show that 0.81 was the highest R-Squared value acquired by multiplying CTR and impression, which eXtreme Gradient Boosting obtains. The other criterion of success, which can be considered a total success, depends on the contrast between the actual amount of click divided by the predicted amount of click. This value is an indicator of the total marketing costs for a particular day concerned. We achieved a 95 per cent Sum-Success criterion, showing the viability of the features wiped out from the initial dataset.
For improving these results, hotels were clustered via K-Means and Fuzzy C-Means clustering algorithms and results were determined. In Fuzzy C-Means clustering algorithm, 0.96 FPC score was obtained which means hotels were clustered successfully. Fuzzy C-Means clustering algorithm improved R2 value of multiplying CTR and impression from 0.81 to 0.86 and sum-success validation criteria from 0.95 to 0.98.
Fuzzy approach prevented that decreasing hotel numbers’ in the clusters so, more training data for clusters are used in training than K-Means clustering. This approach shows us if the data is limited after clustering operation, Fuzzy C-Means can be used because of its membership functionality.
For the sales prediction, we also generated a dataset which contains performance metrics based on the metasearch engine performance report and moving averages and standard deviations of some of the performance metrics. Also, predicted click values which were obtained from CTR and impression prediction. These predictions were from our previous prediction stage. Different predictive algorithms including gradient boosting, XGboost, random forest, generalized linear model, and deep neural network were applied after the data preprocessing and feature enrichment stage to predict the amount of sales for the corresponding hotel next day. The results showed that the predicted amount of clicks was crucial in predicting the amount of sales. As a nature of this problem, click amount is strongly correlated with marketing cost and marketing cost is strongly correlated with the sales amount. The results also showed that adding target variable related features to the dataset as moving averages and standard deviations of these features improved the model success. For evaluating the success level of machine learning algorithms, we can say that tree-based algorithms performed better than the other algorithms. We should also say that XGboost is slightly better than Gradient Boosting, but according to generalized linear models and deep neural network, it is ahead.
