Abstract
The pickup and delivery problem with time windows is an NP-hard discrete optimization problem with two objectives—to minimize the fleet serving transportation requests, and to minimize the distance traveled during this service. Although there exist exact algorithms for tackling this problem, they are still difficult to apply in massively large practical scheduling scenarios due to their time complexities. Hence, the approximate methods became the main stream of research in this field. In this paper, we propose an adaptive guided ejection search algorithm for solving the pickup and delivery with time windows. The pivotal part of this technique is the pre-processing step, in which the instance characteristics concerning its underlying structure are extracted in the clustering and histogram-based analyses. Then, the k-nearest neighbor algorithm is applied to classify the instance to an appropriate class. Finally, the most suitable variant of our enhanced guided ejection search algorithm is adaptively chosen for solving this instance based on the classification outcome. An extensive experimental study performed on the full Li and Lim’s benchmark (encompassing 354 problem instances belonging to 6 classes) revealed that our pre-processing allows for achieving very high classification accuracy, thus for selecting the best variant of the enhanced guided ejection search.
Introduction
Route scheduling is an important problem due to its wide practical applicability. The feasibility of routing schedules is affected by numerous factors, including the time slots in which customers should be served, the maximum truck capacities, the precedence constraints imposed on the order of visiting travel points, potential dynamic changes of the route network, and many others [1]. There exist a plethora of different variants of routing problems which reflect these scheduling circumstances. In the multiple traveling salesman problem (mTSP), multiple salesmen can serve customers. In the capacitated vehicle routing problem (CVRP), the customers define their non-negative demands which need to be satisfied using vehicles of a given maximum capacity. The vehicle routing problem with time windows (VRPTW) defines the time slots, within which the service of each customer should be started (however, it may be finished after closing this time window). Other practical VRP variants encompass heterogeneous fleet VRP, the VRP with stochastic demands, the periodic VRP, and more [2].
Solving the pickup and delivery problem with time windows (PDPTW) consists in finding a feasible schedule for serving a set of transportation requests (combinations of the pickup and delivery requests) using homogeneous trucks [3]. In each request (which must be served within exactly one route), goods are picked up from one location, and delivered to another one. These travel points are scattered around the map and must be visited within the corresponding time windows (the service may finish after closing this time slot). The pickup location must be visited before the delivery one by the serving truck, and the capacity of all vehicles must not be exceeded. All vehicles start and finish their routes in a single depot which also specifies its own time window.
The PDPTW is a hierarchical-objective optimization problem [4]: the main objective is to minimize the fleet size, whereas the second one is to optimize the travel distance. Due to numerous applications of the VRPs (e.g., rail distribution, food delivery or industrial waste collection), they attracted research attention [5]. Since the problem is NP-hard, exact approaches are applicable for relatively small-scale scenarios because of their unacceptable execution times [3].
Approximate algorithms, which do not ensure retrieving exact solutions but are capable of obtaining nearly-optimal schedules in short time, are intensively being developed to tackle the PDPTW. They include genetic algorithms, various neighborhood searches, simulated annealing, and many others [6]. Guided ejection searches (GESes) were proven very efficient (and outperforming the state of the art) in this task [7]. In our very recent work [8], we proposed several improvements and showed that they significantly affect the final routing schedules retrieved using the GES, as well as its behavior and capabilities. Although this enhanced GES was able to elaborate very high-quality solutions, it was unclear how to select an appropriate algorithm variant which suits the instance to be solved as best as possible. This issue is tackled in this work.
The paper is organized as follows. Section 1.1 gives the contribution of this paper. The PDPTW is formulated in Section 2. Section 3 reviews the current state of the art in solving the PDPTW. In Section 4, we present our adaptive guided ejection search algorithm for tackling the PDPTW, with an emphasis put on the pre-processing procedure which is the main contribution of this work. Section 5 discusses the experimental results. The paper is concluded in Section 6, which also gives an outlook to the future work.
Contribution
As already mentioned, selecting an appropriate GES variant for solving a given PDPTW instance is a challenging and demanding task. In [8], we showed that applying certain algorithm components for some classes of problems (e.g., encompassing clustered customers) is very beneficial and helps boost the convergence capabilities of the algorithm (the differences were shown to be statistically important). However, the characteristics of a test case are not known in advance and should be carefully extracted before the execution.
This paper extends our previous work presented at the Asian Conference on Intelligent Information and Database Systems (ACIIDS 2016) [8], held in Da Nang, Vietnam (on March 14-16, 2016). Here, we propose the pre-processing procedure aimed at extracting discriminative features of test instances, and classifying them into appropriate classes. In general, six classes of PDPTW tests are distinguished. They include problems with:
The pre-processing encompasses the analysis of the customer locations, and the expected minimum numbers of trucks serving the requests. First, we calculate 27 various clustering validity indices (using four different aggregation methods), and average their responses for determining the number of clusters which is the most appropriate for the underlying instance structure. This procedure is complemented with the affinity propagation clustering [9]—it is used to determine the number of clusters independently. To fully exploit the information about the geometrical structure of the instance, we perform the histogram-based analysis (at various scales), and quantify the two-dimensional histograms of customer locations. Finally, we investigate the theoretically minimum numbers of trucks which have to be utilized in a solution to serve all requests, without exceeding the truck capacities.
All of the aforementioned extracted features are combined into the feature vectors, which are classified using the k-nearest neighbor algorithm. An extensive experimental study, performed using the full Li and Lim’s benchmark set containing 354 test instances, revealed that our pre-processing step allows for achieving very high classification accuracy. Therefore, it allows for retrieving the best-fitted guided ejection search variant before its execution—the algorithm is adaptively tailored for the problem at hand.
Problem formulation
The PDPTW is a problem of serving N requests by K vehicles. More formally, the PDPTW is defined on a directed graph G = (V, E), with a set V of C + 1 vertices. The vertices v i , i ∈ {1, …, C}, represent the customers, whereas v0 denotes a single depot. The edges E = {(v i , vi+1) |v i , vi+1 ∈ V, v i ≠ vi+1} represent the travel connections. The travel costs ci,j, i, j ∈ {0, 1, …, C}, i ≠ j, are equal to the distances (in the Euclidean metric) between the travel points.
Each request h i , i ∈ {0, 1, …, N}, where N = C/2, is a pair of the pickup (origin, P) and delivery (destination, D) operations (also referred to as the jobs [3]), where P∩ D = ∅, and P ∪ D = V \ {v0}. The pickup occurs before the corresponding delivery, and both must be served in the same route. For each request h i , the amounts of delivered (q d (h i )) and picked up (q p (h i )) goods are defined, where q d (h i ) = - q p (h i ). A customer v i defines its own demand (either delivery or pickup), its service time s i (where s0 = 0), and time window [e i , l i ]. The service of each customer should be started within its time window. The fleet is homogenous—the capacity of each truck is Q.
The PDPTW solution σ is a set of routes. Each route r is given as , and it starts and finishes in the depot (thus v0 = vn+1).
Objectives
The PDPTW is a two-objective discrete optimization problem. The primary objective is to minimize the fleet size K, K ≥ Kmin (, and is the total delivery demands).
The second objective is to minimize the total distance (T) traveled by all the vehicles:
Let σ A and σ B be two feasible PDPTW solutions. Then, σ A is of a higher quality than σ B , if (K (σ A ) < K (σ B )) or (K (σ A ) = K (σ B ) and T (σ A ) < T (σ B )).
The PDPTW constraints may be formally expressed using the following equations:
State-of-the-art techniques for tackling the PDPTW (and numerous of its variants addressing a plethora of real-life scheduling cases [10]) are divided into exact and approximate approaches. The former algorithms deliver the exact solutions, whereas the latter are aimed at obtaining very high-quality (quite often nearly-optimal) solutions in acceptable time. The exact algorithms were devised for solving relatively small problem instances (up to 30 requests [11]) due to their computation time. Albeit they are still being actively developed, this execution time becomes their bottleneck and it is still difficult to apply them for tackling massively large real-life scheduling scenarios. The exact techniques encompass dynamic programming and column generation methods, branch-and-cut, branch-and-price solvers, and more [12, 13]. The set-partitioning-like integer formulation of the problem was presented by Baldacci et al. [14]—two dual ascent heuristics were combined with the cut-and-column generation to find a dual solution of the linear programming relaxation of this formulation. The exact algorithms were summarized in several surveys [15, 16].
Approximate algorithms for minimizing the number of trucks in the PDPTW include construction and improvement heuristics. The construction (insertion-based) heuristics create solutions from scratch by inserting consecutive requests iteratively into the partial solution (in which a subset of requests is already handled) according to certain criteria, e.g., the maximum cost savings, the minimum additional travel distance, or the cost of reducing the time window slacks [17, 18]. These insertions should not violate the solution feasibility, therefore all of the constraints must be satisfied after inserting each request into the partial solution.
Improvement heuristics modify a low-quality solution by executing refinement procedures in search for better neighboring solutions. The metaheuristics usually embed construction and improvement heuristics [19]. Temporarily infeasible solutions, along with the quality deterioration during the optimization are very often allowed in metaheuristic algorithms [6].
Due to the hierarchical objective of the PDPTW, most of the state-of-the-art heuristic algorithms incorporate a two-stage approach—the number of routes is minimized in the first stage, whereas the travel distance is optimized afterwards [7]. Therefore, efficient techniques for both stages may be designed independently, perhaps without affecting each other. Heuristic algorithms comprise tabu [11] and neighborhood searches [20], agent-based approaches [4], guided ejection searches (GESes), enhanced in our very recent work [8], evolutionary algorithms [21, 22], and many more [7]. Parallel algorithms were intensively explored for solving rich routing problems [23], including the PDPTW [24, 25]. A thorough survey on approximate approaches for the PDPTW was presented by Parragh et al. [6, 26].
Selecting appropriate parameter values of a given algorithm became a challenging task, especially in the case of approaches in which this selection can easily jeopardize the optimization process [5, 28]. The algorithm parameters can be either retrieved before the execution (in a time-consuming tuning process), or adaptively updated on the fly (using adaptation schemes). However, if certain algorithm components should be applied or not for a given problem instance, then the tuning process appears very difficult, since these components are most likely not independent from each other. Extracting the characteristics of a problem which is to be solved is therefore extremely helpful, if the algorithm behavior is known for certain classes of instances. In this paper, we address this issue and propose a pre-processing procedure which exploits the characteristics of PDPTW instances (importantly, this pre-processing does not have any parameters itself). This allows for an adaptive selection of the best-fitted variant of our enhanced GES [8] for solving the problem at hand as efficiently as possible.
Adaptive guided ejection search
Pre-processing
The pre-processing step aims at determining the appropriate PDPTW problem instance class. In this procedure, we build the feature vectors at first, and then apply the k-nearest neighbor (k-NN) algorithm [29] for classifying the corresponding instance to one class, out of six classes—they are summarized in Table 1 (the corresponding characteristics have been averaged across all problem sizes). The truck capacities are much larger, and the time windows are notably wider for the second-type tests (c2, r2, and rc2) compared with the first-type ones (c1, r1, rc1). Intuitively, a smaller numbers of trucks should be necessary to feasibly serve jobs in the second-type problems (note that the service times are the same for both types of instances of the same structure—see c1 compared with c2, r1 with r2, and rc1 with rc2). Hence, different real-life scenarios are reflected in this dataset. The following subsections discuss in detail the process of extracting features, and classifying PDPTW instances.
Extraction of features
Extracting an appropriate (i.e., discriminative) set of features is a crucial task in all classification problems. Since this step affects the classification performance, it should be executed with care.
In order to exploit the underlying information concerning the structure of the PDPTW instance (being a set of travel points scattered in the two-dimensional space), we incorporate various techniques to determine the number of clusters which fits best to the characteristics of this instance (i.e., the most “appropriate” number of clusters for this dataset). In all of these methods, we utilize the Euclidean metric to find the distance between two travel points (see Section 2). The following aggregation (linkage) techniques are exploited:
It is worth mentioning that we utilize both hierarchical and non-hierarchical (k-means) clustering.
To find the number of clusters “hidden” in the PDPTW instance, we exploit 27 clustering validity indices (Table 2). The indices combine the information about the inter-cluster isolation along with the information on the intra-cluster compactness in search for the best number of clusters fitted to the underlying structure of the dataset. All of these indices are calculated for all of the aforementioned aggregation techniques. For each aggregation, we calculate the average value of the clusters () retrieved using these indices:
The analysis of the validity indices is coupled with the clustering by passing messages between data points [9], which was shown to be very efficient (and extremely fast) in clustering of various types of data, especially in computer vision and computational biology. In this technique, all data points are considered as potential cluster centers (they are called exemplars). The messages are then transferred between the data points—the magnitude of each message reflects the current affinity that one data point has for selecting another point as its exemplar (this method is often referred to as the affinity propagation). The messages (being either the responsibility messages r(i,j), sent from the i-th to the j-th point, and showing how suitable is the j-th point to serve as the exemplar for the i-th point, or the availability messages a(i,j), sent from the j-th to the i-th point, reflecting how suitable is the j-th point to serve as the exemplar for the i-th point, considering other points for which the j-th point should serve as the exemplar) are exchanged between the exemplars until a high-quality set of clusters is found. The availabilities and responsibilities are finally combined to identify the most appropriate exemplars (clusters). The
The clustering-based analysis of the instance is complemented with the histogram-based investigation of the customer locations. We create two-dimensional histograms which show the number of travel locations in various areas of the map (it is divided into n × n cells, and the number of points in each cell is counted). These histograms are extracted at various scales—the number of bins varies, and , where C is the number of travel points. Hence—theoretically—each travel point could fall into a separate bin if they were scattered evenly. Finally, we create the histogram counting the bins of each size in these two-dimensional histograms (for each scale). This histogram is quantified using several measures—skewness (γ1), kurtosis (γ2), standard deviation (SD), and entropy (h):
Skewness is a measure of asymmetry of a histogram—the negative skewness means that its left tail is longer than the right one (it is opposite if skewness is positive). Kurtosis reflects the “peakedness” of a histogram—the positive kurtosis shows that the histogram is more peaked than the Gaussian distribution (it is flatter otherwise). Standard deviation quantifies the variation from the mean value (the higher values, the larger variations). Finally, entropy may be considered as a measure of “uncertainty” in the data distribution [53]. Also, the minimum (bmin), mean (), and maximum (bmax) values in the histogram are utilized. All of
An exemplary process of extracting these measures (for the instance visualized in Fig. 1) is rendered in Table 3. This analysis is performed for six scales (n = {5, 10, 15, 20, 25, 30}). The two-dimensional histograms are given in Table 3(a), whereas Table 3(b) shows the histograms quantifying the number of bins encompassing a given number of travel points in the corresponding two-dimensional histograms. The extracted measures are presented in Table 4. The averaged values would be appended to the feature vector representing this PDPTW instance. It is worth mentioning that this process may be considered as the analysis of texture of the two-dimensional histograms [53].
As mentioned in Section 2, the transportation requests are served using homogeneous vehicles in the PDPTW. Although the capacity Q of these vehicles is known in advance, it is unclear if this capacity can be considered “large” or “small”. To verify if the minimum number of trucks which can feasibly serve the requests correspond to the number of clusters found in this instance, we calculate the following ratio:
Therefore, each feature vector contains 13 entries: 4 average numbers of clusters (the analysis of the clustering validity indices), 1 number of clusters extracted using the affinity propagation technique, 7 characteristics averaged for various scales in the two-dimensional histogram based analysis, 1 ratio quantifying how “large” is the maximum truck capacity.
In this work, we utilize the k-NN algorithm for classifying the incoming instance to be solved to one (out of six) class (see Table 1). The nearest-neighbor decision rule assigns the appropriate class to the new (unseen) feature vector based on the classification outcome of k already-classified vectors nearest (in the Euclidean distance) to the incoming one. If k is odd, then the classification is decided by the majority vote. On the other hand, if k is even and the voting process leads to the tie, then the class may be either assigned randomly (for the tied classes), or selected based on quantifying the “importance” of separate votes.
Although the k-NN algorithm is a relatively basic classification engine, it was shown to be effective in many pattern recognition tasks [53]. Also, it is easily applicable to multi-class classification problems.
1: σ
B
← initial solution; finished ←
2:
3: Initialize EP with requests from random r;
4: p[i] ←1; (i = 1, 2, …, N); σ I ← σ;
5:
6: Select and remove request h in from EP;
7:
8: σ ← ;
9:
10:
11:
12: p [h in ] ← p [h in ] +1;
13:
14: Generate ;
15:
16: σ← ;
17: Add {} to EP;
18:
19:
20:
21:
22: Perturb σ; c ← c + 1;
23: Verify termination conditions;
24: Update finished and stuck;
25:
26:
27:
28:
29:
30:
Enhanced guided ejection search
Our enhanced guided ejection search is an improvement heuristics which boosts the quality of the initial (feasible—all constraints discussed in Section 2.2 are satisfied) PDPTW schedule, in which every request is served in a separate route (Algorithm 1). At each step, a random route r is drawn, its requests are inserted into the ejection pool (EP) (line 3), and the attempts to re-insert them back into the partial solution are undertaken to minimize K (see Section 2.1). The penalty counters p indicate the re-insertion difficulty of the corresponding request (the higher p value, the more difficult is to include the request back in the solution). A request h in is popped from the EP (line 6). If there are several feasible positions to insert h in into σ, then a random one is chosen (line 8)— is the set of feasible insertions of h in . This request is otherwise inserted so as it violates the constraints (see Section 2.2), and the solution is squeezed to restore its feasibility (line 9)—this is the local-search based procedure [8]. If it fails, then its p is increased (line 12), and at most k m requests (with minimal sum of their p’s) are ejected to insert h in feasibly (lines 13–20). Finally, σ is perturbed with local moves (out-relocate and out-exchange) to diversify the search (line 22). This process continues until the maximum time has elapsed, or the minimum theoretical number of routes (Kmin) has been retrieved. For more details on our GES,see [8].
In our recent paper [8], we proposed three significant improvements to the GES which clearly affected the quality of the retrieved PDPTW routing schedules: Squeezing of infeasible solutions (S), Randomizing the EP (R), Imposing the maximum iteration limit on the analysis of “difficult” requests (M).
All of these enhancements (highlighted in Algorithm 1) notably affected the GES capabilities. We showed that they should be applied for different classes of instances (the guidelines are presented in Table 5—the letters in the acronym correspond to the enhancements, which are switched on or off). The selection of an appropriate GES variant (adapting its important components) before the execution was unclear—this issue is tackled by the adaptation procedure.
Adaptation
In the adaptation process, which is performed before the GES run, we select the appropriate GES variant based on the PDPTW instance class. First, we classify the incoming instance into one (out of six) classes (see Section 4.1). Then, Table 5 is used to retrieve the best GES variant to tackle the problem at hand. This variant does not change during the execution.
Experimental validation
The pre-processing procedure was implemented in
To verify the classification performance, we conducted 10-fold cross-validation on the full Li and Lim’s benchmark set of 354 problem instances belonging to 6 classes (see Table 1). In Fig. 2, we visualize some exemplary PDPTW tests from this benchmark—it is worth noting that assigning a proper class is certainly not obvious (see e.g., the clustered test for 400 customers). In each class, there are problems with 100, 200, 400, 600, 800, and 1000 customers (the numbers of different-size problems are roughly the same) 1 .
Extraction of histogram-based features is extremely fast, and took less than 1 second for all PDPTW instances (of all sizes). Similarly, the affinity propagation time is negligible compared with the time of the clustering validity indices analysis. The latter feature extraction approach was the most time-consuming, and required less than 3 seconds for 100-customer tests, up to approximately 300 seconds for 1000-customer instances containing clustered travel points. However, the pre-processing is executed once for each PDPTW instance (before the GES execution), hence this initial analysis time is acceptable even for large instances.
Table 6 reports the classification accuracy (in %) along with the standard error of the k-NN algorithm obtained for various k’s. To verify how the extracted features affect the scores, we investigated four types of feature vectors, encompassing: (a) 5 clustering-based features (the Clustering variant), (b) 7 histogram-based features (Histogram), (c) both clustering and histogram-based ones (Clustering+histogram), and (d) all of 13 features discussed in Section 4.1 (All). The results revealed that the selection of k has a significant impact on the classification scores for all variants of the feature vectors (the best performance was retrieved using k ≤ 5). Similarly, the standard errors are notably smaller for lower k’s in most cases (e.g., this error is more than 8.3× larger for k = 15 compared with k = 3 for All). This indicates that the classifiers with smaller k’s generalize well to unseen data.
Utilizing the feature vectors containing all extracted features allows for obtaining the best classification performance. Although both Clustering and Histogram variants gave very high classification scores (88.98% and 92.4% for k = 2, respectively), combining these features resulted in the classification accuracy close to 99% (see Clustering+histogram). Finally, the perfect classification (the accuracy equal to 100%) was elaborated using All, in which all features are exploited (see All for k = 2). Importantly, there were no classification errors in this case—the set of the extracted features is quite discriminative and can be effectively used in this classification task (we deal with the six-class classification problem, therefore it is not trivial).
In order to investigate the statistical significance of the classification results, we conducted the two-tailed Wilcoxon tests. We verify the null hypothesis saying that “using different feature vectors (e.g., including certain subsets of all features) leads to obtaining the classification of the same quality”. The levels of the statistical significance are gathered in Table 7. The results confirm that the differences are important for all of the investigated variants (the p-values are smaller than 0.0001, therefore the null hypothesis can be safely rejected). Hence, all of the extracted features appear important and should be utilized to classify the incoming PDPTW instances, since exploiting all features resulted in the highest classification quality (see Table 6).
Conclusions and future work
In this paper, we proposed an adaptive guided ejection search for the PDPTW. The pre-processing step, which is the pivotal part of our algorithm, classifies the PDPTW instance to be solved to one (out of six) class, reflecting its most important characteristics. We extract 13 discriminative features (including clustering-based features, and the histogram-based ones—extracted at various scales). In our previous work [8], we showed that selecting an appropriate variant of the GES is crucial and affects its convergence capabilities. It is addressed in the adaptation procedure, which—based on the classification outcome—retrieves the best GES variant before its execution. The extensive experimental study revealed that the classification performance is very high, and applying our pre-processing procedure allows for retrieving the most appropriate GES variant efficiently. The two-tailed Wilcoxon tests proved that the differences in the classification accuracy obtained using various feature vectors (with and without certain features appended) are statistically important, thus utilizing all of the extracted features is beneficial.
Our ongoing research is focused on coupling our GES with the evolutionary algorithm to minimize the travel distance. This will result in creating a full optimization framework for the PDPTW (and potentially other challenging VRPs [54])—we plan to compare our adaptive GES with other state-of-the-art algorithms. We will investigate the performance of our pre-processing on massive problems, and those elaborated using our benchmark generator [55]. Finally, we work on the adaptive algorithm which will handle dynamic scenarios (e.g., changes in the transportation network due to the traffic congestion) on the fly.
