Abstract
The paper presents a study on the efficiency of Ant Colony Communities (ACC) used to solve the Travelling Salesman Problem. The ACC is an approach to parallelize the Ant Colony Optimization algorithm (ACO). An ACC is made up of a Community Server that coordinates the work of a set ant colony clients. Each client implements a classical ACO algorithm. The individual colonies process cargos of data obtained from the server and send them back the as partial results. The paper presents a general description of the ACC concept and describes in details two ways of implementing it. The first one uses an inhomogeneous environment of traditional computers working in an asynchronous mode. The second one uses the homogenous Hadoop environment and the processing is done in a synchronized mode. The performance of the Communities is estimated by low level measures: their power and scalability. The high level measure deals with the length of obtained routes. The paper presents also the taxonomy of parallel implementations of the Ant Colony Optimization.
Keywords
Introduction
The Travelling Salesman Problem (TSP) is one of classical problems of Artificial Intelligence. To solve it we use an extended version of the Ant Colony Optimization metaheuristic. The ACO could produce good quality results but at the cost of long processing time. Introducing parallelization may help mitigate this disadvantage.
The main aim of the paper is to present and evaluate the Ant Colony Community (ACC) concept. It is a parallel version of the basic Ant Colony Optimization (ACO) algorithm. The ACC consist of a Community Server that coordinates the operation of a number of Ant Colonies. Each one is an implementation of Ant Colony Optimization (ACO) metaheuristic algorithm. The optimization process works in loops. In each one the Server sends cargos of data to its clients. The clients process the data and send back obtained results to the server. The server is in turn responsible for aggregating the results and sending selected cargos anew to the clients. The paper describes in detail the ACCs’ schema, its basic operation principles as well as message passing rules. We have implemented the ACC in two ways. The first one uses an inhomogeneous environment of traditional computers working in an asynchronous mode. The second implementation exploits the homogenous Hadoop environment. The performance of the Communities is measured by low level metrics: their power and scalability. The high level measure is the length of obtained routes. At both levels the base line is the performance of standard ACO algorithm.
The initial version of the ACC was presented earlier in [15] and since then it was considerably extended. The Server can now fully control the behavior of the individual colonies which enable it to restructure the community at run time. The paper introduces two Community performance measures: its power and scalability. They are defined by the task complexity and the time used to complete it. They are more useful for the comparison of different parallel implementations than the traditionally used measures of speed up and efficiency. The traditional measures are confined to one specific implementation. Finally a new Hadoop based implementation of the ACC was developed and evaluated.
The paper is organized as follows. Section 2 introduces the ACO version for the Travelling Salesman Problem. In particular it discusses the computational complexity of optimization tasks. Section 3 presents related work on parallel implementations of the ACO’s concept. The main contribution of the paper – the Ant Colony Community (ACC) is introduced in Section 4. The iACC and hACC - two ways of implementing it are presented and compared. The next section is devoted to the evaluation of the performance of such communities. On the lower level we measure their power and scalability and on the higher one we deal with the quality of achieved results. The code for clients’ implementations is in both approaches the same so the differences in the quality of results are not statistically significant. The section ends with the comparison of the performance of the two versions. The paper concludes with a brief discussion of the results achieved so far and indicates some future investigation areas.
Any Colony Optimization
In this paper the ACO is used to solve the Travelling Salesman Problem. The basic ACO was originally proposed by Doringo in [7] who until now remains one of the key researches in the field. The ACO concept has many application areas [8].
TSP
The TSP is a remarkably old (originating at the beginning of the 19th century) and simple problem: given a list of cities and the distances separating them what is the shortest possible route that visits each city exactly once? The number of all possible different routes for a graph with n nodes is equal to (n-1)!/2. The number is estimated by and even for relatively small values of n like 50 the value is just staggering. It is equal to approximately 3.04141E+64, so it exceeds the mass of an observable steady-state universe (1.45E+53 kg). This clearly calls for a heuristic solution. The complete search is by no means feasible.
The TSP is now one of the established, classical problems of Artificial Intelligence and serves as a touchstone for many general heuristics devised for combinatorial optimization. It is a NP-hard problem. A detailed study of its properties can be found in [1]. Many researches are active on that area. A recent comparison of metaheuristics used for TSP could be found in [8]. The solutions presented in this paper include, among others, genetic algorithms, simulated annealing, tabu search, quantum annealing, particle swarm optimization, harmony search, a greedy 2-opt interchange algorithm and last but not least the Ant Colony Optimization.
Basics of ACS
The graph describing the locations of the cities is represented by a floating point matrix containing the distances between them. The matrix is symmetric (the length of road from A to B is the same as from B to A) and its main diagonal entries are equal to 0. Another matrix of the same size is used to store the pheromone levels. The pheromone levels harvest collective knowledge gained by all ants. At the start of optimization process the ants are randomly scattered over the nodes. An ant colony finds solutions in an iterative manner. Ants travel from one node to another depositing a pheromone on their way. An iteration ends when the ants have visited all nodes. At that very moment the pheromone levels of the whole matrix are updated. The colony remembers the Best-So-Far (BSF) route and its length. There are many parameters for controlling the operation of an Ant Colony and it is easy to optimize them [17]. The current study deals only with two parameters: the number of ants and the number of iterations.
ACO operation details
An ant can work in one two possible modes: exploitation or exploration. The mode of operation is selected by a stochastic function. In the exploitation mode of work an ant uses quality function qf(r, t) defined by the Formula 1. The selected node t must maximize the value quality function qf(r, t).
Where:
r: the current node
t: any possible next (not yet visited) node
τ (r, t): the level of pheromone that is deposited on the route from r to t
η (r, t): the distance that separates r and t
β: moderating factor, usually having the value of 2.0
In the exploration mode an ant selects the node with a probability that is proportional to the value of the pr(r,t) function:
There is no guarantee that the BSF solutions found in consecutive iterations will improve. Figure 1 illustrates the behavior of a typical optimization run counting 1000 iterations. The upper jig-saw line represents the current (iterations’ best) solution and the light colored spikes are proportional to the shortening of the BSF path. As you can see with increasing number of iterations the spikes are less and less frequent and they also decrease in height. There is no improvement after the iteration number 560. All the computations necessary for the subsequent iterations are wasted, they do not contribute to the final solution.
One of the key features of the ACO is its indeterminism. It is due to the fact that the random number generator is used in many functions and the implementations rely heavily on threads which are nondeterministic by nature. As a result the BSF route could occur at any iteration, but the computational cost of route shortening increases with the number of iterations. It would be preferable to have a way of increasing the possibility that the most of the improvements occur at the early stages of optimization, so that we can stop the processing after relatively low number of iterations. The experiments reported, e.g. in [14], show that it could be obtained by the ant overpopulation. This is done by using much higher than usual number of ants in one colony, increasing the number of colonies working on the same task or by a combination of both. In any case it means an increase of computational time which could be mitigated by parallelization.
Let us suppose that we have an algorithm that runs in parallel k instances of the ACO. After the completion of it selects the best (shortest) route. Let Min(k,n) denote the best of k BSF values using n iterations. Table 1 shows the average values of Min(k,n) for several values k. To obtain statistically significant results the number optimization runs was equal to 100. With k=1 the algorithm works exactly like the standard ACO. The number of iterations in the second row decreases in consecutive columns to keep the computational complexity at a constant level.
As expected the values of Min(k,n) for k > 1 are much lower than Min(1,n) especially for the higher values of k. The improvement comes at the cost of increased computational complexity of the optimization process. More interesting is the second row. The values from the second row indicate that decreasing the number of iterations does not harm the solution quality. For a group of 4 colonies we can decrease the number of iterations from 1000 to just 250 and obtain the same length of BSF. Ant overpopulating help us to limit the diversity of obtained results, because good quality results are obtained for relatively small number of iterations.
Estimating task computational complexity
A good measure of the complexity of the optimization task is the number of calls of the qf function. This is because it requires the calculation of the floating point power function. The function requires much more time to execute than any other instruction. A floating point multiplication takes just a few nanoseconds whereas the power function needs almost 1 microsecond to compute.
Let nNodes and qfN(nNodes) denote the number of nodes in a graph and the number of calls to the qf function made by a single ant in one iteration, respectively. During each iteration an ant has to makenNodes-2 selections of a node. The first node is given to an ant at the start of an iteration. After making nNodes-2 selections the next node is obvious – it is the only one left in the set A(r). The value of the qfN(nNodes) is calculated by the Formula (3).
For one iteration and the number of nodes equal to 50 a single ant needs 1224 calls of the qffunction to complete its work. The function has to be calculated anew each time it is called because the pheromone levels constantly change as they are modified by all the ants. To find a good quality solution we need to apply many ants and hundreds of iterations and therefore many millions qf function calls are necessary to complete an optimization task. Handling such large numbers is not convenient and therefore we introduce the SC constant. It is equal to 306*104. It represents the number of calls to qf that are necessary to complete the so called Standard Cargo task. The Standard Cargo optimization task consists of finding a solution for a matrix with 50 nodes using 50 ants and 50 iterations. The numbers were selected in such a manner as to make easier the comprehension of complexity of different optimization tasks. The Standard Cargo processing by the most computers used in the experiments requires from just over a second to almost 15 seconds. Introducing the constant also facilitates the evaluation of results. In what follows the complexity of a task t is estimated by Sc(tk) which is the number of calls to the qf function that are necessity to complete the optimization task tk expressed in the SC units.
The complexity of different computational tasks is presented in Table 2. Bearing in mind, that Standard Cargo needs more than 1 second to complete we can see that solving a typical task (50 nodes, 50 Ants, 1000 iterations) requires around half a minute to complete on a standard hardware. This makes it evident how much a parallel implementation of the ACO is needed.
Solving NP-hard optimization problems such as the TSP is compute-intensive. Therefore we have to resolve to parallel implementations. The parallel versions of the ACO were presented just a few years after their introduction [2]. The main characteristics of any implementation are: the number of colonies, pheromone matrices, communication frequency and organization of the population.
There we many approaches to categorize parallel ACO implementations. Authors of [13] distinguish operation models based on master-slave paradigm, independent executions and synchronous cooperation. In [9] algorithms are divided to: standard and custom parallelization of ACO, with the last one trying to gain efficiency from the specifically designed algorithm behavior.
A new comprehensive study on the subject is in [12]. The new taxonomy reuses and expands the concepts from those previous two works. The resulting taxonomy includes four main model categories and a fifth: ‘hybrid models’for proposals that would fit to more than one main category. The four main categories are: ‘master-slave model’, where multiple clients are controlled in a centralized way by a server; ‘cellular model’, in which small, overlapping neighborhoods in a single colony communicate directly with each other, each one solving its part of the problem; ‘parallel independent runs model’, where ACOsare executed concurrently, without any communication among them; and ‘multicolony model’, for multiple ACOs interacting directly with each other by periodically exchanging best solutions without the presence of a server.
The basic features of the taxonomy are summarized in Table 3. The D/P abbreviation stands for depending on proposal.
The most popular model of ACO parallelization is master-slave in its coarse grain version. An example of that approach is presented in [3], where authors - using an 8 CPU cluster - demonstrate how big speedup can be achieved by increasing number of ants at each processor.
The multicolony approach can be found for instance in [10]. The ACOs here are run on a homogeneous cluster of 4 computational nodes running Linux and LAM/MPI as communication libraries. The inter-ACOs cooperation was tested with a spectrum of 5 configurations with less and less communication starting from ‘fully-connected’ to ‘parallel independent runs’.
The approach proposed in this paper is an extension of [15] and can be thought of as a multicolony model with a server. Its task is to control the work of client ACOs by sending them cargos that is packets of data to process. The cargos contain everything that is needed to provide a partial solution of the whole problem. So according to the presented taxonomy this paper work should fall into the hybrid models category.
According to [12] the parallelization of ACO may be measured using 2 metrics: speedup, which is the time saved when using a parallel algorithm version instead of a sequential one; and efficiency, which is a normalized speedup. Authors have left out the multiobjective and dynamic versions of the problem.
Comparison of the efficiency of different approaches with the proposed solution is in the Section 5.2. The parallelization is most effective when the individual processes run on different processors of the same machine. The speed up factors for two fine grain master slave implementations using message passing and shared memory are given in Table 11.
As can be expected the shared memory is considerably faster than sharing memory and increasing the size of a problem improves efficiency as the communication frequency is lessened.
Ant Colony Community
Many approaches to parallel implementation of the ACO require specialized hardware [5, 6]. All ants of a colony share the same pheromone array and thus communication frequency has to be extremely high. No standard computer equipment could support parallel work with that mode of operation. On the other hand the ACC passes large chunks of data relatively rarely and thus the communication overhead does not pose a problem. In this section we present two approaches to the implementation of the ACC concept. In the first the server and clients are hosted by an inhomogeneous environment of a computer network and the second one uses the Hadoop environment. Most of their components are identical, as a matter of fact, they are implemented by the same Java objects. What makes them different are the tools for achieving parallel execution of colonies and the way cargos are exchanged.
iACC – inhomogeneous implementation of the ACC
In this version the server and clients are hosted on a single computer, many computers in of local network or on Internet servers. The cargos of data exchanged between the server and clients are sent using a very efficient socket mechanism. Each cargo contains data to process (distance and pheromone arrays) and parameters or results of processing. A Colony Client is executed as a separate process and handles all communication with the server. It is also responsible for creating an object that implements the basic version of an Ant Colony Optimization algorithm, passing the input data and operational parameters to it and finally receiving results and sending them back to the Server. The implementation of the basic version of Ant Colony Optimization was described in [4]. The Community Server has a separate thread for each colony client. It is responsible for allocating tasks to colonies and for integrating the results sent by Colony Clients. An exemplary structure with 4 computers and 5 Ant Colonies is presented on Fig. 2.
The interaction between a Server and a Cargo has to be initialized by a Client since it knows the IP address of the server and not vice versa. The server creates a thread dedicated to the calling-in client and registers it. After the initialization phase the Server and a client exchange data items which are serialized objects of the AntCommunityCargo type. The Server starts the work of the Community only after a predefined number of clients have registered. It fetches a cargo object from a cargo pool and sends it to a client. After receiving an AntCommunityCargo object the Colony Client creates a local ant colony, feeds it with the data extracted from the just received cargo object and finally starts its operation. An Ant Colony produces solution in a usual manner and passes the obtained results to its Client which packs them into a cargo object that is immediately transmitted to the Community server. The Community server puts the cargo in the Cargo pool.
The details of Community operation are presented in Table 4. The rows written in bold are repeated in a loop. The number of loop executions is equal to the number of processed AntCommunityCargoobjects.
Community server description
The structure of the Community server is depicted on Fig. 3. It consist of: Graph store which delivers a distance array. The graph store works as a separate thread and is capable of handling distances that change in time. The current study report only work with static arrays. At the start of optimization it generates a new distance array or uses an existing array. The second solution makes it easier to compare results achieved by different Communities. The server core is responsible for creating threads for client handling and assembling the cargo for clients. Command generator specifies the next command sent to a client. Using the commands the server can send cargo for processing, deactivate client, test connection transfer or stop a client altogether. Cargo pool has a fixed size. It keeps a cargo received from a Client if there is free space for it or if it can find a cargo with route longer than the incoming one. The Cargo Pool operates in one of the two following modes: Monitor mode in which the pheromones and partial solutions are not propagated and as a result the clients work independently. The Server limits its activity to monitor the obtained results and to select the best one. Co-operative mode in which it keeps the partial results in a pool.
The structure of a Client is pretty straightforward and does not need further explanation.
hACC – Hadoop implementation of ACC
Apache Hadoop is a software framework for distributed storage and distributed processing of - so called - ‘big data’ on computer clusters. Recently it became very popular because it offers an easy and cost effective parallelization of computing. Big data and Hadoop phenomenon is discussed e.g. in [16]. The advantages come however at a cost. Many tasks have to be customized for that environment and for some tasks it is not possible at all. The ACO parallelization falls to the former group.
The data to process is split into large chunks, all al them are stored in the Hadoop Distributed File System (HDFS). Hadoop takes advantage of ‘data locality’, which means that each node in a cluster processes the data only with local access. The data locality approach constrains tasks from sharing the data with one another.
The TSP cannot be solved in a fine grain manner since the whole distance matrix and the whole pheromone matrix are always needed. This is why Hadoop framework is less suited for fine grain ACO parallelization. The grains here would be as ‘coarse’ as the chunks of data for each node. The chunks of data are serialized cargo objects. Therefore the Hadoop approach of this work is more of a master-slave model than a multicolony model.
The other part of Hadoop framework, beside HDFS, is the processing part called MapReduce. Implementing processing of the data in the framework means implementing 2 functions: map () and reduce (). Map dispatches the job data to task slots and reduce aggregates the task results.
While implementing the hACC an attempt was made to reuse as much code from the iACC as possible. So the server code marked at Fig. 3 as ‘core’ stays the same. Similarly the client code is run by Hadoop map function in task slots.
A different implementation of ACO parallelization with Hadoop is presented in [11]. The main difference between the hACC and this approach is that the latter system aggregates results in the reduce function, whereas the hACC uses for that purpose the Cargo Pool from the system core. It’s hard to make an in-depth comparison of the two approaches since the authors of [11] give too little operational details.
The process of running parallel ACOs for TSP is depicted in Fig. 4 and it works as follows. Each cargo taken from the pool is serialized to a JSON object and placed as a single line of a plain text file. The number of cargo objects and thus the size of the file is dependent on the number of clients configured at the server. The serialized cargos are dumped to a file which is stored in the HDFS system. From there it can be used by MapReduce. Serialization and deserialization of cargo objects is done with Gson 1 library.
The Hadoop part of the ACC implements ACO clients. It starts its operation with the deserialization of a line which produces a cargo object. The cargo is then processed by a separate ACO process. After processing the cargo with updated pheromone matrix and BSF is serialized to JSON and passed to reducer. The reducer writes the serialized cargo back to the HDFS as a line of plain text output file thus finishing a single Hadoop job. Then the output file is taken from HDFS and after deserialization the processed cargos return to the pool closing the process cycle.
Running sequential Hadoop jobs is essential since this is the only way for exchanging information among ACOs. Because of the MapReduce design principles each ACO process within a single Hadoop job is isolated. Only having the cargos back from the processing job can they be fed back into the community as partial solutions.
ACC performance measures
Low level performance measures
The traditional metrics for efficiency of parallel implementations are speed up and efficiency see Section 3. They provide values valid for one hardware configuration. Therefore we have decided to introduce new measures based on the actual complexity of tasks being executed. This enables us to compare different hardware and software configurations. The new measures of community performance are power and scalability. The high level measures of Community performance are introduced in Section 5.2.
In order to measure the low level performance of an Ant Community we first introduce the Cp (Computational Power) measure. For the Community Com and the task T it is defined by the following ratio:
Where:
Com - the tested community
Sc (T) - the complexity of optimization task T measured in SC units, see Section 2.5
Time (T) - the time span necessary to complete the task T measured in seconds.
In other words the Computational Power tells us the number of Standard Cargo tasks that could be calculated within a minute. The measure could be used to compare the performance of differentcommunities.
Time (T) depends not only on the properties of the hardware but also on the location of the server and the complexity of the task. The conducted tests show however that the Computational power of a community remains relatively stable for a wide variety of tasks. The decline of power was observed only for tasks with a low number of iterations and server on the WAN network.
Let MaxPower(Cl) denote the power of a community with the following properties: There is only one Client Colony on the computer Cl. The server is located on another computer in the same LAN and it does not handle any other clients.
The MaxPower(Cl) estimates the maximal value by which a Client could increase the power of its community. Having defined the MaxPower we can measure the SF(Com) scalability factor for the community Com. It is the ratio of the actual observed power and the sum of MaxPower of all its clients.
The scalability factor is an extended version of the speed up metric mentioned in Section 3. If all clients have the same properties both of them have the same value.
It is possible to run several instances of a client colony on a single computer. The computers under the test were not identical. They were regular desktops, laptops and a university workstation, see Table 5. The performance index gives the number of milliseconds that are necessary to calculate 1000000 calls of power functions. It turned out, that such a simplistic approach to efficiency does not correspond well with the actual performance of a computer. In order to be able to configure reasonably the structure of a Community we have to know the power of colonies of varying sizes and their scalability factor. The relevant values are shown in Tables 6 and 7. The data refers to a community in which clients share the same computer and the server is accessed via LAN. The optimization task used while measuring the Computational Power experiment was: 50 Ants, 50 iterations, 50 nodes, 100 cargos.
As you can see all of the computers were equipped with multicore processors and several GB of memory but their power differs considerably in extreme case by the factor of 3.6. Moreover the difference grows with the increasing number of hosted colonies. In all cases running more than 4 clients hardly increases the community computational power. Large number of clients leads to decrease in the community power.
The biggest surprise is the poor performance of a well-equipped E laptop. It is the only one computer that runs under Windows 8 operating system.
The scalability of the colonies drops pretty fast. This is probably due to the fact that each client works as a separate process with a separate JVM environment. The scalability is the highest for the DT (desktop) computers. The workstation scales pretty well.
The scalability of more diverse Communities is presented in Table 8. There are 3 types of clients used in experiments: n: neighborhood clients located on the same computer as the community server, l: local Clients located in the same LAN network but not the same computer as the server, w: WAN Clients accessing the server via Internet.
A colony description consists of a sequence of elements each of them tells: the hosting computer name, number of its clients and their type e.g. {Ll3, Bn2} specifies a colony with the server and 2 clients that run on the computer B and additionally 3 clients located on computer L.
With the number of clients relatively low (less than 3) the scalability exceeds 0.85. Locating the server on LAN lowers the scalability factor, but the impact of the localization decreases as the number of clients per computer increases. The best way to achieve high scalability with a large number of clients is to distribute them over many computers.
Low level performance of hACC
There are two major differences between the iACC and hACC. The structure of the iACC community has to be configured manually. Overloading a node computer could adversely affect the power of a community. In the case of the hACC the available hardware resources that are allocated to the clients are optimized by the Hadoop environment. The second difference is the mode of operation. The iACC works in an asynchronous mode whereas the hACC works in a synchronous mode - in one map-reduce cycle each client processes one cargo.
We have tested two Hadoop versions of the ACC. They we located on computers with parameters shown in Table 9. First one is a virtual machine available on the university cluster WCSS with performance comparable to a well-equipped desktop computer. The other is a physical desktop machine. In both cases Hadoop environment was run from the same image of a docker container.
The power and scalability of hACC colonies are presented in Table 10.
The power of a single colony in both cases is similar to that of iACC. For medium size colonies with several clients the iACC outperforms hACC. However the hACC is far more easily scalable then the other version. Assigning more computers (nodes) to Hadoop gives us the possibility of having truly massive parallelization.
High level performance
In [6] authors compare the efficiency of two approaches to ACO parallelization: message-passing and shared-memory. Although not identical the efficiency and scalability measures are comparable. Table 11 contains the comparison of the two methods described in [6] with the iACC and the hACC.
As the Table indicates the performance of iACC and the hACC is similar to the shared-memory method, but its main advantage is that it does not require specialized hardware.
High level performance
At the high level we are interested in the quality of the found solution that is the length of the BSF. The processing time is not an issue here. The ACC is relatively easily scalable so we can shorten the processing time by increasing the number of Clients. In this section we are not interested in the structure of the community but the granularity of the cargos and the mode in which the server operates. As described in Section 4.2 there are two modes of operation: monitor and cooperative. Figure 5 shows the difference in their operation:
Monitor mode
In this mode the individual colonies do not influence each other. The server sends the same initial cargo to the clients and keeps track of the BSF solution received from any client. An operation of a server with 3 clients and 9 cargos is depicted in on Fig. 5. This mode of operation is used as a reference for the co-operative mode.
The BSF results are presented in Table 12. It shows the obtained BSF path for different number of ants and iterations. The number of cargos was selected in such a manner as to preserve the complexity of the task. It is equal to 200. The BSF values depend on the properties of the cargo.
The advantage of parallel processing is apparent when we compare the results with the BFS lengths calculated for the same distance array with the basic non-parallel version of the ACO. The mean value for the non-parallel version is 2.33. With the confidence level set to 0.95 the results should be within the range [2.29, 2.37]. As you can see in almost all of the cases the values achieved by the colony lay below the indicated rage.
The Co-operative mode
The cooperative mode of the Server differs in 3 important ways from the monitoring mode. The Server propagates the pheromone array, keeps a limited number of previously found results in a cargo pool, keeps track of the overall best BSF route.
All this means that a client colony resumes work on a cargo and does not start processing anew like in the monitoring mode, see Fig. 5.
While testing the co-operative mode we have to take into account the size of the store. During the experiment we have used 3 sizes of the pool: 1, 5 and 10. The number of client colonies was equal to 10 and the obtained results are presented in the Table 13. As in the previous case the total complexity of the task is the same in all cases.
There are a few conclusions that could be drown from the data. The co-operative mode significantly lowers the value of the BSF. The efficiency of the process clearly increases with the number of cargos. The number of cargos in the pool equal to 5 (the half of the number of clients) seems to strike a good balance between the intensity of client co-operation and the diversity necessary to avoid local minimum trap. The best results are obtained for the number of iterations as low as 100 and the number of ants = 50. This is a very promising result. The optimizations process is split in large number of relatively small tasks. It means that it scale easily and probably could be adopted for the Dynamic Travelling Salesman Problem in which the distances change overtime.
Conclusions and future work
The paper presents studies on the performance of implementations of the Ant Colony Communities. Both of them are easily scalable. The iACC version needs manual tuning to optimize its performance whereas the hACC version relies on the optimization mechanisms of the Hadoop environment. The hardware allocated to Hadoop was rather limited and therefore the iACC version outperformed the hACC.
Although not the same, the speedup factor and scalability measure are closely related. The performance of the ACC compares favorably with other parallelization attempts reported e.g. in [6]. The communication level is much lower in the case of the ACC, but it has not a significant impact on the high level performance. What is even more important, it is not confined to the number of processors of a single computer. The ACC could harness the free computational power available in any LAN network which makes it more useful in practice.
In future we plan to further optimize the work of iACC by modifying the operation of a Client Colony so it could handle many ACO implementation each one working as a thread. The current version uses separate processes for clients and this drains up the computer resources pretty fast.
The reported here experiments show that parallel work of the ACC, not only reduces processing time, but also improves the quality of achieved solutions. Future study will include the optimization of the cargo size. The Java source files for Community Server, Community Client, test data and parameters are available on GitHub at https://github.com/marekopel/hACO.
The flexible structure of the ACC makes it possible to adopt it for the dynamic version of the TSP in which the distances change all the time. Work on that area is currently underway.
