Abstract
The Maximum Independent Set Problem (MISP) consists of finding the largest subset of vertices of a graph such that none of the vertices are adjacent. It is well known that the MISP is an NP-Complete problem and usually cannot be solved exactly within a reasonable amount of time. In recent years, various heuristic search techniques, such as tabu search, genetic algorithm, simulated annealing, ant colony optimization and neural network algorithms have been used to solve this problem. In this paper, we propose a new algorithm for the MISP using Multiagent Reinforcement Learning (MARL) approach. In the proposed approach, each agent tries to solve MISP autonomously and improves its local behavior using reinforcement learning (RL) and directly communicates with other agents by sharing successful results. Also, the Exchange 2-1 local search is used to further improvement of the solutions at each step. The experiments were carried out on DIMACS clique instances and p-random graphs. Our experimental results show that the proposed approach is able to compete or even outperform some state-of-the-art algorithms.
Keywords
Introduction
Given a undirected graph
The MISP has a close relation to the MCP as it is complementary to the MCP and a maximum independent set of a graph G is the maximum clique of the complement graph of G [1]. The MISP and the MCP have been proved to be NP-Complete problem [2]. These types of problems usually cannot be solved exactly within a reasonable amount of time. Consequently, heuristic approaches must be used to solve them. In recent years, some heuristics, such as tabu search, genetic algorithm, simulated annealing, ant colony optimization and neural network algorithms have been used to solve these NP-hard problems [3, 4, 5].
Applications of the MISP appear in widely divergent fields including information retrieval, coding theory, signal transmission analysis, geometry, classification theory, economics, transport management, scheduling and computer vision [6, 7, 8, 9, 10].
The MAS consists of a group of autonomous intelligent agents which communicate with each other to reach a specific goal. The characteristics of an agent technology application are: being modular, decentralized, robustness, changeable, flexible and complex [11]. The MISP fits these properties well, hence agent technology is a practicable approach and is particularly useful since we can define several agents with independent local memory and different values of learning parameters which improve their local behavior using reinforcement learning, then share successful results between each other to improve the quality of the solution.
In this paper, we propose a new approach to solve MISP using Multiagent Reinforcement Learning. Where in, several independent agents placed at the graph of MISP and each agent tries to solve MISP and then its solution is improved by Exchange 2-1 local search by repeatedly changing the given solution with a better neighborhood solution. In the proposed approach, agents communicate directly with each other and results of successful agent are shared to other agents. Our experimental results indicate that proposed approach has a good performance and enable to compete with some state-of-the-art methods, in terms of solution size and computing times.
The rest of this paper is structured as follows. We mainly review some related work in Section 2. A formulation of the MISP and details of the proposed approach are explained in Section 3 and in Section 4, we evaluate the performance of our approach. Finally, Section 5 summarizes and concludes the paper.
Literature review
Andrade et al. proposed a successful iterated local search method to solve the MISP [12]. Their method, often finds optimal solutions in milliseconds for small scale benchmarks that require hours to solve with accurate algorithms. But, their method is outperformed by other algorithms for large scale benchmarks such as social networks [13].
Xu et al. developed an improved simulated annealing (ISA) algorithm for the MCP using a new acceptance function, can find an optimal solution to a given graph with higher probability [14]. They claimed, the ISA has higher convergence rate, than the original simulated annealing and could outperform Jagota’s [4] and Funabiki’s [15] methods.
Thomas et al. gave an evolutionary heuristic for this problem [16]. Aggarawal et al. proposed a genetic algorithm and compared with some state-of-the-art algorithms for the MISP [17]. Mehrabi et al. proposed a hybrid evolutionary algorithm for MISP by creating a new crossover which produces high quality solutions [18].
Barba used randomized parallel genetic for the MISP [19]. Busygin et al. proposed a heuristic approach based on optimization of a quadratic over a sphere [20]. Takefuji et al. developed a Hopfield-type neural network for solving the MISP [21]. Friden et al. developed a tabu search approach and tested to random graphs [22]. Leguizamón et al. proposed an ant colony system [23].
Bansal proposed an approximation for the MISP [24]. Ghaffari developed an improved distributed algorithm [25] and Heydari et al. proposed a method for distributed maximal independent set on scale-free networks [26]. Censor-Hillel and Rabie proposed a distributed reconfiguration of MISP [27]. Gainanov et al. developed a heuristic algorithm for solving the MISP with absolute estimate of the accuracy [28].
With regards to the MCP, Wu and Hao investigated several heuristic methods based on local search, Tabu search and evolutionary techniques [29]. Specifically, they compared the performance of 21 state-of-the-art methods for the MCP in terms of solution quality and runtime, which Tabu search algorithms outperform other methods in terms of solution quality. Grosso et al. proposed a highly successful method for the MCP which, uses different restart rules and diversification operations in plateau search [30].
Belachew and Gillis introduced a new continuous formulation of the MCP using symmetric rank-one non-negative matrix approximation [31]. Also, they proposed a new clique finding method, using a standard projected gradient method. They experimented proposed algorithm on 36 instances from the DIMACS data sets and compared with two other MCP methods. According to the experimental results, their algorithm could outperform the two algorithms.
Maslov et al. presented two branch and bound algorithms for the maximum MCP (ILS&MAXSA) applying a powerful heuristic for obtaining an initial solution of high quality, then this solution is used to prune branches in the main branch and bound algorithm [32]. The ILS&MAXSAT demonstrates the best performance on DIMACS instances among the MCS algorithm by Tomita et al. [33] and MAXSAT algorithm by Li and Quan [34].
Wu and Hao proposed Adaptive Multistart Tabu Search (AMTS) algorithm represents a new approach for approximating the MCP which combines a Tabu Search with a guided restart strategy [35]. As they reported on the experimental results, AMTS shows an excellent performance on the complete set of 80 standard DIMACS benchmark instances. Its competitiveness is confirmed when AMTS were compared with five state-of-the-art MCP methods.
Tomita et al. proposed a depth-first method using the engineering techniques for finding all maximal cliques, which generates a tree-like output [36]. Then, they designed a branch-and-bound technique based on a new approximate coloring method to speed up the algorithm [33]. Also, they proposed several improvements for the branch-and-bound algorithm in [37]. Smith and Perkins presented a hybrid algorithm for the MCP, where a heuristic method is used to construct the initial cliques, then tabu search and several optimizations techniques are invoked to improve the initial cliques [38].
There are also others approaches such as, learning heuristics [39] and MAS can be used for solving NP-Complete problems [40, 41, 42, 43]. In MAS approach, the problem search space is represented as an environment in which agents interact and work together to achieve a predefined objective and their activities are coordinated by heuristic methods inspired by collective intelligence [44].
Multiagent reinforcement learning algorithm for the MISP
The formulation of the MISP for combinatorial optimization problems is as follows [45].
Let
where
A
Let
A graph and its Maximum Independent Set is illustrated in blue [19].
The proposed MARL method for the MISP (MARL-MISP), starts by creating a graph of MISP instance and placing m agents at that graph. In other word, m agents are placed randomly at m nodes of the MISP graph. The proposed model consists of three parts:
The learning environment: includes m independent agents placed at MISP graph. These agents differ in learning parameters, such as learning rate, degree ratio and so on (see experimental results section); The learning algorithm: comprised policy and Q-Learning; A reward function: to assessment the effectiveness of the agents’ learning.
The policy is implemented using a lookup table such that, maintains an updated evaluation of all the possible actions for each node. Since MISP graph has
Now we can precisely describe the details of each round for each agent. Initially, an agent randomly placed at one of the graph’s nodes as a first member of the mis_set. Then, the agent chooses one of its
In
where
In Softmax method, a weight is assigned to each actions of the valid actions list based on
Then agent placed at the selected state and if the selected node (action) is not adjacent to none of the nodes in mis_set, then inserted to it. The process of selecting an action from the valid actions list and placing the agent at that state is repeated until a round terminated, this means, all possible nodes of the graph are selected and a mis_set is created. The task of an agent is a sequence of actions (selecting nodes of the MISP graph) that creates a mis_set. A round is finished, once all of the m agents have created their own mis_sets. Then the environment uses the cardinality of each agent’s mis_set to produce its response. If cardinality of the produced mis_set by each agent is greater than the cardinality of its best mis_set (the mis_set that has been created so far by that agent, called local best mis_set), then all actions of that agent along current mis_set is rewarded based on the Q-learning algorithm as defined in Eq. (3), called local
where
Also, if in any round of the algorithm, one of the agents creates a mis_set which its cardinality is greater than the best mis_set created so far by all of the agents (called global best mis_set), then corresponding actions on all agents is rewarded based on Eq. (3), called global
This process is repeated for a predetermined number of rounds (Maximum number of learning rounds, MNLR times) and after that, the best solution (mis_set) from all of the rounds is presented as output of the proposed approach (MARL-MISP). Algorithm 1 implements the MARL-MISP.
The proposed Exchange 2-1 Local Search (Exhg-LS) is a faster implementation of 2-improvements local search developed by Feo et al. [46]. The Exhg-LS (Algorithm 2) repeatedly changes the given solution with a better neighborhood solution and stops when given solution cannot be further improved. Whereas the produced solutions by MARL-MISP are not guaranteed to be globally optimal, so this local search is useful. This local search exchanges vertex
The working diagram of the MARL-MISP for any agent
Working diagram of the MARL-MISP.
Experimental results of the MARL-MISP are presented in this section and are compared with those of other well-known methods. We used Microsoft Visual studio.net 2017 under 64-bit Windows 8.1 operating system, running on a laptop with Intel Core-i5-4200U, 1.6 GHz CPU and 4 GB of RAM to perform all tests. In order to assess the effectiveness of the proposed algorithm, extensive simulations were carried out on two type graphs: some benchmark graphs and
Let
Parameter settings for the MARL-MISP
When developing a MARL, its performance crucially depends on the selection of appropriate learning parameters. Generally speaking, it is important that all the learning parameters are carefully tuned to get good performance in reinforcement learning [48]. Figure 3 illustrates the results of running MARL-MISP on instance p_hat1500_1 with different values of learning parameters.
Figure 3a shows the results obtained by MARL-MISP with different values of learning rates (
Results of running MARL-MISP on dataset p_hat1500_1 with different values of learning rate (
Let us see Fig. 4, which contains the convergence graph of the MARL-MISP on the instance p_hat1500_1 with different values of the MNLR. In this graph, average running times in seconds are shown above the data label. As we can see, after 5000-th round of the MARL-MISP, there are not notable improvements in the results proportional to the relatively high increase in the running time of the MARL-MISP. For this reason, we experimented in MNLR 5000 in the proposed approach.
The performance of the MARL-MISP on the DIMACS benchmark is illustrated in Table 2. Columns ‘instance’, ‘n’, ‘ed.’, ‘b.k.s.’, ‘best’, ‘avg.’, ‘std.’ and ‘time(s)’ respectively represent the name of the graph, the number of nodes, the number of edges, best known solution, size of the best solution found by MARL-MISP, the average solution size, the standard deviation and the average time in seconds. Recall that our algorithm solve MISP, so it works with the complement graphs of the DIMACS benchmark and the p-random graphs.
The performance of the MARL-MISP on the DIMACS benchmark
Convergence of the MARL-MISP on dataset p_hat1500_1 with different values the MNLR.
Table 2 shows that MARL-MISP can find maximal independent set of the best known solution for 24 out of the 30 instances of the DIMACS. The instances for which MARL-MISP fails to find the best known solution are brock400_1, MANN_a27, MANN_a45, p_hat300_3, p_hat1500_1 and san200_0.9_3. For these instance, its convergence rate is near 100%. In 24 instances for which MARL-MISP achieves the best known solution, in 10 instances it can find such a result with a success rate of 100%. As a result, one single run may suffice for MARL-MISP to find a maximal independent set of the best known solution and in 16 instances, each run of the MARL-MISP can reach either a best known solution or solutions of sizes very close to the best known solution. If we check the running times in Table 2, we observe that for 12 out of the 30 DIMACS instances (40% cases), the average running time for reaching the best known solution is less than 1 second and running times of less than one minute are needed for all of the 18 remaining instances. In Table 3, we used 12 instances from the p-random graphs, for
The performance of the MARL-MISP on the p-random graphs
From Table 3, we can say that, in all 12 different instances from p-graphs, the proposed algorithm can reach the solutions that their size is very close to their expected size in reasonable computing times. In summary, considering Tables 2 and 3, the MARL-MISP can achieve the acceptable results with high convergence rate in examined instances of small to large scales with different densities.
Now, we compare the MARL-MISP with several state-of-the-art approaches, some of these methods have been proposed for the MCP. As mentioned earlier, it is possible to get instances of MISP for which the size of the maximum independent set is the same as the maximum clique of the complement graph. Consequently, we can compare the results of the MISP approaches with the results of the MCP methods. The comparison is based on the size of the solutions found in terms of the largest maximal independent set or maximum clique size. According to the differences among the computers, programming languages and so on, so we have illustrated the computing times only for indicative purposes. The comparison of the MARL-MISP with SRON of Belachew and Gillis [31], ILS&MAXSAT of Maslov et al. [32], KLS5 of Tomita et al. [37] and HTS of Smith and Perkins [38] on 26 DIMACS benchmark instances is illustrated in Table 4. SRON used a laptop Intel(R) Core(TM) i7-6500U CPU 2.50 GHz and 8 GB RAM, ILS&MAXSAT used Intel Core i7 CPU 2.3 GHz machine with 8 Gb of memory, KLS5 used a PC with Intel Core i7-4790 CPU 3.6 GHz and 8 GB RAM and HTS used an Intel Pentium 4 3.40 GHz or Celeron (R) 2.79 GHz CPU machine with 4 GB of RAM.
The comparison of the MARL-MISP with 4 methods on 26 DIMACS benchmark instances
Table 4 shows that MARL-MISP finds larger solutions than SRON for 8 instances and for 2 instances both methods can find the best known solution. While SRON solves most of the instances faster than MARL-MISP. Only one instances is solved by MARL-MISP faster. However, the quality of the SRON’s solutions are poor. The next compared method is ILS&MAXSAT, which is an exact method, so it find best known solution solutions of all 26 instances. But its running times are very expensive and in average the MARL-MISP running times are almost 88 times faster than the ILS&MAXSAT. Comparing with KLS5, the MARL-MISP obtains better result for six instances, whereas the KLS5 obtains better results for 2 instances and other results are the same for two these methods, but the KLS5 needs less computing time. Also, the HTS obtains better results for 3 instances, whereas the MARL-MISP obtains better result for four instances and other results are the same for these two methods. But in average running times, the MARL-MISP is almost 8 times faster than the HTS.
In Table 5 the MARL-MISP is compare with ISA of Xu et al. [14], KLS5 of Tomita et al. [37] and HTS of Smith and Perkins [38] on 12 instances from the p-random graphs. None of these methods reported any information about their used computer.
The comparison of the MARL-MISP with 3 methods on 12 instances from the p-random graphs
Table 5 shows that MARL-MISP obtains better solutions than ISA for 8 instances and for other instances, it finds solutions as good as ISA. Comparing with KLS5, the MARL-MISP outperforms this method on 3 instances and for other 8 instances it finds solutions as good as KLS5, whereas the KLS5 obtains better results for 1 instance. However, average running time of the KLS5 is more than 1.66 times faster than the MARL-MISP. Comparing with HTS, the MARL-MISP outperforms this method on 2 instances and for other 7 instances it finds solutions as good as HTS, whereas the HTS obtains better results for 3 instances. However, average running time of the MARL-MISP is more than 4 times faster than the HTS.
In summary, considering the results presented in Tables 2 to 5, on a wide range of instances from small to large scales with varying densities showed that despite its simplicity, MARL-MISP is capable of competing with, and even outperforming state-of-the-art methods for the MISP and MCP. Specifically, MARL-MISP is able to obtain the best current solution for all instances of DIMACS except for 6 instances, and can reach the high quality solutions for 12 instances from the p-graphs in reasonable computing times. Only rare methods can achieve such performance on both DIMACS and p-graphs without any primary adaptation.
In this paper we presented a new multiagent reinforcement learning algorithm to solve the maximal independent set problem, a well-known NP-Complete problem. In the proposed multiagent system, each agent is an autonomous entity with different learning parameters and directly communicates with other agents by sharing successful results. Experimental evaluations on two benchmark sets showed that despite of its simplicity, MARL-MISP finds the current best known solutions for 24 out of 30 instances of the DIMACS and can reach the solutions of the p-random graphs that their size is very close to their expected size in a very reasonable time. The competitiveness of the MARL-MISP is confirmed when it was compared with several state-of-the-art maximal independent set and maximum clique methods and it outperform most of these methods in terms of the solutions’ quality and convergence rate.
Footnotes
Authors’ Bios
