Abstract
Network on Chip (NoC) has been suggested as an appropriate and scalable solution for System on Chip (SoC) architectures having high communication demands. In this study, we propose heuristic fuzzy based mapping approaches to decrease the power consumption and improve the performance in the NoCs. The proposed method has two steps: core to task mapping and router reduction. In the mapping stage, two algorithms are proposed; first, proposed mapping algorithm maps the tasks to cores heuristically by means of Genetic and Simulated annealing algorithms, then tries to define a cost for each mapping and choose the lowest cost in order to diminish the power dissipation in the NoCs. In the second mapping algorithm, fuzzy rules are applied to evaluate and select the best topology such that the power consumption is minimized. Fuzzy logic is used to make a better decision in terms of distance and bandwidth for tasks to cores mapping. In the second phase, since the optimum number of router resources has colossal effect on power dissipation in the NoCs, fuzzy approach is utilized to reduce the number of routers in the NoC architectures without any significant impact on the performance. To evaluate the proposed methods, we use five multimedia benchmarks. The experimental results show that heuristic and fuzzy logic methods improve the power consumption over the non-optimized NoC by up to 66% and 73%, respectively. Also, the proposed fuzzy mapping algorithm along with the router reduction method compared to the presented fuzzy without router reduction approach gives on an average, 73% energy reduction.
Introduction
The communication infrastructure has a great impact on the overall performance and energy dissipation in on-chip architectures. On the other hand, advances in technology pose some challenges in SoCs such as unpredictable latency and high energy consumption [1]. NoCs which have been proposed as an appropriate solution for on-chip interconnection of SoC infrastructures are more scalable and reusable compared with the traditional buses [1, 2]. These advantages in NoC architecture came at the cost of complexity that has a colossal effect on power consumption and performance.
The power consumption and overall performance of NoC depend on the communication infrastructure and researchers show that the significant portion of on-chip power dissipation is consumed in the interconnections [3]. NoC architecture includes routers which are connected by links, therefore, power consumption is composed of router’s power and the power dissipation of the physical link. The link power dissipation is dominant over all other fractions in NoCs [4].
Due to the limitation of resources in NoCs, tasks to cores mapping has a colossal effect on the performance, latency and physical link’s power dissipation of NoCs. Improvement in link’s power consumption and performance can be achieved by assigning the tasks to cores based on the optimum distance and bandwidth [5, 6]. As mentioned, considerable fraction of total power dissipation consumed in the physical links has a direct relation with length of the link and bandwidth of the transmitted data [5]. In order to map tasks to cores, a Quadratic Assignment Problem (QAP) can be defined. The problem assigns the tasks to the cores in a way that the connections with highest bandwidth are assigned to the shortest distances of the cores. The problem naturally is categorized as NP-hard problem. So, it may be solved by many exact and non-exact (meta-heuristic) methods. Exact and meta-heuristic methods such as branch and bound, simulated annealing, genetic algorithm, variable neighborhood search and tabu search were widely applied in the literature of QAP [7].
Decision making affected by uncertainty and fuzzy controller is the best choice for dealing with uncertainty. Fuzzy controller has an advantage of avoidance in rigid boundaries with providing a level of confidence to the data which is really important to remove ambiguities and solve the problem that are difficult from mathematical perspective. Since Quadratic Assignment Problem (QAP) in optimization algorithms is complicated (QAP is combinatorial optimization problem and is known as a NP-hard problem), the utilization of fuzzy concepts can be helpful to improve the efficiency of our evaluation.
In this paper, we present a hybrid meta-heuristic method which assigns the tasks with higher bandwidth to the lower distance and vice versa to minimize the objective function of the QAP. This objective function coverts the problem to minimize the power consumption without degrading the performance.
After the algorithm is applied to certain data of NoCs, to make a better decision based on the objective function, fuzzy logic rules are proposed. According to these rules we would be able to find the optimum solution for the tasks to cores mapping. The fuzzy system is employed to make an optimum decision in terms of bandwidth and distance between the cores as inputs and output is the cost which is the total weighted distance of the tasks. The use of proposed fuzzy algorithm leads to improvement in power consumption and the performance of system.
Besides links, a significant portion of power of NoCs is consumed in routers. Another contribution of this paper is to find the optimum number of routers using fuzzy logic algorithm. This approach along with optimal layout gained in previous step can reduce the power of NoCs. Conclusively, proposed fuzzy solution is used to minimize the power dissipation by tasks to cores assigning and finding the optimum number of routers in NoC. Thus, the physical links power consumption, router’s power dissipation and consequently, the total power of NoC are reduced.
Literature review
Mapping problem is classified as NP-hard problem [8]. Thus, due to the running time of mapping algorithm, the key factors such as objective function, constraints and evaluation should be considered more precisely. There are many simple and advanced meta-heuristic algorithms which have been applied to solve NP-hard problems in different research topics. Efficiency of meta-heuristics were studied by researchers in the fields of QAP, layout problems and scheduling problems [9–13].
In the NoC realm, there are some significant studies on mapping based analytical and meta-heuristic algorithms. Hu et al. [14] present the first mapping problem through branch and bound algorithm which maps cores onto a regular NoC with respect to power consumption minimization and the performance of the system is guaranteed through bandwidth reservation. The authors in [15] present a mapping algorithm with communication energy minimization objective such that the performance is guaranteed via bandwidth reservation. They construct a deterministic routing and branch and bound algorithm is used. Murali et al. [16] introduce a heuristic approach for mapping the cores onto a mesh-based NoC with bandwidth constraint such that the average communication delay is minimized. The researchers in [17] investigate a multi objective mapping method for mesh-based NoC architecture. They use a genetic algorithm to find Pareto mapping which is able to optimize power dissipation and performance in NoC. The authors in [18] propose a genetic algorithm based method that is used for application specific NoC. The objective function of this approach is power and router resources reduction. In [19], an Integer Linear Programming is presented that is followed by a heuristic algorithm according to the Particle Swarm Optimization to find the best position for the router with communication cost minimization objective. They focus on the application specific NoC to obtain an appropriate position for the routers from the list of available router’s position. In [20], a mapping algorithm is introduced with multi objective genetic algorithm to find Pareto front solutions for different network topology with respect to power dissipation and latency.
These days many researchers are using fuzzy logic systems in different fields [21–27]. Fuzzy systems have already been utilized in NoC [28–30]. In [28], the authors present an adaptive routing algorithm where fuzzy logic system is employed to avoid congestion in NoC. They use fuzzy system to generate uniform traffic over the routers which have extra capacity. In the proposed method path decision making is based on the fuzzy system to make links of NoC less congested through distributing the traffic uniformly. Investigators in [29], propose an adaptive routing method to select the router’s output port based on the fuzzy system in NoC. The goal of this research is delay and power reduction. Fuzzy logic is used for flow regulation in NoC according to the chip multiprocessors to improve the performance by using the network resources better [30]. In this method, decision making is done completely dynamically based on the traffic as well as the condition of interconnection network.
To the best of our knowledge the fuzzy logic system has not been used in heuristic mapping problem in NoC. The advantages of fuzzy rules in decision making motivated us to propose a heuristic mapping algorithm based on fuzzy logic rules. The proposed approach takes advantage of fuzzy system to obtain the optimum mapping of the tasks onto the cores with optimum number of routers in NoC architectures. The proposed method takes advantage of the fuzzy concept to have more efficient evaluation in NP-hard problems.
Quadratic assignment problem
Quadratic Assignment Problem (QAP) can be applied to model the mapping problem of NoCs. QAP is a very popular optimization problem and tackled by many researchers. Its NP-hardness [31] causes non-polynomial run time in the case of known methods which motivates researchers to introduce different meta-heuristic solution methods [32–35]. In this paper QAP is applied to map a set of tasks to a set of cores on a NoC by the following model,
The model is described by the following points: n is the number of tasks and cores of the NoC. i (j) and k (l) are indexes used for the tasks and cores respectively, x
ik
is a binary variable. The value of 1 denotes that task i is mapped (assigned) to core k, ω
ij
demonstrates the bandwidth of tasks i and j, d
kl
denotes the distance of cores k and l, The objective function (1) minimizes the sum of distances of all pairs of tasks that is weighted by their bandwidth, The constraints (2) map only 1 task to each core, The constraints (3) consider only 1 core to each task, The constraints (4) force the x variables to take value of 0 or 1.
In this study the concepts of QAP are used in three meta-heuristic algorithms for task to core mapping in NoCs.
In this section, three meta-heuristic algorithms are presented to design the layout of NoCs. The first algorithm uses a genetic algorithm (GA) which is hybridized by a simulated annealing algorithm (SA) while the objective function is calculated using crisp values of bandwidth and distance. This algorithm is called hybrid GA (HGA). The second algorithm applies GA and SA such as the first algorithm but its objective function is calculated by fuzzy logic and fuzzy sets for bandwidth and distance, so, it is called fuzzy hybrid GA (FHGA). The third algorithm also uses GA and SA same as the previous algorithms with a bi-criteria objective function. This function targets to minimize the sum of weighted distances of tasks and the number of routers using fuzzy logic and fuzzy sets for bandwidth, distance and number of routers. This method is called bi-objective fuzzy hybrid GA (BFHGA).
Hybrid GA (HGA)
GA is one of the population based evolutionary algorithm that is widely used to solve combinatorial optimization problems [9, 37]. GA starts with a population of solutions (chromosomes). First p% (randomly selected) of the solutions (worst solutions of the population) are randomly reproduced and evaluated. Then, one/some of the remaining (1 - p)% of the solutions are selected as parents to generate a new solution (named child or offspring) using crossover, mutation and other operators. The new solution is evaluated and the worst solution among the population is replaced by the new solution if the new one has better objective function than the worst solution. Selection of parents, generation of the child, evaluation of the child and its replacement with the worst solution of the population (if happens) are together considered as one iteration of the GA. Then, the GA is terminated after a limited number of iterations and the best solution among the final population is introduced as the solution of the algorithm.
SA algorithm is a single solution based meta-heuristic method which is also widely used to solve combinatorial optimization problems [11, 38]. The SA algorithm starts with a single initial solution (which is also considered as best solution) and an initial temperature (a positive real number, e.g. 100). Then, for a limited number of iterations a neighbor of the initial solution is generated. In the SA algorithm, swap operator is used to generate the neighbor solution. In each iteration the initial solution and the best one are replaced by the generated neighbor solution if the neighbor solution has better objective value than the best one. If not, the neighbor may be accepted by a random generated probability as just the initial solution. Otherwise, the initial and the best solutions are unchanged. Whether it is changed or not, the initial solution is the input of the next iteration. After all iterations of the initial temperature are done, the procedure continues by cooling down the initial temperature until a predefined final temperature is met. In each subsequent temperature, the same limited number of iterations as the number of iterations of the initial temperature, are performed to improve the initial and best solutions. At the end, the output of the SA algorithm is the last saved best solution.
Now, the GA is hybridized by the SA algorithm and named HGA. In each iteration of the GA a child is generated by a crossover operator. The generated child is considered as the initial solution of the SA algorithm to be improved if possible. Then the improved solution is sent back to the population of the GA to be replaced with the worst solution among the population. The HGA continues until all iterations of the GA are done. Note that in each iteration of the GA, the SA algorithm is performed. Figure 1 illustrates the proposed HGA algorithm.
Solution representation
Each solution in the proposed HGA is represented as an array including positive integer numbers as shown by Fig. 2. The number of tasks and assigned cores are represented by the value and index of each entry of array, respectively. For example in Fig. 2 task 1 is assigned to core 10, etc. where there are 12 tasks that are assigned to 12 cores.
GA and SA operators
A Partial Mapped Crossover (PMX) [7, 39] is used as the operator of the GA part of the HGA. In the PMX, two solutions from the population are selected by the selection mechanism and considered as parents. Then as shown by Fig. 3, a single solution (child) is generated and considered as the initial solution of the SA algorithm of the HGA.
On the other hand a double swap operator is used to generate a neighbor solution in the SA algorithm of the HGA as depicted by Fig. 4.
Selection mechanism of the GA
To perform the PMX operator of the HGA, two solutions from the population are selected (noted by x1 and x2 in the flowchart of HGA) by applying the following steps,
This method is called Rolette-Wheel [9, 37] selection mechanism which gives more chance to the better solutions of the population to be selected for child generation.
The HGA introduced in Section 4.1 is modified by a fuzzy-based objective function and called fuzzy-based HGA (FHGA). All steps of the FHGA are the same as the HGA except for the evaluation of solutions. In the FHGA, the solutions are evaluated by fuzzy sets and logics. This method of evaluation considers fuzzy sets for bandwidth and distance and assigns a membership function to each of them for any generated solution of the FHGA. Then fuzzy rules are defined to determine fuzzy cost of mapping based on the fuzzified bandwidth and distance values of each pair of cores. Then the obtained fuzzy mapping cost is defuzzificated. The defuzzification is done for all pairs of cores of the NoC. This method of evaluation is detailed in the following sub-sections.
Basic definitions of fuzzy sets and numbers
Fuzzy theory [40] helps many real-world problems to be modeled mathematically. Fuzzy theory later was developed by introducing more concepts and definitions by [40, 41]. Here, some initial definitions of fuzzy numbers and sets are presented.
Fuzzy set is also called a normal fuzzy set if and only if,
Fuzzy numbers can be used to model linguistic variables which are used by decision maker. In this paper, triangular fuzzy numbers are used to model bandwidth and distance of cores in NoCs as fuzzy linguistic variables. Seven and six linguistic variables are considered to depict the range of bandwidth and distance, respectively, as follows,
Fuzzy linguistic variables for bandwidth:
Zero (Z), Very Small (VS), Small (S), Medium (M), Large (L), Very Large (VL), High (H)
Fuzzy linguistic variables for distance:
Very Small (VS), Small (S), Medium (M), Large (L), Very Large (VL), High (H)
As NoCs differ in topology, number of cores and the bandwidth values, the topology used for each NoC has different distance matrix of cores. Therefore, different MF, lower bound, upper bound and most likely values are considered for bandwidth and distance of cores in NoCs. In this subsection, the linguistic variables proposed for VOPD benchmark are presented by Figs. 6 and 7 (This benchmark later will be used in computational experiments section of the paper). MF, lower bound, upper bound and most likely values of each linguistic variable are proposed based on the bandwidth and distance matrixes of the benchmark. Therefore, for other benchmarks, because of their topology and bandwidth matrix, MF, lower bound, upper bound and most likely values of each linguistic variable will not be the same as VOPD benchmark.
Fuzzy linguistic IF-THEN rules
The following relations are considered for a pair of cores in a NoC [5], Relation 1: Considering a fixed value of bandwidth between a pair of cores, if the distance of cores is increased, then, power consumption and latency of the NoC are increased. Relation 2: Considering a fixed distance between a pair of cores, if the bandwidth of cores is increased, then, power consumption of the NoC is increased. Relation 3: Considering a fixed distance between a pair of cores, if the bandwidth of cores is increased, then, latency of the NoC is decreased.
Based on the above relations, a new criterion named “Cost of Mapping” (CoM) is defined for each pair of cores in NoC which is depended on distance and bandwidth of the cores. Conceptually, CoM reflects power consumption and latency together. From the above-mentioned relations it is concluded that if a pair of tasks with high bandwidth are mapped to the cores having less distances, the tasks will have less CoM. Then, the following fuzzy linguistic variables for CoM is defined based on Fig. 8. MF, lower bound, upper bound and most likely value of each linguistic variable are proposed in Fig. 8 are the same for all benchmarks of NoC.
Fuzzy linguistic variables for CoM:
Zero (Z), Very Small (VS), Small (S), Medium (M), Large (L), Very Large (VL), High (H)
Then, the following fuzzy linguistic IF-THEN rules are extracted from the above relations to obtain CoM of each pair of cores on NoCs.
Rule 1: IF bandwidth is Zero AND distance is Very Small THEN CoM is High.
Rule 2: IF bandwidth is Zero AND distance is Small THEN CoM is High.
Rule 42: IF bandwidth is High AND distance is High THEN CoM is High.
Table 1 shows CoM for all possible rules which have been extracted from intersecting a row and a column. It is worth mentioning that these rules are applicable to all NoC benchmarks.
Defuzzification of CoM
Each generated solution of the FHGA is evaluated by the fuzzy rules mentioned in Table 1. Objective function value in each generated solution of FHGA will be the sum of defuzzificated values of CoM obtained for all pairs of cores of a NoC by the following formula,
Then the fitness value of each generated solution is calculated by Equation (5).
Defuzzification procedure converts the fuzzy linguistic rules to quantifiable values (crisp values). The fuzzy output of the linguistic rules can be defuzzified by one of the following methods which are more common than the other defuzzification methods in the literature of fuzzy theory. Center-of-Gravity method (CoG): This method calculates the geometrical center as defuzzified value by applying the rule with the output related to the greatest area. Mean-of-Maxima method (MoM): This method considers the value having maximum MF as defuzzified value.
Although MoM is easier to apply, CoG has more efficiency to be applied for defuzzification [23, 28]. In this paper CoG method is applied to defuzzify fuzzy linguistic CoM values to obtain objective function of each generated solution of the FHGA. As the following example shows, each pair of cores (of a generated solution in the FHGA) on a NoC has fuzzy linguistic values of bandwidth and distance. Each crisp value of bandwidth and distance are associated to two fuzzy sets, therefore, CoM of a pair of cores is related to four of the rules shown by Table 1. Considering MF of the crisp values in each associated fuzzy set, the formula for calculating defuzzified value of CoM for a pair of cores is as follows,
Where MLV i is the most likely value of the fuzzy set of CoM in associated rule i.
MF (M, 313) = 0.244 and MF (L, 313) = 0.756 MF (VL, 3.16) = 0.862 and MF (H, 3.16) = 0.138
Based on the two fuzzy sets associated to bandwidth of 313 (M and L) and distance of 3.16(VL and H), the following four rules from Table 1 are extracted, Rule 1: IF bandwidth is M, AND distance is VL, THEN CoM is VL Rule 2: IF bandwidth is M, AND distance is H, THEN CoM is H Rule 3: IF bandwidth is L, AND distance is VL, THEN CoM is H Rule 4: IF bandwidth is L, AND distance is H, THEN CoM is H
Then, MF values of the rules are obtained as, Rule 1: MF (VL, CoM) = min { MF (M, 313) , MF (VL, 3.16) } = min { 0.244, 0.862 } = 0.244 Rule 2: MF (H, CoM) = min { MF (M, 313) , MF (H, 3.16) } = min { 0.244, 0.138 } = 0.138 Rule 3: MF (H, CoM) = min { MF (L, 313) , MF (VL, 3.16) } = min { 0.756, 0.862 } = 0.756 Rule 4: MF (H, CoM) = min { MF (L, 313) , MF (H, 3.16) } = min { 0.756, 0.138 } = 0.138
Considering the most likely values (MLV) of 50, 60, 60 and 60 for rules 1 to 4 respectively, the deffuzufied value of CoM for the pair of cores of the example is calculated as,
The HGA and FHGA are modified in this section by a fuzzy-based bi-criteria objective function named bi-objective fuzzy-based hybrid GA (BFHGA). All steps of the BFHGA are the same as the HGA except for evaluation of solution. Each solution generated by the BFHGA, is evaluated by a fuzzy bi-criteria objective function which consists of two criterion, CoM and cost of router (CoR). These cost functions are calculated by fuzzy linguistic IF-THEN rules and then the weighted sum of CoM and CoR is considered as the bi-criteria objective function of the solution.
CoM objective function
CoM of each generated solution of the BFHGA is exactly the same as the CoM used for the FHGA method with the same input data (bandwidth and distance), fuzzification method of input data, fuzzy linguistic IF-THEN rules and deffuzification method. Therefore, a common generated solution by the BFHGA and the FHGA will have the same CoM value.
CoR objective function
As mentioned in the introduction and literature review sections, number of router resources has great impact on power dissipation. Thus, proposed approach reduces the power consumption due to number of router reduction in NoCs. For each generated solution of the BFHGA, CoR is calculated from fuzzification and defuzzification of the minimum required number of routers for that solution. The following steps are applied to calculate the crisp value of the CoR for each generated solution of the BFHGA.
Like the HGA and the FHGA, layout of NoC of a generated solution in the BFHGA, contains a set of tasks which are assigned to a set of predetermined cores. So, the following initializations are considered when mapping is done. At the beginning, a router is assigned to each core on the NoC. Then, each core and its router are labeled by a number. Therefore, the following sets are defined,
As each positive element of the bandwidth matrix is a connection between a pair of tasks, therefore, a connection of a pair of tasks represents the connection of their associated cores. So, all connections of cores on the NoC are labeled by numbers and are collected by the following set,
The set of connections that serve each core is collected by,
The set of connections that can be served by each router is defined by,
The aim of this step is to minimize the number of routers in order to serve all connections on the NoC. Therefore, just the minimum necessary number of routers which are able to handle all connections, remains and the other unused routers will be eliminated. To approach this goal, the following set covering problem is applied.
where, Variable y
j
for router j can take values of 0 or 1 meaning to remove or to keep the router on the NoC. Set CNR for each core is defined as,
Constraints (22) in existence of the minimization objective function (21) guarantee that each core which is a part of at least one connection, is connected to at the maximum one router. Considering constraints (22), the objective function (21) minimizes the number of routers.
The minimum number of routers obtained by Step 2 is fuzzified in this step. The fuzzification chooses the best option among last steps solution. As mentioned in the literature of NoC architecture [6]. Although, an NoC can have even one router, this topology is not appropriate in terms of performance and energy. Therefore, by optimizing the number of router we can achieve better performance and energy as well. To approach this goal, the number of routers and CoR of each NoC is fuzzified by the following linguistic variables,
Fuzzy linguistic variables for number of routers:
Zero (Z), Very Small (VS), Small (S), Medium (M), Large (L), Very Large (VL), High (H)
Fuzzy linguistic variables for CoR:
Zero (Z), Very Small (VS), Small (S), Medium (M), Large (L), Very Large (VL), High (H)
The linguistic variables proposed for benchmark VOPD is presented by Figs. 9 and 10. MF, lower bound, upper bound and most likely value of each linguistic variable are proposed based on the number of tasks of the benchmark.
For each NoC, the highest upper bound among the CoR linguistic fuzzy sets can be calculated based on Equation (25). In this equation, the bandwidth of each connection and its distance can be approximated by the most likely value of its medium (M) linguistic fuzzy set of bandwidths and distance, respectively.
For the VOPD benchmark as it has 13 connections,
Then, the following fuzzy linguistic IF-THEN rules are used to obtain the CoR for the optimum number of routers of the NoC.
Rule 1: IF no. of routers is Zero THEN CoR is High.
Rule 2: IF no. of routers is Very Small THEN CoR is Very Large.
Rule 7: IF no. of routers is High THEN CoR is High.
All possible rules which are applied to VOPD benchmark are proposed in Table 2. For any other benchmark, fuzzy linguistic sets and rules are obtained by the same method.
Then, the linguistic value obtained for CoR is defuzzified by COG method (which used to defuzzify CoM) to calculate the crisp value of CoR.
The bi-criteria objective function for each generated solution of BFHGA is the weighted sum of defuzzificated values of CoM and CoR shown by the following formula which is used to obtain Pareto optimal solution,
As mentioned above, the aim of the HGA and FHGA is to reduce CoM to improve the power consumption in NoCs. In other words, these two methods are trying to find the task to core mapping with minimum CoM. On the other hand, the proposed BFHGA considers two objectives named reducing CoM and CoR simultaneously. Due to the fact that the BFHGA compromises the reduction of CoM to decrease CoR, its improvement in CoM might not be as good as reduction of CoM in two other methods, the HGA and FHGA which focus just on reduction of CoM. This issue arises because the CoR of each solution is not depended to its CoM but it is highly depended to the topology.
Computational experiments
The HGA, FHGA and BFHGA algorithms were coded in Matlab to run on a machine with an Intel Core 2 Duo, processor with 2.53 GHz and 4 GB RAM. Five well-known benchmarks from NoC literature were selected to be solved (designed) by the algorithms. Taguchi experimental design method is applied to select the best level of the parameters for the algorithms. Final runs of the algorithms are done using the best obtained levels of the parameters. The aim is to obtain the best solution of each benchmark by each algorithm. Eventually, the best solutions obtained by proposed algorithm for each benchmark are implemented to evaluate the efficiency of this algorithm for the specific benchmark. The implementation method is detailed inSection 5.4.
Benchmarks
Five multimedia benchmarks are selected to perform the computational experiments on the introduced algorithms. Each benchmark contains some tasks and a matrix representing the bandwidth between each pair of the tasks. In Table 3, the characteristics of the multimedia benchmarks such as H.263 video encoder, H.263 video decoder, MP3 audio encoder [14], Video object plane decoder (VOPD) and Multi-window display (MWD) [42] are depicted. The number of different tasks varies from 8 nodes to 15 which are connected by edges ranging from 11 to 19.
Parameter tuning
Parameters of any meta-heuristic algorithm can take different levels. In the case of several parameters in meta-heuristic algorithm, determining the most efficient level of each parameter plays a significant role in achieving to the best solution [7].
Taguchi experimental design [43] is a robust method to set the level of parameters by using orthogonal arrays to design the experiments in such a way that the number of experiments decreases. Finally, the output of each experiment is transformed to a signal-to-noise ratio (S/Nratio) which calculates the amount of variation of the response parameter. The objective of Taguchi method is to minimize the amount of |S/N|. As mentioned in [9] and [44], the S/Nratio of minimization type problems is obtained by:
In this study the HGA is selected to perform Taguchi method by considering four potential levels for each of its parameters. To find an appropriate orthogonal array for the HGA, the total degree of freedom of its parameters is needed. Since five parameters exist for four levels, three degrees of freedom are considered for each level. Considering one degree of freedom for the total mean, there will be totally sixteen degrees of freedom (5 × 3 +1 = 16). Therefore the suitable orthogonal array for the HGA has at least 16 experiments. The orthogonal array considered for experiments of the HGA is L16 (45) where there are five parameters with four levels. The orthogonal arrays of Taguchi design for the HGA is shown in Table 4. In each experiment, the HGA runs with a combination of the parameters mentioned as follows. This table also reflects the levels considered for the parameters.
VOPD benchmark is selected to perform the experiments designed by Taguchi method. To obtain more reliable results, each experiment runs three times and the average is used in the calculations. The best level of parameters introduced by Taguchi method is shown in Table 5 and the first best levels of the parameters are applied for the HGA, FHGA and BFHGA in all benchmarks.
The proposed algorithms are set by the levels of their parameters obtained by the Taguchi method of the previous section for each benchmark, proposed algorithm runs for 10 times. For each benchmark the same fuzzy sets and linguistic rules are used in the FHGA and BFHGA. Also for the BFHGA, the α is set to be 0.5 for all benchmarks. The results of the experiments are illustrated in Table 6.
For each benchmark, the solution with minimum objective function value is considered to be implemented for further analysis in the next sections.
Experimental results of implementation
The proposed methods have been assessed by various multimedia benchmarks as mentioned in Section 5.1 which is implemented in 65 nm technology. Benchmarks are evaluated in terms of power and performance. The power consumption of the NoC consists of power consumed by links and routers [5]. Self and coupling capacitances on the wire and between adjacent wires are considered as link power dissipation. The power of the links is determined by , where ∝is the switching activity on the link, C is the link and coupling capacitance, f is the clock frequency and V dd is the power supply of the system. Frequency is 500 MHz according to the critical path of the system and based on the International Technology Roadmap for Semiconductors [45], V dd is considered to 1 Volt. The length of the metal wires is 2 mm while the self and coupling capacitance of the wire links are 0.2 pF/mm and 0.6 pF/mm, respectively. The NoC infrastructure is implemented in VHDL and Modelsim is utilized to assess the transitions of wires. Power compiler from Synopsys 1 [1]Synopsys and Modelsim are registered trademarks. is utilized to evaluate the power of the routers. Power compiler considers the static and dynamic power consumptions. Also, uniform distribution is used to send packets between the routers.
Power consumption evaluation
To evaluate the impact of the proposed methods on power reduction, a regular mesh-based NoC is considered as a baseline. Figures 11–15 present the communication trace graph, non-optimized and optimized layouts for VOPD benchmark. Node description of this benchmark is depicted in Table 7. In Fig. 11, the edges of the graph are denoted with bandwidth demand in Mb per second. Figure 12 shows a non-optimized layout for VOPD, while optimized layouts obtained by the HGA, FHGA and BFHGA are presented by Figs. 13–15, respectively. In Figs. 12–15 the white rectangular represents core and darkened squares indicate the routers in NoC.
As mentioned, the number of router resources is one of the most important factors in power dissipation in NoC architectures. According to this fact, we proposed a fuzzy approach to find the optimum number of routers in NoCs along with the optimum mapping mentioned in previous section. Figure 15 shows the custom topology with optimum number of routers for
We compare the power dissipation of non-optimized layout of the NoCs with the optimized layout by applying the HGA and FHGA. In these methods the number of routers is same and router reduction phase is not yet applied. In the other word, each core has one router. As shown in Fig. 16, the HGA and FHGA, on an average, improve the power consumption 49% and 60%, respectively. Figure 16 is normalized to the highest power consumption of each benchmark. The BFHGA is also proposed to obtain the best mapping and find the optimum number of router in NoC architectures. As mentioned, router recourses have a colossal effect on power consumption in NoCs. Hence, it is evident that by decreasing the number of routers the power dissipation decreases as well. The improvement on power consumption using BFHGA is depicted by Fig. 17. BFHGA improves the power consumption, on an average, 73% compared to the FHGA where router reduction phase is not yet applied.
Performance evaluation
In this section, we evaluate the effect of proposed meta-heuristic algorithms on performance of NoCs. Figure 18 shows the variation in normalized latency for different benchmarks. Comparing to non-optimized layout, experimental results show that on average 34.6%, 39.7% and 38.8% improvement in latency of the HGA, FHGA and BFHGA are achieved, respectively. However, there are some exceptions in the BFHGA that is due to effect of router reduction phase. Since the goal of BFHGA is to reduce the number of routers and task to core mapping as well, BFHGA in some cases shows higher latency compared to HGA and FHGA which are targeted just to map the tasks to the cores. That is, BFHGA is trying to decrease the number of router to improve the power consumption and this router reduction in some cases has a trivial side effect on latency; however, BFHGA’s power saving can compensate the latency increment of this method in such a way that it can end up with improvement in energy consumption of NoCs. Table 8 depicts the area and the number of routers for different benchmarks before and after applying BFHGA. In this table, the second column denotes the area used by non-optimized layout and the proposed methods excluding BFHGA. In the third column area of NoC for BFHGA is presented. As it can be observed, after decreasing the number of routers, on average, 75% improvement in area is achieved. Fifth column gives the number of routers in baseline, HGA and FHGA while sixth column presents the number of routers after applying BFHGA.
The area consumption for non-optimized layout, HGA and FHGA are the same due to equal number of routers used in their topologies. In the other word, in these methods the number of routers is not optimized and consequently the network’s area remains unchanged.
Energy dissipation evaluation
The presented results in Fig. 18 demonstrate that in some cases the latency in the BFHGA insignificantly increased compared to the FHGA due to router reduction phase. The results show that this increase in latency can be compensated by improvement in power consumption and in turn, the total energy of NoCs can decrease.
As shown in Table 9, compared to FHGA where the number of routers is not optimized, the BFHGA on average consumes over 73% lower energy. The reason for this improvement is routers reduction which plays a key role in energy dissipation in NoCs.
Concluding remarks
In this paper a novel fuzzy meta-heuristic technique is presented to reduce the power dissipation in NoC architectures. Our contribution consists of three phases. In the first phase, a meta-heuristic mapping approach is proposed to map the tasks to the cores based on the bandwidth and distance among the cores. In this algorithm, cores with higher bandwidth are assigned to tasks with lower distance and vice versa. Not only does this algorithm reduces the power consumption, but also improves the performance of NoCs. Although the proposed method decreases the power consumption and improves the performance, there is still enough room for improvement by making a better decision in the evaluation phase. To address this issue, in the second phase, the fuzzy logic system is proposed to make a better decision in the meta-heuristic approach. After employing the fuzzy logic rules, the results show that the power dissipation decreases and performance improves compared to the initial meta-heuristic algorithm. The last phase is router reduction along with optimum mapping using a fuzzy logic mechanism. The results reveal that there is a significant improvement in power and energy consumption in the proposed method over the non-optimized layout, meta-heuristic mapping algorithm and fuzzy-based meta-heuristic mapping approach.
Footnotes
1
Synopsys and Modelsim are registered trademarks.
Acknowledgments
We would like to express our sincere thanks to Dr. Dara Rahmati because of letting us use Persian tool as a NoC infrastructure as well as editors and anonymous referees for their helpful comments and suggestions that helped to improve the quality of this paper.
