Abstract
This study presents a high-level simulator for Network-on-Chip (NoC), designed for many-core architectures, and integrated with the PlatEMO platform. The motivation for developing this tool arose from the need to evaluate the behavior of application mapping algorithms and the routing, both aspects are essential in the implementation and design of NoC architectures. This analysis underscored the importance of having effective NoC simulators as tools that allow for studying and comparing various network technologies while ensuring a controlled simulation environment. During this investigation and evaluation, some simulators, such as Noxim, NoCTweak, and NoCmap, among others, offered configurable parameters for application traffic, options to synthetically define topology and packet traffic patterns. Additionally, they include mapping options that optimize latency and energy consumption, routing algorithms, technological settings such as the CMOS process, and measurement options for evaluating performance metrics such as throughput and power usage. However, while these simulators meet detailed technical demands, they are mostly restricted to analyzing the low-level elements of the architecture, thus hindering quick and easy under- standing for non-specialists. This insight underscored the challenge in developing a tool that balances detailed analysis with a comprehensive learning perspective, considering the specific restrictions of each simulator analyzed. Experiments demonstrated the proposed simulator efficacy in handling algorithms like GA, PSO, and SA variant, and, surprisingly, revealed limitations of the XY algorithm in Mesh topologies, indicating the need for further investigation to confirm these findings. Future work will expand the simulator functionalities, incorporating a broader range of algorithms and performance metrics, to establish it as an indispensable tool for research and development in NoCs.
Introduction
The increasing demand for processing power in fields such as Deep Learning [1] and Big Data has driven chip manufacturers to develop specialized architectures and graphics cards, including GPUs, superscalar, multi-core, and many-core processors [2]. These advancements aim to enhance the integration of processing cores for parallel and high-performance com-puting [3]. Many-core processors [4] offer greater scalability than multi-core systems, since they are not constrained by bus architecture limitations in the core communication [5].
One of the fundamental communication technologies for multi-core systems, which meets their design requirements in terms of performance and efficiency, is the Network-on-Chip (NoC) [6, 7]. This architecture provides a scalable and efficient solution for exchanging data between components integrated into a System-on-Chip (SoC). It consists of three main blocks: links, routers, and network interfaces (NIs). Links are sets of wires that physically connect nodes, enabling communication. They can include logical or physical channels and implement synchronization protocols between source and destination nodes [8].
Routers have multiple input and output ports, a switching matrix to connect these ports, and a logic block that implements flow control, routing, arbitration, and buffering policies. These elements ensure the efficient movement of data packets across the network. NIs establish the logical connection between the IP cores and the network, allowing the separation between computation and communication. They adapt the data generated by Processing Elements (PEs) to the appropriate format for transmission in the NoC [9].
In this context, topologies can be direct, such as mesh (grid) and torus, where each router is directly associated with a processor, or indirect, where some routers are used only to propagate messages. The choice of topology influences the efficiency and complexity of communication. The design flow follows a top-down approach, starting with the functional and performance specification, proceeding through hardware resource allocation, definition of arbitration schemes and quality of service (QoS) control, and culminating in the simulation and optimization of the final architecture [10].
These approaches are essential for managing the increasing complexity of modern chips, ensuring efficient communication, low latency, and high bandwidth between the components integrated in an SoC [11]. As illustrated in Fig. 1, the diagram provides a visual representation of a generic NoC architecture, highlighting the PE, NI, R, and the Links that connect them.
Diagram illustrating a generic NoC (Network-on-Chip) architecture. The figure shows Processing Elements (PE), Network Interfaces (NI), Routers (R), and the Links connecting them.
Examples of metrics, in this scenario, include latency, energy, bandwidth, and fault tolerance. Research interest is also growing, as observed in recent literature, in the development of metaheuristics [12] or other algorithms, such as the dynamic neural model [13] and the harmony search algorithm [14], which demonstrate how these approaches can optimize essential metrics like latency and energy consumption, crucial for the efficient performance of NoCs [15, 16].
Additionally, the water drop algorithm [17], the spiral dynamics algorithm [18], and simulated annealing and its variants contribute to balancing multiple operational parameters, including bandwidth and reliability [19]. Approaches such as the intelligent bacteria foraging algorithm [20] and the monkey spider optimization [21], along with the use of particle swarm optimization for scheduling problems [22], significantly advance research in task mapping optimization [23] within NoCs, directly addressing the complex challenges discussed earlier [24].
Also, another concern, is the importance a NoC simulator: NOC simulator, so, a high-level simulation environment is necessary, containing an abstract and flexible description of the NoC, allowing even novice users to develop such algorithms quickly and practically to validate their ideas. However, a literature review revealed that many existing simulators do not fully meet the requirements for developing application mapping techniques for the NoC without concerning low-level architectural details [25]. Thus, the main simulators found deal more with low-level aspects and offer few options for application mapping implementations [26]. Although they allow the creation of customized algorithms, they have not proven to be practical and simple, requiring comprehensive knowledge of programming languages and operating systems [27].
Among the most popular projects cited in scientific publications are: NoCTweak [28], Noxim [29], Nirgam [30], Nostrum [31], BookSim [32], NoCmap [33, 34], and Orion3.0 [35]. Their implementations address metrics such as energy consumption, data throughput, latency, reliability, communication delay, router energy consumption, link energy consumption, and router area.
Therefore, it is interesting to use these simulators to create intelligent mapping algorithms without needing to worry about hardware details. The algorithms that can be implemented include evolutionary algorithms with single-objective optimization for task mapping represented by graphs. Thus, in this current paper, the PlatEMO tool is basis for this kind of development, as it has such algorithms (GAs, SAs, etc.) and allows users to customize them. The reliability of implementations in PlatEMO can be trusted, as there are benchmarks tested by various researchers who have used PlatEMO and attest to its efficiency. In this article, only an abstraction of an architecture is executed sequentially; thus, the idea can be explored, after obtaining mapping results, in an FPGA-implemented architecture. This was not done in the present article and should be tested in future work.
Finally, this paper is is organized as follows: related studies present in the literature are described in the Related Works section. The methodology used, including the simulator structure and problem presentation, is introduced in the Methodology section. Experimental results and statistical analysis are discussed in the Experimental Results section. Finally, the conclusion and ideas for possible future work are discussed in the Conclusion section.
This Section reviews some basic concepts and a series of studies, in the literature, related to the simulators discussed previously. It underscores the significance of these tools and highlights the continuous need for methodological adaptations to meet evolving research demands, emphasizing a dynamic field where innovation and diversification are paramount.
Many-core architectures
Many-core architectures offer significant opportunities in computer vision and machine learning using deep neural networks (Deep Learning). They enable the simultaneous processing of multiple tasks, enhancing performance and energy efficiency in data-intensive applications [36, 37].
Thus, one example of these type of architecture is the Larrabe that illustrates the potential of many-core architectures in visual computing. With multiple x86 cores, vector processing units, and fixed-function logic blocks, it provides high performance in graphics tasks and parallel processing, supporting a wide range of applications and workloads [38]. In network processing applications, the Raw architecture demonstrates significant improvements by dynamically clustering cores, optimizing workload distribution.
This approach highlights the adaptability of many-core architectures to meet specific efficiency and performance demands in network communications [39]. Furthermore, parallel sorting algorithms developed for fine-grained many-core processor arrays demonstrate how computationally intensive tasks can be optimized to achieve superior performance with lower energy consumption, underscoring the critical role of these architectures in technological advancements across various domains [40].
Network-on-chip
Task mapping in NoC systems plays an important role in optimizing communication costs, energy consumption, and latency by assigning tasks to specific cores in a way that maximizes performance and minimizes power consumption and communication delays.
Regarding this theme, Darbandi et al. propose a task mapping method for NoCs using a multi-objective particle swarm optimization algorithm based on crowding distance (MOPSO-CD). This approach reduces communication latency by 1.1% compared to genetic algorithms and 0.5% compared to PSO, while improving communication cost by 2.7% and 0.16%, respectively [41].
Another research paper, in this scenario introduces an improved ant colony algorithm (I-ACO) for task mapping in NoC systems, which adapts to different task scales to reduce communication power and network latency. Experimental results, in this paper show that the I-ACO reduces communication power by up to 10.8% and network latency by up to 13.4% compared to traditional algorithms [42].
Finally, paper, on the same subject presents WAANSO, a framework for efficient task mapping in many-core systems (MCS) using Wavelet Clustering and Ant Swarm Optimization (ASO). WAANSO significantly improves energy efficiency by 19% and performance by 65.86% compared to baseline methods, including DPSO, ACO, and DBSCAN, based on experiments conducted on a 64-core system [43].
State of the art in simulator usage
ThIn the following, some important simulators found in the related literature are described in a short way. Table 1, adapted from [26], presents key characteristics that compare the discussed simulators, aiding in understanding their relative advantages and limitations.
NoC simulator features
NoC simulator features
NoCTweak Simulator: Demonstrating remarkable versatility, the NoCTweak simulator has been adapted for various applications in NoC studies. For instance, in [44] adapted NoCTweak to support clustering and incorporated multiple routing algorithms and configurable parameters. Another study [45] used it as a base to implement the Andean Condor Algorithm for application task mapping. Further enhancing its utility, [46] integrated NoCTweak into a meta-heuristic mapping approach using the Hybrid Gravitational Search Algorithm (HGSA) to optimize the allocation of applications.
NoCMap Simulator: Employed in [47] to propose a low-energy consumption mapping method using a Modified Genetic Algorithm (MGA), NoCMap has also been pivotal in evaluating the BEMAP algorithm, which aims at optimizing application mapping in chip networks to reduce energy consumption and improve latency [48].
Noxim Simulator: The Noxim simulator has facilitated the validation of various algorithms such as FTMAP, FTTG, and K-FTTG across different NoC topologies [49]. Additionally, innovative algorithms like TGDFA and CMDFA were implemented by Parvathi [50] to generate irregular topologies and optimize core mapping in multimedia applications. A revolutionary optical router for 3D chip networks was also evaluated using Noxim, which demonstrated significant reductions in insertion loss [51].
Orion Simulator: As a comprehensive simulation tool, Orion estimates energy and performance in NoCs across multi-core and many-core architectures. From its inception [52] throughout subsequent versions [53, 54], Orion has been extensively used to provide foundational calculations for many NoC studies.
Nirgam Simulator: The Nirgam simulator has played a essential role in various NoC studies, analyzing performance parameters with different traffic patterns [55]. The Encircle Routing (ER) algorithm, tested with NIRGAM, effectively balanced network traffic, reducing latency and congestion [56]. A hybrid selection strategy based on traffic analysis has also been evaluated using Nirgam, showing marked improvements in packet delay, throughput, and energy consumption [57].
Nostrum Simulator: In the realm of multimedia applications, Nostrum has been utilized to optimize and reconfigure NoC platforms, providing essential metrics for dynamic networks [58]. However, further investigation is required to clarify its explicit use in certain studies, as highlighted by [59],who discuss the Nostrum communication protocol without clear references to experimental applications.
Tool Structure Overview. (1) Parameter Configuration Module: Sets simulation parameters. (2) Database Conversion Module: Transforms input data into a compatible format. (3) Packet Generation Module: Creates abstract packets for simulation based on task graphs. (4) Router Structure Module: Defines the router logic and data flow. (5) Metrics Configuration Module: Establishes the metrics for performance evaluation. (6) Results Generation Module: Compiles and presents the outcomes of the simulation.
BookSim Simulator: A machine learning-based framework for predicting NoC performance using advanced techniques like support vector regression and artificial neural networks, enhancing analysis speed and accuracy over traditional methods, including the BookSim simulator [60]. The LPNet model further advances these capabilities using deep neural networks (DNN) for latency prediction in NoCs, integrated with data from analytical models and BookSim, achieving lower prediction errors and substantially faster analysis [61].
This review delineates the current capabilities of these simulators and sets the stage for the innovative methodologies proposed in this study, aimed at advancing the state of the art in NoC research.
This section outlines the architecture of the High-Level Simulator for Network-on-Chips (NoCs), referred to as the NoC Simulator. Figure 2 shows the structure of this simulator, which includes two primary components: the NoC architecture representation and the integration with the PlatEMO platform. Additionally, the simulator is designed with several complementary modules to enhance its functionality and usability:
Environment Configuration Module: This module allows users to set various parameters for the many-core architecture, such as the number of rows, columns, grid dimensions, topology type, and the number of solutions, evolutions, and executions. Conversion Module: It transforms input data sets into the matrix format utilized by the simulator, facilitating data handling and processing. Abstract Packet Generation Module: Following the data structure created by the Conversion Module, this module generates abstract packets that mimic Directed Acyclic Graph (DAG) patterns of arbitrary complexity, suitable for diverse simulation scenarios. Router Structure Module: Includes all necessary tables and fields for NoC abstraction, enabling the simulation of various routing behaviors and strategies. Metrics Configuration Module: Focuses on generating and configuring key performance metrics, in this instance, a simplified total energy cost metric is used. Results Consolidation Module: Integrates and consolidates the outcomes of the simulations based on the configured metrics, providing a comprehensive overview of performance and efficiency.
Each module is specifically designed to contribute to the holistic simulation environment, offering flexibility and depth in configuring and executing NoC simulations. This modular architecture not only facilitates use and adaptability but also ensures that the simulator can be tailored to meet a wide range of research and development needs in the field of NoCs.
Furthermore, Figs 3 and 4 show an example of an application running on this architecture. They display the modules responsible for generating the architecture, which operate on the PlatEMO platform. This approach includes evolutionary algorithms, both single and not used in this current research, employing genetic input parameters to analyze the task mapping problem for a specific application. Central to this process is the chromosome structure, which is the key to mapping tasks onto processing elements.
Within the simulator, each position in the chromosome, denoted as PEi,j, corresponds to a specific PE; “i” indicates its location within the architecture, and “j” may represent a specific characteristic or dimension of this PE. Tasks, labeled as Ti,j, are mapped to these PEs, with “i” identifying the task’s unique index and “j” specifying the particular PE to which it is assigned. This vectorized representation facilitates a more streamlined mapping process and is crucial for the evaluation of the many-core architecture’s performance.
As an output, the simulator displays a map showing the optimized distribution of application tasks alongside the performance metrics of the studied many-core architecture. Moreover, the tool also supports importing task maps from other simulators, such as NoCTweak and NoCmap, thereby enhancing its inter-operability and flexibility in integrating various simulation environments and methodologies. This capability allows users to compare different mapping strategies and optimize system configurations based on comprehensive performance data, versatility, and applicability in various simulation scenarios.
Simulator Architecture Abstraction. Inputs: External Maps for application mapping importation and Chromosome Structure representing task allocations. Problem Evaluation: Parameters set within the PlatEMO platform, such as application graph, grid dimensions, population size, and algorithm settings. Outputs: Mapping Model displaying task placements on the grid and Metrics evaluating performance in terms of latency, bandwidth, energy, and fault tolerance.
Example of an application running in a simulation environment: A – Set of configuration and definitions; A1 – Table router structure’s fields; A2 – Packet format; A3 – Application graph; A4 – Matrix graph; A5 – Chromosome; A6 – Initial Population; A7 – Abstraction of the packet set; B – Iteration 0; C – Iteration 1; D – Iteration 3.
Following this, Fig. 4 illustrates an example of mapping an application graph onto the proposed NoC abstraction aimed at cost optimization. All configurations and definitions are pre-established, as depicted this figure, and the simulation is initiated to calculate costs. In this example, the set of configurations and definitions presented to the simulator (Module A) includes: the fields of the router table structure (submodule A1), the format of a given packet of said abstraction (submodule A2), an application graph containing four tasks (t1, t2, t3, t4) with specified bandwidths for each arc (15, 20, 10) (submodule A3), the matrix with data collected from the application graph (submodule A4), a chromosome detailing the mapping of each of the four tasks (submodule A5), the structure of the initial population (submodule A6), and the packet abstraction set (submodule A7), which includes the coordinates of the source and destination routers on the grid, the communication cost, and the packet generation instant defined as a timestamp.
Regarding the router structure, the following components are involved: a structured table containing the input buffers for each link in the directions (North, South, East, and West) of the grid. These buffers receive the abstraction of the amount of overhead data (submodule A7). This structure also records the communication cost transmitted on the referred link, along with a flag that accounts for whether the packet is delivered; if not, the value remains at zero. For the packet format (submodule A2), the subsequent definitions are set: source coordinates of the packet on the grid (Src(x,y)), destination coordinates (Dst(x,y)), bandwidth (Bw), representing the volume of data to be transmitted on the referred link, and the identification tag (Tag) of the packet generation instant.
Still concerning the router structure as illustrated in Fig. 4 (submodule A1), it includes the following elements: a cell (Buffer input ports) to manage the entry of packets and links (North, South, East, and West), a vector (Cost) to store the cost of each packet from the source router to the destination, and a vector (Dlv) to record the count of delivered packets. Given the simplicity of the task graph, the simulation and the calculation of the total energy cost are conducted over three iterations: Iteration 0 (Fig. 4B), Iteration 1 (Fig. 4C), and Iteration 2 (Fig. 4D), detailing in each the process of abstraction and sending/receiving packets on the grid and in the high-level structured table of the simulator.
Iteration 0: The grid and router structure (Algorithm 4 – A1) are abstracted; then, the packets (pkt1, pkt2, pkt3), which had already been generated by the tasks (t1, t2, t3), are placed in their respective destination router input buffers, as illustrated in Fig. 4B.
Iteration 1: In this step, the packets mentioned in iteration zero are effectively routed to their respective links, as per Figure 4C, while concurrently, the communication costs (15, 10, 20) are inserted into the vector (Cost).
Iteration 2: Iteration 2: During this final iteration, every packet mentioned previously is routed successfully to its designated destination router. The communication costs and the number of delivered packets are computed and aggregated to form the total energy cost and the total number of delivered packets. The structured table from Fig. 4D shows the results of this operation.
For a clearer understanding, Algorithm 4 presents a high-level pseudocode that demonstrates the logical sequence of the simulator, specifically for issues related to application mapping. It is structured in three parts: definition of global parameters, configuration of structures and components, and setting up of the simulation environment.
Global Parameters: At the beginning of the simulation, global parameters are set, which include the appGraph (Fig. 4, submodule A3), created by the software Task Graphs For Free (TGFF) [62]. The gridSize is determined by multiplying the number of rows and columns; the mutRate is adaptable, depending on the evolutionary algorithm used; and the routAlg, which initially includes five routing algorithms: West-First, XY, Negative-First, Odd-Even[63], and XYX[64]. The algSM encompasses single or multiple objective algorithms, available in the PlatEMO library or from custom implementations. Additional parameters such as the number of executions (nRuns), the size of the initial population (nInd), the count of evaluations for each individual (nEval), and the metrics (metrics) are also specified.
Structures and Components: The essential function matrixGraph converts the appGraph data into a matrix representing the application graph, as illustrated in Fig. 4 submodule A4. The routerStruct function uses matrixGraph and gridSize to generate the framework, abstracting the routers, depicted in Fig. 4 submodule A1. The chromStruct function generates a random chromosome based on matrixGraph and gridSize, depicted in Fig. 4 submodule A5). The popInit function produces a random initial population of individuals from the parameters chrom and nInd, depicted in Fig. 4 submodule A6. Lastly, the packetGen function creates a set of packets based on the routerStruct and chromosome structures and the defined format, depicted in Fig. 4 submodules A7 and A2, respectively.
The pseudocode detailed in Algorithm 3 outlines a class named mapProblem, serving as an expanded version of the PROBLEM within the PlatEMO simulation framework. The PROBLEM is the abstract base for all optimization problems, defining fundamental attributes such as population size (obj.N), number of objectives (obj.M), decision variables (obj.D), as well as the limits of these variables (obj.lower and obj.upper) and the encoding scheme (obj.encod) that orchestrates genetic processes such as mutation and crossover. The Setting method configures these properties to customize the problem to the specific requirements of the NoC simulation.
The second method, Initialization, constructs the initial population (Population) of solutions from an initial decision matrix (PopDec). This matrix is generated based on the specific number of individuals (nInd) determined earlier and is derived from the application graph (appGraph). Each individual in the population is represented by an instance of the SOLUTION class. In this setup, each SOLUTION instance stores relevant data including decision variables (obj.dec), objective values (obj.obj), and any constraint violations (obj.con).
The performance of each solution is evaluated with the CalObj method of the mapProblem class. This method updates the objective matrix (PopObj) with metrics calculated for every individual solution. These metrics are crucial for assessing the effectiveness and efficiency of the solutions in terms of the predefined objectives and constraints of the NoC simulation.
This section presents the validation and results of two distinct experiments designed to test the structure and functionalities of the proposed simulator. The objective was to explore the simplified and intuitive methodology of the project and its effectiveness in implementing essential techniques such as mapping and routing algorithms, which are common themes in NoC literature. The primary aim was to validate the high-level infrastructure and assess its performance and effectiveness in visualizing the incorporated concepts.
First Experiment: This experiment focused on mapping applications using single-objective optimization algorithms, where the simplest energy cost approximation metric was evaluated, measured in watts. The experiment was designed to test the simulator’s efficiency in optimizing energy consumption during task mapping.
Second Experiment: Utilizing the GA algorithm [65] as a foundational reference, this experiment compared different routing methods integrated into the simulator structure, including XY, Negative-First, West-First, Odd-Even, and XYX, using the same cost metric as in the first experiment.
Methodology: Both experiments involved 50 test runs, each comprising 100 individuals, totaling 50.000 evaluations. The mutation rate was set at 0.1. Eight application graphs, named
The deterministic NMAP algorithm generated 100 application mappings for direct comparison. These parameters were finalized after a series of initial calibration tests to determine the fundamental parameters of the process and to minimize the complexity in the experimental section of the article.
Experimental Setup: The experiments were conducted on two different computer setups. The first setup included an AMD Ryzen 7 5700G processor with integrated Radeon Graphics at 3.80 GHz and 16 gigabytes of RAM, running the Windows 10 Pro operating system, version 21H2. The second setup featured an Intel Core i7 processor at 3.60GHz, 32 gigabytes of RAM, and an NVIDIA GeForce RTX 3080 graphics card, operating under the Linux Ubuntu 18.04 system. For data analysis, MATLAB in its basic version and Microsoft Excel were utilized, along with SystemC and C++ for the specific simulator.
Validation of results
Validation of results: Energy and power
Validation of results: Energy and power
Validation of Results: Energy and Power Comparison by Algorithm and Graph Size. The plots illustrate the comparative energy (best) and power (mW) values associated with different algorithms, with a breakdown by the size of the graph. The vertical correlation shows that as the graph size increases, both energy and power values tend to increase across different simulators. This provides a visual representation of energy efficiency and power consumption, allowing for a quick comparison across varying conditions.
The presented graphs show the comparison between the energy (best) and power (mW) values obtained from the simulator proposed in the paper and the NoCTweak simulator, considered more realistic. In general, it is observed that the energy lines generated by the paper’s simulator closely follow those of the NoCTweak simulator, suggesting that the proposed simulator is effective in terms of energy optimization.
For Graph9, the energy lines from the paper’s simulator are closely aligned with those of NoCTweak, indicating a close match between the results of the two simulators. The power also shows a similar trend, reinforcing the reliability of the proposed simulator.
In the case of Graph16, there is a slight variation in the energy values, but the overall trends are still similar. The lines follow a similar pattern, suggesting that the paper’s simulator is capable of replicating the energy behaviors observed in NoCTweak.
For Graph25, the energy lines continue to show a good match, although there are small discrepancies in some points. However, the general trend is maintained, with the paper’s simulator demonstrating consistent performance relative to NoCTweak.
Graph36 presents a very close correspondence between the two simulators, with the energy lines almost overlapping. This indicates that the paper’s simulator is well-calibrated for this graph size, producing results that faithfully reflect the energy values of NoCTweak.
For Graph49, the energy lines again show a similar trend between the two simulators, with small variations that do not compromise the consistency of the results. The power also follows a similar pattern, corroborating the effectiveness of the proposed simulator.
In Graph64, the correspondence between the simulators is a bit more varied, but still, the overall trends are maintained. The energy lines follow a pattern that is recognizable and close to that of NoCTweak.
Graph81 continues to show a good match, with the energy lines maintaining considerable proximity. This suggests that the paper’s simulator is robust for a variety of graph sizes.
Before conducting the experiments, this validation was performed comparing the results simulated in the proposed work (energy metrics) and in the NoCTweak simulator (power). The objective was to analyze whether the behavior of the graph lines would follow the same pattern. This comparison aimed to ensure the reliability and accuracy of the proposed simulator in optimizing energy consumption during task mapping.
Performance comparison of mapping algorithms
Performance comparison of mapping algorithms
After this evaluation process, the idea was to use the simulator to study techniques for generating application mapping for the manycore. So, in the first experiment, a comparative analysis of application mapping results was conducted, as shown in Table 3. Various heuristic algorithms from the PlatEMO platform library were utilized, including the Genetic Algorithm (GA), Ant Colony Optimization (ACO) [65], and Particle Swarm Optimization (PSO) [65]. Additionally, the RANDOM algorithm [28] and the Nearly-Optimal Mapping via Nearest Neighbor Mapping (NMAP) [28] were employed to generate mappings in the NoCTweak simulator. This was complemented by Simulated Annealing (SA) [33] and Branch and Bound (BB) [34] for mappings in the NoCmap simulation environment.
Further mappings were produced using the Tabu Search (TS) algorithm [66] and a methodology described in [67], which introduced three new perturbation methods to the traditional SA algorithm, aiming to enhance the performance within the NoC. Two variants mentioned by the authors, “Proposed method-10K” and “Proposed method-1K”, were compared with other algorithms such as GA, CastNet, ILP, LMAP, NMAP, ONMAP, XY_ADB, and Map_graph in the Noxim simulator structure. Throughput was the main metric, but the potential of the approach for more comprehensive evaluations is evident, demonstrating superior efficacy than traditional methods like the GA and NMAP.
Best Performance Values (Cost) by Algorithm and Graph Size. The bar chart illustrates the comparative costs associated with different algorithms, with a breakdown by the size of the graph. It provides a visual representation of cost efficiency, allowing for a quick comparison across varying conditions.
This study aims to understand the behavior of various algorithms adapted for application mapping in NoCs as the number of tasks in the application graphs increases, consequently affecting the number of packets that will travel on the network. Table 3 shows details of this analysis. Statistical evaluations were performed, describing a simpler approximation of the energy cost and power in configurations for application graphs ranging from 9 to 100 tasks. The mean cost (Mean), the consistency of the results (Standard Deviation, Std Dev), and the efficiency of the optimization (Best Cost, Best) were analyzed. The methodology for calculating the total energy cost involves summing the individual energy costs of each packet.
Energy Consumption by Algorithm and Graph Size. This bar chart illustrates energy consumption metrics for various algorithms, segmented by the size of the processed graph. The idea was to use the proposed simulator for a test case comparing application mapping algorithms, quantitatively reflecting the energy demands of each algorithm and enabling a direct comparison across different computational scenarios.
This cost is calculated based on the count of links each packet traverses, multiplied by the initial energy cost assigned to each link, as mathematically outlined in Eq. (1).
where:
Here,
In the analysis carried out, detailed in Table 3 and illustrated in Fig. 6, note that the decision to use standard initial parameters aims to maintain a level of equality for all algorithms. Their performance seems to depend on the ability of each one to adapt to and explore the characteristics of the given solution space. In terms of exploration and intensification of the search, the GA achieved satisfactory performance in most scenarios, where its ability to generate diversity among solutions seems to be a differential. The SA[62] variant, with its double perturbation strategy, stands out particularly in larger spaces, where its search approach is less prone to being restricted to local optima. Random Search, with its non-directional and stochastic nature, serves as a baseline methodology for comparative evaluation of algorithms, offering insights into the structure of the search space and the variability of the solutions found.
Another variable to be analyzed to understand the context is the representation of the chromosome, which is essential in GAs to encode and efficiently explore the solution space. Its absence in algorithms like PSO and ACO forces them to rely on other solution representation strategies. The BB, with its systematic and careful approach in eliminating suboptimal solutions, proves to be robust, especially in contexts of increasingly complex problems, as observed in larger graphs. Population diversity is an inherent advantage of evolutionary algorithms, but other algorithms, like Tabu Search, also maintain diversity by mechanisms such as the tabu list, which avoids the repetition of previous solutions. The NMAP, being applied only to the best performance, does not reveal much about its diversity but the result suggests a specialization that can be effective within a specific context.
Sensitivity to parameters is an important consideration in choosing the algorithm. While GA and PSO may require fine-tuning, they seem to be less sensitive to parameter variations than the SA[62] variant, whose performance can be significantly influenced by the configuration of the temperature and cooling rate. The NMAP, on the other hand, shows efficiency in a case of best performance, indicating that it may outperform other methodologies in an ideal set of parameters. This suggests that each algorithm might perform optimally under specific settings, highlighting the importance of understanding and adjusting these parameters to fit the particular needs of each NoC mapping task or scenario.
Comparison routing algorithm – Cost – GA
parameters for specific NoC mapping scenarios.
Comparison Routing Algorithms – GA. The bar chart illustrates the best performance values obtained by different routing algorithms across a range of graph sizes. The vertical axis measures the performance in terms of the best value, while the horizontal axis shows the increasing complexity of the graphs from “Graph9” to “Graph100”. Each set of bars represents a unique algorithm, providing a visual comparison of their efficiency in routing within the context of genetic algorithm optimization.
Therefore, the nature of the problem itself is a determining factor in the effectiveness of the algorithms. While GA and PSO are generally presented as suitable for more problem variations, the SA[62] variant, ACO, BB, and even RANDOM stand out in situations where complexity and the size of the solution space increase. The performance of NMAP in a case of best performance suggests a specialization that can be highly efficient for problems in this context, emphasizing the notion that often the particular nature of the problem is what determines which algorithm will yield the best performance.
Continuing with the analysis of Table 3, which details power consumption (Power (mW)) measured in milliwatts, the data were obtained from the most efficient mappings, chosen based on the total cost metric. These mappings were evaluated in the NoCTweak simulation environment under standard parameters. The use of these initial parameters aims to ensure an equal starting level for all algorithms, although it may be beneficial in real scenarios, especially for non-adaptive algorithms. The analysis of the power consumption graph in Fig. 7 reveals important trends in the energy efficiency of different optimization strategies.
In graphs with a smaller number of tasks, GA and Tabu Search demonstrate similar energy consumption, indicating efficiency in less complex contexts. As the size of the graphs increases, the energy consumption grows for all algorithms due to the complexity and the greater number of operations required. In this regard, ACO and PSO show a significant rise in energy consumption, potentially attributable to the more operationally demanding characteristics of such algorithms.
In contrast, the SA[62] variant maintains more stable energy consumption, suggesting more energy-efficient intensive exploration, likely due to the use of probabilities to accept or reject solutions. The BB algorithm, despite the increase in consumption with the size of the graph, maintains a certain energy efficiency, indicating an efficient balance in exploring the solution space. On the other hand, the RANDOM algorithm stands out for its high consumption in all scenarios, illustrating the inefficiency of random search strategies in terms of energy.
The results of the comparative study of routing algorithms, shown in Table 4 and in Fig. 8, using a Mesh topology with GA as a base, reveal that energy efficiency varies according to the size of the graph. For smaller graphs (Graph9 and Graph16), the Negative-First, Odd-Even, and West-First algorithms show similar efficiencies, suggesting that in less complex networks, the routing strategy has less impact on energy consumption.
As the network expands (Graph25 onwards), Odd-Even stands out as better, suggesting that alternating between odd and even rows and columns is effective in traffic distribution and preventing congestion, typical of larger networks.
In medium to large graphs (Graph49, Graph64), strategies that prioritize movements in specific directions, like Negative-First and West-First, perform better, implying that managing the direction of traffic can reduce energy consumption by avoiding congested routers. For the largest graph evaluated (Graph100), the XYX algorithm was the most effective, suggesting that the ability to alternate between axes offers a significant advantage in optimizing routing under these conditions.
It is possible that the implementation of GA may have favored certain routing algorithms, influencing the routing optimization based on energy cost. Through its ability to explore extensive solutions and adapt strategies over generations, GA might have enhanced the performance of adaptable algorithms like Odd-Even, and aided in identifying efficient paths that conventional algorithms might not achieve. Notably, the XY algorithm, which is simple and widely cited in the literature as efficient for Mesh topology, showed limitations in this testing methodology, thus challenging the common perception of its superiority and ease of implementation. However, the influence of GA depends on its specific configuration and how it interacts with the logic of each routing algorithm.
Conclusion
This study introduced a high-level simulator for Network-on-Chip, integrated with the PlatEMO platform, focusing on many-core architectures. The shift from the traditional approach aims to evaluate mapping and routing algorithms, as well as other architectural elements, in a more straightforward and intuitive manner. The results verify the simulator’s proficiency in handling a range of algorithms for mapping and routing. The algorithms, in this context, are chosen depending on the complexity of the problem and the solution space. Algorithms such as Tabu Search and PSO show efficiency in more confined areas, whereas the SA[62] variation is notably effective in broader scenarios, and GA exhibits strong performance across a wide range of sizes.
A limitation of the XY routing algorithm was identified, which puts in question its recognized superiority and simplicity. This suggests the need for more analyses in different contexts for validation. The study also revealed variations in energy efficiency among the algorithms, with Odd-Even and XYX standing out in larger networks for efficient traffic management and congestion prevention. In summary, the study contributes to the field of NoCs, challenging pre-established ideas and deepening understanding of the algorithms, their efficiencies, and limitations. Future work intends to expand the functionalities of the approach, including more algorithms, metrics, and models of energy and performance evaluations. Finally, in the future, the goal and research tool for NoCs, emphasizing its practical and streamlined usability.
Footnotes
Acknowledgments
The authors are grateful to the grant 2023/00212-0, São Paulo Research Foundation (FAPESP) and Coordination for the Improvement of Higher Education Personnel (CAPES) – Financing Code 001.
