Multi-objective hyperparameter optimization approach with genetic algorithms towards efficient and environmentally friendly machine learning

Abstract

This paper presents a multi-objective optimization approach for developing efficient and environmentally friendly Machine Learning models. The proposed approach uses Genetic Algorithms to simultaneously optimize the accuracy, time-to-solution, and energy consumption simultaneously. This solution proposed to be part of an Automated Machine Learning pipeline and focuses on architecture and hyperparameter search. A customized Genetic Algorithm scheme and operators were developed, and its feasibility was evaluated using the XGBoost ML algorithm for classification and regression tasks. The results demonstrate the effectiveness of the Genetic Algorithm for multi-objective optimization, indicating that it is possible to reduce energy consumption while minimizing predictive performance losses.

Keywords

Genetic Algorithm Machine Learning Green AI Multi-objective optimization

1. Introduction

Artificial intelligence (AI) has become an integral part of our lives, with subfield machine learning (ML) finding applications in various domains. Although the benefits of AI are widely recognized, concerns have been raised about its negative impact on ethical and environmental issues. This is mainly due to data-driven approaches, such as ML, which has been trained using increasingly large datasets and significant computation times, leading to high energy consumption and carbon emissions [6,7,39,45].

A more pressing concern about how modern AI directly creates ${CO}_{2}$ emissions from the data-intensive training of ML algorithms followed the study of Strubell et al. [43]. It was shown that a common Natural Language Processing (NLP) model training and tuning pipeline using Deep Learning (DL) produced the same amount of carbon dioxide equivalent ( ${CO}_{2}$ e) as five cars during their lifespan. This study was one of the first to quantify the environmental impact of ML models, drawing attention to the need to address their carbon emissions. In sequence, Schwartz et al. [40] demonstrated through case studies that ML systems can significantly contribute to ${CO}_{2}$ emissions and introduced the concept of “Green AI” as an area of research focused on creating environmentally friendly and inclusive AI [40], in which our work is centered. Bender et al. [1] argued that the current trajectory of NLP with Large Language Models (LLMs), trained on increasingly large datasets, is unsustainable and harmful in several ways, including its massive carbon footprint. Patterson et al. [35] estimated that the GPT-3 training, which is partly based on ChatGPT, consumed 1,287 MWh and led to emissions of more than 550 tons of ${CO}_{2}$ e. This number is equivalent to the emissions produced by a person taking 550 round-trips between New York and San Francisco. Another study by the Massachusetts Institute of Technology [44] warned that the massive global adoption of autonomous vehicles could significantly increase carbon emissions. These high emissions result from the significant computing workload placed on each self-driving vehicle, which uses an algorithm similar to that used today, a multitask DL, processing inputs provided by several onboard cameras with high frame rates to allow the car to drive on its own. Researchers suggest that the efficiency of computer processors needs to be improved to consume less energy for the same tasks, and that the algorithms themselves could be made more efficient to use less computing power to avoid a massive increase in carbon emissions. These studies highlight the urgent need to develop more efficient and environmentally friendly ML solutions, which is the focus of this research.

In this context, Automated Machine Learning (AutoML) has emerged as a promising approach for reducing the costs associated with ML. AutoML [17] seeks to automate the development of parts or the entire pipeline of an ML model; several frameworks and tools are available for this purpose. However, most existing approaches focus only on maximizing predictive performance without considering energy efficiency.

Based on this gap in efficient environmentally ML solutions and this potential of AutoML, the main research question of this study is how to develop more efficient and environmentally friendly ML models by optimizing their accuracy, time to reach a solution and energy consumption, thereby solving a multi-objective optimization problem. Our proposed approach uses Genetic Algorithms (GA) to generate the ML model, focusing on architecture and hyperparameter search as part of an AutoML solution. We implemented a customized GA scheme and operators to work on two ML models: XGBoost (XGB) for classification and regression and Convolutional Neural Network for image classification. In this work, we present the GA scheme for XGB and evaluate its feasibility.

GA is the optimization technique in the formulation of our AutoML approach, as it has shown good results in optimizing different ML models [14,16,22,23,28,46,50]. Furthermore, GA is considered one of the most efficient techniques for multi-objective optimization [25] and with the potential to create stable pipelines [37]. XGB is one of the most widely used algorithms by data scientists and has been widely recognized in several ML challenges, such as the Kaggle competitions [3]. In addition, XGB can deal with missing examples, outliers, handle large numbers of features, and examples; it is memory efficient, allowing more experiments to be performed in less time, which are desirable properties to be incorporated into an AutoML system, and it is very useful for the early development and prof-of concept of the proposed GA.

The main contribution of this work is the development and evaluation of this specialized GA scheme and operators. In addition, our scheme enables the objectives of optimization to be defined according to the user’s requirements. It is possible to set priorities for time, energy, and predictive performance. Our results demonstrate the effectiveness of the GA for multi-objective optimization, indicating that it is possible to reduce energy consumption while minimizing predictive performance losses. According to our review of the literature (detailed in Section 2), this is the first multi-objective optimization approach to consider the efficiency of ML models by simultaneously considering accuracy, time to solution, and energy consumption, which still allows them to be chosen. Moreover, this approach represents an important and inspiring step towards the development of efficient and environmentally friendly ML. We believe that our work not only presents a novel approach, but also stimulates a crucial conversation about the importance of developing greener and more inclusive artificial intelligence.

This paper builds on our previous work [47] by incorporating new related work, improving the introduction, the background on evolutionary algorithms, and providing a detailed description of our GA scheme and operators. Furthermore, we conducted additional experiments, presenting new results and discussions.

The remainder of this paper is organized as follows. Section 2 provides background information on Green AI and AutoML concepts and related work. In Section 3, our proposal for multi-objective optimization is presented along with the details of the GA implementation. Section 4 presents the experimental methodology, and Section 5 discusses the results. Finally, in Section 6, we discuss the contributions and limitations of this work, as well as future research directions.

2. Background and related works

This section briefly introduces the domain of Automated Machine Learning and Green AI with some background information and the main work related to our proposal. We conducted a literature search using the Rapid Literature Review (RLR) methodology [42]. To find publications, searches were performed on the CAPES Portal ,1

¹
https://www-periodicos-capes-gov-br.ez24.periodicos.capes.gov.br/

which includes several databases in a time window from 2011 to 2023. The following keywords were searched for in titles “multi-objective optimizations AND energy saving AND Machine Learning” OR “multi-objective optimizations AND energy-efficient AND Automatic Machine Learning” OR “Genetic Algorithms AND energy saving AND Automatic Machine Learning”. The first string returned the most results (56) which after reading the title and abstracts we found five interesting works but none included the multi-objective optimization approach of the ML models by simultaneously considering accuracy, time to solution, and energy consumption. The grey literature was identified using Google Search and all the results selected are presented in this section.

Schwartz et al. [40] introduced the concept of “Green AI”, AI research that is more environmentally friendly and inclusive. On the contrary, Red AI refers to AI research that seeks to improve accuracy through massive computational power, while being both environmentally unfriendly and prohibitively expensive, raising barriers to participation. The costs of actual state-of-the-art AI research limit the ability of many researchers to study it and practitioners to adopt it. The use of massive data and large amounts of computation in tuning hyperparameters of computationally intensive models creates barriers for many researchers to reproduce the results of these models and train their models on the same setup [29]. Research on red AI has yielded valuable scientific contributions to the field; however, it is time to increase the prevalence of green AI. Creating efficiency in AI research will decrease its carbon footprint and increase its inclusivity, as AI studies should not require large budgets or computational power. Green AI refers to research that yields novel results while considering computational cost [12]. In the aforementioned references [1,35,40,43,44], we highlight the need to develop a more efficient and environmentally friendly ML.

AutoML is an approach that has shown great promise and has been among the AI trends in the coming years 2

Gartner https://www.gartner.com/smarterwithgartner/top-trends-on-the-gartner-hype-cycle-for-artificial-intelligence-2019.

because the ML model trained on one particular dataset would not work well in another. Therefore, separate ML models for each dataset are needed, including data preparation, feature engineering, algorithm selection, and hyperparameter tuning. Most of these steps require trial and error, making the process inefficient and costly (economic, time and environmental). This has motivated the development of AutoML [17], which aims to automate parts or even the entire pipeline for building an ML model. The pipeline can involve several processes, including data preparation, attribute engineering, model construction, and evaluation. The model-building process consists of the selection of algorithms and optimization methods. The optimization methods are further divided into hyperparameter optimization (HPO) and architecture optimization (AO), or Neural Architecture Optimization (NAS) when it involves only neural network models [17]. In HPO, the former indicates the parameters related to training (e.g., learning rate and batch size), and the latter indicates the parameters related to the model (e.g. the number of layers for neural network architectures or the number of trees in XGB). A review of the literature on AutoML can be found in [10,11,17,32].

The works found in the literature are distinguished mainly by the optimization technique used in their formulation, the most common being Bayesian [13], Reinforcement Learning [18], Grid and Random Search [27] and Evolutionary Algorithms (EA) [37], such as GA and swarm intelligence [49]; by the type of input data being tabular, text, images, and time series; and by the pipeline processes that are involved [11]. However, most of them have in common that the formulation of the optimization process is single-objective, mainly involving maximizing the predictive performance (Red AI approach), without considering the energy consumption as our proposal, which is Green AI centric.

Among the optimization techniques used in the formulation of AutoML approaches, EA-based methods, more specifically Genetic Algorithms [19], have shown good results to optimize different ML models [14,16,22,23,28,46,50]. GA is considered one of the most efficient techniques for multi-objective optimization [25] and, according to [37], one of the potential choices for AutoML to create stable pipelines and is easier to implement, being adopted by FEDOT [33]. GAs, especially with co-evolutionary approaches [30] (more than one evolutionary process occurs simultaneously), are also considered the best options by Amazon, which according to [9], allows the result of more general-purpose approaches.

Several works with AutoML are for DL models and use GA as an optimizer [8,16,28,38,41,46,48]. This is justified by the fact that, as previously mentioned, these algorithms have achieved surprising results for unstructured data. Furthermore, these models require adjustment of a large number of hyperparameters. Moreover, eXtreme Gradient Boosting (XGB [4]) is very effective for structured data. It is among the most used ML algorithms for all types of data science problems and is responsible for solving and winning most of Kaggle’s challenges [24]. The work [11] presents a benchmark of supervised AutoML, performing a comparison study with hundreds of computational experiments based on three scenarios: General Machine Learning (GML), DL, and XGB.

There is a growing number of tools for AutoML: Auto-Weka, Auto-sklearn [13], TPOT [34], Autokeras, Auto PyTorch, H2O [27], rminer 3

https://cran.r-project.org/web/packages/rminer/rminer.pdf

[11], and others [33].

Multi-objective optimization is implemented in some existing AutoML solutions, and various criteria can be involved in its design. AutoxgboostMC [36] proposed a Bayesian approach to optimize the predictive performance, fairness, and interpretability of the HPO process for XGB. [37] proposes an EA as a model design to be implemented as part of the FEDOT AutoML framework. Two optimization objectives were used: solution quality and chain complexity. Hong et al. [20] proposed the use of a multi-objective evolutionary algorithm for pruning neural networks that considers the accuracy of the pruned network and the available computational resources of the environment where the network will run. The proposed hardware-aware multi-objective evolutionary network pruning method evaluates the accuracy, latency, and amount of used memory, but not energy consumption. Karl et al. [26] presented a survey of multi-objective hyperparameter optimization. Despite the number of studies discussed in this paper, when energy consumption is considered as the optimization objective, it is heavily based on the topic of hardware NAS for efficient neural accelerator design. According to our review of the literature, work [21] is the only one considering energy in the NAS for software. They proposed a framework for Multi-Objective Neural Architectural Search (MONAS) that employs reward functions considering the predictive accuracy and energy consumption when searching for neural network architectures in a CNN for image classification. However, MONAS adopts a two-stage framework: a Reinforcement Neural Network (RNN) is used in the generation stage to generate a hyperparameter sequence for the CNN. In the evaluation stage, an existing CNN model is trained as a target network with hyperparameter output by the RNN. Accuracy and energy consumption of the target network are rewards.

Comparing our work with the literature review, we see that most of the works are single-objective and existing multi-objective optimization approaches, with the exception of [21], do not consider the energy consumption of HPO and NAS. In contrast, our multi-objective proposal using GA considers predictive performance, time to solution, and energy consumption, and allows the priorities to be chosen for each of them. In addition, in our design, the strategy is a joint hyperparameter optimization and architecture optimization. Most NAS methods fix the same setting of training-related hyperparameters throughout the search stage. After the search, the hyperparameters of the best-performing architecture are further optimized. However, this paradigm may result in suboptimal results because different architectures tend to fit different hyperparameters [17].

3. Multi-objective optimization with GA

Among the optimization techniques used in AutoML formulation, including hyperparameter and architecture optimization, Evolutionary Algorithms [37], such as GA [15] and swarm intelligence [49] have shown good results to optimize different ML models. There is no consensus accepted in the evolutionary computation community that differentiates the definition of Genetic Algorithms from other evolutionary computation methods. However, in this work we follow the elements in common finding in the definitions of the works [5,19,31]. Under these definitions, we developed our proposed multi-objective GA, as detailed in Section 3.2.

3.1. Basic elements of genetic algorithms

Genetic Algorithms is a meta-heuristic widely used in search or optimization problems, which is inspired by the process of evolution of a population of individuals through reproduction and natural selection [19]. Mathematically, GA mimic the mechanisms of natural evolution of species, comprising the processes of genetic evolution of populations, survival, and adaptation of individuals [5]. The methods called GA have the following elements in common: populations of chromosomes, selection according to fitness, crossover to produce new offspring and random mutation of new offspring [31].

In GA, the reproductive process involves the recombination of genetic material (chromosomes) from two “parent individuals”, known as crossover, as well as genetic mutations that may occur during the formation of offspring chromosomes. Natural selection determines the individuals who will reproduce and be part of a new population. Chromosomes in a GA population are typically represented as bit strings, with each locus having two possible alleles: 0 or 1. Each chromosome is considered a point in the search space of candidate solutions, and the GA processes the populations of chromosomes, replacing one population with another. A fitness function assigns a score to each chromosome based on its ability to solve the problem [31].

First, a population of solutions is initialized and evaluated. In each generation (iteration), better solutions are probabilistically selected as parents to generate new offspring solutions through crossover and mutation. The resulting offspring solutions are evaluated and inserted into the population, while some solutions are removed through survival selection to maintain a constant population size. If the stopping criterion is met, the best solution is returned; otherwise, the process repeats with the next iteration [26].

These algorithms have a lower probability of getting stuck in a local optimum than most optimization algorithms because they perform a global search in the solution space, they are also very robust and stable, and have a straightforward parallelization capability since they are population based, allowing multiple solutions to be tested simultaneously (limited by the hardware available). To apply GA, we must define specially designed genetic operators (crossover, mutation and selection) and fitness functions. These special operators are important for processing individuals, as described by a GA. They also offer the possibility of considering single or multiple-objective functions.

When applying GAs to a multi-objective problem, the only component that requires change is the selection step (selecting parents and selecting survivors to the next generation). In single-objective optimization, the objective function can be used to rank individuals and select the better ones. In multi-objective optimization, many solutions have different trade-offs between the objectives, and we are interested in finding a good approximation [26].

The next section presents the GA design choices for a multi-objective approach centered on Green AI.

3.2. GA design

The solution proposed in this work will be part of our AutoML framework developed in the future in the project where this work is inserted, which aims to automate the model construction process by focusing on HPO and AO/NAS optimization. The GA design was developed to work on two ML models: XGB for classification and regression (tabular data) and CNN for image classification. This section presents the GA scheme for XGB (HPO + AO). The general GA workflow is shown in Fig. 1 and is detailed below. Hyperparameter and architecture model tuning are treated as optimization problems. The multi-objective functions we want to optimize are the predictive performance (accuracy or MSE), the energy, and time to solution (fitness function). The GA method (Fig. 1) and operators (in bold) are designed as follows:

Fig. 1.

Flowchart for the GA method workflow implemented in this work.

Genetic evolution starts with the creation of an initial population of randomly generated solutions using functions generatePopulation and generateIndividual. In our case, a solution is a combination of hyperparameters that defines the ML model’s architecture and learning process and are encoded in a binary format represented by an integer array of zeros and ones, referred to as chromosomes, and each element as an allele. generatePopulation initializes an array of pointers of length p that ‘store’ the individuals created by the generateIndividual function that generates a random array of length l, where p is the number of individuals in the population and l is the minimum necessary number of “bits” to represent the highest value for each hyperparameter.

After creating the population, each individual was tested and assigned a fitness score. This is the most time- and energy-consuming part of the evolutionary process. Fitness scores are given by Equations (1) (for the XGB classification) and (2) (XGB regression) as follows (developed by the authors): $\begin{matrix} (1) & f_{i} = α * (a_{i} * 100) + β * \frac{1}{\frac{e_{i}}{\sum_{j = 1}^{n} e_{j}}} + γ * \frac{1}{\frac{t_{i}}{\sum_{j = 1}^{n} t_{j}}} \end{matrix}$ where a is the accuracy, e is the energy in Joules, t is the time in seconds, α, β, γ are the weights attributed to the relevance of the accuracy, energy and time, respectively, and n is the total number of individuals in the population. $\begin{matrix} (2) & f_{i} = α * \frac{1}{\frac{m_{i}}{m_{min}}} + β * \frac{1}{\frac{e_{i}}{e_{min}}} + γ * \frac{1}{\frac{t_{i}}{t_{min}}} \end{matrix}$ where $m_{i}$ is the MSE of the i-ith individual, $m_{min}$ is the lowest MSE in the current population, $e_{i}$ is the energy in Joules of the i-ith individual, $e_{min}$ is the lowest energy in the current population, $t_{i}$ is the time in seconds of the i-ith individual, $t_{min}$ is the lowest time in the current population. α, β, γ are the weights attributed to the relevance of the MSE, energy, and time, respectively; n is the total number of individuals in the population.

Selection: This operator implements a simple roulette selection, where the probability of being selected is directly proportional to the fitness score of the individual; the higher the score, the higher the chance of being selected. Two individuals are selected, and they will be referred to as Parents A and B. Two offspring will be generated from these two individuals, Offspring A and Offspring B, and are described below in the Crossover and Mutation operators.

A Crossover function may be applied or not. The probability of the crossover parameter gives it: if is set to 100, it is always applied; 0, it is never applied. If applied, a random allele is selected as the crossover point, and all the alleles before it are copied directly to the offspring solutions. That is, from Parent A to Offspring A and Parent B to Offspring B. The remaining alleles, starting at the marked allele, are copied from the other Parent (Parent A to offspring B and Parent B to Offspring A). If crossover is not applied, the selected parent chromosomes are copied, as they are for the two new offspring solutions.

Mutation functions are applied to both offspring. The probability of mutations occurring is given by the likelihood of mutation parameter, and the operator evaluates the probability of mutation for each of the alleles sequentially; if a mutation occurs, the value of the allele is inverted, that is, 0 to 1 and 1 to 0; every time a mutation occurs, the value of the probability of mutation for that offspring is divided by 2. After the mutation to the current allele, the operator evaluates the remaining alleles with a reduced probability of mutation, reducing the chance of another mutation in the same offspring but still allowing multiple mutations.

The two new offspring are then added to the latest population pool along with the best n solutions of the current population, where n is given by the elitism parameter. This is due to the preservation of the best solution found up to that generation.

The generation of new individuals is repeated until the new population pool is full; once the new population is full, it is evaluated. This process is repeated until the stop condition is not reached.

The details of the GA scheme presented in this section are for XGB (HPO + AO) for the classification and regression tasks for structured data (tabular). However, as mentioned, the GA method was also designed for CNN (HPO + NAS) for image classification. For this, the selection, mutation, and generation of the population presented above do not change. However, they must be applied to larger chromosomes, and the functions responsible for decoding the chromosome and testing individuals of the population are specific to XGB. The main change is in the fitness function. Furthermore, this GA can be extended for use in other ML models with minimal additions to the code, preserving the main operators, that is, selection, mutation, and crossover functions as they are; for tasks similar to those presented in this work (classification and regression), even the fitness evaluation can be used as is. In addition, in our design, the strategy is a joint hyperparameter and architecture optimization. Most NAS methods fix the same setting of training-related hyperparameters throughout the search stage. After the search, the hyperparameters of the best-performing architecture are further optimized. However, this paradigm may result in suboptimal results because different architectures tend to fit different hyperparameters [17].

4. Experimental setup

In order to evaluate the feasibility of the multiobjective optimization using the proposed GA method, experiments with the XGB algorithm were performed using two datasets and seven experimental configurations.

XGB [4] is an implementation of the gradient-boosted tree algorithm. This technique, known as boosting, is an ensemble learning technique that uses a set of base learners to improve the stability and effectiveness of an ML model. This is called gradient boosting, because it uses a gradient descent algorithm to minimize loss when adding new models. The central idea of boosting is the sequential implementation of homogeneous ML algorithms, where each of these algorithms attempts to improve the stability of the model by focusing on the errors made by the previous algorithm. XGB is one of the most widely used algorithms by data scientists, mainly in forecasting problems involving structured data, and has been widely recognized in several ML and data mining challenges, such as Kaggle competitions [3]. In addition, it has desirable properties to be incorporated into an AutoML system, as it can deal with missing observations and outliers and can handle a large number of features and examples in a very optimized manner. Additionally, it is numerically stable and memory efficient, allowing more experiments to be performed in less time, which is very useful for the early development and proof-of-concept of the proposed GA. Despite this, the HPO is helpful for XGB because it is highly configurable with a large number of hyperparameters for regularization and optimization.

The dataset utilized for the classification task is Boson Higgs ,4

⁴
https://archive.ics.uci.edu/ml/datasets/HIGGS

whose objective is to classify whether an event is a Higgs boson decay or not, and Seoul Bike Trip Duration 5

⁵

https://www.kaggle.com/saurabhshahane/seoul-bike-trip-duration-prediction

for regression, which predicts the duration of travel of bicycles rented in Seoul, based on combining weather data. Boson Higgs has 11 million examples, 2 classes, and 28 numerical features. Seoul has 9 million examples and 24 numerical features. None of the data sets has missing data features.

GA tunes a combination of four parameters of the XGB, looking for the best multi-objective optimization. The HPO (N_jobs, ETA) and AO (N_estimators, max_depth) parameters are optimized jointly. The parameter N_jobs defines the number of cores in parallel that the processor will execute the algorithm. When N_jobs = −1, all cores are used, and for N_jobs = 1, only one core. The ETA parameter is the learning rate and defines the correction made at each boosting step. The N_estimators is the number of gradients boosted trees and the maximum tree depth for base learners is defined by max_depth. XGB splits up to the max_depth specified and then starts pruning the tree backward and removing splits beyond which there is no positive gain.

For the tests, the total length of the chromosome was fixed in 22 alleles, where the first six represent the ETA (ranging from 0.01 to 0.64); the next four are used for the N_jobs, (1 to 16), followed by 9 alleles for N_estimators (9 to 520), and the last three are used for the max_depth (3 to 10). The GA parameters were set as follows: population size = 20; number of generations = 30 (for the classification) and 20 (for the regression); elitism = 4; probability of crossover = 80; probability of mutation = 10.

The developed scheme enables the multi-objective optimization to be defined according to the user requirements. It is possible to set priorities for time, energy, and predictive performance. This defines the configuration of the GA fitness function (a combination of maximizing accuracy or minimizing the MSE, minimizing time, and minimizing the energy). These possibilities define the methodology for the experiments, resulting in 11 different experimental configurations, as listed in Tables 1 and 2.

Table 1 shows the combinations of relevance, in percent, attributed to model accuracy, training time, and energy consumed when training XGB for the classification task (XGBC). Table 2 shows the combinations of XGB for regression (XGBR). For example, the combination 1,0,0 in configuration 1 indicates that only the metrics with value 1 is to be considered (having 100% of relevance), which in this case is the MSE, which is equivalent to a red AI approach, where only the predictive accuracy matters.

Table 1

Configurations of the relevance for the experiments performed with XGB for classification

Configuration	Accuracy	Time	Energy
1	98	1	1
2	96	2	2
3	97	2	1
4	97	1	2
5	49	50	1

Table 2

Configurations of the relevance for the experiments performed with XGB for regression

Configuration	MSE	Time	Energy
1	1	0	0
2	95	0	5
3	95	5	0
4	50	0	50
5	50	50	0
6	98	1	1

Test environment was a computer with an Intel Core i7-8700 CPU with 6 cores e 12 threads, frequency 3.20 GHz, memory 64 GB DDR4 (16 GB X4), 1 HD SATA Seagate de 2TB e 1 SSD Corsair MP510 de 240 GB of storage, and a GPU Nvidia GeForce RTX 2080Ti with a base clock of 1350 MHz, boosted clock of 1545 MHz, and 11 GB of memory GDDR6 with ECC off, and OS Ubuntu 20.04.

XGB was implemented and executed with Python 3.8. The time and energy were measured with Perf Tool. The GA is implemented in C++. As a reproducibility compromise, all the codes and configurations are available at: https://github.com/comcidis/GA_ML_XGB.git.

5. Results and discussion

Figure 2 shows the fitness of each individual (bars) and the average population (line) for a sample test with a population of 10 individuals over nine generations for XGBC. Generation 0 is the initial randomly generated population. For this test, we used an elitism of 1, where only the best solution of the generation is passed on to the next due to the small size of the population. The graph shows the evolution, and as new generations are created, it is possible to observe the improvement in the fitness function (the higher, the better). This can be seen for the best individual (blue bar) and the general population (AVG line). This shows that you can improve the population, even with this small example with 10 individuals (for visualization purposes only).

Fig. 2.

Example of the evolution process with a population of 10 individuals.

Fig. 3.

Comparison of the accuracy and time for the XGB for classification.

Figures 3, 4, 5 and 6 show the accuracy/MSE, time and energy, for the best solution for each configuration (Tables 1 and 2).

Figure 3 compares the accuracy and time of the XGBC, where configuration 1 focuses mainly on accuracy (Table 1). As expected, it had the best accuracy (76%), but the worst time at 266.99 seconds. For Configuration 2, both the time and the energy relevance increased slightly. The results showed a significant decrease in time and a reduction of only one percentage point in accuracy. Configurations 3 and 4 also show time reductions, although not as much as configuration 2, but had a smaller loss in accuracy. Configuration 5, which is highly relevant to time, had the worst accuracy (70. 9%) and the best time (26.28 s), which is less than 10% of the time spent by Configuration 1. These results show the ability of GA to perform multi-objective optimization, even allowing for defining priorities. Furthermore, it can be seen that the GA manages well to optimize the defined criteria.

In Fig. 4 it can seem that the energy had the same behavior as the time.

Fig. 4.

Comparison of the accuracy and energy for the XGB for classification.

Fig. 5.

Comparison of the accuracy and time for the XGBR.

Fig. 6.

Comparison of the accuracy and energy for the XGBR.

Figure 5 compares the MSE and the time for the XGBR, and Fig. 6 shows the MSE and energy. The lower these values, the better. Configuration 1 has 100% relevance to MSE (red approach), and obtained the lowest MSE but the third highest time (174.46 seconds) and the highest energy consumption. The two highest times were performed by configurations 2 and 4 (175.52 and 176.86 seconds, respectively); this was expected since they focused on the energy but not on time. However, the most significant difference compared to configuration 1 was less than 2.5 seconds. These configurations had a lower energy consumption than configuration 1, with the lowest overall, obtained by configuration 4, which had a [50/50] split between MSE and energy. However, this reduction came at the cost of increasing the MSE.

Configurations 3 and 5, which focused on reducing time without considering energy, achieved a reduction in time, with the lowest obtained by configuration 5 (119.71 sec). This is more than 54 seconds lower than in configuration 1. However, it had the highest MSE, almost 2.5 times the MSE of configuration 1. This is mainly due to the XGB algorithm, which is highly optimized. As configurations 1, 3, and 5 used all available CPU cores, there was little or no room to reduce the time by increasing the parallelism, and the solution of the algorithm was to minimize the number of estimators. For configuration 1, the number of estimators was between 450 and 500, while in configuration 5, it was between 200 and 250, allowing the models to train faster but increasing the MSE.

In comparison, configurations 2 and 4 reduced the energy by reducing the number of cores used from 12 to 4. Interestingly, configuration 5 had the lowest energy consumption, resulting from the significant time reduction.

Configuration 6 showed an interesting result, having the third best MSE, very close to Configuration 3 with the second best MSE, but with lower energy and time than Configuration 3, having the third best energy and second best time overall.

These results were compared with a previous study, co-authored by some of the authors of this work [2], training XGB with the same Higgs dataset with 11 million examples searching for energy savings when tuning, empirically and manually, the hyperparameters of the algorithm. The comparison here is with the results of [2] obtained with the default parameters of XGB and with one who got the best accuracy. The configuration with the best accuracy achieved the same precision as our configuration 1. However, their result was obtained in a shorter time (223.86 seconds compared to the 266.994 of ours and energy consumption of 13095.18 J to 16429.67 J). For the default configuration, the accuracy of [2] was 74% with a time of 108.36 seconds and an energy of 5647.08 J. Compared to our configuration 2, it had the closest accuracy (75%), but the total time was 56.143 seconds and the energy 2803.16 J. It is about half of the time and energy consumption.

The results were not compared with other AutoML frameworks or GA approaches, since finding other works that implement multi-objective optimization evaluating energy and time in addition to predictive performance was impossible.

6. Final considerations

In this work, a multi-objective optimization centered on Green AI was designed, implemented, and evaluated, searching for an AI that is more environmentally friendly. It was implemented through GA using a workflow and operators specially designed for this problem to be part of an AutoML solution. A set of experiments was performed for the joint optimization of the XGB algorithm with HPO + AO for classification and regression tasks. Although GA was evaluated using XGB, the workflow and some operators were designed to be used for ML models in general with minimal additions to the code, preserving the main operators as they are. For classification and regression tasks similar to those presented in this study, even fitness evaluation can be used without modifications.

The initial results show the effectiveness of the proposed solution for multi-objective optimization using XGB. In addition, some experiments enabled significant energy savings compared to manual hyperparameter tuning when searching for the same objectives, indicating that it is possible to reduce energy consumption while minimizing predictive performance losses. To the best of our knowledge, this is the first multi-objective optimization approach to consider the efficiency of ML models by simultaneously considering accuracy, time-to-solution, and energy consumption, which still allows them to be chosen.

The limitation of this work is that only the XGBoots were evaluated, which is not a power-hungry model like the Deep Neural networks mentioned in the works related to green AI. However, this first model had a proof-of-concept objective, conducting experiments more quickly. To demonstrate the usefulness of this approach using a GA, a large number of experiments must be performed to evaluate mutation and crossover and to verify if the population is evolving. There were many implementations and tests until the final result was obtained. Even so, a single test with 20 generations took approximately 20 to 30 h for each experiment. This time could easily be up to four times higher if we tested it using a neural network according to our initial tests with a CNN. However, the fitness implemented and tested for XGB shows the possibility of this multi-objective optimization. Therefore, time tests with CNN will be greatly reduced. Recall that the purpose of this study is to reduce the impact of training ML models.

However, despite this limitation, this work makes contributions by presenting a specialized GA scheme and operators. Moreover, this approach represents an important and inspiring step towards the development of more environmentally friendly ML. We believe that our work not only presents a novel approach, but also stimulates a crucial conversation about the importance of developing greener and more inclusive artificial intelligence. Furthermore, even with the actual version of the GA model developed, this multi-objective optimization is already a contribution in itself, as although predictive measures of performance are still decisive in most cases, models that also have other measures of efficiency become increasingly important. For example, in many IoT applications, an ML model is deployed on edge devices, such as smartphones, watches, and embedded systems, and power consumption can be a limiting factor when deploying models in such settings. Therefore, trade-offs between energy consumption and predictive performance are essential factors in which our multi-objective approach can be directly applied to XGB because current state-of-the-art models in computer vision and LLMs with NN processing with millions or billions of weights are not applicable in such settings.

In future work, the GA is evolving to include HPO + NAS for the CNN model. Additionally, more tests are being performed with different configurations for GA parameters, such as the number of individuals and generations. In addition, A new crossover function for HPO + AO for XGB is under development. Finally, the multi-objective AutoML framework will be developed to incorporate the GAs designed initially for the mentioned tasks and algorithms.

References

E.M.

Bender,

Gebru,

McMillan-Major and

Shmitchell, On the dangers of stochastic parrots: Can language models be too big? in: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT’21, Association for Computing Machinery, New York, NY, USA, 2021, pp. 610–623. ISBN 9781450383097. doi:10.1145/3442188.3445922.

Bernardo,

Yokoyama,

Schulze and

Ferro, Avaliação do Consumo de Energia para o Treinamento de Aprendizado de Máquina utilizando Single-board computers baseadas em ARM, in: Anais do XXII Simpósio em Sistemas Computacionais de Alto Desempenho, SBC, Porto Alegre, RS, Brasil, 2021, pp. 60–71. doi:10.5753/wscad.2021.18512.

Chen and

Guestrin, XGBoost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’16, Association for Computing Machinery, New York, NY, USA, 2016, pp. 785–794. ISBN 9781450342322. doi:10.1145/2939672.2939785.

Chen and

Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794. doi:10.1145/2939672.2939785.

Colorni,

Dorigo and

Maniezzo, Genetic algorithms and highly constrained problems: The time-table case, in: Parallel Problem Solving from Nature,

H.-P.

Schwefel and

Männer, eds, Springer, Berlin Heidelberg, 1991, pp. 55–59. ISBN 978-3-540-70652-6. doi:10.1007/BFb0029731.

Cowls,

Tsamados,

Taddeo and

Floridi, The AI gambit: leveraging artificial intelligence to combat climate change—opportunities, challenges, and recommendations, Ai & Society (2021), 1–25.

D.-G. UNESCO, Preliminary report on the first draft of the Recommendation on the Ethics of Artificial Intelligence, United Nations Educational, Scientific and Cultural Organization (2021).

David and

Greental, Genetic Algorithms for Evolving Deep Neural Networks, GECCO 2014 – Companion Publication of the 2014 Genetic and Evolutionary Computation Conference, 2014. ISBN 978-1-4503-2881-4. doi:10.1145/2598394.2602287.

de Wynter, On the bounds of function approximations, in: Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation, Springer International Publishing, 2019, pp. 401–417.

10.

Doke and

Gaikwad, Survey on automated machine learning (AutoML) and meta learning, in: 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2021, pp. 1–5. doi:10.1109/ICCCNT51525.2021.9579526.

11.

Ferreira,

Pilastri,

C.M.

Martins,

P.M.

Pires and

Cortez, A Comparison of AutoML Tools for Machine Learning, Deep Learning and XGBoost (2021), 1–8, in: 2021 International Joint Conference on Neural Networks (IJCNN). doi:10.1109/IJCNN52387.2021.9534091.

12.

Ferro,

G.D.

Silva,

F.B.

de Paula,

Vieira and

Schulze, Towards a sustainable artificial intelligence: A case study of energy efficiency in decision tree algorithms, Concurrency and Computation: Practice and Experience n/a(n/a) (2021), e6815. doi:10.1002/cpe.6815.

13.

Feurer,

Klein,

Eggensperger,

J.T.

Springenberg,

Blum and

Hutter, Auto-sklearn: Efficient and robust automated machine learning, in: Automated Machine Learning: Methods, Systems, Challenges,

Hutter,

Kotthoff and

Vanschoren, eds, Springer International Publishing, Cham, 2019, pp. 113–134. ISBN 978-3-030-05318-5. doi:10.1007/978-3-030-05318-5_6.

14.

Ganapathy, A Study of Genetic Algorithms for Hyperparameter Optimization of Neural Networks in Machine Translation, 2020.

15.

D.E.

Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, 1st edn, Addison-Wesley Longman Publishing Co., Inc., USA, 1989. ISBN 0201157675.

16.

K.M.

Hamdia,

Zhuang and

Rabczuk, An efficient optimization approach for designing machine learning models based on genetic algorithm, Neural Computing and Applications33(6) (2021), 1923–1933. doi:10.1007/s00521-020-05035-x.

17.

He,

Zhao and

Chu, AutoML: A survey of the state-of-the-art, Knowledge-Based Systems212 (2021), 106622. doi:10.1016/j.knosys.2020.106622.

18.

Heffetz,

Vainshtein,

Katz and

Rokach, DeepLine: AutoML tool for pipelines generation using deep reinforcement learning and hierarchical actions filtering, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD’20, Association for Computing Machinery, New York, NY, USA, 2020, pp. 2103–2113. ISBN 9781450379984. doi:10.1145/3394486.3403261.

19.

J.H.

Holland, Genetic Algorithms, Scientific American (1992).

20.

Hong,

Li,

Liu,

Yang and

Tang, Multi-objective evolutionary optimization for hardware-aware neural network pruning, Fundamental Research (2022), https://www-sciencedirect-com-443.web.bisu.edu.cn/science/article/pii/S2667325822003405 . doi:10.1016/j.fmre.2022.07.013.

21.

C.-H.

Hsu,

S.-H.

Chang,

J.-H.

Liang,

H.-P.

Chou,

C.-H.

Liu,

S.-C.

Chang,

J.-Y.

Pan,

Y.-T.

Chen,

Wei and

D.-C.

Juan, MONAS: Multi-Objective Neural Architecture Search using Reinforcement Learning, 2018, arXiv:1806.10332. doi:10.48550/ARXIV.1806.10332.

22.

Jian,

Zhou and

Liu, Densely connected convolutional network optimized by genetic algorithm for fingerprint liveness detection, IEEE Access9 (2021), 2229–2243. doi:10.1109/ACCESS.2020.3047723.

23.

Johnson,

Valderrama,

Valle,

Crawford,

Soto and

Ñanculef, Automating configuration of convolutional neural network hyperparameters using genetic algorithm, IEEE Access8 (2020), 156139–156152. doi:10.1109/ACCESS.2020.3019245.

24.

Kaggle, State of Data Science and Machine Learning 2021, Technical Report, 2021, https://www.kaggle.com/kaggle-survey-2021.

25.

Karl,

Pielok,

Moosbauer,

Pfisterer,

Coors,

Binder,

Schneider,

Thomas,

Richter,

Lang,

E.C.

Garrido-Merchán,

Branke and

Bischl, Multi-Objective Hyperparameter Optimization – An Overview, 2022, arXiv:2206.07438. doi:10.48550/ARXIV.2206.07438.

26.

Karl,

Pielok,

Moosbauer,

Pfisterer,

Coors,

Binder,

Schneider,

Thomas,

Richter,

Lang,

E.C.

Garrido-Merchán,

Branke and

Bischl, Multi-Objective Hyperparameter Optimization – An Overview, 2022, arXiv:2206.07438. doi:10.48550/ARXIV.2206.07438.

27.

LeDell and

Poirier, H2O AutoML: Scalable Automatic Machine Learning, 7th ICML Workshop on Automated Machine Learning (AutoML), 2020.

28.

Lee,

Kim,

Kang,

D.-Y.

Kang and

Park, Genetic algorithm based deep learning neural network structure and hyperparameter optimization, Appl. Sci. (2021). doi:10.3390/app11020744.

29.

Li,

Yang,

M.A.

Islam and

Ren, Making AI Less “Thirsty”: Uncovering and Addressing the Secret Water Footprint of AI Models, 2023.

30.

J.D.

Lohn,

W.F.

Kraus and

G.L.

Haith, Comparing a coevolutionary genetic algorithm for multiobjective optimization, in: Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No. 02TH8600), Vol. 2, 2002, pp. 1157–1162. doi:10.1109/CEC.2002.1004406.

31.

Mitchell, An Introduction to Genetic Algorithms, MIT Press, Cambridge, MA, USA, 1998. ISBN 0262631857.

32.

Nagarajah and

Poravi, A review on automated machine learning (AutoML) systems, in: 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), 2019, pp. 1–6. doi:10.1109/I2CT45611.2019.9033810.

33.

N.O.

Nikitin,

Vychuzhanin,

Sarafanov,

I.S.

Polonskaia,

Revin,

I.V.

Barabanova,

Maximov,

A.V.

Kalyuzhnaya and

Boukhanovsky, Automated evolutionary approach for the design of composite machine learning pipelines, Future Gener. Comput. Syst.127 (2022), 109–125. doi:10.1016/j.future.2021.08.022.

34.

R.S.

Olson,

Bartley,

R.J.

Urbanowicz and

J.H.

Moore, Evaluation of a tree-based pipeline optimization tool for automating data science, in: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO’16, Association for Computing Machinery, New York, NY, USA, 2016, pp. 485–492. ISBN 9781450342063. doi:10.1145/2908812.2908918.

35.

Patterson,

Gonzalez,

Le,

Liang,

L.-M.

Munguia,

Rothchild,

So,

Texier and

Dean, Carbon Emissions and Large Neural Network Training, 2021, arXiv:2104.10350. doi:10.48550/ARXIV.2104.10350.

36.

Pfisterer,

Coors,

Thomas and

Bischl, Multi-Objective Automatic Machine Learning with AutoxgboostMC, 2019. doi:10.48550/ARXIV.1908.10796.

37.

I.S.

Polonskaia,

N.O.

Nikitin,

Revin,

Vychuzhanin and

A.V.

Kalyuzhnaya, Multi-objective evolutionary design of composite data-driven models, in: 2021 IEEE Congress on Evolutionary Computation (CEC), 2021, pp. 926–933. doi:10.1109/CEC45853.2021.9504773.

38.

Rani and

Sharma, An optimized framework for cancer classification using deep learning and genetic algorithm, Journal of Medical Imaging and Health Informatics7 (2017), 1851–1856. doi:10.1166/jmihi.2017.2266.

39.

H.S.

Sætra, AI in Context and the Sustainable Development Goals: Factoring in the Unsustainability of the Sociotechnical System, Sustainability13(4) (2021), https://www.mdpi.com/2071-1050/13/4/1738. doi:10.3390/su13041738.

40.

Schwartz,

Dodge,

N.A.

Smith and

Etzioni, Green AI, Communications of the ACM63(12) (2020), 54–63. doi:10.1145/3381831.

41.

Sharma and

Rani, An optimized framework for cancer classification using deep learning and genetic algorithm, Journal of Medical Imaging and Health Informatics7 (2017), 1851–1856. doi:10.1166/jmihi.2017.2266.

42.

Smela,

Toumi,

Świerk,

Francois,

Biernikiewicz,

Clay and

Boyer, Rapid literature review: Definition and methodology, J Mark Access Health Policy11(1) (2023), 2241234. doi:10.1080/20016689.2023.2241234.

43.

Strubell,

Ganesh and

McCallum, Energy and policy considerations for deep learning in NLP, 2019, arXiv preprint arXiv:1906.02243.

44.

Sudhakar,

Sze and

Karaman, Data centers on wheels: Emissions from computing onboard autonomous vehicles, IEEE Micro43(1) (2023), 29–39. doi:10.1109/MM.2022.3219803.

45.

Vinuesa,

Azizpour,

Leite,

Balaam,

Dignum,

Domisch,

Felländer,

Langhans,

Tegmark and

F.F.

Nerini, The role of AI in achieving the sustainable development goals, Nature Communications11(233) (2020). doi:10.1038/s41467-019-14108-y.

46.

Xiao,

Yan,

Basodi,

Ji and

Pan, Efficient Hyperparameter Optimization in Deep Learning Using a Variable Length Genetic Algorithm, 2020.

47.

A.M.

Yokoyama,

Ferro and

Schulze, A multi-objective hyperparameter optimization for machine learning using genetic algorithms: A green AI centric approach, in: Advances in Artificial Intelligence – IBERAMIA 2022,

A.C.

Bicharra Garcia,

Ferro and

J.C.

Rodríguez Ribón, eds, Springer International Publishing, Cham, 2022, pp. 133–144. ISBN 978-3-031-22419-5.

48.

Young,

Rose,

Karnowski,

S.-H.

Lim and

Patton, Optimizing deep learning hyper-parameters through an evolutionary algorithm, 2015, pp. 1–5. doi:10.1145/2834892.2834896.

49.

Yuan,

Wang,

Xue and

Zhang, Particle Swarm Optimization for Efficiently Evolving Deep Convolutional Neural Networks Using an Autoencoder-based Encoding Strategy, IEEE Transactions on Evolutionary Computation (2023), 1–1. doi:10.1109/TEVC.2023.3245322.

50.

Yuan,

Wang,

G.M.

Coghill and

Pang, A Novel Genetic Algorithm with Hierarchical Evaluation trategy for Hyperparameter Optimisation of Graph Neural Networks, 2021, CoRR, https://arxiv.org/abs/2101.09300arXiv:2101.09300.

Multi-objective hyperparameter optimization approach with genetic algorithms towards efficient and environmentally friendly machine learning

Abstract

Keywords

1. Introduction

2. Background and related works

1 https://www-periodicos-capes-gov-br.ez24.periodicos.capes.gov.br/

3.1. Basic elements of genetic algorithms

3.2. GA design

4 https://archive.ics.uci.edu/ml/datasets/HIGGS

References

¹
https://www-periodicos-capes-gov-br.ez24.periodicos.capes.gov.br/

⁴
https://archive.ics.uci.edu/ml/datasets/HIGGS