Abstract
Patients with lung cancer can only be diagnosed and treated surgically. Early detection of lung cancer through medical imaging could save numerous lives. Adding advanced techniques to conventional tests that offer high accuracy in diagnosing lung cancer is essential. U-Net has excelled in diversified tasks involving the segmentation of medical image datasets. A significant challenge remains in determining the ideal combination of hyper parameters for designing an optimized U-Net for detailed image segmentation. In our work, we suggested a technique for automatically generating evolutionary U-Nets to detect and segregate lung cancer anomalies. We used three distinct datasets, namely the LIDC-IRDC Dataset, Luna 16 Dataset, and Kaggle Dataset, for training the proposed work on lung images. Our results, examined with six distinct evaluation criteria used for medical image segmentation, consistently demonstrated the highest performance. More specifically, the GA-UNet outperforms conventional approaches in terms of an impressive accuracy rate of 97.5% and a Dice similarity coefficient (DSC) of 92.3%.
Introduction
The world’s population is aging, growing, and adopting more cancer-causing activities, notably smoking, in economically developing nations, which contributes significantly to the rising global cancer mortality rate. Males are more likely than females to get lung cancer, which accounts for 23% of all cancer fatalities and 17% of all cancer-related deaths [1]. Cancer death rates for individuals who are uneducated are more than twice those of the most educated. Around 37% of premature cancer-related deaths among humans aged 25 to 64 years every year [2]. Worldwide, lung cancer has the highest mortality rate [3]. Usually, lung nodules are caused by lung cancer. Lung nodules come in two varieties: benign and malignant. While benign-type nodules are non-dangerous and do not spreads to other body parts, malignant nodules are cancerous and have the possibility of spreading to other body parts. The first step in preventing lung cancer is to accurately identify malignant nodules. If the cancer is already advanced, the patient’s life expectancy would be shortened, and treatment would be challenging.
The goal of computer-aided treatment is to quickly and accurately diagnose many patients in need of medical remedies. This automation cuts down on human error and saves a lot of time. There is a huge need for automated models like neural networks that can execute the same tasks due to the slowdown process by manual segmentation issues. Deep learning in medical image analysis study ideas aids in resolving this issue [4].
Medical image segmentation is the most critical operation in medical image analysis and processing because it facilitates the region of interest and its characteristics. Due to the growing size of medical image modalities and the large volume of images that need to be investigated, manually segmenting medical images becomes a challenging and time-consuming process. Deep learning models can autonomously acquire the attributes of medical images and use high-dimensional abstraction for segmentation. Cancer experts can segment the lung on a CT scan more accurately, consistently, and effectively by using deep learning techniques [5].
For segmenting medical images, Ronneberger [6] introduced the U-Net, which was built as a U-shape framework. In order to learn high-featured information from small datasets, the U-Net design enables the network to reuse knowledge from earlier layers by connecting all the up-sampling and down-sampling sides. By weighted connections on both pathways, the U-shaped U-Net allows the network to reuse the knowledge gained from earlier layers. Enhanced learning from a small sample set is possible with UNet with random weight initialization.
U-Net’s shallow structure is more successful at memory efficiency than other neural networks with deep architectures [6]. Residual connections of U-Net [7] and the Fully Convoluted Neural Network [8] of U-Net can handle variable sizes of medical images and can solve the vanishing gradient problem.
This aids in extracting the most important characteristics from images and aiming to achieve high accuracy and precision using the datasets that are supplied. Numerous researchers have recently changed the original U-Net [6]. The Ronneberger [6] original U-Net has therefore been the subject of several modifications [9–14]. Identifying parameters and generating the best-performing U-Net from a pool of modified U-Nets is a challenging task.
To resolve this issue, an optimization algorithm called a genetic algorithm takes its inspiration from biological evolution, which involves mating the best chromosomes together to produce offspring [15]. By modifying existing solutions, genetic algorithms generate new ones. The genetic algorithm is used in the sections that follow to investigate strong possibilities inside this search space effectively.
Population for GA algorithms comes from U-Net Variants [16] that are proposed with various hyper parameters. The U-Net variations’ performance is used to compute the fitness function values, and GA chooses samples with high fitness values. To create new offspring, GA uses crossover to swap out the traits of a few chosen U-Net variations. In the mutation process, the implemented U-Net models are compared with the generated offspring’s characteristics, and any required changes are made. This stage provides an optimized U-Net that is then used to train medical lung images. Automatically Optimized U-Net generates the following steps as Each chromosome (U-Net Model) initialized with fitness values Apply reproduction operator for each selected chromosome Design the offspring U-Net with required attention and residual blocks
The following sections provide more details about this work. The main points of this article are: An effective method of classifying and detecting lung tumors using deep learning We observed that the network’s performance was enhanced by incorporating the GA and UNet into the model. Three datasets, the LIDC-IRDC Dataset, the Luna 16 Dataset, and the Kaggle Dataset, were used for the evaluations in GA-UNet for CT medical images. The suggested method outperforms other cutting-edge methods in comparison, with an accuracy of 97.5% and a dice coefficient of 92.3%.
Related work
Lung cancer is the second most frequent cancer to be diagnosed, and it is the major cause of cancer-related fatalities [2]. The development of autonomous lung cancer diagnostic algorithms relies heavily on the accuracy of medical image segmentation. Predicting the parameters of DN models in segmenting medical images is a challenging task that needs more exploration.
Deep learning with augmentation is producing promising results in classification problems [17]. Deep Learning Auto-encoders can also reconstruct low-dimensional data into its original form [18]. Regularization factors added to DL can guide loss optimization during the model training process [19]. Deep learning techniques are widely accepted learning techniques that allow a DL model to be precisely trained in a variety of medical imaging modalities, including MRI, CT, and others. Sample CT images are also collected from the LIDC-IRDC dataset [26].
U-Net was introduced by Ronneberger et al. (2015) [6] for segmenting biological images. U-Net has performed admirably in segmentation in the healthcare image analysis field. Oktay et al. [9] used attention gates in U-Net to increase the accuracy and sensitivity of the model. Hang F et al. [10] designed a model where the deep internal features of the CT images were successfully extracted using U-Net combined with residual convolutional blocks, and the features from all layers of ResNet were combined as one result.
Zhang et al. [11] developed AResU-Net and added consecutive attention blocks in encoder and decoder paths, and residual blocks were used for creature information in up sampling. Zhang et al. [12] integrated attention gates and skipped connection residual blocks to highlight important features from irrelevant, noisy features. Later, Maji et al. [13] designed a model to guide learning in a decoder and generate informative feature maps. The attention gate guider focuses on relevant information instead of allowing all necessary features to be residual blocks.
Siddique et al. [16] summarize different modifications of U-Net in medical image segmentation. In that to find interdependencies in spatial and channel temporal information, Wei Chen et al. [20] suggested a multiple attention 3D U-Net (MAU-Net) that includes a double attention blocks at the bottle-neck of the U-Net and obtained an Average similarity coefficient value of 95%. Run Suet al. [21] developed a model, which combines several convolution sequences with various receptive fields, and significantly enhances semantic segmentation. Chen et al. [22] elaborated on a system that nested dense convolutional blocks with residual connections added to UNet to reduce the semantic disparity among feature sets. Ding et al. [23] proposed a deep CNN for accurate pulmonary detection in medical images.
In a pool of UNet variants, selecting the best U-Net chromosome as an offspring is initiated by a genetic algorithm [15]. Even though automatic CNN architecture with manual intervention produces a higher performance than automatic CNN architecture [24], in automated CNN weight values can be easily updated for selected chromosomes [25]. Lima et al. [27] proposed the use of a grammar-based automatic design of DNNs (deep neural networks) for medical image segmentation. This approach employs grammar to describe the structure of neural networks and rules to develop them. Xie et al. [29] discussed the possibility of generating deep CNN structures automatically. They applied a number of possible structures as search spaces and allowed genetic algorithms to efficiently traverse that large state space. Xie et al. [28] discussed the possibility of generating deep CNN structures automatically. They applied a number of possible structures as search space and allowed genetic algorithms to efficiently traverse over that large state space.
Esfahanian et al. [29] suggested methods such as steady-state genetic algorithms, generational genetic algorithms, and elitism genetic algorithms (GA) for training convolutional neural networks. Bhuvaneswari et al. [30] proposed a method for detecting cancer-affected lung nodules using a genetic algorithm combined with the K-NN algorithm. Kaur et al. [31] developed a framework for segmenting medical images using VGG transfer learning techniques along with a genetic algorithm.
Arif et al. [32] proposed the segmentation of the tumor-affected sections by applying a genetic algorithm. Later, the selected images were used in training the U-Net. Popat et al. [33] discussed the Genetic Algorithm (GA) for choosing hyper parameters of the U-Net model for retinal blood vessel segmentation. Wei et al. [34] developed a model to avoid the overhead of manual U-Net design and applied genetic U-Net to segment retinal vessels.
Methodology
Lung images containing malignant nodules must be segmented with more precision than other normal images. Designing more efficient segmentation architectures that correctly identify damaged regions in medical images is greatly sought. We present a GA for constructing a UNet for segmenting medical images. An evolutionary algorithm uses the three fundamental genetic operations of selection, crossover, and mutation. In the first phase of this strategy, we outline a technique for encoding the U-Net structure as a fixed-value string. Multiple genetic procedures, including selection, crossover, and mutation, are applied in the second stage to search the state space efficiently [15]. Figure 1 depicts the overall framework of the proposed method.

Overall framework of the proposed method.
To create a new generation, the initial population’s most well-fitting samples are selected. From the chosen pool, a pair of samples that are thought of as parents are chosen for the reproduction process. A new solution is produced by using crossover procedures that mostly inherit the traits of their parents. After that, the offspring is subject to mutation until a new population of solutions is produced. The new candidates produced by the genetic algorithm are used for the next level. After executing GA for G generations, optimal U-Net is obtained as an offspring. This optimized GA-UNet was trained using three different datasets of lung nodules, such as Luna16, LIDC-IRDC, and Kaggle, for image segmentation. The following sections discuss how the methodology was implemented and the results obtained.
The foundation of genetic algorithms is natural evolution. It is made up of a population that is a modification of the U-Net architecture. Every member of the population is a solution that has been described with the use of convolutions, pooling layers, residual blocks, and attention blocks. The hyper parameters of the U-Net variants are shown in Table 1 as a population. Each solution is often represented as an array. Crossover operations might well be employed because of an individual’s fixed size and alignment. Because of this, offspring with desirable traits are produced from parents who are technically the fittest. The GA ends when the number of generations has reached its maximum or the offspring have reached a satisfactory level. Repetitive selection, crossover, and mutation processes enhance the population of solutions. The following algorithm indicates the steps involved in applying genetic algorithms to U-Net variants.
Population
In this research, we examined random U-Net samples as a population. For each round of selection, this approach may yield different architectural U-Nets as individuals. In this study, a population of different U-Net structures is used by the evolutionary genetic algorithm (GA). Each U-Net structure has a different accuracy level in lung nodule segmentation. A collection of samples that were categorized together based on how many convolutional, residual, and attention layers they had. Table 1 depicts the sample U-Net variants considered for the population. The population’s diversity helps avoid local maxima and maintain population diversity. From the existing state space, the evolutionary algorithm with population diversity generates a better optimum solution.
Encoding
Encoding each individual in the population is the first step in implementing a genetic algorithm. In a chromosome representation, U-Net convolutional filters and pooling layers are shown as a real-valued array. Each chromosome segment is represented by a convolution layer with filter size suffixed, an attention layer, and a residual block. Convolution layers, bottleneck layers, residual blocks, and attention blocks are indicated, respectively, by the letters C, BN, R, and A. According to research by [6], each chromosome element roughly corresponds to a convolution layer filter with all of its weights and bias values, or a layer full of the active connection weights for all of the neurons. Table 1 displays the sample U-Net variants and their real value encodings considered for further processing.
Selection
In this research work, the influence of natural selection on evolution is used to guide sample selection. In this investigation, the accuracy of U-Net variants is used as a fitness parameter in the selection process. The process of selection in GA involves picking the fittest samples and producing offspring of the following generation. In this way, the population’s “fittest” samples are chosen, while the “least fit” individuals are discarded. With the use of a roulette wheel selection method, an evolutionary algorithm, GA, chooses the subsequent population. The selection procedure for the roulette wheel uses a linear search and includes a weighted wheel based on each individual’s fitness. All individuals are taken into account when selecting the population, which will increase population variety and speed up algorithm convergence.
Crossover
A genetic algorithm combines the traits of existing solutions to create a new one and is motivated by the function of reproduction in living organisms. The crossover techniques attempt to produce offspring by taking characteristics from one parent and fusing them with attributes from another parent. There are several crossover methods, including single-point, two-point, multipoint, uniform, and three-point crossover. In our work, we used two-point crossover techniques to exchange the characteristics of selected chromosomes.
All of the offspring generated if the crucial genetic operator crossover is skipped will be exact copies of their parents. By doing a crossover, the features of the parents will be exchanged. A probability of crossover of 100% means that all of the offspring were produced by crossover, whereas a probability of 0% means that none of them were produced by crossover. The two-point crossover over the chosen U-Net variant chromosomes is shown in Fig. 2. To produce the new modified offspring, chromosome configurations are swapped.

Multipoint crossover on selected U-Net variants.
Mutation will substitute an appropriate value for the specified random value. The DNA of an evolutionary living organism is imitated by the mutation process. In order to create the new population, these modifications or changes must be made to some pieces of the population that was formed. Many different methods can be used to carry out mutation operations.
The three crucial operations of flipping, interchanging, and reversing are all included in the mutation. Genetic algorithm crossover could result in an impractical solution. Genetic differences might originate from a minimum mutation rate. A maximum mutation rate might result in the loss of effective solutions. So, a successful genetic algorithm has to properly tune these mutation parameters. A bit’s position is altered by merely altering its value for the purpose of representing an individual’s actual value.
With a series of sequential transposition convolution processes, the down sampling approach on the left effectively recovers image size, improves image segmentation accuracy, and enables improved reconstruction of features. The up sampling method on the right successfully restores image size, enhances the precision of image segmentation, and enables improved detail reconstruction. The encoder path we prepared for this research consists of four layers.
The convolution, pooling layers, and residual blocks are some of the components found in each layer, depending on the U-Net variation design. We take into consideration the input of size 256 X 256 from Computed Tomography (CT) modalities, which is common to all variants. Down sampling results in a doubling of the number of semantic feature maps and a halving of the image content. The output of the preceding layer is passed on as input to the subsequent layer. The link between the up- and down-sampling paths lies at the bottleneck layer. All of the variations of our suggested model include a bottleneck layer (BN) with 512 feature maps. Figure 3 depicts an interchanging mutation in detail.

Mutation on selected genes.
A 16 x 16-pixel reduced picture is included in the BN layer. With regard to U-Net variations, the BN layer also comprises residual blocks with various size dilation rates. The result of the Bottle Neck (BN) layer has two paths, one of which is used for attention gate blocks and the other for up-sampling.
Similar to the down sampling path, the up sampling path also has four layers. Each decoder link has a skip connection from the encoder layer and is connected to a preceding layer. Attention block output is appended to the preceding layer’s output. The output was then sent to the residual blocks. Residual blocks double the image size while halving the size of the feature maps. Each layer’s outputs are compared, and the loss function is computed. The ultimate output for the complete U-Net model is considered from the last up-sampling layer, which is 256 x 256.
As U-Net deepens, there is a chance that spatial information will be lost, significantly lowering the accuracy of image segmentation. The context information is carried over the skip connections between the up sampling and down sampling. Retrieving and restoring the spatial information lost in the down sampling path is aided by the attention mechanism. It ignores other unnecessary aspects in order to concentrate on a particular area of the image. Blocks of attention can decrease the significance of extraneous portions, reducing false positives. Two sections provide input to attention blocks: the decoder below it and the skip connection of the corresponding encoder path. Without cropping the edges of the lung areas, attention blocks can suppress unimportant information. In order to maintain the areas affected by cancer, attention parameters identify essential components and trim the remaining ones.
Residual blocks
Deep feature maps from the up-sampling path are combined with shallow feature maps from the down-sampling paths through skip connections. The recovery of fine features from the target images is effectively accomplished using these skip connections. For this fine-tuned segmentation, complicated mask generation is required in the background. The vanishing gradient problem can occur when deep neural networks have more layers than necessary. That makes understanding identity functions more difficult. This issue is resolved by attaching residual blocks for each unit by utilizing the identity mapping function.
Accuracy will gradually saturate and degrade. Deep learning networks can employ residual blocks to take advantage of shallower networks’ enhanced learning characteristics.
Population size
Even though selecting a population size in a genetic algorithm is crucial, it is not a fixed value for all problems. Depending upon the available resources and nature of the problem domain, the number of individuals (population size) can be preferred. A larger population size slows down the convergence process, increases complexity, and encourages exploration. To avoid these Problems, a small population size is preferred for our experimentation. At least to maintain genetic diversity to a certain extent, 20 varieties of U-Net are considered the initial population. Randomly generated candidates are displayed in Table 1.
Initial population selected at random
Initial population selected at random
The crossover and mutation operations’ (Pcr and Pmt) probabilities were set to 0.95 and 0.8, respectively. The population size was 20, and the number of generations was 100. Using Google Colab, the models were trained in Python. With generations limited to G = 100, the generated U-Net may guarantee improved performance. After a successful GA run, the generated optimized architecture was trained across a number of epochs to guarantee training convergence. Preprocessed samples from the datasets were divided (80 : 20) for training and testing, respectively.
For the optimized U-Net, the initial learning rate for training was set at 0.001. A small learning rate may lead to fast convergence. In the encoding and decoding layers of the optimized U-Net, batch size is fixed at 8 to generate the feature maps. The number of training epochs was determined to be 100. Early stopping is also considered to avoid validation loss. The Adam optimizer and dice coefficient loss functions are considered for training the optimized U-Net. Normalization is used in the preprocessing stage of the image to remove unnecessary details from the medical image.
Model weights are initialized with random values and updated during the training process. In order to produce feature maps with the same dimensionality as the original medical image, the up-sampling path does the deconvolution. In order to enhance the quality of the training data and minimize overfitting, the pixels of the images were augmented with scaling and rotation operations.
Experimentation
Dataset
Most of the time, datasets for segmentation comprise thousands of samples. It takes a lot of work to prepare masks for all of those samples manually. In our experimental model, we use CT (Computerized Tomography) and DICOM images of the lungs. The following Fig. 4(a–d) shows the various-sized lung nodule samples taken from the Kaggle dataset for experimental exploration.

(a–d): Various-sized diameter nodules –CT images from the Kaggle dataset.
The LIDC-IRDC dataset [26], the Kaggle Data collection, and LUNA were used to create the lung datasets. These images are collections of diverse lung cancer-related images from throughout the world. To prepare each sample for training, the dataset must be preprocessed. The noise-removed images are divided into equal portions for training and testing. Table 2 displays the image modalities and data sources used in this approach.
For the purposes of this research, the performance of the optimized U-Net is assessed using a variety of evaluation metrics. Accuracy (A), precision (PR), recall (R), Specificity (S), Dice Score and F1-Score. The following Equations (1)–(6) depict the commonly used metrics for measuring performance.
In the above Equations (1), (2), (3), (4), (5), and (6), TP, TN, FP, FN, PR, and P respectively represent True Positive, True Negative, False Positive, False Negative, Precision and Recall.
This research work is carried out using the original U-Net code in Google CoLab Python. An evolutionary algorithm implemented in Python 3.7.4 and the resultant optimized U-Net trained using different CT images The data source images are in raw format. Without preprocessing, examining the images might produce inaccurate findings. Preprocessing medical images is necessary to prevent these types of outcomes. Strategies for removing undesirable portions from images based on size, location, and intensity.
For training our produced, optimized U-Net, we have collected 1076 images with 256 x 256 resolutions from various sources, as shown in Table 2. The sizes of the input images are [1076, 256, 256], where 1076 is the number of images and 256 x 256 is the size of each image. Many nodules, including tiny, solid, unusually shaped, and nodules with flaws, can be considered in the test images. We have implemented shift, rotation, and vertical and horizontal augmentations to our training samples in order to prevent overfitting. Lung images and ground truth are used for training after the preprocessing. With the help of our suggested genetic algorithm, U-Net, lung nodule segmentation is accomplished effectively (GA-UNet).
Lung images used in the research
Lung images used in the research
Figure 5 displays the results of cancer-affected lung knot segmentation. where column (A) depicts the original lung images, ground truth images are shown in the second column (B), segmentation by UNet [6] is displayed in the third column (C), segmentation of lung nodules by Attention UNet [6] is depicted in the fourth column (D), and predicted outcomes by our proposed GAUNet are shown in the fifth column (E). Our research process is carried out using GA-UNet and is compared with UNet [6], AUNet [7], ResU-Net [8], Attention ResUNet [9], Attention Gate ResUNet [10], and ARes-UNetGD [11] under measurements like accuracy, precision, recall, specificity, F1-Score, and Dice-Score.

Predictions of our proposed GA-UNet model and the state-of-the-art models on Lung images; (column –wise) 1. Original Image 2. Ground truth 3. UNet4. AUnet 5.GA- Unet.
Figure 6 shows the results of the comparison of our method with other U-Net structures such as UNet [6], AUNet [7], ResU-Net [8], Attention ResUNet [9], and Attention Gate ResUNet [10].

Models vs. Dice score, accuracy, and precision.
Figure 7 depicts the comparison results of our proposed evolutionary method with UNet [6], AUNet [7], ResU-Net [8], Attention ResUNet [9], and Attention Gate ResUNet [10].

Models vs. F1-score, sensitivity, and specificity.
Based on the observation from the result shown in Table 3, our methods perform well in terms of performance factors such as dice score, accuracy, and precision.
Our experiment results, displayed in Table 4, confirm that the GA-U-Net model works better than all other quoted models in terms of performance metrics such as F1-Score, sensitivity, and specificity. Considering the results in Tables 3 and 4 , our suggested GA-UNet model performs excellently.
Overall analysis under Dice Score, Accuracy and Precision
Overall analysis under F1 Score, Sensitivity and Specificity
In Fig. 8, the black line indicates the dice loss function curve of U-Net on the training set, and the red line represents the dice loss function of AU-Net on the training set. The solid blue line depicts the ResU-Net dice loss function on the training dataset, and the pink line shows the dice loss function of AResU-Net on the training dataset. The green solid line depicts the disc loss function curve of optimized evolution U-Net (GA-UNet).

Loss curve of different UNet on training data.
In the training process, AResU-Net depicts the fastest convergence due to residual models. Even ResU-Net accelerates the fastest convergence by enabling shallow networks. Though the other three network tests (AU-Net, ResUNet, and AResUnet) were compared, it is evident that GAU-Net performs better in training data.
Reduced training error in GA-UNet, reduced overfitting, and increased accuracy on verification data as well.
This research expands the field of disease detection by developing an effective deep-learning model for the advanced diagnosis of lung cancer. In this study, we employed the GAU-Net model to identify and classify lung cancer using medical images. The Genetic Algorithm (GA) was employed to construct an optimized U-Net, which provides a reliable method of identifying cancer-affected regions. Based on the evaluation of our analysis, our model outperforms other alternative approaches with an accuracy rate of 97.5% and a dice similarity coefficient (DSC) of 92.3%. Despite the encouraging results with G = 100 generations, there are still numerous areas for further research, specifically in exploring the use of high-dose medical CT lung images to enhance performance and reduce cost.
Funding statement
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Conflicts of interest
The authors declare that they have no conflicts of interest to report regarding the present study.
