Abstract
Marine object localization from UAVs is required for different areas, including marine research, environment monitoring and topographic surveys. Herein, we introduce a new undersea object detection method; synthesizing shifted split-merge segmentation and fuzzy-oriented generative adversarial networks. Split-merge segmentation is the region-based model, where this process is achieved via splitting and merging regions based on specified similarity measures. The previous split-merge segmentation algorithm was modified to employ a shifted window approach that is better at detecting undersea objects with changing shapes and sizes to address this issue. In addition to Guiding the segmentation quality of underwater object detection, a Fuzzy-Guided Generative Adversarial Network (FG-GAN) is proposed. The generator network aimed to produce artificially photographed images beneath the water, and the discriminator network was used to differentiate between real and fabricated pictures. The generator system is trained to use a fuzzy loss function with fuzzy membership functions to explain the level of uncertainty and vagueness in the underwater environment by controlling the behaviour of underwater entities. We positioned the above-described method and compared it to the traditionally used image partition and object detection approaches. The outcomes from these experiments indicate that our proposed method is more accurate than the existing approaches in segmenting the objects and identifying the objects accurately with 95% and has a reduced loss of 0.3. The proposed approach could be applied in a broad spectrum of underwater facilities such as marine hydrology, work with remote sensing equipment and underwater robotics.
Keywords
Introduction
Submarine item locating using UAVs is very interesting and essential for marine research projects like environmental analysis, science, and decision-making for autonomous underwater machine projects. Yet, it impacts deep-sea scientific research significantly through factors such as the complex environment, low visibility, scattering, and the absorption of light. We face these complications concerning segmenting the images and classifying the model. Hence, a combination of both techniques should be deployed. This introduction focuses on the need for underwater object detection, the use of image segmentation and classification models, the challenges in existing approaches and the main contributions proposed by the current study. 1 The segmentation procedures accurately delineate objects of interest from a background in underwater object detection techniques. However, conventional segmentation methods may not give tight results for this environment due to the difficulties above. Henceforth, it is imperative to devote more effort to developing advanced underwater scene division methods. Image segmentation techniques can be broadly classified into two categories: regional-specific segmentation and edge-specific segmentation.
Region-based segmentation groups multiple pixels into regions with similar characteristics, while Edge-based segmentation detects boundaries between neighbouring regions. 2
The classification models are executed to choose the characterized segments of the images. Much research on different classification models has been ongoing, and decision trees, support vector machines, and neural networks are some examples. Nevertheless, they are not that good of solutions if the place of robots’ activities is underwater because of the specific conditions of that environment. This explanation makes it more necessary to carry out individual underwater object detection classification models. The sliding split-merge segmentation technique is proposed, a broad class of three-leveled region-based segmentation. This method is engineered to address the problems typically encountered by underwater environments by incorporating a shift in window algorithm, which enables the process to produce more accurately layered underwater objects with different shapes and widths. The Underwater scene segmentation is employed by the split-merge algorithm segmentation technique, assessed using an actual dataset, and compared to traditional segmentation approaches. The FG-GAN comprises two neural networks: a generator network used to generate synthetic underwater photos and a discriminator neural network that differentiates real from fake images. A generator network in which the fuzzy loss function is chosen is deployed in the training process, with the fuzzy membership function as a tool to handle the incertitude and vagueness of the underwater environment. The fuzzy loss function is essential in training the generator system within the proposed fuzzy-guided generative adversarial network (FG-GAN) for underwater object detection, as it addresses the uncertainty and vagueness inherent in underwater environments. By incorporating fuzzy membership functions, the fuzzy loss function quantifies the uncertainty associated with underwater images, which is crucial due to challenges like low visibility and light scattering. It guides the generator network in producing artificially photographed images that closely resemble real underwater images, providing a nuanced measure of error that enhances the quality of the generated images. This effective management of uncertainty ultimately improves the overall performance of the FG-GAN, leading to better classification precision and accuracy in identifying underwater objects, even in challenging conditions where traditional methods may struggle, achieving high accuracy rates with reduced loss values during training. For the FG-GAN purpose, a dataset consisting of real images of underwater and the traditional classification models are chosen to evaluate its effectiveness.3–5
Usually, the latest underwater target detection approaches have some problems. On the one hand, the segmentation techniques and the classification models are based on the image of the object photographed through an apparatus, but on the other hand, the low visibility, scattering, and absorption of light in underwater conditions may pose problems to that technical approach. Next, the problem of having small or non-variety datasets for the training and testing of the 3-d and 4-d model segmentation and the classification process is also very critical. Furthermore, the computational complexity of the segmentation approaches and classifying models might be as much as the real-time detection of the objects underwater. 6
Solving these problems is one of the proposed work's most important objectives. The main contribution of the proposed work is discussed below.
A technique called shifted split-merge segmentation is being introduced and is devised particularly to deal with the difficulties of underwater environments, such as frame variations, focus, and relatively low-quality images. Fuzzy-Guided Generative Adversarial Network (FG-GAN) works with a fuzzy loss function to better understand uncertainty and ambiguity based on underwater conditions. The generator network aimed to produce artificially photographed images under the water, and the discriminator network was used to differentiate between real and fabricated pictures. The proposed techniques tackle the scene depth warping effect, the scattering of rays and absorption by the water medium, the complexity of computation, and the errors introduced by the segmentation techniques and classification models.
The main objective of the proposed Fuzzy-Guided Generative Adversarial Network (FG-GAN) for underwater object detection is discussed as follows:
The main objective of the generator in the Fuzzy-Guided Generative Adversarial Network (FG-GAN) is to create artificial underwater images that mimic real-world situations. This network's objective is to improve the training dataset by creating a variety of lifelike representations of underwater environments, ultimately enhancing the accuracy of detection models. By producing these synthetic images, the generator helps improve the training and adjustment of classification algorithms, resulting in more precise detection of underwater objects in diverse and complex conditions.
In summary, this paper proposes two novel methods for underwater object detection: shifted split-merge segmentation and fuzzy-guided adversarial network. Our proposed approaches tackle the problems of light passing through water being partially visible, light being invisible and scattered in the water, and the lack of large and homogeneous underwater datasets, as well as the computational complexity of the segmentation and classification.
Related works
In 2020, Han and the team presented a deep CNN method that can be utilized to detect and identify marine organisms. They employed a pre-trained VGG16 model and finetuned it to the one suited for the exact task. Besides all the data augmentation techniques they tried, they also expanded the data set size. 1 In their work, Madhan et al. (2021) propose a training method for distributed object classification enhancement in shallow waters using deep learning algorithms(YOLOv3). They also opted for the distributed computing approach to enhance the model's performance. The additional data were produced by data augmentation techniques through this. 2 Alshdaifat et al. (2020) proposed a deep learning-based fish segmentation framework, which has outstanding accuracy in underwater video applications. Using the U-Net framework, a convolutional neural network designed for biomedical image segmentation, they incorporated this model into a deep-learning architecture for segmenting tuberculosis images. They utilized data augmentation techniques from the other end of the line to enrich their dataset 3 extensively. Szymak et al. (2020) focused on whether the classification of objects in video underwater can be done using a deep learning neural net employed in some other models (existing models using deep learning neural net? Pre-trained model employments like VGG16, ResNet50 and InceptionV3 train them and later utilize the model for classification. They also applied augmentations to this data and increased their dataset deterministicity. 4 The decomposition of Roy and Talukder (2024) demonstrated the use of deep learning for unambiguous detection of underwater objects and categorization. They used a Faster R-CNN technique; the name defines it as a convolutional neural network that is used to detect objects with minimum headaches. On the other hand, they invested heavily in data augmentation strategies that aimed to add more data to their platforms. 5 Jalal et al. (2020) proposed a deep learning framework for fish species classification and underwater object detection, which employs deep neural networks combined with temporal information. 7
Zhu et al. (2017) described a case of underwater sonar image classification based on deep-learning feature extraction. They applied the finetuning strategy with a pre-trained CNN model, which was recognized for their case. 8 Yeh et al. (2021) proposed a lightweight deep neural network for joint learning of underwater object detection and colour conversion; the network is relatively thin and, hence, is convenient to apply to an autonomous underwater vehicle. They developed a You Only Looked Once (YOLO) model and preceded it with a colour conversion module. 9 Kalaiarasi et al. (2023) described the method of identifying objects in underwater photographs using Faster R-CNN deep learning architecture. 10 Zhu et al. (2018) fed their pre-trained DEped CNN the underwater object images to accomplish object classification in the underwater photos. 11 Chen et al. (2020) proposed an underwater object detection method based on the Invert Multi-Class Adaboost and deep learning technology. 12 Huo et al. (2020) applied deep transfer learning and semisynthetic training data as part of their underwater object classification analysis on side scan sonar images. The results revealed that deep learning algorithms are potent for underwater object detection and classification of cases using the following methods: CNNs, YOLOv3, Faster R-CNN, and transfer learning. 13
Based on machine learning, Salsitti (2020) suggested the neural as the network primary method for patch classification of Posidonia oceanica in underwater images. To begin with, the author has trained the VGG16 network before with the finetuning parameter, which is meant to adapt the network for the particular task the author has. First, pixels from the underwater pictures are extracted and assigned to the category of Posidonia oceanica or the background. 14 Chong et al. (2021) suggested a modified Mask RCNN (Region-based Convolutional Neural Networks) for discriminating and localizing objects from the forward-looking sonar. The authors developed a feature pyramid network and a region proposal network to augment the efficiency of object detection and segmentation. This method involves highlighting the features on the sonar images and then using the same functions to differentiate and separate the underwater objects. 15 Kaur et.al. (2024) proposed a method for species classification using invariant features in a deep-learning approach for underwater environments. It used a pre-trained ResNet50 model and brought it into a state meaningful for the task they worked on species classification. The identifying method brings out invariance features from the pictures that come from the underwater images and categorizes the species accordingly. 16 Wang et al. (2022) proposed a deep learning method for marine object recognition and several deep learning approaches being employed in networks that include CNNs, R-CNNs, and GANs, for marine object detection. 17 Rathi et al. (2017) suggested a convolutional neural network and deep learning for classifying underwater fish species, a technique that uses machine learning. The trained AlexNet model on which they were performing a finetuning process to fulfil the particular objective of fish species recognition. It uses snippets of the subaquatic images to classify the fish species. 18 Islam et al. (2020) proposed a method based on a semantic segmentation approach on a Deep Learning network and dataset for underwater imagery segmentation (benchmark). They pre-trained a network on the DeepLab v3 + network and further preprocessed it for underwater image segmentation. 19
Bajpai et.al. (2021) demonstrated the application of deep learning methods for detecting underwater moving objects by U-Net. It uses U-Net architecture with additions of features, which was a trained model for underwater scenes. This helped the authors to detect moving objects. 20 Zhang et al. (2022) proposed an underwater video analysis approach for the marine fisheries resources monitoring of ocean coast based on deep learning was proposed. The supervised neural network is leveraged to generate underwater features relevant to coastal fisheries’ resources. 21 O'Byrne et al. (2018) suggested the invention of deep nets for underwater image segmentation with photo-realistic datasets. The deep learning model was trained on synthetically generated underwater images, allowing the model of underwater object segmentation in real-life pictures. 22 Domingos et al. (2022) were involved in assessing underwater acoustic data classification methods that used deep learning for coastline surveillance. It has state of the art of deep learning algorithms and techniques for the detection of underwater acoustic signals. Those signals can be used to design surveillance systems. 23 However, Moniruzzaman et al., 2019, suggested a deep learning approach using a Faster R-CNN for detecting the seagrass from underwater digital images. Through key feature extraction, they exploited a pre-trained Faster-RCNN to detect seagrass from underwater photos. 24 Li et al. (2023) considered deep learning for image classification and recognition of aquatic fauna. A survey of various deep learning models and object detection and recognition methods, including aquatic creatures, is used in marine images and videos. 25 Srividhya and Ramya (2017) presented a model consisting of an accurate object recognition method encompassing a learning approach and texture features for water images. 6 The summary of the existing models is given in Table 1.
A literature review on existing segmentation and classification model.
A literature review on existing segmentation and classification model.
The proposed work comprises two modules: shifted split merge segmentation and a Fuzzy-Guided Generative Adversarial Network (FG-GAN) classifier for accurate underwater object detection.
Shifted Split merge segmentation
Split-merge segmentation is a region-based image segmentation approach that gets information using a specific capacity of importance and then develops it for different regions. The beginning is based on the decomposition of the image into small and non-overlapping parts named superpixels. Each superpixel is obtained using the K-means algorithm, and then a label is assigned to them based on similarity to other surrounding superpixels. The clusters are tuned iteratively, making a purposeful delineation of regions whose relevance measure is more significant and more relevant until the fulfilment of a stop condition. Consequently, the pictures can be divided into similar segments, which can be applied to object detection, image compression or scene understanding. Several parameters, such as colour, texture, and shape, can be considered in the current study to quantify the relevance of split-merge segmentation tasks. The choice of relevance measure is dependent upon the application and the image, including the features of the image. For example, brightness-based relevance measures can ideally find the parts of a natural image, and textured-based relevance will deliver the appropriate results for medical images. 26
The effectiveness of Shifted Split Merge Segmentation (SSMS) can be evaluated by comparing it to conventional segmentation methods such as region-based, edge-based, and clustering algorithms. Traditional region-based methods often struggle with varying object shapes and sizes, especially in complex underwater environments. At the same time, SSMS addresses this limitation through a shifted window approach, enabling more flexible and adaptive segmentation that better captures underwater object nuances. Edge-based techniques, which focus on boundary detection, can be sensitive to noise and may fail in low-visibility conditions typical of underwater settings. In contrast, SSMS combines splitting and merging strategies to maintain the integrity of segmented regions, even when edges are not well-defined due to environmental factors. Clustering algorithms like K-means, which rely on predefined criteria for grouping pixels, can yield suboptimal results in heterogeneous underwater scenes. SSMS, however, iteratively refines regions based on similarity measures, allowing for more accurate delineation of objects with non-uniform characteristics. Overall, SSMS demonstrates superior performance in underwater image segmentation by effectively managing challenges such as low visibility, varying object characteristics, and environmental distortions, making it a more robust choice than conventional methods and leading to improved object detection outcomes in complex underwater scenarios.
This is often specified in a formula involving the distance between the salient features of adjacent superpixels using the intersection of the histogram. Shift-merging clipping has many advantages over others. For the first order, the segmentation algorithm can take any sort of object regardless of their ratios and sizes, making it appropriate for segmenting images with complicated structures. On one side, it will create consistent edges, which is significant for specific processes involving compression and face recognition. In this context, it can visualize noisy images and balance the differences across illumination and contrast. Despite this, the shifted split merge segmentation model can also have disadvantages. Likewise, it may divert into partitioning unnecessarily or not polishing edges enough with different response measures and stopping procedures.
27
The workflow of the proposed segmentation is shown in Figure 1. Initially, the image is split into superpixels, and a label is assigned to each superpixel depending on its similarity with nearby ones. The area tags are perfected in stages by recursively breaking and combining depending on the relevance measure until the stop criteria are reached. The end regions matched in a way that a homogeneous area segment of the image showed. The detailed workflow of split-merge segmentation is as follows: The detailed workflow of split-merge segmentation is as follows:
Preprocessing: This procedure involves cleaning up and unifying the image, generally enhancing the segmentation results. It can include noise reduction, grading correction, and spatial transformation. Superpixel Segmentation: The image is temporarily made and then divided into allies, patches that are specified to toast each other. This method can be achieved using the Simple Linear Iterative Clustering Algorithm (SLIC) and others. Initial Labeling: From there, the superpixel for which each superpixel is assigned a label depends on its neighbourhood. It can be implemented by employing a relevance measure that transforms images into a colour code or texture feature format. Iterative Splitting and Merging: The labels are redefined by splitting the regions based on the relevance measure. This is done by considering the features of neighbouring superpixels and determining if they are similar. If they are dissimilar, these two neighbouring superpixels will be merged. Stopping Criterion: The successive division and subsequent combination stages reoccur until some dogma is fully reached. This could be based on factoring the iterations, the percentage change in the labels, or the uniformity of pixels. Post-processing: This step is followed by cleaning up the segmentation bit. Small regions or holes are removed, and the boundaries are also smoothened.
superpixels = SLIC (image, n\_segments = 1000, compactness = 10, sigma = 1)
labels = np.zeros(image.shape[:2], dtype = np.int32)
for i, s in enumerate(superpixels):
labels[s.roi] = i
regions = []
for i in range(max\_iterations):
regions = []
For l in np. unique(labels):
if l == −1:
continue
region = extract region(image, labels, l)
regions.append(region)
relevance\_measures = []
for r in regions:
feature = compute\_feature(r)
relevance = 0
for s in r.neighbors:
if s is None:
continue
f = compute\_feature(s)
d = compute\_distance(feature, f)
relevance += d
relevance\_measures.append(relevance)
region = extract\_region(image, labels, l)
if len(region) < min\_region\_size:
# Split the region
new\_regions = split\_region(region)
for new\_r in new\_regions:
new\_label = add\_region(new\_r, labels)
new\_r.label = new\_label
neighbors = []
for s in region.neighbors:
if s is None:
continue
if s.label == l:
neighbors.append(s)
if len(neighbors) > 0:
index = regions.index(n)
relevance = relevance\_measures[index]
if relevance < min\_relevance:
min\_relevance = relevance
min\_neighbor = n
if min\_neighbor is not None:
merge\_regions(region, min\_neighbor)
new\_label = add\_region(region, labels)
region.label = new\_label
Generative Adversarial Networks (GANs) are a type of deep model for Meta-learning. Generative modelling is an unsupervised learning technique employed in machine learning tasks, which consists of identifying and inferring the relationship within the given data so that the model can generate new examples that may have been taken from the original dataset. GANs consist of two neural networks: the generator and the real versus the fake. The generative model aims at generating novel instances, while the discriminative model tries to classify the two into the wild and generate examples. Both models in this process are trained in a zero-sum game. This illustrates that the generator wants to deceive the discriminator, and the discriminator might succeed in correctly identifying real and artificially generated samples. Generative models function by updating the entire distribution of input variables and can be used to create or generate new cases in the input distribution. GANs are generative models that learn to produce a realistic distribution of new samples without separate training on training examples. Fuzzy logic is a mathematical framework for solving problems involving uncertainty and imprecision in decision-making. It allows for the approach of abstract ideas using verbal variables and fuzzy rules and capturing a human-like way of thinking. Implementing fuzzy logic into GANs builds a framework that allows training and decision-making processes to be controlled, opening doors to developing more robust and human-explainable networks. Uncertainty in decision-making can also be used to manage the training of the GAN, which involves the adaptation of the hyper-parameters of the GAN, such as learning rate or network architectures, through the progress of learning or data characteristics. The FG-GAN is a deep-learning model that uses hyperparameters and settings to improve its underwater object detection performance. The learning rate stabilizes the model, while batch size affects convergence speed and generalization. The number of epochs determines how often the entire training dataset is passed through the network, with higher counts improving learning but increasing the risk of overfitting. The design of fuzzy membership functions in the loss function is crucial for interpreting uncertainty in data, and the parameters defining these functions can significantly influence the model's handling of ambiguous underwater images. Regularization techniques like dropout or weight decay help prevent overfitting, while the choice of optimizer and its parameters affect convergence behaviour. Data augmentation techniques enhance the training dataset's diversity, aiding generalization to unseen underwater conditions. By carefully tuning these hyperparameters and settings, the FG-GAN can achieve improved performance in underwater object detection, addressing challenges like low visibility and light scattering and accurately classifying underwater objects in real-world applications. By the way, fuzzy logic, if followed by the result of GAN post-processing, can provide interpretability and uncertainty level estimation. 30

The working flow of Shifted split merge segmentation.
The FG-GAN classifier becomes a flexible and reliable classifier as it integrates with GANs in the generation of artificial data as well as fuzzy logic. The classifier FG-GAN is based on GAN-trained data samples and the learned classification information that emerges from GAN is adopted for neural networks or SVM classifiers. The decision boundaries are delicate and may require a fuzzy logic classifier to be used during the finalization to tear out unwanted details or confidence measures for the predicted classes. The FG-GAN classifier can easily transfer to different domains, especially with small or imbalanced class-wise data sets. The classifier has several advantages over the traditional classification methods. Firstly, GANs enable the creation of disparate, authentic-looking data that is used with existing datasets to optimize the performance of the classification model. Next, the fuzzy logic integration offers interpretability and uncertainty estimation for the classification process, making the success probability high and the system credible. Further, the classifier accuracy is boosted along the line due to its efficiency in tagging data imbalances deficits or abundances via fuzzy logic-based data augmentation and generation.
31
The architecture of the Fuzzy-Guided Generative Adversarial Network (FG-GAN) classifier is designed to manage the dimension data effectively, as shown in Figure 2. Comprised of a Generator network, the system takes random noise vectors from a latent space as input and translates them to a set of synthetic data that looks very much like the real data. This framework follows the discriminator, which discriminates between real and generated data. This is embedded in the architecture, a Fuzzy Logic Module that adjusts fuzzy logic senses dynamically to accelerate the learning process. Here, membership functions, fuzzy rules, and inference mechanisms are applied to generate and discriminate. Based on the current state of the system, parameters are adaptively tuned. The FG-GAN consists of two parts: picture quality and detection of accents computed after each iteration in the FG-GAN classifier. The classifier's parameters are updated using optimization algorithms based on the computed losses. FG-GAN architecture can harness the ability to deal with higher dimensional data, which leads to informed and flexible decision-making even in complex situations.38,39

An architecture of Fuzzy-Guided Generative Adversarial Network (FG-GAN) classifier.
function create\_discriminator():
model = Sequential([
Flatten(input\_shape = (28, 28, 1)),
Dense(512, activation='relu’),
Dense(256, activation='relu’),
Dense(1, activation='sigmoid’)
])
return Model (Input (shape = (28, 28, 1)), model (Input (shape = (28, 28, 1))))
function create\_gan (generator, discriminator):
discriminator. trainable = False
model = Sequential([generator, discriminator])
RETURN Model (Input (shape = (100,)), model (Input (shape = (100,))))
class fuzzylogicmodule:
function\_\_init\_\_():
self.membership\_functions = {}
self.fuzzy\_rules = {}
self.inference\_mechanisms = {}
function add\_membership\_function(name, variable, function):
self.membership\_functions[name] = function(variable)
function add\_fuzzy\_rule(name, antecedents, consequent):
self.fuzzy\_rules[name] = (antecedents, consequent)
function add\_inference\_mechanism(name, function):
self.inference\_mechanisms[name] = function
The structure of the FG-GAN classification model consists of several layers, as shown in Table 2. The initial surface comprises 256 neurons and 25, 856 parameters with a dense layer. The next layer is constructed using the output of the subsidiary layer and the shaping layer, which transforms the output of the dense layer into a 4-dimensional tensor with a shape (None, 7, 7, 64) where the None parameter represents the batch size. Next is the layer with 64 filters, a kernel size of (5, 5) and stride (2, 2) with a conv2d\_transpose operation. The second layer has 36,928 parameters and produces an exchanged tensor of shape (None, 14, 14, 64). The last layer in the neural network is another conv2d\_transpose layer with 1 filter, (5, 5) kernels and stride (2, 2). This locality sees 577 parameters and produces a tensor with shape (None, 28, 28, 1). Here, we present the reconstruction or the decoder's network of the FG-GAN classifier, which consists of the flatten layer and three dense layers adopted from the discriminator model and used to boost the image quality generated by the FG-GAN.40,41
Model summary of FG-GAN classifier.
The underwater image dataset used to train and test the detection system is defined by numerous essential features that guarantee clear and consistent results. This collection of images includes a variety of pictures taken in different underwater settings, each displaying distinctive features that showcase the difficulties of photographing underwater. The pictures usually come in a resolution greater than 1500 pixels, with popular sizes being 320 × 240 or 320 × 256 pixels, enabling thorough examination of objects below water. Every picture in the dataset is carefully labelled to classify various underwater objects, such as fish, reefs, aquatic plants, wrecks or ruins, human divers, robots, and the seabed. A colour-coded system is used to assign specific colours to each group: black for the background, blue for human divers, green for seaweed, sky blue for wrecks or ruins, red for robots, pink for reefs or invertebrates, yellow for fish or vertebrates, and white for the seabed or rocks. This structured method of adding annotations makes it easier to teach the detection system by providing distinct labels for supervised learning.
Additionally, the dataset is created to cover a broad spectrum of underwater situations, which include changes in light reaching underwater, water visibility, and the existence of marine litter or creatures close to the camera lens. This adds to the complexity of the images and mimics real-life situations that the detection system might face. It is essential to include a variety of conditions to train strong models that can generalize effectively to new data. The dataset is split into separate training and testing groups to improve reproducibility. A designated test set of 110 images is held back for evaluating the model's performance. This ensures the model is tested on new data not seen during training, giving a trustworthy assessment of its generalization abilities.
Another part of the dataset is the test set of 110 images that also. Each object group is built up in its own colour set. Hence, something black means the background, blue is for the human divers, green is for seaweed, and so on – sky blue is for divers’ wrecks or ruin, red is for the robots, pink is for reefs/invertebrates, yellow for fish and vertebrates and white for the sea floor or rocks. The specificity of the domain of the segmentation models makes them have the weak point of the lack of universality in the segmentation process. 42
The underwater images from the Figure 3 are the samples captured in the varying aquatic environments. The scenes show different kinds of problems with the systems. One is changed water transparency, another is lighting conditions, and the last is marine animals or debris near the lenses. To form an excellent underwater image acquisition and compiling system, the challenges must be addressed, and, in particular, the effectiveness of these challenges should be examined. The pictures see this in different scenes, emphasizing innovation in object segmentation and classification tasks. 42

A sample of underwater images.
Figure 4 shows the segmented image samples through the shifted Split-Merge technique. This method breaks down an image into smaller segments using pixels and local texture features but then converts the partition into larger blocks by merging adjacent regions. Segmentation is the process of partitioning the image into distinct segments based on their heterogeneity, further enhanced by structural features such as edges. The utilization of this technique can be a great deal in cases where the objects we want to segment have different textures, colours, or shapes, which will be captured with utmost details and overall the large structure. 42

Samples of segmented images after Shifted split merge segmentation.
Figure 5 illustrates the transition boxes indicating the type of underwater objects in the image. This is particularly evident in the corridor stencils, which narrow down the areas of interest, thus allowing for the identification and location of objects within the water scene. The figure aims to display occupied regions and relative positions of the depicted objects in the images by giving them bounding boxes.

Bounding boxes of an underwater object.
Table 3 shows FG-GAN as the parameter settings that are set up to be used for fuzzy-guided image classification, as well as the Generative Adversarial Network (GAN). The learning rate was fixed at that −0.001, and samples of 64 were used to train. The training procedure worked over 100 eras; an optimizer based on stochastic gradient descent (SGD) was applied. We provide the input consisting of vector dimension 100 (100 input vectors) to the generator, and the discriminators consist of 2 hidden layers of 512 and 256 units each, all with ReLU activation function. Binary Cross Entropy is the chosen loss function, and a 0.3 per cent drop rate was used to stop overfitting the model.
Parameter settings of FG-GAN.
Figure 6 shows the curve at the end of epochs on which the training iteration of the FG-GAN is based. Therefore, margin loss for bounding boxes is an indicator between the predicted box location and the ground-truth labels’ box. A drop in the box loss indicates that the model is getting good, especially in locating objects accurately within the images. However, on the contrary, Figure 7 depicts the FG-GAN class loss function, quantifying dissimilarity between the predicted class probabilities and the true class labels. A flattened trend line means that the model accurately classifies objects of the same category more and more.

FG-GAN Train box loss.

FG-GAN Train class loss.
Table 4 compares the advanced segmentation methods, which include the Shifted split merge segmentation, Region Growing, Watershed Segmentation, Graph-based Segmentation, and the Active Contour Model. The evaluating metrics will be identified based on accuracy, loss, precision, and recall. One of the most notable things is 0.85 accuracy, a loss of 0.72, a precision of 0.89 and a recall of 0.78, which make the Shifted split merge segmentation method an effective choice compared to others in this context of these metrics.
Comparative analysis of proposed Shifted split merge segmentation with other models.
The following table (Table 5) contains the comparison between the FG-GAN proposed model and other deep learning networks, such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and traditional Generative Adversarial Networks (GAN). The table of contents represents each line of the table as an independent deep-learning method, and performance metrics are collected in columns. Beginning with accuracy, characterized by an instance classification metric, FG-GAN obtains 0.92, which translates to high effectiveness in classifying the images. LSTM also shows similarly strong performance with a precision of 0.92, while CNN, RNN and GAN perform slightly worse than LSTM with precisions of 0.85, 0.78 and 0.87, respectively. Following the loss metric that determines the deviation between the predicted and the actual ones, FG-GAN is the most superior technique among others due to the lowest loss value of 0.30, which means more error reduction. LSTM nearly meets a loss of 0.53, while others are further away with loss values of 0.72, 0.65, and 0.58, respectively, as for CNN, RNN, and GAN. Precision and recall measures are crucial tools for evaluating the model's accuracy in making correct positive statements and identifying actual positive cases. The performance of FG-GAN in terms of precision is 0.89, which can be interpreted as it gives the right answer in 89% of cases. Additionally, the recall measurement of this model is 0.85, indicating it can identify 85% of real positive instances. Those values have pointed to the bigger picture: FG-GAN is more of a resilient and efficient tool for classifying tasks than other deep learning techniques.
Comparative analysis of proposed FG-GAN with other models.
Table 6 highlights the accuracy and loss values of the proposed FG-GAN in terms of adequate epochs. All the rows indicate an epoch number, and the columns show the accuracy and loss rates obtained by the neural network model within the listed epochs. At Epoch 1, the FG-GAN starts the learning process with an accuracy of 0.60 and a loss of 0.90, indicating that though the model is just at the beginning of its training, it is still in its learning phase. Subsequently, the data set's precision and loss show clear tendencies heading towards better values. In Epoch 4, the model is progressively improving with an accuracy of 0.85 and the loss reduced to 0.68, showing that this function is now making more accurate forecasts while minimizing costly mistakes. With training getting more advanced, FG-GAN moves on to exhibit more impressive achievements in terms of efficiency. The model's greatest accuracy by Epoch 9 is 0.925. Thus, we can deduce that the model has learned to categorize data instances with a higher degree of certainty correctly. As an illustration, the loss reaches the lowest value of 0.30 by Epoch 9, indicating that the model performs optimally, making no significant mistakes.
Accuracy and loss of proposed FG-GAN for various epochs.
Table 7 demonstrates the metrics related to FG-GAN and its performance on different cross-validation folds. The fold number is denoted for every row, while every column illustrates the classifier's accuracy, loss, precision, and recall values, which FG-GAN achieved for every fold. The precision numbers are between 0.87 and 0.91 across all folds, which implies a higher steadiness in FG-GAN's capability of classifying correctly the test instances. The loss values, which vary from 0.09 to 0.12, lie between the backwards propagated error and updating of the model weight, hence accuracy improvement during the training. Besides, the precision and recall, 0.85 to 0.89 and 0.88 to 0.94 correspondingly, prove that FG-GAN can detect positive classes and realistically positive samples from the dataset.
FG-GAN Performance for various cross-validation folds.
Figure 8 intends to demonstrate the fluctuation of the train loss curve for FG-GAN over training iterations. This points out how loss function, indicative of the model performance, develops during training. This functionality enables monitoring the strain manifested during FG-GAN convergence, with the numerical slope downward reflecting better FG-GAN outcomes. Figure 9 demonstrates the sensibility metrics of FG-GAN that accurately indicate the number of positive predictions, particularly among the total predicted positives. A value of precision closer to 1 shows a model less call false, which means it is more specific in making positive decisions.

Overall, FG-GAN Train loss.

FG-GAN precision metrics.
Figure 10 shows the precision rate of FG-GAN, thus showing the model's ability to separate all instances of relevant target class among all true and false negatives. A high value of recall implies that the model works efficiently in bringing out a higher number of true positives. It is suitable for cases where the model must detect positive data comprehensibly. Figure 11 depicts the validation box loss of FG-GAN, reflecting these losses on validation data during a training phase. The plot shows the deviation of the whole model from the training data to the test data following the process of the loss function changing on validation.

FG-GAN recall metrics.

FG-GAN validation box loss.
As illustrated by Figure 12, FG-GAN has a relatively low validation class loss, showing that the model can efficiently distinguish objects during validation. The fractions between 0.1 and 0.5 validate the class loss of the model, which shows how well the model can distinguish between the different classes of the dataset as lower losses approximate better classification. Figure 13 is an integrated graph of the FG-GAN validation loss, which also illustrates both the location and classification tasks performed during the validation; this graph depicts the validation. The narrow range of the values is from 0.2 to 0.8; this plot summarises the model on the datasets that have not been used before, which means that in the low loss values thi,ngs are better generalized and show the overall model performance.

FG-GAN validation class loss.

Overall FG-GAN validation loss.
Figure 14 shows the precision of the proposed deep model that uses FG-GAN at various epochs and stages of training. The accuracy ranks from 0.6 to 0.95, showing how the model classifies objects with the highest accuracy among correctness. It is implied that if the accuracy values are higher, the model is mostly correct, while if they are lower, there are misclassifications.

Accuracy of Proposed FG-GAN.
Figure 15 visualizes the categorical loss the proposed FG-GAN model proved during training. However, a proximity measure determines the dissimilarity of the predicted outputs of the model compared with the real ground truth labels. A lower loss value indicates a system more in agreement with actual measurements and signifies that the model couldn't be better. Indeed, the numbers in this plot represent the loss values, in the range of 0.9 to 0.3, that mimic the gradual decrease in errors as the model is being trained for different epochs.

Loss of Proposed FG-GAN.
Table 8 shows the performance test of FG-GAN with the other current and famous algorithms on waster garbage detection datasets. FG-GAN has a stronger precision indicator (87.3%) and recall indicator (78.9%); it is more accurate, which means there are fewer false positives, and it can collect as many as possible. The F1 score of 82.8% is balanced, which means that FG-GAN keeps a precision-recall junction. In addition, the fusion network in the proposed approach yields an IoU threshold of 0.5 (85.2%) and is highly competitive, with scores of 30.1% at 0.5 to 0.95 IoU thresholds. Note that the FG-GAN model's timings are perfect in data processing, which takes only 3.2 milliseconds. It is in contrast to other models, and it is very effective considering the low computation cost.
FG-GAN on the underwater garbage detection datasets.
Table 9 in the upper row captures the metrics of commercial target detection algorithms available on the URPC dataset. FG-GAN provides very high precision (88.2%) and recall (85.1%), which leads to a good F1 score (86.7%), which is used to measure predictions’ accuracy. This approach achieves some of the highest mean average precision scores at both iou 0.5 (85.3%) and iou 0.5–0.95 (50.2%), surpassing other algorithms focusing on recall-precision balance. FG-GAN is a real-time system that performs data post-production at 9.1 ms. This ability to have a minimal computational footprint marks FG-GAN ahead of alternative models with higher overhead.
Performance metric values for mainstream target detection algorithms on the URPC dataset.
The experiments demonstrated that the proposed method outperformed traditional image partitioning and object detection techniques. Key results included enhanced accuracy in identifying underwater objects, with precision rates reaching up to 95%. Additionally, the proposed approach exhibited lower loss rates, maintaining values around 0.3. Integrating shifted split-merge segmentation and FG-GAN classifiers improved performance in challenging conditions, effectively addressing issues related to low visibility and environmental distortions. Overall, the findings highlighted the effectiveness of the new methodology in achieving superior results in underwater object detection tasks.
Marine object localization from UAVs is required for different areas, including marine research, environment monitoring and topographic surveys. This study uses the segmentation method called Shifted split merge segmentation to define the underwater objects from the complex aquatic surroundings. The present work promoted the evaluation of Shifted split merge segmentation as a high-performance underwater object segmentation algorithm, which achieved high precision and recall rates regardless of the dataset. Such a method undoubtedly makes a difference in detecting and analyzing underwater objects, and UAVs are capable of navigating through water to collect useful data. In addition, the integrated FG-GAN classifiers into the object detection pipeline to boost classification precision and accuracy in underwater image processing. FG-GAN classifiers can utilize fuzzy logic principles effectively in identifying and realizing that uncertainty within underwater images can perform classes better than other classifiers. As a result, the FG-GAN classifier shows excellent performance in terms of precision, recall, and overall accuracy in identifying sea objects, even in extreme cases when only one representation is present. Applying FG-GAN classifiers to UAV-based underwater object detection systems will improve their abilities, letting them perform object recognition and classification tasks required in real-life situations of underwater objects.
In summary, the integrated Shifted split merge segmentation and FG-GAN classifiers provide a prevailing approach for underwater object detection through UAVs. Shifted split merge segmentation is now a cutting-edge and sophisticated method for object segmentation from imagery with complex backgrounds. In contrast, the FG-GAN classifiers are reinforced for accurately classifying objects in problematic underwater conditions. The proposed image analysis and pattern recognition techniques ensure high accuracy with up to 95% precision and no more than 0.3 losses while reducing overhead and computational costs.
Footnotes
Authors’ contributions
K. Selva Sheela, S. Vinoth Kumar, and Saman M. Almufti are responsible for designing the framework, analyzing performance, validating the results, and writing the article. R. Lakshmana Kumar collect the information required for the framework, provide software, conduct critical reviews, and administer the process.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
No datasets were generated or analyzed during the current study.
