Abstract
There are many kinds of Marine organisms and their biological forms differ greatly, so it is difficult to guarantee the accuracy of artificial species identification, which brings great challenges to the work of Marine species identification. In this paper, we propose a recognition method of Marine biological image classification using residual neural network, redefining convolution layer and using batch regularization to avoid gradient parameter disorder. The bottleneck layer is realized by the residual connection in the neural network, and the residual network ResNet50 is constructed by the transfer learning method. The classification training was conducted on 19 common Marine animal data sets, and the experimental results showed that the recognition accuracy of ResNet50 reached about 90%. Compared with the traditional convolutional neural network VGG19, the results showed that the recognition efficiency of ResNet50 was better, thus verifying the effectiveness of the Marine animal classification and recognition model proposed in this paper.
Introduction
China has a vast ocean area and a wide variety of marine creature. In recent years, the marine environment is facing more and more dangers, and human beings need to know more information about marine life in order to provide protection for marine creature and provide information for biodiversity management. Manual identification of marine creature species is difficult, inefficient and difficult to guarantee the accuracy of identification, which poses a great challenge for the identification of marine species. With the development of artificial intelligence, the classification models are gradually popularized in the application of images, and they can identify multiple categories and play a role in different scenarios. If artificial intelligence technology is applied to marine creatures identification, it can protect the stability of the marine ecological environment.
Many algorithms for classification models have emerged in recent years, involving gradient histograms, support vector machines, principal component analysis, neural networks, etc. In 2012, AlexNet [8] won the ImageNet image identification competition, and convolutional neural network (CNN) became one of the main algorithms of image classification. Convolutional neural networks can learn to extract features from images during training to improve recognition accuracy, but the same problem of neural network degradation or excessive training time can occur. Reference [9] proposed a ResNet model with residual blocks to shorten the training time and improve the recognition accuracy of neural networks, while reference [2], reference [3], and reference [7] are applications of convolutional neural networks based on the ResNet architecture in garbage classification, fiber classification problems, and solving fluid motion problems, demonstrating the feasibility of ResNet in image classification tasks. In addition, neural networks are prone to the problem of gradient parameter disorder and fusion deviation during training.
In this paper, we will use deep learning techniques to conduct identification experiments on 19 species of marine animals. To address the problems of long training time and degradation of recognition accuracy of neural networks, the residual network ResNet50 model is constructed. The normalization method is also used in the model and dataset to prevent the model from the problems of disordered gradient parameters and large deviations in the fit. In this paper, we also trained the model of VGG19 as a comparison, and the experimental results show that the ResNet50 model is much better than VGG19 in recognition effect, which verifies that the residual network has better recognition effect than the traditional convolutional neural network, and shows that the residual network can replace the manual for marine biological species recognition. In the reference [6], three neural networks, EfficientNetBO, DenseNet and MobileNet V2, were used to classify 20 marine species, and the final recognition accuracy was 91.25%. In comparison, the ResNet50 model used in this paper has higher recognition accuracy. The reference [11, 12] are classifications of different shellfish speciesthis is also the study of marine biometrics. The main contribution of this paper is to provide a theoretical basis for building a large scale marine biometric system, using residual networks to reduce the training time and lower the training parameters, which will greatly reduce the cost of neural network computation.
This paper introduces the techniques related to the implementation of residual networks, residual networks, batch regularization, and learning of neural networks in Section 2; Section 3 introduces how to construct model, redefine the convolutional layer, bottleneck layer, and construct the ResNet50 model; Section 4 analyzes the experiments and results, introduces the dataset and model evaluation metrics, validates the effectiveness of the ResNet50 model, and compares the VGG19 model to verify the superiority of the model; Section 5 summarizes and analyzes the method proposed in this paper.
Related technologies
Residual network
ResNet is a generic term for neural networks using residual algorithms to solve the degradation problem when there are too many hidden layers in traditional neural networks. ResNet is a generic term for neural networks using residual algorithms to solve the degradation problem when there are too many hidden layers in traditional neural networks.The core of the residual network is the residual block, with a total of four residual blocks. The core of the residual network is the residual block, there are four residual blocks, these blocks originally consist of multiple convolutional layers, and then superimposed on some constant mapping of the layers, it constitutes the residual block
Firstly, focusing on the localization of the neural network, suppose the input is
Firstly, focusing on the part of the neural network. It is assumed that the input is
Traditional neural network (left) and residual blocks (right).
The neural network model of the ResNet architecture is generally divided into six phases: the first phase has a convolutional layer and a pooling layer; the second to fifth phases are all residual blocks; and the sixth phase is the classifier, in which there is a pooling layer, a Flatten layer and a fully connected layer for outputting results. Conventional ResNet has 18, 34 and 50 layers, etc. The number of convolutional layers, filter size and number of output channels within the residual block varies. In this paper, we will choose ResNet50 and build the model manually based on the PyTorch framework, modifying it for the parameters within the residual blocks.
The training of deep neural networks can become difficult due to the variation of the input distribution at each layer, and to train models with saturated nonlinear pairs, Reference [1] proposes Batch Normalization. Batch normalization is a method commonly used in deep learning to prevent model gradient disorders and uneven dataset distribution. Batch normalization using regular and linear transformations can normalize the input or intermediate layers of a neural network so that the output of each layer obeys a normal distribution with mean 0 and variance 1, thus avoiding the problem of skewed parameter distribution. Since regularizing all the data increases the pressure on the memory and the training time can be very long, the regularization training is performed in small batches using division.
In the forward calculation of the neural network, for each batch dataset, assuming that there is a batch
where
In the case of ignoring
where
There are several advantages of using batch normalization: the distribution of input data for each hidden layer in the neural network is relatively stable, avoiding the two extremes of gradient disappearance and gradient explosion, and accelerating model training; it makes the model less sensitive to parameters, and the training of the model will tend to be stable; because the regularization is performed on a batch, some noise will be generated, and the generalization ability of the model is improved.
Any neural network model requires continuous learning to arrive at well-performing model parameters, so neural networks need a method for optimization.
Backward Propagation is an algorithm to be used in the computation of gradients in neural networks, mainly for the application of the chain rule, which traverses the network in reverse from the output layer to the input layer. Suppose there are functions
where the prod operator indicates that the parameters are multiplied after performing the necessary operation.
VGG block (left) and VGG model (right).
Based on this back propagation algorithm, all weight parameters of the neural network model can be derived and the whole model parameters can be optimized by stochastic gradient descent algorithm. Equation (6) is a mathematical expression that decreases random gradient.
where
The residual block is the core idea, which allows the neural network to compute efficiently; batch regularization is the auxiliary, which avoids the neural network to have chaotic gradient parameters, back propagation algorithm and stochastic gradient descent are used for neural network learning, and the combination of these methods and convolution can realize the model needed for the experiments in this paper.
Traditional CNNs
The reference [5] proposes a block-shaped neural network with multiple 3
The VGG network can be divided into two parts: a convolutional part and a classification part: the convolutional part consists of several blocks and a mean pooling layer; the classification part has three fully-connected layers, the first two fully-connected layers reduce the input features from 25088 to 4096, and the last one brings the output to the number of required classifications (e.g., output 19 is required in this paper). The left diagram in Fig. 2 shows the basic structure of a block in VGG Net, and the number of convolutional layers in the block can be customized as needed; the right diagram shows the overall architecture of VGG, where the content of the block is omitted.
Refine the convolutional layer
The convolutional layer is usually responsible for convolutional operations only and the activation function is usually ReLU, which is not conducive to the training of deep networks and can also cause the problem of gradient parameter disorders.
In order to solve the trouble caused by these problems, this paper redefines the convolutional layer and performs batch regularization for the values output from the convolutional layer on top of the original one. And the convolution layer chooses to use ReLU or directly keep the output value according to the input Boolean value, so that the redefined convolution continues to be used in the definition of the bottleneck layer later, which can reduce the multiple definitions of convolution when writing code and serve to reduce the amount of code.
In addition, the padding is also modified so that the padding value is equal to the size of the kernel divided by two. This way of padding keeps the input and output sizes of the convolutional layer the same and does not make the convolved image too abstract.
A convolution with padding.
The bottleneck layer, which is composed of two 1
In addition to realizing direct connections, the step of 3
The computation used for direct concatenation is determined by whether the input and output channels are equal, if they are equal, then the algorithm for direct concatenation uses convolution, otherwise it will directly not change the value of the input. In the forward propagation function, the effect of residuals is achieved by adding the computation results of the three-layer convolution block to the computation results of the direct concatenation. This algorithm of residuals gives the model the ability of constant mapping, so that the stacking of network layers does not cause the gradient to explode or disappear either.
Model implementation
The ResNet constructed in this paper is divided into 3 parts, a downsampling part, a residual part, and a classification part. The downsampling part contains a 7
Next is the residual part, which defines a function for constructing bottleneck layers by cyclically generating multiple convolutional layers and combining them into. There are four residual blocks in the ResNet50 model, having 3, 4, 6 and 3 bottleneck layers respectively. By stacking such bottleneck layers, the model extracts more abstract features in the deeper network layers, while the 1
Finally, there is the classification part. Since this paper is trained to classify marine species with a species number of 19, a vector with an output length of 19 is needed, and a classifier with a built-in Flatten layer that converts a four-dimensional tensor to one dimension and a fully connected layer with an output channel of 19 and activated using Softmax is needed at the end of the model.
Model parameters
Model parameters
Schematic diagram of the bottleneck layer.
Transfer learning [15] is a common training method for deep learning that refers to a pre-trained model being reused in another task. The accurate definition is as follows. Given the source domain
ResNet structure diagram.
The pseudo code of the residual network training algorithm is as follows.
Dataset
The dataset used in this paper is from Kaggle, and contains images of different marine organisms, which are scaled to 300 px, for a total of 11,742 images. There are 19 different marine creatures in the dataset, namely coral, crab, dolphin, eel, water mother, lobster, octopus, penguin, puffer fish, catfish, sea hippocampus, seal, shark, squid, squid, squid, squid, squid, squid, squid, squid, squid, squid, stars, turtles, whales.
In this paper, images in jpg format are preprocessed using the torchvision module. The marine life images were cropped with random aspect ratio of 224
A random batch is read and will display 32 of these images, each with a corresponding name.
(1) Loss: In the problem of image classification, there are two types of conventional indicators that can evaluate the quality of neural network models. One is the loss value. The loss value is obtained by the cross-entropy loss function. It reflects the tolerance of the neural network. The lower the loss value, the better the effect of the model. Cross-entropy represents the difference between probability distribution
(2) Accuracy: The Softmax function is generally used for multi-classification problems based on probability. The output result is a vector. The element in the vector is between 0 and 1, and the sum of all elements is equal to 1. The accuracy is obtained by finding the correct number of times through comparing the maximum probability of the probability in the Softmax output result and the image label, followed by dividing the batch size.
Equation (8) is the expression of the SoftMax function, where
In this paper, we used GPU P100 cloud server as the training environment, with GPU driver version 470.82.01 and CUDA version 11.4. Hyperparameters are a kind of parameters in deep learning that can be adjusted artificially, generally the number of iterations Epoch, the learning rate of the optimization algorithm Learning Rate, the training batch size Train Batch Size, the test batch size Test Batch Size, the size of the image Image Shape. In general, the model of a neural network gets better as the number of iterations increases; the learning rate determines the size of each optimization, and an appropriate learning rate allows the model to gradually find the optimal solution; the training batch size is the number of images learned in one iteration of the model, and the test batch size is the number of images tested in one iteration of the model; the image size is the size of the input into the model image, and an appropriate size The appropriate size allows the model to extract enough features. The training accuracy, testing accuracy, training loss value and testing loss value of each Epoch are saved in the training process, and the fitting curve of the model is plotted.
Hyperparameters
Hyperparameters
Figure 7 shows the training process of ResNet50, showing the last 3 Epoch. The loss values, recognition accuracy, and training time are printed in the terminal. the training loss values of the ResNet50 model in the last three rounds are 2.134, 2.126, and 2.118; and the test loss values are 2.192, 2.186, and 2.18, which shows that the recognition results of the training and test sets are similar. The training accuracy was 93.4%, 94.0% and 94.1%; the testing accuracy was 88.0%, 88.4% and 89.3%, and the difference between the training and testing sets in terms of accuracy was large. In terms of training time, one Epoch of the training set took about 52 seconds, and the test set took 7–8 seconds, for a total iteration time of 3038 seconds. With these data, it can be found that the ResNet50 model can already identify marine organisms with good accuracy in the last few iterations of training.
Training process of ResNet50.
Fitting curve of ResNet.
The fit curve can help to understand how the model perceives the data set, as well as can reflect whether the model shows fit bias. Figure 8 shows the fitting process of ResNet50. Both loss values of the model are close to 3.0 at the beginning of training, and gradually converge to around 2.0 with continuous learning, and the difference between the loss values of the training and test sets is not very large. The accuracy variation of ResNet50 also increases with the number of iterations, from about 10% at the beginning to 94% at the end, and the curves of the training and test sets always fit together. Through this experiment, ResNet50 confirmed that it can solve the problem of neural network in terms of goodness of fit mentioned at the beginning of this paper.
In this paper, we also train the traditional convolutional neural network VGG19 with the same training environment and hyperparameters as ResNet50, except that the VGG19 model has only 19 layers while ResNet50 has 50 layers. In terms of parameters, the ResNet50 model is calculated to have a total of 23,546,963 parameters, but VGG19 has 143,667,240 parameters, which is six times more than ResNet50.
Figure 9 shows the training process of VGG19, again showing the last 3 Epochs. The loss values in the last three Epochs of the VGG19 model were 2.190, 2.189, and 2.188 for the training set; and 2.249, 2.253, and 2.244 for the test set. The accuracy of the training set of the model was 84.2%, 84.2% and 78.8%, respectively; the accuracy of the test set was 78.0%, 78.0% and 78.8%, respectively. For VGG19, each Epoch took 94 seconds to train and about 9 seconds to test, for a total of 5228 seconds to train 50 Epochs. In terms of recognition accuracy, VGG19 can also recognize marine organisms, but ResNet50 has a crushing advantage in terms of training time.
Training process of VGG19.
Fitting curve of VGG19.
Comparison of model results
Figure 10 shows the VGG19 fitting curve. The VGG19 model fits very quickly in the first 10 Epochs, but the fitting curve gradually converges as the number of rounds increases. This is the drawback of traditional neural networks: as the number of training increases, the model parameters cannot be further optimized, resulting in no further improvement in recognition accuracy. Table 3 shows the comparison of evaluation metrics for the results of the last three training Epochs of the two models. First of all, looking at the training accuracy, ResNet50 is about 10% higher than VGG19 in the last three Epochs, which is a large gap; then looking at the test accuracy, the gap is widened to about 11%, which is enough to prove that ResNet50 is superior to VGG19 in terms of recognition accuracy. In terms of loss value, there is only a slight difference of 0.06 between the two models as observed by comparison, which indicates that the two models are not very different in terms of training. ResNet50 has higher accuracy and shorter training time than VGG19 in the marine species classification task, indicating that ResNet50 is a more practical neural network model.
In this paper, we propose a research method for image classification using the ResNet50 model for marine organism species based on the recognition and training advantages of residual networks on image classification tasks. By using batch regularization and constructing residual blocks, the problem that neural networks can have chaotic gradient parameters and long training time is solved, and the problem of overfitting or underfitting of the model is also avoided. A comparison proves that the residual network ResNet50 and the traditional convolutional neural network VGG19 yield that the residual network is more time and resource efficient in training and does not reduce the accuracy due to the reduction of parameters.
The ResNet50 model appears to perform in a 19-class marine animal classification task, but this model has a drawback that if there are multiple species of organisms in the same picture, then the model will identify them as one of the species instead of identifying all of them. This requires more detailed identification and detection of marine organisms using techniques for target detection, such as Mask R-CNN [14] or YOLO [13]. Reference [10] used YOLOv5 for object detection of four marine creatures, the experiments in this paper provide a theoretical basis for the detection of more marine creatures.
