Abstract
Image segmentation is very important for various fields. With the development of computer technology, computer technology has become more and more effective for image segmentation, and it is studied on the basis of partial differential equations. The curve representation method in plane differential geometry is expounded, with the SegNet-v2 segmentation model analyzed and tested in medical image segmentation. The test results show that the partial differential equation image segmentation algorithm can achieve more accurate segmentation, especially in medical image segmentation, which can achieve good results, and it is worth in practice to further promote.
Introduction
Image segmentation is a fundamental problem in the field of image processing and computer vision. Many scholars have done extensive research and proposed many image segmentation algorithms, among which the active contour model based on variational partial differential equations is one of the most successful algorithms (Wang Y.2016) [1]. At present, supervised convolution neural networks have achieved the leading level in various visual tasks (Kostrov B V et al.2016) [2]. In the 2015 Multi-modal Image Segmentation Challenge, some scholars first used CNN for brain image segmentation, and achieved the leading results. Since then, more CNN-based multi-modal brain image segmentation methods have been proposed (Xie X et al.2017) [3]. Compared with the traditional supervised machine learning method, the method based on deep learning does not depend on the feature of manual extraction, and can automatically learn from the data the features of increasing network layer and complexity, which can realize the end-to-end segmentation task (Fang W et al.2016) [4]. Most of the commonly used CNN methods transform the segmentation problem into a classification problem. By classifying the central pixels of the image blocks, the segmentation of brain image regions can be achieved (Ren Y.2017) [5]. The CNN model to be introduced is derived from such methods.
Because of its importance, image segmentation technology has attracted people’s attention. So far, there are many segmentation algorithms. Common segmentation techniques include region growing, threshold segmentation, edge detection and so on. It can be divided into three categories: region based segmentation, edge based segmentation and thresh-old based segmentation. Region based segmentation is to connect pixels with similar features to form regions and separate them from the background. This method not only makes full use of the local information of the image, but also overcomes the disad-vantage of spatial discontinuity, but sometimes it has the disadvantage of over segmentation. The method based on edge detection mainly uses the different characteristics of pixels in different regions of the image and the discontinuous characteristics of the edge. How to effectively capture the image edge is still a difficult problem, Especially for the case of multi noise image: the segmentation method based on threshold is to use one or more threshold to judge and segment the image. Because the threshold segmentation only considers the gray information of the image, it can not segment the image accurately for the image without obvious gray difference. In addition, adaptive threshold selection is also a difficult problem.
Kass M et al. Proposed the active contour model, which makes it possible to divide the image based on partial differential equation. The main idea of the model is to define a deformation energy function on a curve or surface, which consists of internal energy EINT and external energy eout. The internal energy is constrained by the curvature of each point and its continuity, It describes the elastic deformation and distortion of curves or surfaces; the external energy is the power of the active surface, and on the basis of extracting the features of the image itself, it forms an external force field which controls the deformation energy in the image feature area. Therefore, the deformation of the model is controlled by many different forces acting on it, and each force will generate a part of energy, And each part of energy can be expressed as an independent energy term in the active contour model.
The idea of active contour model based on partial differential equation is: according to the characteristics of each image to be segmented, the energy function is constructed and its minimum value is obtained, so as to obtain the best contour curve and achieve the segmentation result. It can be divided into three categories: edge based, region based and hybrid active contour models. The parameters of the edge based active contour model are determined in advance by artificial experimental data, so its application scope is limited. In addition, the second type of active contour model based on region overcomes the over cutting image caused by the limited boundary value encountered by the edge, and the image segmentation is obtained by using part of the statistical characteristics of the image, Different segmentation accuracy is based on different statistical information, and the initial target contour curve is not required high, using the same region drive to obtain high fitting target contour curve; the last kind of hybrid active contour model, as the name implies, combines the two to obtain the best result, effectively avoiding the shortcomings of the two. Then, the key problem in the current related research is the automatic initialization before segmentation, which determines the segmentation results. Different initialization, the segmentation results are also different. At present, initialization usually places the initial curve near the target. Therefore, it is necessary to set parameters automatically in the model, develop more effective statistical features, and how to fuse edge and regional information.
State of the art
Deep learning is a datadriven method and supervised learning. Models with high segmentation accuracy must rely on enough and effective labeled training data (Tang F X et al.2017) [6]. At present, the effective annotation method is still manual annotation. The three-dimensional structure of MRI results in a large amount of data to be annotated, and the annotation of MRI images requires the knowledge and experience of experts in the field. The annotation results of different experts may be different, even the annotation of the same expert at different times may be different (Huang H et al.2018) [7]. Because of these difficulties, the existing labeled MRI images are very limited, which is also one of the difficulties in using depth learning to segment MRI images (Hou S et al.2017) [8]. For this reason, researchers using deep learning methods usually study in the following directions: Considering the computational cost, the researchers usually use a two-dimensional deep learning network model, because the MRI images have less axial information, so they will slice along the MRI axis and then split the two-dimensional slices (Song H et al.2016) [9]. Since image segmentation is essentially a classification of each voxel in MRI images, the segmentation problem can be transformed into a classification problem, and the voxel classification can be predicted by using the neighborhood information of voxels (Miao J et al.2018) [10]. For two-dimensional slices, firstly, as many fixed-size image blocks as possible are taken from the slices. Because of the imbalance of pixel classes on slices, a strategy is usually adopted to make the number of blocks in each class as equal as possible. Then, by predicting the class of pixels in the center of each image block, the prediction of each pixel class is realized, and the segmentation is realized [11].
Methodology
Curve representation in plane differential geometry
Differential geometry is a discipline that uses calculus theory to study geometry. Curves are one of the main research objects in differential geometry [12]. From the visual point of view, curves can be regarded as the motion trajectories of spatial particles. Generally speaking, a curve is usually defined as γ : I → X by a continuous function, which represents the transition from a real interval I to a topological space X. According to context information, Y or its y is called a curve. In general topology, considering that some functions are not differentiable, because its image may be very different from that of a general curve, then only the mapping Y is called a curve. If it’s a single mapping, that is, for any x, y ∈ I, there’s γ (x) = γ (y), which means x = y, then this curve is a simple curve. If I is a closed interval [a, b], there will be y (a) = y (b). If there are some points in the area I x ≠ y make x ≠ y, then y(x) is called the double point of the curve (multi-focus). If the region I is [a, b] and there must be y (a) = y (b), then the curve is called a closed curve. The curve can also be written as a vector function, and each variable of the vector can be represented by a parametric equation, i.e. x (p) , y (p). Then the curve can be specified as
The corresponding normal vector can be expressed as follows:
On the curve, the arc length between point C (a) to C (b) can be calculated by the following formula:
The s differential formula for arc length can be obtained:
After these features are extracted and separated from the image, the target can be easily analyzed and distinguished, which provides a key foundation for subsequent measurement and detection work [13]. According to the characteristics of things, the method and process of using a certain method to separate a specific region from other regions is image segmentation. Generally, edges, corners, brightness and gray value are taken as features [14]. The predefined target can occupy a continuous area or be distributed in multiple non-overlapping areas. From the bottom to the top, the field of computer vision can be divided into three modules: image segmentation, feature extraction and target recognition [15]. In computer vision, image segmentation is a basic technology. It plays a key role in the subsequent analysis and understanding of image data. On the one hand, it has an extremely important impact on performance measurement in computer vision, and it is the basis of target expression. On the other hand, the expression of the original image is transformed more compactly by image segmentation technology and other related object expression, feature extraction and parameter measurement based on image segmentation, which makes it more advantageous to higher level image analysis and understanding [16–18].
In the field of semantic segmentation, FCN-based methods are mostly proposed according to different application scenarios, which also have some limitations, for example, because the size of receptive field is fixed, it can only deal with the semantics of a certain size in an image, which may lead to the decomposition of objects with larger actual size than receptive field, or the classification of objects with smaller actual size than receptive field. In addition, the FCN-based method may lose or smooth out the fine structure of the object in the image because the input of deconvolution layer is too rough and the deconvolution operation is too simple, which limits the semantics segmentation performance of the FCN-based method [19, 20].
To overcome the limitations mentioned above, a network model based on CNN is proposed to realize semantic segmentation, which consists of two parts. The first part is the convolution network, also known as the coding network, which is used to extract image features [21]. The latter part is deconvolution network, also known as decoding network, which is decoded by pooling and deconvolution the features extracted from the coding network to achieve decoding. Finally, the probability map with the same size as the original input image is output. The number of probability maps is equal to the total number of classes of partitioned objects, and each probability value represents the probability belonging to each class [22]. Because the coding-decoding network structure is similar to SegNet, the difference is that the decoding module of SegNet-v2 uses convolution layer instead of deconvolution layer, which is called SegNet-v2. The SegNet-v2 network model consists of convolution layer, BN layer, activation function layer, pool layer, inverse pool layer and Softmax layer [23]. Model input includes MRI image slices and corresponding label slices labeled by experts. After input model, more and more complex and abstract features are extracted layer by layer by coding module, and then the features are reconstructed by decoding module [24]. Finally, the network model outputs the prediction probability map. The map size is equal to the original input size of the model, and the predicted value of each voxel indicates the probability that the voxel belongs to a certain class. Here, the class with the greatest probability value is the prediction category of the voxel [25]. In the process of model training, the loss function is used to guide the supervised learning of the model, and the verification set is used to supervise the training process of the model, so that the training is stopped when the performance of the model is no longer improved. In Fig. 1, “convolution (3)” indicates that the convolution kernel size is 3×3, “convolution (1)” indicates that the convolution kernel size is 1×1,"Input represents the model input, “Conv” represents the convolution, and after the convolution layer is the batch standardization layer and the activation function layer, respectively, in order to highlight the main parameters of the model structure configuration. “Pool” means pooling operation, “Reverse pool” means inverse pooling operation, “Deconv” does not mean deconvolution here, but represents the convolution operation layer after the inverse pooling layer in the decoding module, “Output” represents the output layer of the model [26].

Convolution process diagram.
Prophase date prepare
As described in the previous chapters, multimodal MRI images consist of four modes, Flair, T1, T1c, and T2, and each mode has a three-dimensional structure of 240_240_155 in size, whereas SegNet-v2 does process three-channel RGB images. For this purpose, slice is processed in two ways and divided into training set, validation set and test set.
Firstly, the mean and standard deviation of all MRI images in each mode are calculated separately; then, slices are taken along the axial direction of the 3D MRI images, and each slice is standardized by subtracting the mean of the corresponding modes and dividing the standard deviation of the corresponding modes. Then, these slices are further processed. The first way is to imitate RGB images. The pixel values of Flair, T1 and T2 slices of the same sequence are normalized to [0,255], and they are synthesized as three channels of the image. The synthesized image is shown in Fig. 2 (a). The second way is to imitate RGBA images. The pixel values of Flair, T1, T1c, T2 slices of the same sequence are normalized to [0,255], and they are synthesized as four channels of the image. The synthesizing effect is shown in Fig. 2 (b).

Slice synthesis effect. (a) RGB effect example of three mode slice synthesis; (b) four examples of RGBA effects of modal synthesis.
Since there are only 186 image samples of the multi-modal MRI image data, even after slicing, remove no image area of the slice, but also about more than 26,900. png format images, and in these images, about 50% of the image has no image area, or the image area is less than the maximum image cross-sectional area of 10 %. Therefore, the effective data that can be used to train the SegNet-v2 network model is not as much as 26900. To increase the amount of data, the input size of the model is 240*240. Since the size of the slices is 240*240, 224*224 image blocks can be randomly cut along the horizontal direction on these slices, thus effectively expanding the training data to meet the demand of training set data for ab initio training SegNet-v2 network model.
A series of experiments are designed to explore the effectiveness of SegNet-v2 in segmentation of multi-modal MRI images, including different number of modalities and different sizes of training set data. Multi-modality MRI image segmentation, not only because MRI has no damage, no bone artifacts, good soft tissue contrast, can provide rich brain information and other advantages, which is currently one of the common means of image pre-diagnosis and detection. Moreover, because of the different imaging mechanisms, multi-modal MRI images complement each other, thus presenting more abundant brain information. If these information can be effectively used, the task of accurate segmentation of images will be more smooth and simple. To this end, the experiment is used to verify this statement. During the experiment, three modal MRI images and four modal MRI images are used to train and test the SegNet-v2 network model. The verification results are shown in Fig. 3.

Segmentation results using different number of modal training models.
In Fig. 3, the first column on the left is an image synthesized from four modes of MRI slices that mimic the RGBA image. The second column is the image region labeled by experts, and the third column is the corresponding image region predicted on the test set using the data training model of four modes of 70% image samples. “3 channelorigin” is an effect map which imitates RGB image and is synthesized by slices of three modes of MRI images. The second column on the right is the corresponding image region labeled by experts. The first column on the right is a data training model which uses three modes of 70% image samples to predict the corresponding image region on the test set.
From the figure, when using the same number of training set training model, three modal training models and four modal training models can accurately locate the image area, and probably determine the shape and size of the image. But the three modal training models are relatively unable to process the sub-region of the image, and the four modal training models divide the sub-region of the image to some extent. This shows that the four modes of MRI image training model is more conducive to segmentation of image sub-regions under the same number of training data sets. However, because the two models predict different image regions, further experiments are needed to verify the results.
Since the depth learning network model is a data-driven approach, which means that adding more training set data can further improve the segmentation performance of SegNet-v2. On the other hand, the experiment is an exploratory experiment of multi-modal MRI image segmentation using SegNet-v2. It can increase the training data set as much as possible to observe the segmentation performance of the model without considering too much the proportion of training set, verification. In there, taking three modal experiments as an example, the relevant prediction results are given, as shown in Fig. 4, set and test set.

Segmentation results of training models with different number of datasets.
Among them, the first column on the left is the effect of three modes of slice synthesis image, and the “GT” column is the image area labeled by experts corresponding to the first column on the left. The third column on the left and the fourth column are the prediction results of 70% of the data training model on the image, and 90% of the data training model on the image prediction results. As can be seen from the graph, the predicted image region is closer to the expert’s annotation in shape and size than the model trained by 70% of the 186 image sample data (see “GT” column). After adding the training data set, the model can predict the sub-region of the image. Furthermore, an experiment is designed to explore the effect of increasing the number of modals and enlarging the training data set on the segmentation performance of the model. As shown in Fig. 5.

Qualitative comparison of multiple experimental results.
The results show that the “GT” list represents the image regions labeled by experts, and the “3 channel (70%)” and “3 channel (90%)” represent the segmentation results of the image in the test set using 70% image sample data and 90% image sample data training model respectively. “4 channel (90%)” and “4 channel (99%)” denote the segmentation results of the test set using 90% image sample data and 99% image sample data training model respectively. The results show that increasing the number of training sets and image modality can improve the segmentation performance of the model, including the prediction of image size, shape, and sub-region. In the first column on the right, when the model is trained with the data of 185 image samples, the prediction results for one image sample in the test set are much better than the previous experimental results, and the prediction for the sub-region is more obvious. This shows that SegNet-v2 can also effectively segment multi-modal MRI images if enough training data are available to train SegNet-v2, which further indicates that the depth learning model for semantic segmentation of natural images can also be used to segment multi-modal MRI images to some extent. As shown in Fig. 6.

Image quality and algorithm operation.
This article mainly introduces the exploratory experiment of multimodal MRI image segmentation based on SegNet-v2. Firstly, in the introduction part, the question whether the depth learning network model for semantics segmentation of natural images can segment multi-modal MRI images is raised; then, the SegNet-v2 network model is selected to explore, and several important components of the SegNet-v2 network model are introduced before the experiment begins, including batch standardization, converse pool and the convolution layer in the decoding module. The reason for choosing SegNet-v2 network model is that the model is an improvement on FCN-based method. It can process multiple objects of different scales in an image, retain the structure details of the object in the encoding process, and reconstruct the object details in the image by inverse pooling and convolution in the decoding stage, which is consistent with the target of fine segmentation image and its sub-region. Next, the experimental scheme is designed, including selecting two types of training set data and setting the parameters related to model training, including data expansion method, initial learning rate, model optimization algorithm, and the maximum number of training. Finally, three groups of experiments are carried out to verify the effect of adding data modality and training set data on the performance of model segmentation. Qualitative analysis method is used to analyze the performance of the model, and a better model for natural image semantic segmentation is obtained.
