Abstract
Objective:
This study aims to develop and test a new image registration method in which full-scale skip connections in the encoding process is added into the registration network, which can predict the deformation field more accurately by retaining more features and information in the decoding process.
Methods:
Two improved registration networks are connected in series in the registration framework. Each registration network uses the unsupervised learning registration method to predict a small deformation field, and the last two small deformation fields are superimposed to obtain the final deformation field. The model is evaluated by the oasis datasets (brain T1-weighted MRI images), one image is selected as the fixed image, while 383 images are used as training images and 30 images are used as test images. Wavelet decomposition and reconstruction are also used to enhance the image.
Results:
Compared with the affine method, the voxelmorph-1 method and the voxelmorph-2 method, applying the new registration network was proposed by this study improves the registration accuracy by 28.6%, 1.2% and 0.2%, respectively.
Conclusion:
The experimental results demonstrate that the method proposed in this study can improve the accuracy of image registration effectively.
Introduction
Medical imaging technology can obtain information about patients’ condition without trauma, which is the most powerful auxiliary means of treatment. Medical image registration is an important part of medical image processing, and there are many types of medical image registration. For example, according to the dimension of the image, medical image registration can be divided into three categories, namely 2D and 2D image registration, 3D and 3D image registration and 2D and 3D image registration, which also can be classified by the subjects, the way that the image is obtained, and the types of spatial transformation. Medical image registration be used for lesion detection, clinical diagnosis, surgical planning, surgical navigation, efficacy evaluation, etc. It has important implications for the development of clinical medicine.
The traditional image’s registration method seeks the optimal spatial transformation between two images by iterating and optimizing each pair of data. Traditional image registration includes feature-based registration and gray-based registration and so on. For example, the feature-based image registration is to extract the key feature points between the fixed image and the moving image, and it seeks the geometric transformation relationship between the two images by the relationship between the feature points of the two images. The disadvantages are that extracting features is difficult and image registration accuracy is not high. Common feature-based image registration methods include SIFT [1], SURF [2], Harris [3] and so on. The grayscale-based method is based on the grayscale values of two images, uses a search method to find the maximum or minimum similarity of the grayscale values of the two images, and determines the transformation parameters between the fixed image and the moving image. This method is simple to implement and does not require complex preprocessing for fixed and moving images. The disadvantage is that the amount of computation is large and cannot be directly used to correct nonlinear transformations of images. It mainly includes cross-correlation method [4], maximum mutual information method [5], maximum likelihood matching method [6] and Demons [7], etc.
The registration method of supervised learning requires the real deformation field as the gold standard, which is marked manually. The Affine Image Registration Network (AIRNet) was proposed by Chee et al. [8] This network can estimate the spatial transformation parameters of two 3D brain MRI images by the supervised learning method. The method uses the MSE function to train the network. The results show that the performance of this registration method is better than the iterative algorithm. The CNN network model was proposed by Sloan et al. [9] The performance of this registration method is better than the FCN method in the registration of unimodal and multimodal T1- and T2-weighted brain MRI images. The real deformation field not only is difficult to obtain but also cannot be guaranteed in quality.
The registration method of unsupervised learning overcomes this shortcoming. This method directly uses the warped image to calculate the loss by adding the spatial transformer network (STN network) [10] into the registration network, thereby an end-to-end registration method is realized. A first end-to-end unsupervised Deformable Image Registration Network (DIRNet) was proposed by Devos et al. [11] The network used the MNIST and SCD datasets for model evaluation. The experimental results show that the registration effect is better than the supervised learning method. An unsupervised registration method based on CNN was proposed by Balakrishnan et al. [12]. The structure of network is similar to U-Net and it was named VoxelMorph, which is a commonly used deep learning image registration method. A deep recursive cascaded network structure was proposed by Zhao et al. [13]. Multiple networks are connected in series in model. The model gradually forms a deformation field and improves the registration accuracy. A large deformable image registration network (LDIRnet) model based on multiple deep convolutional neural networks are connected in series, which was proposed by Wei et al [14]. The method is two or three registration networks in series. The small deformation field is obtained by each registration network, and the final large deformation field is obtained by voxel summation of the small deformation field. In this study, the registration network of LDIRnet is improved by adding full-scale skip connections, in order to retain more image information. The paper also enhances the image to improve the accuracy of image registration effectively. A cycle-consistent deformable image registration was proposed by Kim et al. [15]. The registration model contains two registration networks (G X , G Y ). The deformation field of X to Y is obtained by the registration network G X , and the deformation field of Y to X is obtained by G Y . The registration accuracy is improved by matching the moving image and the fixed image with each other. An image registration method that introduces amortized hyperparameter learning was proposed by Andrew et al. [16] This method adds a hypernetwork to the registration model, which takes in an input hyperparameter and modulates a registration network to produce the optimal deformation field for that hyperparameter value.
Few medical image samples and scarcity of annotations have always been two major problems in medical image processing. For supervised learning, the accuracy of registration is determined by the quality of annotations. Unsupervised learning networks require a large amount of data for training, but there are few samples of medical images, so how to use less data for network training is also an issue that needs to be further studied.
The LDIRnet model will lose a lot of information. In order to retain more image information, the registration network of the LDIRnet model is improved in this study. Full-scale skip connections are added to the registration network, so that the deformation field is more accurately predicted by more information. The detailed information of the training images in this study is enhanced by wavelet decomposition and reconstruction, thereby improving the registration accuracy.
Materials and methods
A registration network with full-scale jump connections is added
Full-scale skip connections in the encoding process is added into the registration network in the paper, which likes UNet, and fine-grained semantics and coarse-grained semantics are added in the decoding process to predict the deformation field more accurately. The registration network structure is shown in Fig. 1. The down-sampling operation in the encoding process and the convolution operation in the decoding process are implemented by a convolutional layer with a convolution kernel of 3×3and a stride of 2, a LeakReLu layer with a parameter of 0.2. The up-sampling operation in the process is implemented by a deconvolution layer with a convolution kernel of 3×3 and a stride of 2, a LeakReLu layer with a parameter of 0.2. In the skip connection, the maximum pooling is used to reduce the size of the feature map to solve the problem of different sizes of the feature images, thus, to realize the aggregation of features. The image that realized the aggregation of features is then subjected to convolution operations to gradually form a deformation field.

Structural diagram of the improved registration network.
This study adopts the LDIRnet registration framework, two improved registration networks are connected in series. Each registration network learns a small deformation field, and the final large deformation field is obtained by summation of the two small deformation fields. The registration network framework is shown in Fig. 2.

Model structure diagram.
First, the deformation field φ1 is obtained by the first registration network, and then the fixed image and the image that are obtained by the first registration network are the input of the second registration network. The deformation field φ2 is obtained by the second registration network, and the final large deformation field φ is obtained by summation of the two deformation fields.
The loss function of registration method of the unsupervised learning method is divided into two parts, the first part is the similarity measure between the registered image and the fixed image, and the other part is the regularization loss added to avoid the discontinuity of the deformation field. The loss function is computed using Equation (1).
In this study, the similarity measure of the fixed image and the warped image selects a negative local cross-correlation, the calculation window is 9 × 9, and the local cross-correlation is computed using Equation (2) as below.
The purpose of training the model is to train a registered image that is most similar to the fixed image, so that the L
sim
(F, M (φ)) function is as small as possible, but the deformation field that may be obtained is discontinuous, so a regularization loss is added to smooth the deformation field. In this study, the l2 norm is selected as the regularization loss. Equation (3) is the computation formula.
Datasets preprocessing
The experimental data in this study comes from the oasis [17] datasets in the experimental data of HyperMorph. The oasis datasets includes 3D brain T1-weighted MRI images of 414 subjects. The data that was used in this article is the slice of preprocessing 3D brain T1 weighted MRI image. The preprocessing operation is as follows: first, FreeSurfer [18] was used to obtain the original data, and the original data was resampled into a grid of 256 × 256 × 256 with a voxel spacing of 1mm × 1mm × 1mm. Then, the obtained origina data were removed from the skull, the data was normalized, and the data was scaled to the [0, 1] interval. Finally, the data used affine transform and cropped the image size to 160 × 192 × 224.
Image enhancement
In this study, the image is enhanced by wavelet decomposition and reconstruction. The process is shown in Fig. 3. Wavelet decomposition of the image uses the wavelet basis to decompose the image into four images, which are horizontal low frequency (LL), vertical low frequency respectively (LH), horizontal high frequency (HL) and vertical high frequency (HH). The four images contain different information, and the three images that are LH, HL and HH contain the detailed information of the original image. Thus, in this study these three images multiplied β (greater than 1) parameter, and then the training data was reconstructed by LL, β LH, β HL and β HH, in order to enhance details. In this study, the Biorthogonal wavelet function was used as wavelet base function in wavelet decomposition and reconstruction. This study used one image for the fixed data, while 383 images for model training and 30 images for validation.

Image enhancement.
The experiments in this study used Windows10 system with GPU, the programming language was Python, and the model was constructed by the pyTorch. The torch version is 1.8.0 + cu111. During training, batch_size was 1 that a set of data was used for training each times, the model was trained for 15,000 epochs and using Adam optimization with learning rate 1 × 10-4. This study sets parameter as λ = 1.0, β = 1.1.
Evaluation method
In this study, the registration performance is evaluated by segmentation labels of brain anatomy. The dice similarity coefficient (DSC) is chosen as the evaluation method. Equation (4) is used to compute the DSC coefficient.
In this study, 23 anatomical structures that were segmented by FreeSurfer are selected to calculate the DSC coefficients, and the 23 anatomical structures are shown in Table 1.
Anatomy table
Anatomy table
In this study, the registration network framework was trained, and the experimental results were compared with affine transformation and voxelmorph models. The registration effect was quantified by the average DSC coefficients of 23 brain anatomical structures, and the experimental results are shown in Table 2. It can be seen from the table that the registration accuracy of the improved registration model is better than the Affine transform method and the voxelmorph-1 method, and which is comparable to the voxelmorph-2 method.
Comparison table of experimental results
The following figure shows the average DSC coefficient of each brain anatomy. As can be seen from Fig. 4, for each anatomy, the registration effect of method in this study is better than the Affine transform. Compared with the voxelmorph-1 method, except for Left-Choroid-Plexus the registration effect of the other brain structures is better than or comparable to the voxelmorph-1 method. Compared with the voxelmorph-2 method, except for the three brain structures that are Left-Inf-Lat-Ventricle, Left-Putamen and Left-Choroid-Plexus, the registration effect of other brain structures in this study is better than the voxelmorph-2 method or comparable to the voxelmorph-2 method. In this study, the experimental results were hypothesis tested by paired t-test, and p = 9.66 × 10-6 was obtained between the paper model and the voxelmorph-1 model, and the p = 0.292 was obtained between the voxelmorph-2 model.

Comparison of anatomical structures.
Figure 5 shows the visualization of the partial registration results of the above registration method, including the fixed image, the moving image, warped images which were obtain by the voxelmorph-1 method, vocelmorph-2 method and the method in this study and the deformation fields that were obtained by the method, respectively.

The registration results.
In this study, the LDIRnet model is improved by adding the full-scale jump connection of the encoding process into the registration network, and more image information is retained by adding fine-grained semantics and coarse-grained semantics in the decoding process. The deformation field is more accurately predicted by retaining more image information. The overall framework is a series of two registration networks, and the final transformation field is the sum of the transformation fields that are predicted by the two registration networks. In this study, image enhancement is carried out before the model training, and the detailed information of the image is enhanced by wavelet decomposition and reconstruction. During training, more features are learned. Experimental results show that compared with the affine method, the voxelmorph-1 method and the voxelmorph-2 method, average DSC coefficient is improved by 28.6%, 1.2% and 0.2%. It shows that the registration effect of the registration model in this study is better. Medical image registration is the basis of disease diagnosis, medical image segmentation, etc. So, accuracy such as disease diagnosis or image segmentation is improved when the model in this study is applied to these aspects. The method proposed in this study can improve the accuracy of image registration effectively. The Unsupervised image registration method does not require the real deformation field as the gold standard, which saves the cost of labor as well as ensures the quality of the registration. Thus, it has potential to be a new development direction of the future registration field.
Footnotes
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (No. 42274173).
