Abstract
BACKGROUND:
Multi-modal medical image fusion plays a crucial role in many areas of modern medicine like diagnosis and therapy planning.
OBJECTIVE:
Due to the factor that the structure tensor has the property of preserving the image geometry, we utilized it to construct the directional structure tensor and further proposed an improved 3-D medical image fusion method.
METHOD:
The local entropy metrics were used to construct the gradient weights of different source images, and the eigenvectors of traditional structure tensor were combined with the second-order derivatives of image to construct the directional structure tensor. In addition, the guided filtering was employed to obtain detail components of the source images and construct a fused gradient field with the enhanced detail. Finally, the fusion image was generated by solving the functional minimization problem.
RESULTS AND CONCLUSION:
Experimental results demonstrated that this new method is superior to the traditional structure tensor and multi-scale analysis in both visual effect and quantitative assessment.
Keywords
Introduction
Based on different imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), and single-photon emission computed tomography (SPECT), medical images can reflect human body structure information or molecular metabolic information from different perspectives. Although PET and SPECT possess an essential advantage in the early diagnosis of lesions because of conveying the information of blood perfusion and metabolism of tissues and organs, the images can’t show the anatomical structure clearly enough due to the limitation of resolution and consequently lead to the difficulty in the localization of lesions. For the sake of providing the physicians with much more comprehensive clinical information, the strategy of fusion of various imaging modalities is proposed and carried out. For example, PET-CT image fusion, which combines anatomical features with functional metabolic information, can not only improve the diagnostic accuracy of head and neck cancer [1], but also facilitate the detection of lung cancer in organs with high motor changes [2]; MRI-PET image fusion can be used to detect brain tumors [3] and metastasis of liver cancer [4]; SPECT-CT image fusion can be employed to study coronary artery disease [5], and to identify bone cancer metastasis or benign and malignant lesions [6]. Moreover, medical image fusion has important application value in surgical planning, setting radiotherapy plan and checking post-treatment effects as well [7, 8].
As an effective information synthesis technology, many medical image fusion methods have been proposed in decades [9–14]. Multi-scale analysis (MTA) as the most popular method in transform domain has developed various transformations and improvements including discrete wavelet transform (DWT) [15], discrete shearlet transform (DST) [16], contourlet transform [17], non-subsampled contourlet transform (NSCT) [18], and non-subsampled shearlet transform (NSST) [19]. However, the fusion rules used in different sub-bands in MTA may not always effectively fuse the image features, resulting in the loss of detail information and degradation of image quality. Another method of sparse representation has also emerged in the research of transform domain fusion [9, 20]. The algorithm based on sparse representation (SR) first obtains the sparse coefficient representation of the source image through dictionary learning, then fuses the sparse coefficients and constructs the fusion image by combining the overcomplete dictionary. Although the SR-based method has been successfully applied to fusion with medical images, only one dictionary is usually used to represent the different morphological structures of the source images. The common features contained in the dictionary are not enough to effectively express the content of images, especially for the irregular details, and result in over smooth of the fusion result. Moreover, deep learning technology has also been successfully applied to image fusion [21, 22]. However, due to the lack of sufficient training data and an authority to provide the ground truth for supervision, it is difficult to realize medical image fusion based on deep learning.
Recently, researchers have proposed image fusion methods in the gradient domain [23–28], in which the important features of the image including edge information can be well preserved onto the fused image. Piella [12] utilized structure tensor as a gradient information to fuse contrast information for multi-spectral images, thus effectively preserving the edge details of the input images. For noise images, Zhao et al. [28] obtained the ideal effect in noise-containing image fusion experiments by defining gradient entropy as the gradient domain weight and introducing P-Laplace diffusion constraint. However, these methods are mainly applied to the fusion of two-dimensional (2-D) multi-spectral images. In general, a medical image is constructed to be a three-dimensional (3-D) volume, which is made up of slices collected along a specific direction. Adjacent information between slices is essential for experts to evaluate the integrality and activity of organs and tissues. Previous studies mostly focused on 2-D image fusion issue in which 3-D image fusion can be achieved by fusing slice by slice. This rough method of generating 3-D fusion from 2-D fusion can easily ignore the hidden volume context information in the 3-D structure, resulting in the loss of information and incomplete fusion [27]. Therefore, it is of great clinical significance to propose an effective 3-D image fusion method. For instance, some researchers combined gradient-based image fusion method with MTA to realize the fusion of 3-D images [23, 25]. Wang and Liu [26] used this gradient-based fusion method to successfully realize the fusion of 3-D non-small cell lung cancer PET/CT images.
Although the fusion method based on 3-D structure tensor can fuse PET-CT images well, it still has some drawbacks. On the one hand, this method cannot capture 3-D continuous detail structure well and lack spatial continuity; on the other hand, the fusion image obtained by this method has small but consistent detail loss due to the incomplete fusion gradient domain. To address these problems, we propose an improved method to improve the fusion performance. For purpose of capturing structural continuity in 3-D medical images, directional structure tensor (DST) is proposed in our previous work [29], which has been proven to be able to describe the local structural information precisely in denoising. The DST contains not only the direction of the local change in the image but also the second-order derivatives. The second-order derivatives are used to build a more complete gradient field that is very instrumental in fusing images based on gradient field. Inspired by this work, we propose a 3-D medical image fusion method based on DST herein. Furthermore, it is necessary to maintain the continuity of information when fusing images, thus we use local entropy to construct gradient weights of different images to construct local entropy-weighted directional structure tensor. Local entropy can express continuous geometric structure and texture information of the image, which is helpful to keep the continuity of information. Additionally, for highlighting the detailed information in the fused image and remedying for the loss of small details caused by reconstruction, we use guided filtering to obtain the base components of the image, then subtract the base components from the source images to obtain the detail components. Once the detail components are obtained, they are converted into gradient domain and added to the fusion gradient field to construct the final objective function.
The remainder of this paper is organized as follows. In section 2, the basic principle of DST is briefly introduced. Section 3 describes the proposed image fusion method. Experimental results are performed in section 4. Discussion and Conclusions are given in section 5 and section 6.
Directional structure tensor
In this section, we briefly describe how to build DST. Firstly, the traditional structure tensor (ST) is constructed according to the image local neighborhood gradient information, and then the eigenvectors of the traditional ST are interpolated into the original image to reconstruct a DST with the direction of neighborhood change. Finally, the contribution of the second-order derivatives is also added to construct a more complete gradient field, which is very beneficial for reconstructing the fused image from the gradient field.
Supposing f(
The eigen-decomposition of the 3-D ST is:
For purpose of describing the local structure information more accurately, we construct DST based on the eigenvectors of traditional ST and the second-order derivatives. DST is defined as follows [29, 30]:
The superiority of DST is that the horizontal structure of image or the discontinuity between slices can be easily captured by the reconstructed directional derivative. In order to more intuitively explain the advantages of DST in describing local texture information, one of the CT images of lung is selected for display. Figure 1 shows the maximum eigenvalue images of the traditional ST and DST, (a) is the maximum eigenvalue image of the traditional ST, and (b) is the maximum eigenvalue image of DST. From the Fig. 1 (b), we can see the local structure information of the image can be described more completely based on the DST method. Specifically, as shown in the region of interest marked in red, the continuity of the blood vessel is better, and the linear texture representation of the image is also more clear and complete. Thus, the DST can capture the volume context information in adjacent slices better.

Maximum eigenvalue images of tensor matrixes.
In this section, we will introduce the local entropy weighting firstly to reconstruct a new DST. Then we use the guided filtering to obtain the detail components to increasing the contribution of the details in the fusion image. Afterwards, we extract the image features according to the established DST, fuse these features and the extracted details in the gradient domain, and finally reconstruct the fused image from the gradient domain. In the process of fusion, the texture features of the image are fully considered, which are conductive to obtain more rich information and clear image in the fusion results. Figure 2 illustrates the flow chart of the proposed fusion algorithm.
Local entropy weighting

Algorithm flow chart.
As an index to measure the local information of an image, the local entropy reflects the discreteness of the voxel distribution of each gray-level of the image, and explains the local texture information of the image. The local entropy LE is defined as follows:
Therefore, combining the source image gradient field with local entropy weighting, the final DST is defined as:
In the purpose of enhancing the detail part of the fused image, we extract the detail components of the source images. The first is to carry out the guided filtering on the source image to get the base component:
Since the guided filtering can preserve the edges of the image well during filtering that causing the blurred details of the base component, thus it is only the detail part of the image when subtracting the base component from the source image. As shown in Fig. 3, Fig. 3 (a) shows the details extracted from one slice of the image in the CT sequence, and Fig. 3 (b) shows the details extracted from the image in the MR-T1ce sequence.

Detail map. ((a) shows the details extracted from the CT image, (b) shows the details extracted from the MR-T1ce image).
According to the analysis of the ST in [30], the eigenvalues λ1, λ2, λ3 can provide a measurement of the structural information anisotropy, such as the linearity and planarity of the local structural features of the image. Medical images contain a large number of complex structures such as blood vessels and organs, while image fusion requires the display of these important structures. Therefore, based on the relationship among the eigenvalues, the coherence measurement among the gray values of local pixels is further defined to describe the local information of the image, which aims to highlighting the texture details and edge regions of the image, so that the important organizational features are better preserved in the fusion weighting process. The coherence measurement is designed as:
After constructing the feature template of each source image, the corresponding weight is assigned to the corresponding gradient field of each source image according to the structural features:
Therefore, the final fused gradient field is defined as:
The tensor matrix of the fusion image that is closely related to the fusion gradient field needs to be as close to
However, since the eigenvector
Now we need to reconstruct the fusion image
Further, we transform the previously extracted detail information into the gradient domain to fuse more effective details, which the details can preserve and enhance effectively by solving the optimization model:
On the basis of the principle of variation method, formula (18) can be derived and reduced to the following:
For the initialization image f0, we use the weighted combination
In this section, several experiments on multi-modal medical images are performed and analyzed to verify the effectiveness of the proposed method. The experimental data are downloaded from the websites of TCIA [32] and BraTS 2018 Data [33, 34]. In these data, CT/PET data are downloaded from the TCIA website. Among these data, the lung data of each CT image is 512×512×257, and PET image is 128×128×257 with a same slice spacing 3.2 mm. The brain data of each CT image is 512×512×134, and PET image is 256×256×134 with a same slice spacing 3.0 mm. The MR data are downloaded from the BraTS 2018 Data website and the size all are 240×240×155 with a same slice spacing 1.0 mm. The proposed method is conducted on two groups of MR images (T1ce/T2). Before the fusion process, a preprocess procedure is required to complete the feature and size adaption.
In addition, some classical fusion methods such as 3D-ST [26], ST-NSST [35], GTF [36], GFF [11], and NSCT-LLE [37] are compared with the proposed solution in comparative experiments. 3D-ST is a 3-D structure tensor model for non-small cell lung cancer PET/CT image fusion. ST-NSST is a multi-modal image fusion method based on structure tensor and non-subsampled shearlet transform (NSST). GTF is a method for infrared/visible fusion based on gradient transfer and total variation minimization. GFF is proposed by using guided filtering. NSCT-LLE is a medical image fusion algorithm that combines phase congruency and local Laplacian energy with NSCT. Considering that GTF method is for infrared/visible fusion which are similar to the PET/CT, GTF method is only applicable to PET/CT fusion experiments. The GFF method is for multi-exposure images which are similar to the T1ce/T2 images, so the GFF comparison method is only applicable to the T1ce/T2 fusion experiments. Moreover, 3D-ST and ST-NSST are carried out in 3-D method. GTF, GFF and NSCT-LLE are designed for 2-D image, thus the individual 2-D slice fusion is implemented for fusing 3-D volumes.
Qualitative comparisons
First, we conduct experiments to fuse PET and CT images. Two groups of CT images and PET images are shown in Figs. 4, 5 show the fusion results and the rectangles marked in red are regions of interest in the fusion results which are enlarged to facilitate comparison of the visual effects. We can see that every method can fuse the important features of the source image together, but there are some differences among them. From (a) and (b) of Fig. 5, it can easily figure out that the fused results of GTF and NSCT-LLE can contain the lesion clearly but loss some vessel details in volume context information, and the fusion results obtained by the GTF method have the dark brightness owing to retaining the brightness of the PET images only. 3D-ST, ST-NSST and the proposed method all employ the structure tensor to extract features so that the information of the blood vessels in the fusion results are more abundant than others, and the proposed method can maintain the brightness and contrast better of the source images while having high definition. Although the clarity of the contour for the lesion in Fig. 5 (e) is not as good as Fig. 5 (a) enough, still can be seen clearly and is better than 3D-ST and ST-NSST. In conclusion, especially from the analysis of the connectivity of blood vessels and the sharpness of the contour of the lesion, the proposed method is better in reflecting the advantages of DST with extracting the deep structure information of 3-D images and connecting the vessel breaks.

Two pairs of CT/PET source images, (a) (b) are Lung1 image, (c) (d) are L ung2 image. The rectangles marked in red are the regions of interest.

Fusion results of the Lung1 and Lung2 images. From the left to the right are fusion results of GTF method, NSCT-LLE method, 3D-ST method, ST-NSST method, and the proposed method, respectively.
Figure 6 shows the source images for the other three lung image pairs and two head image pairs, and Fig. 7 shows the fusion results with different methods. Since the structural details are mainly contained in CT, almost each method performs well in the detailed information while the main differences are the connection of the vessels and the distortion of the PET images. By contrast, we can clearly observe that the structures of PET are over-smoothed in fused images of Fig. 7 (a) and (b), specifically in Head1 and Head2, and the vessels in Lung3– Lung5 are unclear compared to Fig. 7 (e). 3D-ST can connect the vessels well, but has comparatively low visual definition. Though the ST-NSST method can retain the details in the image, the definition of the fused results are not as good as the proposed method. Generally, the proposed method is superior to other methods in displaying of the details and preserving the brightness and contrast of the source images into the fusion image without introducing distortion.

Source images of Lung3– Lung5 and Head 1– Head 2.

Fusion results of the Lung3– Lung5 images and Head1– Head2 images. From the left to the right are the results of GTF method, NSCT-LLE method, 3D-ST method, ST-NSST method, and the proposed method, respectively.
To further verify the applicability and performance of our method, we next conduct experiments on two groups of T1ce/T2 images. It can be seen that in Fig. 8, due to various imaging settings, the T1ce images and T2 images contain complementary information. Figure 9 (a)– (e) illustrates the fusion results obtained by the comparison methods and the proposed method. As seen in Fig. 9 (e), the contour of ventricle or edema are shown with better sharpness in our fused results. Meanwhile, compared with other methods, our method transmits more interesting features and preserves the brightness and contrast of the source images better, which all indicate that the proposed method performs well on preserving both the structural information and detailed information in fusing T1ce/T2 images.

Source images of T1ce and T2 images.

Fusion results of the Head3– Head4 images. From the left to the right are the results of GFF method, NSCT-LLE method, 3D-ST method, ST-NSST method, and the proposed method, respectively.
Quantitative assessment of different fusion methods of CT/PET images
To further illustrate the effectiveness of the proposed method, information entropy (QE), joint entropy (QJE), mean cross entropy (QMCE) and mean gradient (QMG) are used as quantitative evaluation metrics for image fusion results [38]. QE evaluates the richness of information for the fused image. QJE reflects the joint information between the source image and the fused image. QMCE is used to measure the difference between the corresponding pixels of the source images and the fused image. QMG is a gradient-based quality metric to indicate the level of detail contrast expression.
The experimental results are compared quantitatively according to the objective evaluation metrics, and the results are shown in Tables 1– 2. In the fusion of CT/PET images, the DST can connect the discontinuities of tubular structures such as blood vessels in the CT images during the fusion processing, which will cause the difference among the fused image and the source images to be large. Therefore, the QMCE metric is meaningless to assess the fusion result of CT/PET, and we choose QE, QJE and QMG as the objective evaluation metrics in CT-PET fusion. For the fusion of T1ce/T2 images, what we need more is to evaluate the difference among the fusion image and the source images, and to evaluate whether the fusion image effectively integrate the structure information of lesion in the two source images. Then we choose QE, QMCE and QMG as the objective evaluation metrics in T1ce/T2 fusion. The boldface in the tables indicates the best results of the objective evaluation metrics. Tables 1 and 2 illustrate the proposed method outperforms other methods on almost the metrics. For details, the QE and QJE of the proposed framework in Table1 are obviously higher than other fusion methods, which demonstrates that our fusion method has more advantages in obtaining information from the source images. Besides, the QE and QMCE of the proposed framework in Table 2 point out that the proposed framework can preserve structure information well of the source images. In spite of the QMG of the NSCT-LLE is a little bit higher than the proposed method in some cases, most of the fusion results obtained by the proposed method still have higher resolution. Through the objective comparison of the four metrics, the superiority of the proposed method in preserving image structure information and reducing distortion is satisfactory.
Quantitative assessment of different fusion methods of T1ce/T2 lung images
Quantitative assessment of different fusion methods of T1ce/T2 lung images
In this section, we discuss the applicability and limitation of the proposed fusion method. The purpose of the fusion method proposed in this paper is to establish a fusion method that is beneficial to processing of 3-D medical images, which can effectively capture the structural continuity among slices and the hidden structural information. Therefore, in the gradient-based fusion method, it is necessary to establish a more ideal and accurate fusion gradient domain to make the gradient of fusion image as close as possible to this fusion gradient domain.
For constructing this ideal fusion gradient domain, we use the directional structure tensor weighted by local entropy to extract the local structure features of image, and fuse the gradient features of the source image effectively via the coherence measurement function. The directional structure tensor is proved to be capable of connecting the discontinuous structures and capturing the information hidden in the voxels in Fig. 1, while the local entropy weighted directional structure tensor highlights the important structural features of the details in source image, and is conductive to the subsequent feature fusion. In addition, since the approximation of the fused gradient domain is composed of the maximum eigenvalue of the weighted directional structure tensor and the corresponding eigenvectors, which may result in a small loss of continuous details. To address this problem, the details of source images are extracted and added to the final fused gradient domain, and the weight of detail components is increased to enhance the details in the fused image. Through taking full advantage of DST and detail components, the proposed fusion method gives a satisfactory result in preserving structure information, capturing the structural continuity and enhancing the vessel structures. In conclusion, the fusion method proposed in this paper can be applied to most 3-D medical images, and can even be applied to other fusion cases that need to obtain 3-D inter-component information. However, there is a limitation of this method, that is, the 3-D weighted directional structure tensor has slight diffusion effect, which is not suitable for color images or causing color distortion.
Conclusion
In this paper, a 3-D fusion method based on directional structural tensor is proposed for the fusion of 3-D medical images. First of all, the eigenvectors of the traditional ST and the second-order derivatives of the source image are combined to construct the DST. Then the gradient weights of different source images are constructed by using the local entropy metrics, which is to construct the weighted directional structure tensor to extracting the important and continuous structural features. For purpose of highlighting the detailed information in the fused image, we add the detail components to the gradient domain. Experiments are carried out on several 3-D CT/PET images, 3-D MR-T1ce/ MR-T2 images to verify the effectiveness of the fusion method. The experimental results demonstrate that the fusion results obtained by this method are more complete and continuous in displaying the pulmonary vascular structure, and preserve the brightness and contrast of the source images into the fusion image without introducing distortion. Future research is expected to focus on the fusion of noise medical images.
Footnotes
Acknowledgments
This work is sponsored by Natural Science Foundation of Shanghai (18ZR1426900), National Natural Science Foundation of China (61201067).
