Abstract
Joint segmentation and registration of images is a focused area of research nowadays. Jointly segmenting and registering noisy images and images having weak boundaries/intensity inhomogeneity is a challenging task. In medical image processing, joint segmentation and registration are essential methods that aid in distinguishing structures and aligning images for precise diagnosis and therapy. However, these methods encounter challenges, such as computational complexity and sensitivity to variations in image quality, which may reduce their effectiveness in real-world applications. Another major issue is still attaining effective joint segmentation and registration in the presence of artifacts or anatomical deformations. In this paper, a new nonparametric joint model is proposed for the segmentation and registration of multi-modality images having weak boundaries/noise. For segmentation purposes, the model will be utilizing local binary fitting data term and for registration, it is utilizing conditional mutual information. For regularization of the model, we are using linear curvature. The new proposed model is more efficient to segmenting and registering multi-modality images having intensity inhomogeneity, noise and/or weak boundaries. The proposed model is also tested on the images obtained from the freely available CHOAS dataset and compare the results of the proposed model with the other existing models using statistical measures such as the Jaccard similarity index, relative reduction, Dice similarity coefficient and Hausdorff distance. It can be seen that the proposed model outperforms the other existing models in terms of quantitatively and qualitatively.
Keywords
Introduction
Image segmentation and registration are considered the most demanding issues in image processing faced in many areas like in medical imaging analysis [1, 2], pattern recognition [4], geophysics [5], data comparison common reference frame [3], shape tracking, etc. In the above-mentioned cases, both image segmentation and registration depend on each other and can be treated at the same time within the joint framework. The aim of image registration is to find an optimal geometric transformation which aligns different images data [6], whereas image segmentation aims to partition an image based on some certain special features w.r.t. color, texture, intensity etc.
In last two decades, few joint segmentation and registration models have been proposed for mono-modality images. In joint models, it is important that better registration of moving image should lead to better segmentation of target image and vice verse. This requires a match of the intensity distribution and geometric features of the moving and target images. On the other hand, the multi-modalities images are produced by different image modalities and different intensity ranges leading to extreme intensity differences. Therefore, the aim of registration process for multi-modalities images is to correlate the given images from two or more different modalities without changing the modality of the moving image. Multi-modality image registration enables the combination of complementary information from images acquired using various modalities [15, 17]. The multi-modality image registration task is shown to be more difficult than mono-modal image registration because finding an optimal similarity measurement for it is very difficult. The image registration for such cases requires outputs that show physiological and anatomical information. Such common examples are found in the fields of medicine in multiple diagnostics where MRI and CT images introduce various image perspectives and information.
In literature, there exist several simultaneous unsupervised and supervised models for mono-modality images. Well known simultaneous models for mono modality are classified as rigid registration based models [7, 8], non-rigid registration based models [9, 10] and atlas-based registration models [11] etc. For the first time segmentation and registration was jointly presented as a model by Yezzi et al. in [8]. In [13], Le-Guyader et al. presented a joint model for non-rigid mono-modality images. The authors employ a non-linear elastic term as a similarity measure. The model registers the segmented moving image with the target image and allows large deformations. This model is able to construct topology retaining segmentation which warps the level set of the moving image to the level set of the target image without blending or breaking. In comparison to this model, Ibrahim et al. [14] presented a joint model using linear curvature as smoother. This model can deal with the mono-modal images containing more than one object and severe deformations but still can not deal with multi-modal images.
Joint segmentation and registration models for multi-modality images are just a few that have been worked upon and among those, the first one is introduced by Wang et al. [12], with the segmentation of the target image without changing the non-rigidity. Although this model is doing well in accommodating the images having different intensities but can not cope with large deformations. A variational model based on a local energy density for multi-modality image registration is presented in [18], which utilizes a generalized notion of image morphology. In [16] a non-information theocratical measure for image registration is considered using multi-modality images where the error of segmentation is considered as registration cost function. This model is designed only for rigid registration leading to the limitation of this model. In [25] the author presented a joint model, which combines both nonlocal shape descriptors and total variations. The limitation of this model is that this model depends on many parameters. In [19] Ademaj et al. presented a joint regmentation model using linear curvature smoother and the mutual information [21] similarity measure for image registration through an active contour [23, 37] representation which easily can cope with the merging and splitting for a proper segmentation. Even though the model show relatively good results for some hard cases where rigid or non-rigid image registration is required, the model may stuck in local minima and struggles with severe non-rigid deformations. Due to the use of mutual information based similarity measure, this issue is demonstrated in experimental section in detail.
Moreover, it is observed that joint segmentation and registration model may produce better results than applying segmentation and registration separately. In Fig. 1, segmentation and registration models are applied separately and in Fig. 1a the joint segmentation and registration model is used. From Fig. 1a we can see that jointly the model produce better results as compared to separately applying segmentation and registration models in Fig. 1.

Applying segmentation and registration separately. Results of segmentation with LBF model and then using linear curvature model based on CMI for registration of synthetic noisy image. (a) Segmented moving image, M with

Applying segmentation and registration jointly by using LBF model for segmentation and linear curvature model based on CMI for registration of synthetic noisy image. (a) Segmented moving image, M with
Working with multi-modality medical imaging data presents challenges, which is typically the driving force for joint models. The capacity to effectively combine data from several imaging modalities can offer an enhanced understanding of the patient’s status. The recognition of shortcomings in current approaches could serve as the driving force. The goal of creating a joint model could be motivated by the need to fill any gaps in the methods used today for image segmentation and registration in order to get more reliable and accurate outcomes.
To address these issues, in this paper, we propose a joint model for segmentation and registration of multi-modal images. The proposed model is based on conditional mutual information (CMI) similarity measure to adjust the spatial positions. The CMI similarity measure is flexible to non-rigid deformation [24]. For the segmentation part, the model’s energy is based on local information of the image through the kernel windows for better segmentation results. For smooth deformation field, we use linear curvature as a regularization term. The proposed model has the following advantages over the existing models. (i) This model is using Conditional Mutual Information (CMI), which is far better from MI when applied in images having noise and intensity inhomogeneity. (ii) This joint model is using local binary fittings for the selection of region of interest (ROI), which is proven to give better results in images with intensity inhomogeneity. (iii) For regularization we have used linear curvature which give smooth deformation with linear computational complexity. For detecting region of interest, we are not using length term as a regularization and does not require any re-initilization of the level set function.
The rest of the article is structured in the following way: In section 2, some classical models are reviewed and we indicate towards their constraints. Section 3 describes the variational formulation of our proposed model. In section 4 we evaluate our model experimentally and in section 5 the conclusion of this paper with a final discussion and future work is given.
A brief discussion of related models is given in this section, that are utilized in the experimental section for comparison.
In [14], Ibrahim et al. proposed a joint model for segmentation and registration of mono-modal images, which is the improvement of model given in [13]. Their proposed model is based on distance measure (SSD), Chen-Vese data term and linear curvature regularizer. They proposed the following model:
In multi-modal images, intensities in moving and target images are completely different as they are acquired by using multiple modalities. MI is the best option for a measure of similarity in multi-modal image registration. Some of well known models based on MI are [20–22]. We give here a brief discussion about Modersitzki model [35]. He proposed the following energy functional:
In [19], Ademaj et al. proposed a joint model for segmentation and registration of multi-modal images, known as regmentation model, which is the modification of model given in [14]. Their proposed model is based on linear curvature regulariser, an active contour model without edges and MI similarity measure. They proposed the following model:
Registration of multi-modality images is very important and is a challenging problem, which is usually done in variational framework through MI. MI does not compare intensities and compare their spatial positions globally. Due to this reason, it may not work very well in images having intensity inhomogeneity or noise. Conditional MI also compares spatial positions but locally, which perform better in images having intensity inhomogeneity. This motivate us to use CMI as data fitting term for registration. Doing registration of images separately may not give good results in complex medical images with noise/intensity inhomogeneity. So we propose a joint segmentation and registration model for the better registration of complex medical images. This process will decrease the computational cost of the algorithm. For the segmentation part of the model we have introduced local binary fitting energy, which perform better in images with noise/inhomogeniety. The process of combined segmentation and registration can be done in the following two ways: 1) Registration and then segmentation separately or vice versa. 2) Joint model for segmentation and registration. In the first case, segmentation of moving image is done by using a suitable segmentation model then registration model is applied on the image to get a transformed image and the final contour of the moving image is deformed to the target image. This process is explained in Fig. 1, where (a) is the segmented moving image with
Here, a joint model for segmenting and registering of multi-modality images is proposed, which overcomes the drawbacks of the existing models by utilizing different types of energies for both segmentation and registration. The proposed model is utilizing local binary fitting data term and for registration, it is utilizing conditional mutual information. For regularization of the model, we are using linear curvature. The new proposed model has an ability to segment and register multi-modality images with intensity inhomogeneity, noise and/or weak boundaries.
Our proposed model comprises of two terms namely segmentation and registration. The term responsible for segmentation of moving images uses local information of the moving image. Binary functions are calculated in each local window for fitting the image data. This type of data fitting can segment images with weak object’s boundaries, images with intensity inhomogeneity and images have strong noise. The similarity measure is responsible for registration of images, which is based on conditional mutual information (CMI) [30]. Conditional mutual information (CMI) is introduced to estimate feature redundancy and interaction, respectively in moving and target images of multi modalities and leads towards an optimal transformation between moving and target images. In active contour methods, the level curves becomes too flat and need re-initialization to make it unit distance function. To decrease the computational cost, we regularize the level set function by using Gaussian kernel instead of re-initialization.
As a result, a joint segmentation and registration model is proposed having the following energy functional:
To discretize the energy functional in (16) we use the following:
multline where η s = (s n - s0)/n s and η t = (t n - t0)/n t , n s and n t shows the number of mesh points. The very small positive constant term * is used to avoid indeterminate forms. Note that the Neumann boundary conditions are used, i.e.
here * represents discrete convolution.
To get the optimal value of
In general, the proposed approach can improve diagnosis accuracy by providing more accurate information about the location and boundaries of structures or abnormalities by joint image segmentation and registration. For medical professionals, this can save time and facilitate easier decision-making. The capacity of the model to handle several imaging modalities can result in a better understanding of the patient’s situation. For the purpose of diagnosis and treatment planning, integrating data from several modalities can offer a more comprehensive perspective.

Block diagram of the proposed joint model.
In this section, we aim to investigate the performance of the proposed model and those of other three existing models [14, 35] on medical images of different modalities. The experimental results of the proposed model are compared with three existing state of the art models qualitatively and quantitatively. The proposed model has been validated on images having intensity inhomogeneity, large deformations and high level of noise. The metric relative reduction of similarity measure is defined as follows and is used for quantitative comparison to judge the registration quality for multi-modality images.
In Fig. 1a, the proposed model has produced satisfactory results and successfully jointly segment and register the noisy images whose output results are given in Fig. 1a (c)-(e). Parameters used have the values ν1 = 2, ν2 = 2, ν3 = 0.9 and γ = 0.4.
Here, we have tested our proposed model on four different multi-modal medical images (Thorax, Brain and Liver [19]) of size 128 × 128. In Fig. 4 we have shown a successful performance of our proposed model in terms of segmentation and registration for images having large deformations, severe intensity inhomogeneity and weak boundaries. In Fig. 4 the first column shows moving images M with the contour Γ, the second column shows the target images I0, the third column shows the deformation fields with the values of

Results of the proposed model for different multi-modality medical images. The first and second columns shows the given moving images with initial contours and target images, respectively. The third and fourth columns shows the deformation fields and the target images with the final contours, respectively. The fifth column shows the transformed moving images.
In this section, we discuss quantitative comparison of the proposed model with the above mentioned existing models say Modersitzki model [35] and regmentation model [19] through Jaccard Similarity Index (JSI) and relative reduction of similarity measure. For testing, we have used different type of synthetic and medical images.
It must be noted that the value of JSI always lies between 0 and 1, where 0 means no overlapping and 1 means complete alignment. The JSI value closer to 1 indicates that the transformed moving image is quite similar to the target image, whereas, JSI value closer to 0 shows a low resemblance. Table 1, shows the JSI values of the proposed model and two other existing models given in [19, 35]. From Table 1, it can be observed that the proposed model has outperformed the existing models quantitatively as the JSI values of the proposed model are higher than the existing models.
We have also compared the results of proposed model with other existing models by using comparison metric relative reduction which is defined in (30). In Table 2, we have given the relative reduction value of the similarity measure of the proposed model and other existing models. Table 2, shows that the relative reduction values for the proposed model is smaller than the other existing models which shows the better performance of the proposed model.
Table of the proposed model by using images of different resolutions
Here, we have compared the proposed model with the existing models on a variety of synthetic and real-medical images. We have included a few of them below.
On multi-modal medical images with intensity inhomogeneity and weak boundaries, Fig. 5 illustrates the comparative results of the proposed model with the Modersitzki model [35]. In Fig. 5 the first and second columns shows the moving and target images respectively. The third column shows the moving images M with final contours. The fourth column shows the deformation fields, the fifth column shows target images I0 with final contours and the sixth column shows the transformed moving images of the proposed model. The seventh and eighth columns are the results of Modersitzki model [35]. In all of these experimental results, it can be observed visually that the proposed model has produced better results than the Modersitzki model [35] on multi-modal medical images.

Comparative results of the proposed model with the Modersitzki model [35]. (a) Moving images M. (b) Target images I0. (c) Moving images M, with zero level set. (d) The deformation fields. (e) Images I0 with final contour. (f) The transformed moving images
Comparative results of the proposed model with the regmentation model [19]. First row shows the results of the proposed model. Second row shows the results of the regmentation model.
Comparative results of the proposed model with the regmentation model [19]. First row shows the results of the proposed model. Second row shows the results of the regmentation model.
Comparative results of the proposed model with the regmentation model [19]. First row shows the results of the proposed model. Second row shows the results of the regmentation model.
Comparative results of the proposed model with the regmentation model [19]. First row shows the results of the proposed model. Second row shows the results of the regmentation model.
In Figs. 6–9, the first and second rows shows the qualitative comparative results of proposed model and the regmentation model [19], respectively on four different medical images having intensity inhomogeneity. In Figs. 6–9, (a) shows the moving images M, (b) shows the target images I0, (c) show the moving images M with zero level set
In Figs. 8–9, (a) shows the moving images M, (b) shows the target images I0, (c) show the moving images M with zero level set
Additionally, we evaluated our proposed model using the CHAOS dataset (Combined (CT-MR) Healthy Abdominal Organ Segmentation) [36]. The collection contains the ground truth segmentation of the abdominal organs, i.e. (liver, kidney, and spleen etc). For the joint registration and segmentation model, we employed the full T1-DUAL and T2-SPIR (Spectral Pre-Saturation Inversion Recovery) abdominal MRI sequences. A 1.5T MRI scanner was used to collect the dataset. The target image is T2-SPIR and the moving image is T1-DUAL in mode. MRI sequences have a resolution of 256 × 256. To achieve a resolution of 128 × 128, the images are further down-scaled.
In this dataset, we have carried out our experiments on 10 cases. We find the relative reduction applied directly on the gray-scale images of the target and the transformed moving image. Moreover, the Dice Similarity Coefficient (DSC) and Hausdorff distance (
Comparison of the proposed model with the regmentation model and Modersitzki model in terms of CPU time
Comparison of the proposed model with the regmentation model and Modersitzki model in terms of CPU time
Also, the quantitative comparison of the proposed model with the regmentation and Modersitzki model using this dataset is presented in Table 5 and Figs. 10–12. The higher value of DSC and lower values of
We also find the statistical measures of CHAOS dataset to show the statistical significance of our proposed model. We calculated the means and standard deviations of our proposed model, the regmentation model and the Modersitzki model. The DSC is computed on CHAOS dataset images given in Table 5 and the following results are obtained in the form (mean±s.d), (0.9085 ± 0.0241), (0.8586 ± 0.0242), (0.8621 ± 0.0281), respectively, for the proposed model, regmentation model and Modersitzki model. This clearly shows that the mean of the DSC values of our proposed model is high as compared to the other existing models. Furthermore, the standard deviation of DSC of our proposed model is low. The high mean DSC and low standard deviation DSC of our proposed model is statically significant as compared to the regmentation and Modersitzki models. This clearly indicates the better performance of our proposed model on various MR images from the CHAOS dataset.
Comparison table of the proposed model with the existing state of the art models using DSC-Liver, Hd-Liver and RR



In this section, we discuss about the sensitivity of the proposed joint model on a parameter γ. γ is one of the important parameters of the proposed model that can effect the registration results of the medical images. Figure 13 shows the parameter sensitivity of our proposed model. The regularization parameter here is γ and is applied on the multimodal images taken from CHAOS dataset [36]. The joint registration and segmentation is performed with varying γ values to analyze the parameter sensitivity. The dice similarity coefficient is recorded versus γ. From the Fig. 13 it is clear that it have a high peak of DSC value at 0.9501 at γ = 1.05. The range of γ is taken as [1 2]. Also we have minimum peak of DSC 0.9436 at γ = 1.75. All the values of DSC in the range of γ values [1 2] lies in between 0.9436 and 0.9501. The resultant DSC values show the results are highly correlated even in case of varying γ parameter. However, we choose the best performing parameter γ for joint registration and segmentation.

Sensitivity of the proposed model on parameter γ vs Dice Similarity Coefficient for image 3 taken from CHAOS dataset [13].
In this paper, a new joint model for the segmentation and registration of multi-modality images having severe intensity inhomogeneity/noise/weak boundaries is proposed. For segmentation purposes, the model will be utilizing local binary fitting data term and for registration, it is utilizing conditional mutual information. For regularization of the model, we are using linear curvature. The new proposed model is capable of segmenting and registering multi-modal images having intensity inhomogeneity, noise and/or weak boundaries. The proposed model is compared quantitatively and qualitatively with the existing state of the art models through statistical metrics like Jaccard similarity index, relative reduction of similarity measure, Dice Similarity measure and Hausdorff distance. The relative reduction of similarity measure (RR CMI ) is computed to evaluate the registration quality of multi-modality images. We have used MI over CMI for the comparison purpose. We have also tested our proposed model on CHAOS Dataset. For both comparison types, it can be observed that the proposed model has produced better results than the other existing models.
Future work
Future work may include optimization of the parameters of the regularisation terms as well as a further modification which will include the use of the Gaussian curvature term as regularizer. Developing fast numerical methods for the solution of joint segmentation and registration models to deal with large size medical images.
