Abstract
The registration of the infrared (IR) image and the low-light-level (LLL) image remains a challenging problem due to poor dispersion of feature points, low correlation of structure and texture information. In this paper, we propose a method based on neighbourhood difference chain code to address the challenge. First we extracted the feature points of the images with the binary eight or sixteen-neighborhood information. And then construct the descriptor of the feature point by neighborhood difference chain code. At last we use the Euclidean distance to match the feature points. We adopt TNO and INO data sets to verify our method, and by comparing with four objective evaluation parameters obtained by other three methods. The result demonstrated that the proposed algorithm performs competitively, compared to the state-of-arts such as Harris, SIFT and SURF, in terms of accuracy of registration and speed.
Introduction
Fusing multi-modality image can provide complementary information and improve the accuracy of the decision-making. One of the basic problems associated with the fusion is the image alignment of the same scene remarked from different positions or different sensors. This is also called the image registration, which aims to restore the correspondence between images. Once such a correspondence is established, all images can then be converted into the same reference to achieve the function of information complementarity. Given many types of concerning images, in this paper we pay attention on IR images and LLL images. Naturally, the IR and visible light sensors are complementary. IR detectors detect the thermal radiation of objects in the scene, while visible light detectors detect the light reflection information of objects. Therefore, combining LLL image and IR image can enhance the scene information and have important applications in many fields.
The unique characteristics of IR and visible light spectrum result in the big differences of the gray level of the two images. For instance, the same position in an IR image and a LLL image could have nearly inversed gray gradients. In this case, the image registration based on traditional image gray scale methods may lead to mismatching.
The image matching based on key points is an effective approach in image registration, including Harris corner feature, scale invariant feature transform (SIFT) [1], speeded up robust features (SURB) [2], binary robust independent element features (BRIEF) [3] and directional brief (ORB) [4], etc. Among these methods, the classical time domains SIFT or its variations are most commonly used image registration method. However, for multispectral imaging, the performance of SIFT is still unsatisfactory [5–8], especially for IR and LLL images.
In this work, our contributions include following two aspects. Firstly, we proposed binary neighborhood algorithm to extract the feature points, which can express the regional contour feature and texture information. Secondly, to match the feature points we proposed the neighborhood difference chain code algorithm, which not only improves the accuracy of registration, but also maintains a good efficiency. Comparing with the state-of-the-art methods, our method has better performance on LLL and IR image registration. Consequently, it could be employed to raise the performance of LLL and IR image fusion applications.
Among the remaining sections of this paper, Section 2 reviews the image registration methods. Section 3 covers the proposed approach in detail, including feature point detection, feature point description and matching. Section 4 gives our experimental results, including qualitative illustration and quantitative analysis of the performance of the proposed approach and comparison to the state-of-arts such as SIFT, SURF, etc. We concluded the paper with future work in Section 5.
Related work
Considering the image registration as a fitting problem, distance measurement will be a fitting performance criterion. According to the criterion, the registration method can be divided into intensity- based method, spectrum-based method and feature- based method.
In the intensity-based method, Viola et al. [9] proposed the method to use maximization of mutual information(MI) to align the images. To address automated 3D multi-modal medical image alignment, Studholme [10] proposed entropy-based registration criteria. Pradhan [11] proposed a novel similarity measure by integrating the effectiveness of each voxel along with the intensity distributions for computing the enhanced MI using joint histogram of the two images. Chen [12] proposed the method of PCA based regional MI to address robust medical image registration. The assumption based on the MI measure mainly relies on the similarity of the intensity statistics between the images to be aligned. Due to the significant difference in intensity distribution of multi-modal images, the intensity based measure is not suitable for multi-modal image registration [13].
Spectral method represents image data structure through spectral decomposition, which is an upgrade of intense-based measurement method. Piella [14] proposed the method to use the first embedded coordinate of diffusion map and Laplacian feature map to represent the features of multi-modal images. Bansal [13] proposed the method to use the joint graph to match disparate image. Zimmer [15] proposed a method to use Laplacian commutators to registrant multi-modal image. Because the spectral measure established by the L1 or L2 distance measure is convex function, the solution of the spectral measure is more concise than that of the MI measure. However, the derivative formula of spectral measurement is quite complicated and difficult to derive. Therefore, the development and practical application of spectral methods are limited.
The feature-based method uses the distance between the matching features of the images to be aligned [16] to measure the performance of image registration. These image features include points, curves and surfaces [17]. Compared with the curves and surfaces of a discrete set of points, point features are preferred. Some typical models were used to construct feature-based metrics, including L2 loss criterion [18], L2 minimum estimator [19], regularized Gaussian field criterion (RGF) [20], and Gaussian mixture model [21]. Because of the convexity based on feature-based measures, these models can be minimized by gradient-based methods, such as gradient descent and quasi-Newton methods. In addition, a more important problem is how to find enough matching features from the multi-modal image for alignment. Image feature-based registration algorithm is the method studied by most scholars due to its high registration accuracy, among which point feature is more preferable than line feature and surface feature.
For IR-LLL images, it is difficult to find the correspondence between feature points due to the nonlinearity between pixel intensities [22]. The intensity of IR images changes with the temperature of objects, while the intensity of LLL images is closely related to the color and lighting of object. Obviously, this nonlinearity leads to a lack of correlation between their respective gradients. In addition, since IR images are smoother without much detail and texture [23], it is ineffective to use local gradient features to detect feature points. To overcome the limitations above, in this paper we propose a binary neighborhood algorithm to extract the feature points. And then, we propose the neighborhood difference chain code algorithm to construct the descriptor. At last we use the Euclidean distanced to match the feature points.
Methodology
The reason for the failure of the traditional classical algorithm is that the pixel gray relations of the two images are inconsistent due to the difference of the imaging principles, but the contour structure and texture information of the two images are consistent. In order to eliminate the difference of image pixel gray scale, we use the relative gray value to calculate the chain code and the mean square gradient to set the direction of feature points. To reflect the structure and texture, we use the chain code to establish the feature point descriptors.
Image registration contains three steps: feature point detection, generation the feature point descriptor and feature point matching, among which the feature point descriptor is the key for image registration. Although introducing adjustments at the matching stage can improve the percentage of correct matches, the results in IR-LLL using SIFT or SIFT adjustments (such as [7, 24]) as descriptors are poor. This low quality of matching is mainly due to the lack of gradient description ability of IR images, and the loss of details and texture resulted by image smoothing. Below we introduce our approach and demonstrate how it addresses the low quality of matching. The flow of the algorithm is shown in Fig. 1.

The flow of the proposed method.
LLL images have low gray level, while IR uses the thermal radiation of matter to image, missing details and textures. As a result, most of the detected feature points fail to correctly reflect the structure and texture information of the images. To address the issue, we proposed a binary neighborhood algorithm.
First, we binarize the LLL and IR images. And then we extract feature points based on the following:
Where P is the feature point, K is the number of neighborhoods, I p is gray value of the the feature point and Ip→x is the gray values of neighborhood points. For IR images, we set K = 8 and for LLL images we set K = 16 as they have more details and contour information.
Due to the thermal difference of various objects in the IR image, the contour of the object is obviously different from its surroundings, and the corresponding binary image also presents significant characteristics of the structure contour. The low contrast of LLL image results in insignificant structure contour information in the image. Therefore, a suitable number of feature points can be obtained by considering only 8 neighborhoods when detecting feature points in binary IR images. However, for binary LLL images, due to the low contrast of the image, only a few feature points can be detected if only its 8 neighborhoods are considered. Therefore, it is necessary to consider its 16 neighborhoods, and a suitable number of feature points can be obtained at this time. Neighborhood diagram is shown in Fig. 2.

Diagram of 8 (in green) and 16 (in blue) neighborhoods.
In the original image, the points that reflect the texture details of the IR image and LLL image are concentrated at the edges of each contour region. Therefore, it is only necessary to consider the number of points in the 8 or 16 neighborhood that are inconsistent with their gray values in the pixel to determine whether they are edge contour points. In this paper, the threshold is set to 6 when extracting feature points in IR images and 10 when extracting feature points in LLL images.
Figure 3 shows the feature points detected using Harris, SIFT, SURF and binary neighborhood algorithm.

Examples of feature point detection of ten groups of images. For each group, on the left is the IR image and on the right is the LLL images.
As can be seen in Fig. 3, the 10 groups of image feature points extracted by Harris method is not only a small number but also fails to reflect the structural contour information of trees, tanks, roads, chariots, aircraft and soldiers. SIFT method extracted relatively high quality of feature points from the first group, a small number of feature points from low-contrast images such as group 4, 5 and 6 and a large number of feature points from noisy images such as group 2, 3, and 7. SURF method extracted feature points from group 1, 6 and group 10 images, which reflect the structure and contour of various substances in the images while the feature points extracted in other groups did not reflect structural details. On the contrary, the binary neighborhood algorithm has extracted high quality of feature points for all images. In fact, we extracted the feature points concentrating in the contour of trees, houses, smoke, tanks, roads, chariots, aircraft and soldiers, which reflect the structure and texture details of the image.
SIFT sets the direction for each feature point before the generation of descriptor, and uses direction histograms to calculate the main directions [25]. However, the direction histograms calculated by the different source images may point to unrelated directions, and this can lead to false matches. In addition, the orientation of histogram is discrete, which is related to the number of histogram bars. Compared to the direction histogram, the average square gradient [26, 27] is continual, which is more accurate and more efficient. As long as the structure contour is the same, the main direction calculated by this method remains unchanged. Therefore, we use the average vertical direction of the gradient as the main direction of the candidate control point, so the same primary direction is obtained even if the gradient of the image is in the opposite direction.
Given an image I, its gradient
In this equation, the second element of the gradient vector is always positive as the opposite direction of the gradient represents the equivalent principal direction. To calculate the principal direction, the image gradient should be averaged or accumulated within an image window. The opposite gradients, if directly averaged or accumulated, cancel each other out, but they should reinforce each other because they represent the same principal direction. So, we square the gradient vector in the complex domain before averaging it.
We average gradient vectors using a 3×3 neighborhood window:
There into:
The main direction of each field of 0 ⩽ φ < π is given by the following equation:
Therefore, for each candidate control point p (x, y), the main direction is specified as ∅ (x, y).
To ensure the rotation invariance we rotate the coordinate axis along the direction of the feature point [28], as shown in Fig. 4.
After rotating the feature points in the main direction, we can calculate the descriptor of the feature points. According to the characteristics of two different images, we propose a differential neighborhood descriptor, as shown in the following steps:

Rotation of axes.

Extract feature points and relative gray values of 16 fields.

Feature point chain code.
Step 1: Extract the gray value of feature point P in the IR and LLL images, and its 16-neighborhood points:
Where p (x, y) represents the gray value of the feature point, f i (x, y) denotes the gray value of the 16 neighborhood points.
Step 2: Get the minimum value of each feature point and its 16-neighborhood points respectively:
Step 3: Obtain the relative gray value of the feature points and its 16-neighborhood points to remove the influence of gray difference between the two images. We calculate the minimum value, and subtract the minimum value to the relative gray value of each point, as shown in Fig. 5 and Equations (9) and (10).
Step 4: Through the previous step, we obtained the relative gray value LLL′ (i) and IR′ (i). We use the relative value of feature point to subtract the relative gray value of the neighborhood once in accordance with the main direction. Obtain the feature point chain codes as show in Equations (11) and (12). The schematic diagram is show as Fig. 6.
Step 5: Calculate the differential chain codes.
Based on the relative chain code of 16 neighborhoods, we subtract the relative chain code of the next neighborhood point from the first neighborhood point to form the differential relative chain code. The flow is shown in Fig. 7. The equations are show below:
Differential chain code of feature point.
Step 6: Using the relative differential chain code to construct the descriptor, which is a 16×1 dimensional matrix:
Here we compare the description matrix of each feature point of the two images and establish the correspondence between the feature points. We use Euclidean distance [29] to calculate the distance between feature points:
Take a feature point in one image and find the closest two feature points in the other image through traversal. In these two feature points, if the closest distance divided by the sub closest distance is less than a certain threshold T, we regard it as a pair of matching points. Finally, we use RANSAC algorithm [30] to remove the incorrect matching points.
To evaluate the effectiveness of differential chain code method, we use forty groups of IR and LLL image data from the network dataset TNO Image Fusion Dataset [31] and ten groups of images from INO Dataset1 as test subjects. The Dataset includes intensified visual, near-IR, and long-wave IR or thermal, night time imagery of different military relevant scenarios. These image pairs mostly have evident pixel intensity in IR images and abundant details in visible images. In this paper, we present the results of 7 groups of TNO and 3 groups of INO. Table 1 details the properties of the ten groups of images.
Properties of our image dataset
Properties of our image dataset
We compare our proposed approach with Harris, SIFT, and SURF. The ten groups of experimental results are shown in Fig. 8.

Image registration results. For each group, on the left is IR images and on the right is LLL images.
Through the results, we can see that Harris method failed to match feature points correctly. SIFT method extracted more feature points than Harris, but only some images were matched correctly, for example groups 3, 8 and 9 in Fig. 8(b). The accuracy of SURF method algorithm was improved to a certain extent compared with Harris and SIFT, but it still only achieved limited accurate registrations on the images of individual groups such as groups 2, 3 and 9 in Fig. 8 (c). On the contrary, differential chain code approach, which takes consideration of the gray difference and the direction of feature points, succeeded in extracting the feature points of description structure and texture information. As a result, the neighborhood chain code difference algorithm has achieved high registration accuracy such as Fig. 8(d).
In this section we quantitively analyze the performance of differential chain code approach and compare it with Harris, SIFT and SURF, using the following four criteria:
The bigger the precision matching (PM), the better:
The smaller the Mismatches ratio (MMR), the better:
The average values of the criteria for fifty groups of images.
The smaller the sum of inverse total number of matching and mismatch ratio (SITMMR) [32], the better:
The bigger the subtraction of inverse total number of matching and matching correctness (SITMMC) [32], the better:
In the above equations, TP Is the correct match, FP is an unmatched number, NB FM is the number of false matches, NB TM is the total number of matches, and NB CM is the correct match.
Figure 9 shows the statistical results of the registration of 10 groups in Fig. 8. It can be seen from Fig. 9(a) that differential chain code approach achieved significantly higher precision among all groups. In terms of the mismatch ratio, differential chain code approach gives generally lower ratio than other algorithms, as shown in Fig. 9(b). Figure 9(c) clearly shows that a significantly lower reciprocal sum of matching and mismatch than other algorithms. The effectiveness of differential chain code approach is further demonstrated in Fig. 9(d) such that the average difference between the total number of false matches and the correct matches is more than 80% for differential chain code approach, which is higher than 20% of other algorithms. In addition, we also provide the elapsed time comparison of the different methods in Table 3. All the methods run on the software environment of Matlab2018a and hardware environment of 8 g memory and 2 g GPU. Each value denotes the mean of run times of a certain method on a dataset and our method can achieve comparable efficiency with the other methods.
Run time comparison of four methods on the datasets. (unit: second)
In this paper, we have proposed an algorithm for IR image and LLL image registration. Firstly, we proposed the binary neighborhood algorithm to extract feature points. And then we constructed the descriptors of feature points by 16-neighborhood relative gray value. At last, we matched the feature points of two images by Euclidean distance. Through testing the ten groups of image sets from two open public data sets, the result demonstrated that comparing with the state-of-arts methods, differential chain code method not only improved the registration precision, but also improved the running time. There are however some improvements to be made in the future. One of them is to optimize the feature extract algorithm and try to extract the feature points that can better reflect the structure. We also should optimize the method of construct descriptor and the match algorithm for enhancing the precision matching.
Footnotes
Acknowledgment
This work was supported in part the by the National Natural Science Foundation of China under Grant 61572392 and in part by the National and Local Funds for New Networks and Measurement and Control Laboratories under Grant GSYSJ2017001.
