Underwater multi-focus image fusion based on sparse matrix

Abstract

Due to the limited focus range of optical imaging system, and the locations or focus of different objects in the same scene are different, multiple objects cannot be focused at the same time. In order to solve this problem and make the underwater image clearer, we propose a fusion method based on the sparse matrix in this paper. Firstly, we transform the source image into sparse image by sparse transform and get the clearity of the image based on the sparsity. Then, the clearity image will be segmented into focus regions. After that, the focus regions and non-focus regions are fused respectively based on different fusion algorithms. Finally, the focus regions and non-focus regions are combined to get the enhanced image. The experiments in the end show that the fusion method we proposed in this paper has higher information entropy, correlation entropy, standard deviation, and average gradient, so it can enhance the underwater multi-focus image and can be applied to the underwater object detection.

Keywords

Sparse matrix underwater multi-focus image fusion region segment

1 Introduction

Recently, machine vision was applied in underwater detection [1]. However, due to the limited scope of optical machine, multiple objects in the scene cannot be focused at the same time, and focused regions look clear and other regions look vague. Therefore, even the spatial resolution of the optical imaging system can be improved; the limited focus region still reduces the image resolution. In order to get clearer images in the underwater scene and make the images more comprehensive and accurate, we will propose one method based on sparse matrix which is one kind of fusion enhancement technology.

If we want to obtain the focus region of an image based on fusion enhancement, image clearity is an important evaluation criterion [2], and the choice of the algorithms to get image clearity will affect the quality of the fusion image as well. For the same scene, the outline of focus region of an image in spatial domain is relatively clear and with sharp edges, but the focus region has more information in the high frequency domain.

In this paper, a fusion method of multi-focus image will be proposed based on sparse representation by considering the feature of multi-focus image and underwater optical environment. Because the focus region is clear and non-focused region is vague, we can use sparse transformation to calculate the image clearity, and segment image into different regions. Then fuse the multi-focus image by applying sparse information, light intensity and contrast information in order to solve the problem of loss of the residual information in sparse decomposition.

The fusion enhancement of multi-focus image includes the following steps, firstly we get the multi-focus images sequence by focusing the objects on the different locations in the same scenario, and secondly we should get the corresponding clearity regions based on the suitable algorithm, and finally get the fused image including all the objects in the same scenario by the multi-focus images method in [3].

The remainder of the paper is organized as follows. Section 2 introduces some related work, Section 3 gives the framework of our method, Section 4 to Section 6 gives the detailed explanation of our fusion procedure. Section 7 shows the efficiency of our method, and we conclude the paper in Section 8 with summary of this work.

2 Related work

Over the last several decades, considerable attention has been given to the multi-focus image fusion problem [4 –8]. In recent years, clearity calculation method was proposed based on the characteristics of spatial and frequency domains [9]. The majority of the existing literature on the topic can be categorized into two basic classes of approaches, the spatial domain-based and frequency domain-based methods. The the spatial domain-based methods include MSE function, regional contrast functions and spatial frequency functions. Luo et al. proposed a fusion method, and they applied the region similarity to get the clearity based on the similarity ratio of elements in the eigenvector [10]. Zaveri et al. used the region normalization to calculate the average gray value as the region clearity [11]. The main idea of this kind of methods is to reflect the image clearity by the gray value difference, so there exists some accumulative error as a result.

The frequency domain-based methods include the wavelet transform-based and curvelet transform-based image conversion methods, which both use the high frequency to reflect the image clearity. Wang et al. proposed one fusion method for multi-focus image based on curvelet transform, and got the clearity by the sparsity after the curvelet transform [12]. Yang used the multi-scale wavelet transform to get the fusion of the multi-focus image [13]. But the disadvantage of these methods is high complexity and ignorance of the spatial information.

3 Framework

In this paper, we will consider multi-factors and use the region clearity to design the approach. The framework of our method is shown in Fig. 1.

Fig.1

Framework of our method.

To solve the wrong detection or a detection error of the focus region during the fusion enhancement of underwater multi-focus images, we propose a multi-focus image fusion method based on sparse transform. In the multi-focus image fusion process, we calculate the region clearity to determine the focus region and design different fusion rules on focus region and non-focus region respectively. According to the characteristics of underwater images, we add underwater image intensity and contrast information to get better fusion result. We need to train multiple underwater images and get the dictionary.

4 Calculation of image clearity

We will illustrate how to calculate the image clearity in details.

4.1 Sparse transform

Sparse representation is a method representing image in a concise way by using a linear combination of a dictionary or a small number of elements. Its encoding mechanism can match the process of human visual perception to deal with information systems, where dictionary atoms may be taken as the human visual cortex neurons, and should have the similar neuronal receptive field structures on direction, locality and bands continuity to capture natural images of the local geometry information [14]. Rui et al. propose a multi-focus image fusion approach based on coupled sparse representation [15].

Suppose there is a linear system $x = D α (D \in ℝ^{N \times K})$ , which is a complete dictionary acquired through signal (n << k), each column of the dictionary can be seen as an atomic, $a \in ℝ^{K}$ is a sparse vector coding samples obtained after the dictionary D. If there are s (s << L) non-zero coefficients in α, then we call α as s-sparse. If $span (D) = ℝ^{n}$ and $ℝ^{n}$ is an n-dimensional space, then the dictionary is complete. If L > n, and the number of atoms in the dictionary is greater than the dimension of the signal, then the dictionary is redundant. If the dictionary is redundant and $span (D) = ℝ^{n}$ , then D is an over-complete dictionary, and the corresponding model can be called over-complete sparse representation.

4.2 Sparsity calculation

Sparsity calculation method used in this paper is based on image fusion method for overcomplete sparse representation of pixel domain, which segments the source image into image blocks, and transforms the pixels into a column vector corresponding to the value of the block to obtain the coefficients corresponding to pixel block sparse.

First, divide the source image into blocks with size n × n, then transform each block into column vector of length n² by sparse transform. It is shown in Fig. 2.

Fig.2

Images sparse representation.

The equation of image sparse representation as follows. $ν = \sum_{t = 1}^{T} s (t) d_{t}$ (1)

Where d_t is the complete dictionary D = [d₁, d₂, …, d_r], and the coefficients of the image can be expressed as s = [s (1) , s (2) , …, s (T)] ^T. If the image is divided into J regions, the image can be expressed as $V = [d_{1}, d_{2}, \dots, d_{r}] [\begin{matrix} s_{1} (1) & \dots & s_{J} (1) \\ ⋮ & ⋱ & ⋮ \\ s_{1} (T) & \dots & s_{J} (T) \end{matrix}]$ (2)

Sparse coefficient matrix of the image can be defined as: $S = [\begin{matrix} s_{1} (1) & \dots & s_{J} (1) \\ ⋮ & ⋱ & ⋮ \\ s_{1} (T) & \dots & s_{J} (T) \end{matrix}]$ (3)

Respectively, transform the input image A and B into sparse matrix S_A and S_B, so the sparsity of each pixel can be labeled as A_S $= \sum_{j = 1}^{n} S_{j} / n$ . S_j is the value of each sub-matrix corresponding to pixel in a sparse matrix, and n is the number of sub-matrixs which contain the pixel, the sparsity of source image A and B as follows.

$A_{s}^{'} = A_{s} / (A_{s} + B_{s})$ (4)

$B_{s}^{'} = B_{s} / (A_{s} + B_{s})$ (5)

4.3 Clearity calculation based on sparse transform

Most of traditional multi-focus image clearity calculation is designed without noise. Due to the complexity of the underwater environment, the focus region of underwater image is also vague to some extent. It is difficult to get the focus region by traditional methods while the direct use of sparse transform will result in the loss of underwater information. Hence, we combine sparse information with gray scale information to locate the focus region.

Normalize the source image A and B to obtain the matrix A′ and B′, and combine A′ and B′ with $A_{s}^{'}$ and $B_{s}^{'}$ respectively to obtain the clearity image C, the process is as follows: $A A = \exp [(1 - α) \log A' + α \log A_{s}^{'}]$ (6) $B B = \exp [(1 - α) \log B' + α \log B_{s}^{'}]$ (7) $C = (A A + B B) / 2$ (8)

Now, let’s segment C into different regions based on the image clearity. If we fuse the image based on the maximum sparse coefficient, the image will generate the blocking artifact and the spatial information of image will be ignored, so the fused image has ringing effects and leads to distortion. If we fuse the image based on the sparse coefficients, the final image contrast will decline and lose detail information. To solve these problems, we propose a fusion rule based on region sparsity.

Firstly, we use fusion method based on Pulse Coupled Neural Network [16] to segment the clearity image, and then calculate the average clearity of each region to get the focus region.

The iterative formula for Pulse Coupled Neural Network as follows. $ω_{ij}^{(q)} (t + 1) = ω_{ij}^{(q)} (t) + η \frac{\partial E}{\partial ω_{ij}^{(q)}}$ (9)

Where η is the efficiency, and E is the system error.

Calculate the clearity of each region in image C. The two block regions with the maximum average clearity are corresponding to the source image A and B in the focus region, respectively. $C_{r_{i}} = \sum_{x, y \in r_{i}}^{N} C (x, y) / A (r_{i})$ (10)

C_{r
_i} is the region clearity of r_i, A (r_i) is the region of r_i, C (x, y) is the clearity of pixel (x, y) in r_i.

5 Image fusion

Calculating the average clearity of each region, then the two regions with maximum clarity are the focus regions of source image A and B. Multi-focus image can be divided into clear region and non-clear region. Taking into account the characteristics of the two regions, we apply different fusion algorithm to those regions. Since the clarity of the image reflects energy of images. Greater local energy indicates a clear region, and smaller indicates more vague region. For the clear region in source image, we extract sparse coefficients directly and put it in the corresponding region of the fused image.

Common fusion rules have big rule, coefficient average and the weighted average rule. There are two ways to deal with the maximum rules. One is to take one large value simply, and the other is to take the absolute value of the maximum. In some fusion process, greater integration rule results in blocking artifacts in the final, the choice of low frequency coefficient mostly adopts the average value fusion rules, which will damage the image contrast and reduce the quality of the fused image.

For non-focus region, considering the loss of information caused by the sparse transform and the impact of underwater environment, we use the luminance and contrast ratio of source image as the weighted coefficient of sparsity. So the unfocused region sparse coefficient can be expressed as follows. $S = A * ω_{A} + B * ω_{B}$ (11)

Where ω_A and ω_B are the weighted factors, and $ω_{A} = φ_{1} (\frac{C_{A}}{C_{A} + C_{B}}) {(\frac{L_{A}}{L_{A} + L_{B}})}^{φ 1}$ (12) $ω_{B} = φ_{2} (\frac{C_{B}}{C_{A} + C_{B}}) {(\frac{L_{B}}{L_{A} + L_{B}})}^{φ 2}$ (13) and ω_A + ω_B = 1, C and L are the contrast ratio and luminance of source image.

Finally, we take inverse transform of the non-focus region and combine it with the focus region to get the final fused image.

For the non-focus region, this paper uses exhaustive search method to calculate the fusion weight. The clearity of unfocused region should be maximal when fusion has an optimum, so ω_A and ω_B are optimal weights. The flow chart can be expressed as in Fig. 3.

Fig.3

Weight calculation flowchart.

6 Summary of algorithm

Now, we will give our algorithm in detail as follows.

Step 1. Train underwater image and get the dictionary D .

Step 2. Decompose source image A and B with sparse transform and get sparsity $A_{s}^{'}$ and $B_{s}^{'}$ .

Step 3. Using the formulas [9 –11] for sparsity and normalized grayscale to obtain clearity image.

Step 4. Segment the clearity image and calculate the average clearity of each region to the focus regions, respectively.

Step 5. Fuse the non-focus regions. Calculate the overall image clarity, and use the exhaustive search method to get the best fusion weights.

Step 6. Synthesis the focus and non-focus region.

In our algorithm, we consider the multi-factors, such as characteristics of underwater images, image intensity and contrast information, and we use the region clearity and sparse matrix, so our algorithm will make the image clearer. Our experiments in Section 7 also show that the image gotten by our algorithm is the best.

We choose the related images from network which are made from underwater scene such as sea to verify our algorithm in Section 7.

7 Experiments

7.1 Sparse transform on underwater image

We select 30 multi-focus images like shown in Fig. 4 as the dictionary training image set. Due to the characteristics of multi-focus image, the focus region looks clearer and other parts of the image seems to be rather vague.

Fig.4

Parts of training set of images.

We use Singular Value Decomposition (KSVD) to get the dictionary D of the 30 images and then make sparse transform to these pictures with the dictionary D. Figure 5 shows the images after sparse transform. We can find more details and relatively rich texture in the focus region than other regions in the source images Fig. 5(a) and (b). In sparse images Fig. 5(c) and (d), we can see more information in focus region than unfocused region. Figure 5(e) and (f) are the clearity images, which add contrast ratio and luminance information of source image to segment and calculate the focus region.

Fig.5

Sparse image and clearity image.

7.2 Focus region selection based on clearity

We will choose the focus regions based on sparse transform, and segment the regions and calculate the clearity of each region based on the fused image of the sparse image and the normalized source image.

Now we select one source image, and obtain the region partition as shown in Fig. 6. Then calculate the clearity of each region, as seen in Table 1. From Table 1, we can get the focus regions should be region 1 and region 2.

Fig.6

Segment of clearity image.

Table 1

Clearity of each region

Region	Average clearity of image A	Average clearity of image B
Region 1	0.154	0.846
Region 2	0.612	0.388
Region 3	0.512	0.488
Region 4	0.462	0.538
Region 5	0.464	0.526
Region 6	0.532	0.468
Region 7	0.488	0.512

7.3 Fusion process

7.3.1 Evaluation of fusion image

This paper selects image information entropy, correlation entropy, standard deviation and the average gradient as the evaluation criteria to evaluate the quality of the fused image.

Information entropy of image is a form of statistical characteristic, and reflect the information of the image. Assuming that the grayscale of image is P ={ P₁, P₂, …, P_i, …, P_n }, P_i represents the probability that the grayscale is i.

Let L be the total gray levels, the grayscale H as follows. $H = - \sum_{i = 0}^{L - 1} P_{i} {log}_{2} (P_{i})$ (14)

The larger the information entropy, the more information the fused image contains, so the better the result is. And vice versa.

Correlation entropy (mutual information) is an important basic concept of information theory which can be used as a measure of correlation between two variables, or a measure of the amount of information that is contained in another variable of a variable. It can be gotten by the following formula. $MI = \sum_{i = 0}^{L - 1} \sum_{j = 0}^{L - 1} P_{RF} (i, j) {log}_{2} \frac{P_{RF} (i, j)}{P_{F} (i) P_{R} (j)}$ (15)

Where P_R (i, j) and P_F (i, j) are the joint grayscale distributions of the source image and fusion image, respectively. The larger the mutual information, the higher the correlation between fusion image and source image is, so the distortion is lower.

Standard deviation shows the contrast between target and background, which is defined as: $D = \sqrt{\sum_{i = 1}^{M} \sum_{j = 1}^{N} {(F (i, j) - μ)}^{2} / M * N}$ (16)

Where F (i, j) is the gray value at row i and column j. The size of this image is M × N, and μ is the average gray value of the image. The larger the standard deviation and the more the discrete gray value of the image, the better the effect is.

The average gradient reflects image contrast and image texture features in detail. It is defined as follows.

$\begin{matrix} G = \frac{1}{(M - 1) (N - 1)} \sum_{i = 1}^{M - 1} \sum_{j = 1}^{N - 1} \\ \sqrt{\frac{{(\frac{\partial F (i, j)}{\partial i})}^{2} + {(\frac{\partial F (i, j)}{\partial j})}^{2}}{2}} \end{matrix}$ (17)

The greater the average gradient of the image, the clearer the image is.

7.3.2 Fusion

The focus region is fused by the maximum value and the non-focus region is fused by Equation (14). Figure 7 shows the source and the final image. Fig. 7(a), (b), and (c) are three different source images, and Fig. 7(d), (e), and (f) are the corresponding multi-focus fusion images of Fig. 7(a) based on the methods we proposed, Laplacian and wavelet transform from up to down. Figure 7(g), (h), and (i) are the corresponding multi-focus fusion images of Fig. 7(b) based on the three methods, and Fig. 7(j), (k), and (l) are the corresponding multi-focus fusion images of Fig. 7(c). As shown in Fig. 7, fused image based on Laplacian can locate the focus region accurately, but the edge is not good. Fused image based on wavelet gets more vague at the edge of focus region. Fused image based on the algorithm of this paper compared to the above two algorithms has clearer focus region with clear region edge. The corresponding information entropy, correlation entropy, standard deviation and average gradient are shown in Table 2.

Fig.7

Comparison of image fusion gotten by the different methods.

Table 2

Comparison of the different fusion methods

Image	Method\Criteria	Information Entropy	Related Entropy	Standard Deviation	Average Gradient
1	Method we proposed	12.23	4.28	30.12	5.16
	Laplacian	11.12	4.15	25.84	5.01
	wavelet	12.22	4.29	29.14	5.21
2	we proposed	10.46	4.15	35.55	5.15
	Laplacian	8.21	4.22	35.21	5.11
	wavelet	10.43	4.11	35.41	5.16
3	we proposed	8.65	5.38	46.65	4.85
	Laplacian	6.41	3.26	32.14	2.12
	wavelet	6.82	5.12	35.03	3.81

From the data of image 1 in Table 2, it can be shown that the method we proposed is slightly better than the other two methods in good underwater environment. But as for dim underwater light and blurred image, our method is superior to the other two methods from the data of image 2 and image 3. Hence, the method we proposed has good performance on multi-focus underwater image fusion.

8 Conclusion

As for underwater multi-focus image fusion, this paper presents a fusion method based on the sparse matrix. Firstly, we transform the source image into sparse image by sparse transform and get the clearity of the image based on the sparsity. Then, the clearity image will be segmented to get the focus region. And the focus region and non-focus region are fused respectively by different fusion algorithms. Finally, the focus region and non-focus region are combined to get the enhanced fused image. The experiments show that our algorithm has higher information entropy, correlation entropy, standard deviation, and average gradient. So it is good to be applied in underwater multi-focus image fusion.

We will focuse on how to reduce the time cost and improve the performance based on some new methods such as coupled dictionary learning algorithm in the future work.

Footnotes

Acknowledgments

This work is partially supported by the Fundamental Research Funds for the Central Universities, China NSF grant(61671201), a project funded by PAPD, and National Key Technology Research and Development Programs (2015BAB07B03, 2015BAB07B01).

References

Zodiatis

, Lardner

, Solovyov

, and Panayidou

, Predictions for oil slicks detected from satellite images using MyOcean forecasting data, Ocean Science Discussions9 (3) (2012), 1105–1115.

Blum

R.S.

, Multi-sensor image fusion and its applications, CRC press, 2005.

Zhou

, Yan

, Huang

, and Zhang

, Research on evaluation method of light field imaging resolution digital focus, Acta Photonica Sinica39 (6) (2010), 1094–1098.

Fan

, Research and application of multi-focus image fusion algorithm based on multi-resolution analysis, North University of CHINA, 2014.

Wan

, Zhu

, and Qin

, Multifocus image fusion based on robust principal component analysis, Pattern Recognit Lett34 (9) (2013), 1001–1008.

Liu

, Liu

, and Wang

, Multi-focus image fusion with dense SIFT, Inf Fusion23 (2015), 139–155.

Pertuz

, Puig

, Garcia

M.A.

, and Fusiello

, Generation ofall-in-focus images by noise-robust selective fusion of limited depth-of-field images, IEEE Trans Image Process22 (3) (2013), 1242–1251.

Tian

and Chen

, Multi-focus image fusion using wavelet-domain statistics, in Proc. IEEE Int. Conf. Image Process, Hong Kong, 2010, pp. 1205–1208.

Tian

, Chen

, Ma

, and Yu

, Multi-focus image fusion using abilateral gradient-based sharpness criterion, Opt Commun284 (1) (2011), 80–87.

10.

Luo

and Wu

, An evaluation method of image fusion based on region similarity, Acta Electronica Sinica38 (5) (2010), 1152–1155.

11.

Zaveri

, Zaveri

, Shah

, and Patel

, A Novel Region Based Multifocus Image Fusion Method, Digital Image Processing, International Conference on digital image process, Bangkok, Thailand, 2009, March 7–9.

12.

Wang

, Qi

, Han

, and Liu

, Multifocus image fusion based on nonsubsampled contourlet transform, 7th International Forum on Strategic Technology, Tomsk, Russia, 2012, Sep. 12–19.

13.

Yang

, Application of Multi-Wavelet Transform in Multi-Focus ImageFusion, The international workshop on education technology and computer science, IEEE Press Piscataway, NJ, USA, 2009.

14.

Wang

, Image sparse representation and its application incompressed sensing based on learning dictionaries, Yanshan University, 2013.

15.

Gao

and Vorobyov

S.A.

, Multi-Focus Image Fusion Via Coupled Sparse Representation and Dictionary Learning, arxiv, 2017.

16.

Chen

, Park

S.K.

, Ma

, and Ala

, A new automatic parameter setting method of a simplified PCNN for image segmentation, IEEE Trans Neural Networks22 (6) (2011), 880–892.