Abstract
The night video fusion algorithms integrate the visuals captured by a security surveillance camera, which in turn improve the visual perception. The recent development in night fusion research focused on fusing both illuminated and non-illuminated areas simultaneously however, the natural color of the light area may be lost. Moreover, the contrast of the illuminated regions decreases because of the dark pixels surrounding those regions. Hence, the color and contrast should be improved to obtain the actual color of the illuminated regions. We propose a fuzzy inference system based wavelet fusion to enhance the light regions of a nonuniform illuminated night video surveillance system. To include spatial and temporal variations of the illuminated regions, a spatio-temporal illumination approach is used. A contribution index of the illuminated regions is generated using a fuzzy membership function. Subsequently, the stationary wavelets are used to decompose high-frequency and low-frequency coefficients of both night and day background frames for frame fusion. The contribution index selects the illuminated regions presented in these wavelet coefficients for fusion. Finally, the inverse wavelet transform is applied to reconstruct the illumination enhanced frame. The proposed approach effectively highlights the illuminated regions and provides a better visual perception.
Introduction
The nighttime surveillance video is analyzed as visual forensic evidence in many critical security monitoring areas such as banks, parking areas, and shopping galleria. The high-quality visual perception is adequate in these situations. Even a small variation in visual information can deviate the direction of the crime investigations.
Night vision systems such as infrared and thermal sensors are used to monitor the nighttime scene in various static and dynamic applications from smart home surveillance to military operations. However, the object movement, object color identification, and texture recognition from these night vision videos are less fruitful because of their in-camera effects. The low-cost digital camera could be effective while the nighttime enhancement is performed to improve the visual quality of the captured nighttime scene.
The fusion approaches are used to integrate visual information from more than one frame, which may capture under various sensors or from various lighting conditions. The visual content of the infrared image and the visible image are fused in [8] where the visual appearance of the fused image was improved. The authors applied a fuzzy inference system for fusing the wavelet coefficients of infrared and visible images. However, the images in multi-sensor approaches should be captured simultaneously from its corresponding sensors. Moreover, the color and texture of the multi-sensor fused frame should be improved to obtain better visual perception. These are very critical in the vision based investigation. Hence, the visible image and frame fusion are adequate for nighttime images and videos.
The nighttime video frame and the day background frame of a static surveillance video are fused in [4], where the illuminated and moving object regions are separated for fusion. The dark pixels present in the non-illuminated surroundings will deteriorate the contrast of the illuminated regions in the night frame. Selection of the illuminated region depends on the quality of the fusion process. Hence, a spatio-temporal approach for finding an illumination contribution is effective in the fusion algorithm. We propose a fuzzy membership function based fusion approach to fuse the highly illuminated pixels as well as nearly illuminated pixels.
The proposed approach highlights the illuminated region of nighttime surveillance video and preserves the color of the nighttime scene. Based on the night lighted area, a contribution index is generated from a fuzzy inference system. Eventually, the night and day frames are decomposed into wavelet coefficients by stationary wavelets. The high-frequency coefficients and low-frequency coefficients are fused based on the contribution index of the illuminated and non-illuminated regions. The fused video enlightens the illuminated region of the night surveillance video.
The rest of the paper is organized as follows. Section 2 describes the related works and Section 3 details the proposed approach. Section 4 discusses the experimental results and discussion, and Section 5 concludes the proposed work.
Related works
Video fusion algorithms blend the scene pixels present in each input frame. Image fusion using fuzzy logic has been proposed by Singh et al. [10] where the input images are integrated using fuzzy and neuro fuzzy approaches. Zhu and Yang [15] proposed an image fusion algorithm based on fuzzy logic and wavelet filters. They discussed the pixel level image fusion algorithm in which visible and infrared image fusion is applied. The significance of wavelet coefficient is obtained by fuzzy reasoning. However, this approach is specifically designed for the visible and infrared images of the same scene.
The pixel and region based approach of fusing the visible image and infrared images is proposed by Saeedi and Feaz [8]. This approach applied a fuzzy rule based system for fusing high-frequency coefficients and population based optimization for low frequency coefficients. However, inputs are taken from two different sensors, which should capture the same scene at the same time. The approach utilized the quality of the infrared image and the visible image, while it is less suitable for real-time applications. A fully automated context enhancement is discussed in the method of Ulhaq et al. [13], where the multiple video sequences captured by infrared and low light visible sensors are used for night vision enhancement. The self-organized approach [12] automatically enhanced the real time night surveillance video, while the color based approach [11] depends on the quality of the nighttime fusion technique. Ding et al. [3] proposed a sparse code fusion approach for reducing the effect of over-enhanced moving objects and night shadow. A mutual coherence algorithm was used in this approach to develop nighttime dictionary and daytime dictionary.
Li et al. [4] proposed a fusion based approach where the shift invariant discrete wavelet transform is used to fuse the nighttime video. The algorithm initially segments the light regions from the night video frame. The segmented regions of illumination and motion of night video frame are used for the final fusion decision. However, the light region selected in the method of Li et al. [4] is based on the assumption that there exists only one light source. In night video fusion algorithm, the threshold computation for the illuminated region depends on the variations of the light sources and illuminated regions. A small variation in the illumination segmentation will affect the fusion performance of these approaches. It is necessary for separating the illumination of the nighttime frame automatically and accurately. Hence, we propose a fuzzy fusion approach to enlighten the illuminated region.
In our approach, we considered the illuminated region of current frame and a set of consecutive n frames to prepare the temporal illuminated region. The frame average technique is used for computing temporal illumination, while the current frame is selected for spatial illumination. A fuzzy inference system is applied to select the illuminated region of the night video frame. The histogram count of the respective temporal and spatial frame pixels determines the threshold of fuzzy variables in the fuzzy inference system. This precise logic of imprecision and approximate reasoning effectively considers the neighborhood variations of the illuminated region. Subsequently, the contribution index derived from the fuzzy system decides to fuse the appropriate wavelet coefficients. Fused night surveillance videos preserved the color of the night illuminated region and increased the visual perception of the nighttime video.
Proposed approach
The proposed approach enlightens the illuminated regions of a night surveillance video, where a fuzzy approach is used to generate a contribution index corresponding to the spatio-temporal illuminated area. Then, the illuminated and non-illuminated areas in the wavelet coefficients are fused based on this contribution index. Our approach exclusively aimed to enlighten the non-uniform illuminated videos captured by a static surveillance camera. This work is based on an assumption that all the important information contents of the night surveillance frame are located in the illuminated region. Figure 1 shows the flow diagram of the proposed approach. In Fig. 1, the contribution index ξ is the output of the fuzzy inference system, where the inputs are spatial and temporal frame component Sl and Tl. These frame components are obtained from the current frame Nf and temporal frame Tf. The illumination enlightened frame Ef is obtained by fusing the day background frame Db and the night frame Nf.
The proposed approach technically comprises of two fragments: fuzzy based contribution index computation and wavelet based fusion. The following subsections describe these fragments in detail.
Fuzzy based contribution index
The contribution index of the illuminated area is computed from spatial and temporal frames of the nighttime video, because the illumination fluctuations of the preceding frames are effective to improve the illumination identification. Hence, we selected the spatial frame and temporal frame of the nighttime video.
The temporal frame is obtained by conducting a frame averaging of n frames from the nighttime video. Let n be the number of frames, and Tf is the average of n frames. Then, the current frame Sf is selected by the illumination of spatial pixels. Subsequently, these frames are converted into HSV color space for separating the value component, V. Let Tl and Sl be the V components of temporal and spatial frames.
In this approach, we used the Mamdani fuzzy model [5] because of the intuitive nature of generating the contribution index from Tl and Sl. Inputs of the fuzzy system, Tl and Sl have two fuzzy linguistic variables, LowLight and HighLight. The fuzzy membership functions are described as μ LowLight (Tl),μ HighLight (Tl), μ LowLight (Sl), and μ HighLight (Sl). The values of fuzzy variables vary between two thresholds a and b. Hence, a trapezoidal membership function is used in this fuzzy inference system. Figures 2 and 3 show the input membership functions, and Fig. 4 shows the output membership function.
Threshold values a and b are obtained from analyzing the histogram of Tl and Sl. The majority of the nighttime frame pixels are accumulated in the night region, and the rest in the light region. Hence, the low illuminated pixels have less Tl and Sl values. From this observation, we computed the values of a and b using the following equation:
Notations Used: Nf A , Nf D , Df A , Df D , Ef A ,
Ef D , a, b, ξ, Tl, Sl, x, T, t, h (x)
:Day background frame Db.
1: TF ← TempAverage (nframes)
2: SL ← Vcomponent (Nf)
3: TL ← Vcomponent (Tf)
4:
4: T ← x
5:
6: a ← min (T)
7: b ← max (T)
8:
9:
10: ξ ← evaluateFuzzy (Sf, Tf)
11: Nf A , Nf D ← swt (Nf)
12: Df A , Df D ← swt (Df)
13:
14: Ef A ← GetMax (Nf A , Df A )
15:
15: Ef A ← Df A
16:
17: Ef D ← GetMax (Nf D , Df D )
18:
19:
20: Ef ← iswt (Ef A , Ef D )
21:
The output of the fuzzy system is contribution index, ξ which decides the exact contribution of the illuminated pixels required for fusion. The dependence of ξ on Tl and Sl is expressed by the following rules. If Sl is LowLight and Tl is LowLight then ξ is Nonill If Sl is HighLight and Tl is HighLight then ξ is Ill If Sl is LowLight and Tl is HighLight then ξ is Nonill If Sl is HighLight and Tl is LowLight then ξ is Nonill
The membership functions for LowLight and HighLight of Tl are described as follows:
Similarly, the fuzzy membership functions of μ (Sl) are computed. Subsequently, the centroid of the aggregated output membership area is computed for defuzzification. Then the contribution index is used to fuse appropriate wavelet coefficients.
The stationary wavelet transforms (SWT) [1, 2] are used for frame fusion, where the input frames are the current night frame and the day background frame. The SWT is effective in the case of real-time frame fusion because of its translation invariance property. Compared to the discrete wavelet transform, the SWT provides translation invariance by avoiding the down sampling procedure.
The input frames are decomposed into their corresponding SWT representations. Specifically, the day background frames and night video frames are split into two: approximation coefficients and detail coefficients. The nighttime frame coefficients are Nf A and Nf D , and the day background coefficients are Df A and Df D . These wavelet coefficients are obtained by convolution of the low pass and high pass decomposition filters and input frames.
A composite SWT representation is obtained by incorporating the pixel level selection of appropriate wavelet coefficients based on the fuzzy contribution score, ξ. The illuminated pixels of the nighttime frame coefficient are selected when the ξ value is less than t, where t is the threshold whose value ranges between 0 and 0.4. The non-illuminated areas are enhanced by day background coefficients based on the value of ξ. The fused frame coefficients of the enlightened night video frame are Ef A and Ef D . The reconstruction of the enlightening frame, Ef is obtained by the inverse SWT, where a convolution operation of SWT coefficients and reconstruction filters is performed.
Algorithm 1 describes the detailed steps of the proposed approach. The enlightened frame Ef is returned from the system where the input frames are Nf and Db.
Experimental results and discussion
The proposed algorithm is implemented using MATLAB 7. The algorithm was tested on three video data sets (night.mpeg, i695.mpeg, and i495.mpeg), which include the day background frame and its night video. The resolution of night.mpeg is 320 × 240, and that of i695.mpeg and i495.mpeg is 480 × 350. Figure 5 shows the enlightened night frame results for the videos night.mpeg and i695.mpeg. The frame results show that the visual quality of the illuminated regions is improved.
The proposed approach was compared with the average frame based fusion approach and wavelet fusion based approach [4]. The visual comparison of the proposed approach with these methods is shown in Fig. 6. The night.mpeg test video is shown in the aforementioned visual comparison. Compared to the existing fusion results, our approach highlights the nighttime illuminated area.
Generally, pixel-based frame fusion techniques may lose some useful information and introduce some artifacts. Hence, we used two metrics which are relevant to measure the performance of the fusion result. They are Petrovi index and entropy [10, 14]. The Petrovi index measures the amount of edge pixels which are transferred from input frames to final fused frames. High Petrovi index indicates better fused result. Table 1 shows the Petrovi index measure obtained from three video sets, where the proposed approach shows better index compared to the average fusion approach and wavelet based fusion approach. The entropy is a measure of the amount of information present in the fused frame. High entropy value indicates better quality of the fused result. Table 2 shows the entropy results where we discussed the sample results obtained from three video sets. In most of the cases, the entropy result of the proposed approach is better, compared to the existing fusion based approaches.
The standard deviation (SD) is a measure used for computing the contrast of the fused frame. Table 3 shows the SD measure comparison of the proposed method with the existing fusion methods. The high SD value of the proposed approach indicates better contrast. These objective quality measures indicate that the proposed approach visually increased its contrast to highlight the illuminated area.
Conclusion
The proposed approach discussed a fusion algorithm where the inputs are night video frame and its corresponding day background frame. The illuminated regions of the night video frame are preserved in the fused frame to increase better visual perception in the nighttime frame. We derived a contribution index that effectively considered the pixels from the illuminated region during the wavelet fusion. Moreover, the contrast of the illuminated region is increased by fusing the pixels in the surrounding scene. The experimental results demonstrate that the proposed fusion based approach increases the information content and preserved the edge details of the night frame.
In the future, we would incorporate the motion pixels in addition to the illuminated pixels for improving the visual content of the nighttime surveillance video.
Footnotes
Acknowledgments
We would like to thank Centre for Engineering Research and Development Kerala and College of Engineering Trivandrum for providing facilities and Tao Yang and Yunbo Rao for sharing databases.
