A fuzzy fusion approach to enlighten the illuminated regions of night surveillance videos

Abstract

The night video fusion algorithms integrate the visuals captured by a security surveillance camera, which in turn improve the visual perception. The recent development in night fusion research focused on fusing both illuminated and non-illuminated areas simultaneously however, the natural color of the light area may be lost. Moreover, the contrast of the illuminated regions decreases because of the dark pixels surrounding those regions. Hence, the color and contrast should be improved to obtain the actual color of the illuminated regions. We propose a fuzzy inference system based wavelet fusion to enhance the light regions of a nonuniform illuminated night video surveillance system. To include spatial and temporal variations of the illuminated regions, a spatio-temporal illumination approach is used. A contribution index of the illuminated regions is generated using a fuzzy membership function. Subsequently, the stationary wavelets are used to decompose high-frequency and low-frequency coefficients of both night and day background frames for frame fusion. The contribution index selects the illuminated regions presented in these wavelet coefficients for fusion. Finally, the inverse wavelet transform is applied to reconstruct the illumination enhanced frame. The proposed approach effectively highlights the illuminated regions and provides a better visual perception.

Keywords

Night video surveillance stationary wavelet transform frame fusion fuzzy inference system

1 Introduction

The nighttime surveillance video is analyzed as visual forensic evidence in many critical security monitoring areas such as banks, parking areas, and shopping galleria. The high-quality visual perception is adequate in these situations. Even a small variation in visual information can deviate the direction of the crime investigations.

Night vision systems such as infrared and thermal sensors are used to monitor the nighttime scene in various static and dynamic applications from smart home surveillance to military operations. However, the object movement, object color identification, and texture recognition from these night vision videos are less fruitful because of their in-camera effects. The low-cost digital camera could be effective while the nighttime enhancement is performed to improve the visual quality of the captured nighttime scene.

The fusion approaches are used to integrate visual information from more than one frame, which may capture under various sensors or from various lighting conditions. The visual content of the infrared image and the visible image are fused in [8] where the visual appearance of the fused image was improved. The authors applied a fuzzy inference system for fusing the wavelet coefficients of infrared and visible images. However, the images in multi-sensor approaches should be captured simultaneously from its corresponding sensors. Moreover, the color and texture of the multi-sensor fused frame should be improved to obtain better visual perception. These are very critical in the vision based investigation. Hence, the visible image and frame fusion are adequate for nighttime images and videos.

The nighttime video frame and the day background frame of a static surveillance video are fused in [4], where the illuminated and moving object regions are separated for fusion. The dark pixels present in the non-illuminated surroundings will deteriorate the contrast of the illuminated regions in the night frame. Selection of the illuminated region depends on the quality of the fusion process. Hence, a spatio-temporal approach for finding an illumination contribution is effective in the fusion algorithm. We propose a fuzzy membership function based fusion approach to fuse the highly illuminated pixels as well as nearly illuminated pixels.

The proposed approach highlights the illuminated region of nighttime surveillance video and preserves the color of the nighttime scene. Based on the night lighted area, a contribution index is generated from a fuzzy inference system. Eventually, the night and day frames are decomposed into wavelet coefficients by stationary wavelets. The high-frequency coefficients and low-frequency coefficients are fused based on the contribution index of the illuminated and non-illuminated regions. The fused video enlightens the illuminated region of the night surveillance video.

The rest of the paper is organized as follows. Section 2 describes the related works and Section 3 details the proposed approach. Section 4 discusses the experimental results and discussion, and Section 5 concludes the proposed work.

2 Related works

Video fusion algorithms blend the scene pixels present in each input frame. Image fusion using fuzzy logic has been proposed by Singh et al. [10] where the input images are integrated using fuzzy and neuro fuzzy approaches. Zhu and Yang [15] proposed an image fusion algorithm based on fuzzy logic and wavelet filters. They discussed the pixel level image fusion algorithm in which visible and infrared image fusion is applied. The significance of wavelet coefficient is obtained by fuzzy reasoning. However, this approach is specifically designed for the visible and infrared images of the same scene.

The pixel and region based approach of fusing the visible image and infrared images is proposed by Saeedi and Feaz [8]. This approach applied a fuzzy rule based system for fusing high-frequency coefficients and population based optimization for low frequency coefficients. However, inputs are taken from two different sensors, which should capture the same scene at the same time. The approach utilized the quality of the infrared image and the visible image, while it is less suitable for real-time applications. A fully automated context enhancement is discussed in the method of Ulhaq et al. [13], where the multiple video sequences captured by infrared and low light visible sensors are used for night vision enhancement. The self-organized approach [12] automatically enhanced the real time night surveillance video, while the color based approach [11] depends on the quality of the nighttime fusion technique. Ding et al. [3] proposed a sparse code fusion approach for reducing the effect of over-enhanced moving objects and night shadow. A mutual coherence algorithm was used in this approach to develop nighttime dictionary and daytime dictionary.

Li et al. [4] proposed a fusion based approach where the shift invariant discrete wavelet transform is used to fuse the nighttime video. The algorithm initially segments the light regions from the night video frame. The segmented regions of illumination and motion of night video frame are used for the final fusion decision. However, the light region selected in the method of Li et al. [4] is based on the assumption that there exists only one light source. In night video fusion algorithm, the threshold computation for the illuminated region depends on the variations of the light sources and illuminated regions. A small variation in the illumination segmentation will affect the fusion performance of these approaches. It is necessary for separating the illumination of the nighttime frame automatically and accurately. Hence, we propose a fuzzy fusion approach to enlighten the illuminated region.

In our approach, we considered the illuminated region of current frame and a set of consecutive n frames to prepare the temporal illuminated region. The frame average technique is used for computing temporal illumination, while the current frame is selected for spatial illumination. A fuzzy inference system is applied to select the illuminated region of the night video frame. The histogram count of the respective temporal and spatial frame pixels determines the threshold of fuzzy variables in the fuzzy inference system. This precise logic of imprecision and approximate reasoning effectively considers the neighborhood variations of the illuminated region. Subsequently, the contribution index derived from the fuzzy system decides to fuse the appropriate wavelet coefficients. Fused night surveillance videos preserved the color of the night illuminated region and increased the visual perception of the nighttime video.

3 Proposed approach

The proposed approach enlightens the illuminated regions of a night surveillance video, where a fuzzy approach is used to generate a contribution index corresponding to the spatio-temporal illuminated area. Then, the illuminated and non-illuminated areas in the wavelet coefficients are fused based on this contribution index. Our approach exclusively aimed to enlighten the non-uniform illuminated videos captured by a static surveillance camera. This work is based on an assumption that all the important information contents of the night surveillance frame are located in the illuminated region. Figure 1 shows the flow diagram of the proposed approach. In Fig. 1, the contribution index ξ is the output of the fuzzy inference system, where the inputs are spatial and temporal frame component Sl and Tl. These frame components are obtained from the current frame Nf and temporal frame Tf. The illumination enlightened frame Ef is obtained by fusing the day background frame Db and the night frame Nf.

The proposed approach technically comprises of two fragments: fuzzy based contribution index computation and wavelet based fusion. The following subsections describe these fragments in detail.

3.1 Fuzzy based contribution index

The contribution index of the illuminated area is computed from spatial and temporal frames of the nighttime video, because the illumination fluctuations of the preceding frames are effective to improve the illumination identification. Hence, we selected the spatial frame and temporal frame of the nighttime video.

The temporal frame is obtained by conducting a frame averaging of n frames from the nighttime video. Let n be the number of frames, and Tf is the average of n frames. Then, the current frame Sf is selected by the illumination of spatial pixels. Subsequently, these frames are converted into HSV color space for separating the value component, V. Let Tl and Sl be the V components of temporal and spatial frames.

In this approach, we used the Mamdani fuzzy model [5] because of the intuitive nature of generating the contribution index from Tl and Sl. Inputs of the fuzzy system, Tl and Sl have two fuzzy linguistic variables, LowLight and HighLight. The fuzzy membership functions are described as μ_LowLight (Tl),μ_HighLight (Tl), μ_LowLight (Sl), and μ_HighLight (Sl). The values of fuzzy variables vary between two thresholds a and b. Hence, a trapezoidal membership function is used in this fuzzy inference system. Figures 2 and 3 show the input membership functions, and Fig. 4 shows the output membership function.

Threshold values a and b are obtained from analyzing the histogram of Tl and Sl. The majority of the nighttime frame pixels are accumulated in the night region, and the rest in the light region. Hence, the low illuminated pixels have less Tl and Sl values. From this observation, we computed the values of a and b using the following equation: $T = xiff μ (h (x)) < h (x) < \max (h (x))$ (1) $a = \min (T)$ (2) $b = \max (T)$ (3) where x represents the pixel value of Tl or Sl, h(x) represents pixel intensity count, T is a temporary vector and μ represents mean.

Algorithm 1 Enlighten the illuminated regions of a nighttime video frame

Notations Used: Nf_A, Nf_D, Df_A, Df_D, Ef_A,

Ef_D, a, b, ξ, Tl, Sl, x, T, t, h (x)

Input:Nighttime Video Frame Nf

:Day background frame Db.

Output:Fused night video frame Ef.

1: TF ← TempAverage (nframes)

2: SL ← Vcomponent (Nf)

3: TL ← Vcomponent (Tf)

4: ifμ (h (x)) < h (x) < max (h (x)) then

4: T ← x

5: end if

6: a ← min (T)

7: b ← max (T)

8: for i=1 toMdo

9: for j=1 toNdo

10: ξ ← evaluateFuzzy (Sf, Tf)

11: Nf_A, Nf_D ← swt (Nf)

12: Df_A, Df_D ← swt (Df)

13: ifξ ≤ tthen

14: Ef_A ← GetMax (Nf_A, Df_A)

15: else

15: Ef_A ← Df_A

16: end if

17: Ef_D ← GetMax (Nf_D, Df_D)

18: end for

19: end for

20: Ef ← iswt (Ef_A, Ef_D)

21: returnEf

The output of the fuzzy system is contribution index, ξ which decides the exact contribution of the illuminated pixels required for fusion. The dependence of ξ on Tl and Sl is expressed by the following rules.

If Sl is LowLight and Tl is LowLight then ξ is Nonill

If Sl is HighLight and Tl is HighLight then ξ is Ill

If Sl is LowLight and Tl is HighLight then ξ is Nonill

If Sl is HighLight and Tl is LowLight then ξ is Nonill

The membership functions for LowLight and HighLight of Tl are described as follows: $μ_{LowLight} (Tl) = {\begin{matrix} 1 & Tl < a \\ \frac{a - Tl}{b - a} & a \leq Tl \leq b \\ 0 & otherwise \end{matrix}$ (4) $μ_{HighLight} (Tl) = 1 - μ_{LowLight} (Tl)$ (5)

Similarly, the fuzzy membership functions of μ (Sl) are computed. Subsequently, the centroid of the aggregated output membership area is computed for defuzzification. Then the contribution index is used to fuse appropriate wavelet coefficients.

3.2 Stationary wavelet transform based frame fusion

The stationary wavelet transforms (SWT) [1, 2] are used for frame fusion, where the input frames are the current night frame and the day background frame. The SWT is effective in the case of real-time frame fusion because of its translation invariance property. Compared to the discrete wavelet transform, the SWT provides translation invariance by avoiding the down sampling procedure.

The input frames are decomposed into their corresponding SWT representations. Specifically, the day background frames and night video frames are split into two: approximation coefficients and detail coefficients. The nighttime frame coefficients are Nf_A and Nf_D, and the day background coefficients are Df_A and Df_D. These wavelet coefficients are obtained by convolution of the low pass and high pass decomposition filters and input frames.

A composite SWT representation is obtained by incorporating the pixel level selection of appropriate wavelet coefficients based on the fuzzy contribution score, ξ. The illuminated pixels of the nighttime frame coefficient are selected when the ξ value is less than t, where t is the threshold whose value ranges between 0 and 0.4. The non-illuminated areas are enhanced by day background coefficients based on the value of ξ. The fused frame coefficients of the enlightened night video frame are Ef_A and Ef_D. The reconstruction of the enlightening frame, Ef is obtained by the inverse SWT, where a convolution operation of SWT coefficients and reconstruction filters is performed.

Algorithm 1 describes the detailed steps of the proposed approach. The enlightened frame Ef is returned from the system where the input frames are Nf and Db.

4 Experimental results and discussion

The proposed algorithm is implemented using MATLAB 7. The algorithm was tested on three video data sets (night.mpeg, i695.mpeg, and i495.mpeg), which include the day background frame and its night video. The resolution of night.mpeg is 320 × 240, and that of i695.mpeg and i495.mpeg is 480 × 350. Figure 5 shows the enlightened night frame results for the videos night.mpeg and i695.mpeg. The frame results show that the visual quality of the illuminated regions is improved.

The proposed approach was compared with the average frame based fusion approach and wavelet fusion based approach [4]. The visual comparison of the proposed approach with these methods is shown in Fig. 6. The night.mpeg test video is shown in the aforementioned visual comparison. Compared to the existing fusion results, our approach highlights the nighttime illuminated area.

Generally, pixel-based frame fusion techniques may lose some useful information and introduce some artifacts. Hence, we used two metrics which are relevant to measure the performance of the fusion result. They are Petrovi index and entropy [10, 14]. The Petrovi index measures the amount of edge pixels which are transferred from input frames to final fused frames. High Petrovi index indicates better fused result. Table 1 shows the Petrovi index measure obtained from three video sets, where the proposed approach shows better index compared to the average fusion approach and wavelet based fusion approach. The entropy is a measure of the amount of information present in the fused frame. High entropy value indicates better quality of the fused result. Table 2 shows the entropy results where we discussed the sample results obtained from three video sets. In most of the cases, the entropy result of the proposed approach is better, compared to the existing fusion based approaches.

The standard deviation (SD) is a measure used for computing the contrast of the fused frame. Table 3 shows the SD measure comparison of the proposed method with the existing fusion methods. The high SD value of the proposed approach indicates better contrast. These objective quality measures indicate that the proposed approach visually increased its contrast to highlight the illuminated area.

5 Conclusion

The proposed approach discussed a fusion algorithm where the inputs are night video frame and its corresponding day background frame. The illuminated regions of the night video frame are preserved in the fused frame to increase better visual perception in the nighttime frame. We derived a contribution index that effectively considered the pixels from the illuminated region during the wavelet fusion. Moreover, the contrast of the illuminated region is increased by fusing the pixels in the surrounding scene. The experimental results demonstrate that the proposed fusion based approach increases the information content and preserved the edge details of the night frame.

In the future, we would incorporate the motion pixels in addition to the illuminated pixels for improving the visual content of the nighttime surveillance video.

Footnotes

Acknowledgments

We would like to thank Centre for Engineering Research and Development Kerala and College of Engineering Trivandrum for providing facilities and Tao Yang and Yunbo Rao for sharing databases.

References

Amolins

, Zhang

, Dare

Wavelet based image fusion techniques An introduction, review and comparison, ISPRS Journal of Photogrammetry and Remote Sensing, 62 (2007), 249–263.

Chanussot

, Mauris

, Lambert

Fuzzy fusion techniques for linear features detection in multitemporal SAR images, IEEE Transactions on Geoscience and Remote Sensing, 37 (1999), 1292–1305.

Ding

, Lei

, Rao

Sparse codes fusion for context enhancement of night video surveillance, Multimedia Tools and Applications, 75(18) (2016), 11221–11239.

, Li

S.Z.

, Pan

, Yang

Illumination and motionbased video enhancement for night surveillance, IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, (2005), 169–175.

Mamdani

E.H.

Advances in the linguistic synthesis of fuzzy controllers, International Journal of Man-Machine Studies, 8 (1976), 669–678.

Petrovi

, Xydeas

On the effects of sensor noise in pixel-level image fusion performance, Proceedings of the Third International Conference on Information Fusion, IEEE, 2 (2000), 14–19.

Rao

and Chen

L.T.

, Illumination-based nighttime video contrast enhancement using genetic algorithm, Multimedia Tools and Applications, 70(3) (2014), 2235–2254.

Saeedi

, Faez

Infrared and visible image fusion using fuzzy logic and population-based optimization, Applied Soft Computing, 12(3) (2012), 1041–1054.

Simou

, Athanasiadis

, Stoilos

Image indexing and retrieval using expressive fuzzy description logics, Signal Image and Video Processing, 2(4) (2008), 321–335.

10.

Singh

, Raj

, Kaur

, Meitzler

Image fusion using fuzzy logic and applications, International Conference on Fuzzy Systems, IEEE, 1 (2004), 337–340.

11.

Soumya

, Thampi

S.M.

Day color transfer based night video enhancement for surveillance system, IEEE International Conference on Signal Processing Informatics Communication and Energy Systems (SPICES), IEEE, (2015), 1–5.

12.

Soumya

, Thampi

S.M.

Self-organized night video enhancement for surveillance systems, Signal Image and Video Processing, (2016), 1–8.

13.

Ulhaq

, Yin

, He

, Zhang

FACE: Fully Automated Context Enhancement for night-time video sequences, Journal of Visual Communication and Image Representation, 40 (2016), 682–93.

14.

Xydeas

C.S.

, Petrovic

V.S.

Objective pixel-level image fusion performance measure, International Society for Optics and Photonics, 4051 (2000), 88–99.

15.

Zhu

, Yang

A new image fusion algorithm based on fuzzy logic, Intelligent Computation Technology and Automation, IEEE, 2 (2008), 83–86.