Automatic hierarchical background virtualization method for monocular vision image based on depth information extraction

Abstract

Due to the influence of illumination, noise, distortion and other factors on monocular vision images, the image quality is reduced, the difficulty of image information extraction is high, and there are often errors and uncertainties in background segmentation, which affect the effect of monocular vision image background virtualization. Therefore, a new depth information extraction monocular vision image automatic hierarchical background virtualization method is studied to improve the effect of background virtualization. The depth information map is extracted by anisotropic thermal diffusion equation. The morphology is used to fill the tiny holes in the depth information map, and its smoothing process is used to determine the image depth range, automatically layer the depth information map, and obtain the foreground layer and background layer. The background layer is virtualized by Gaussian blur operation. Pyramid image fusion method is used to fuse the foreground layer and the blurred background layer to complete the background virtualization of monocular vision image. Experimental results have shown that this method can effectively improve the clarity of depth information map edges, preserve a large amount of image edge information, and have high structural similarity, with an average value of 0.96. The efficiency is high, and the background virtualization time is only 15 ms.

Keywords

Depth information extraction monocular visual image automatic layering background virtualization thermal diffusion equation gaussian blur

1. Introduction

Monocular vision images are widely used in various fields [1]. Using monocular vision images can achieve positioning and tracking functions [2]. The imaging principle of monocular camera is basically the same as that of human eye. Both of them complete imaging in the form of convex lens focusing. The lens of the human eye belongs to the convex lens of the monocular camera, but the lens has the zoom function, and the convex lens does not have the zoom function. In recent years, people like to record people or objects of interest in the form of images, but monocular cameras do not have zoom function, resulting in images captured by them do not meet people’s aesthetic needs. Using image processing technology, all information in the image can be reconstructed [3] to improve the comprehensiveness of image application. By using image segmentation methods [4], such as threshold segmentation, the image is divided into two parts: background and foreground, which can extract key information within the image. The advantages of this method are simplicity, speed, and ease of implementation, which are suitable for images with high background and foreground contrast. However, the disadvantage of this method is that the selection of threshold has a significant impact on the segmentation effect. If the threshold is not selected properly, it may lead to inaccurate segmentation or noise. Through image restoration technology [5], denoising filtering method is a common image restoration method, whose basic idea is to smooth the image through certain algorithms or filters to remove noise or reduce noise interference. For example, median filters, mean filters, Gaussian filters, etc. The advantage of this method is that it can effectively remove noise and improve the clarity and quality of the image. However, denoising filtering methods may also blur or lose the details of the image. Using image enhancement technology [6], histogram equalization is a commonly used image enhancement method. Its basic idea is to transform the grayscale histogram of the original image into a uniformly distributed histogram, thereby enhancing the contrast and clarity of the image. The advantage of this method is that it can improve the visual effect and readability of images, and is suitable for enhancing grayscale and color images. However, histogram equalization may cause blurring or loss of detailed parts of the image. To this end, various fields will adopt image processing technology to provide services with different needs. However, there is little research on the image background virtualization method. Using the image background virtualization method can highlight the foreground information of the image, suppress the background information, provide concise foreground information for subsequent applications, and accelerate the efficiency of subsequent applications. Although most monocular cameras have the background virtualization function, they need to manually select the foreground. This way will not only improve the complexity of monocular visual image acquisition, but also cannot achieve the ideal background virtualization effect, affecting the follow-up target tracking and detection effect [7]. Therefore, it is necessary to study intelligent monocular image background virtualization methods. For example, Aouissaoui et al. proposed an efficient and secure image background virtualization method based on one-dimensional chaotic mapping (tent mapping and logistic mapping) and hash functions (SHA-256 and MD5). The first part of this method is to generate keys based on the hash function of the image and its metadata. At this time, the key is highly relevant and sensitive to the original image; The second part is to rotate and arrange the first two MSB bit plans of the image to reduce the black background of redundant coding sequences; The third part is to use dynamic selection rules to encode and decode every 2-bit pixel value through logic diagram. At the same time, tent graph and exclusive OR (XOR) operation are used for confusion diffusion to realize background virtualization. The experimental results show that this method has a good background virtualization effect, and the correlation coefficient is 6.66617e-7 [8]. Dornala et al. proposed a new complex scene image background virtualization method for different lighting conditions, complex textures and colors, and various occlusion situations. This method can effectively handle complex backgrounds and achieve background virtualization effects. Firstly, morphological operations are used to locate the scene within the image, and background information within the scene is effectively extracted through morphological operations and heuristic rules. Secondly, horizontal edge processing and vertical edge processing are used to detect the histogram of the background image edges. In order to reduce the deviation of the histogram output and make the image smoother, a low-pass filter is used to process the histogram. Use Gaussian blur algorithm to blur the smoothed background histogram to achieve the effect of background blurring. This method can highlight foreground objects and reduce background interference, making the image more focused and aesthetically pleasing [9]. Pudaruth et al. first convert all images into grayscale format, and then thresholding the images to subtract foreground objects from the background. Then, the background image of the foreground target is subtracted by Gaussian fuzzy blur processing to realize the image background virtualization. This method can effectively subtract the foreground target from the background and complete the background virtualization [10]. Khongkraphan et al. studied an image background virtualization method based on convolution and iteration. Firstly, the image is estimated by kernel estimation, and then the potential image is estimated by kernel estimation in each iteration. The final virtualized image is obtained by convolving the clear image with the final estimated kernel. Due to the non uniqueness of the solution, image blur is an ill posed problem. Therefore, a smoothing function is proposed to estimate the potential image, and l2 regularization is applied to intensity and gradient. Ensure that each subproblem has a closed form solution. Various experiments on synthetic images and real images have proved that this method can obtain more reasonable and natural virtual images [11]. Djerida et al. proposed a method that can virtualize the background under dynamic influence. Dynamic principal component analysis is used to model the sequence correlation between consecutive frames, and a robust pixel based background model is constructed to reduce the impact of light changes; In order to constrain the background model, the kernel density estimation is used to identify the distribution of the background lag data matrix, and then the confidence interval limit is used to determine the corresponding detection threshold; Use background subtraction to detect the foreground, and subtract the foreground from the background. Use the Gaussian fuzzy algorithm to subtract the background image of the foreground. This method has a better background virtualization effect [12]. Yahaghi et al. proposed a Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) ray image enhancement algorithm to remove the background blur. According to various digital image processing methods, improved image quality and information extraction were achieved, and the secondary regularization strategy was used to rapidly iterate the shrinkage threshold to achieve image deblurring. Radiographic Testing (RT) images of five artworks were evaluated to verify the effectiveness of the method [13]. However, the above methods all have the problems of poor continuity of background virtualization and uneven edges, resulting in poor transition effect between foreground and background areas, which cannot meet people’s aesthetic needs.

This paper utilizes the anisotropic thermal diffusion equation to extract depth information maps from monocular visual images [14], which can improve the accuracy and continuity of depth information map extraction. This research goal emphasizes the importance of depth information extraction in the field of image processing, and hopes to promote the development and progress of this field by introducing the advanced algorithm of anisotropic thermal diffusion equation. Anisotropic Diffusion Equation (ADE) is an image processing algorithm that can remove noise while preserving image edge information, thereby improving image quality and clarity. The introduction of this method solves the shortcomings of traditional filtering algorithms in image processing, such as inability to effectively remove noise and difficulty in preserving detail information. By applying the anisotropic thermal diffusion equation to fields such as image enhancement, edge detection, and texture analysis, more accurate and continuous depth information extraction can be achieved, thereby promoting the development and progress of image processing. The research results of this paper are expected to have a positive impact on related fields. Firstly, by combining automatic layering methods and studying the automatic layering and background blurring method for extracting depth information from monocular visual images, we will be able to improve the background blurring effect and make the image more hierarchical and three-dimensional. This research direction not only meets people’s pursuit of high-quality images, but also provides more valuable information for applications such as target tracking and detection in various fields. Secondly, by introducing the advanced algorithm of anisotropic thermal diffusion equation, we can promote academic and practical exchanges in related fields. The application of this method will provide useful references for researchers in other fields and promote the common development of related fields.

2. Automatic layered background virtualization of monocular vision image

The specific steps of the automatic layered background virtualization method for monocular vision images are as follows:

Step 1:
Use a fixed monocular camera, capture two monocular visual images $Z_{1}$ and $Z_{2}$ with different focal points. When obtaining these two images, set different calibration parameters and image distances respectively to generate visually distinct images $Z_{1}$ and $Z_{2}$ . Because of the uncertainty of the calibration parameters of the monocular camera $Z_{1}$ and $Z_{2}$ calibration parameters and image distance $v$ the same.
Step 2:
Extract the depth information map in the acquired image through the anisotropic thermal diffusion equation [15].
Step 3:
Use morphology to fill the tiny holes in the depth information map; By improving the bilateral filtering algorithm, the filled depth information map is smoothed and the burr of the depth information map is eliminated.
Step 4:
Determine the image depth range according to the smoothed depth information map, automatically process the depth information map by layers [16], and obtain the foreground layer and background layer.
Step 5:
Through Gaussian blur operation, the background layer is virtualized and the foreground layer is not processed.
Step 6:
Pyramid image fusion method, which fuses the foreground layer and the blurred background layer to obtain the blurred monocular vision image.

2.1 Depth information extraction

The extraction of depth information can complete segmentation and recognition of foreground and background, which lays the foundation for realizing automatic hierarchical background virtualization of monocular vision images. Monocular depth estimation uses a single camera to obtain two-dimensional images, and estimates the three-dimensional depth information of objects through computer vision algorithms. In the automatic hierarchical background virtualization of monocular vision image, the depth information of objects in the image can be obtained through monocular depth estimation technology, so as to better segment and recognize the foreground and background. Anisotropic thermal diffusion equation is an image processing algorithm, which can smooth the image while preserving the edge information of the image. In the depth extraction of image information, the anisotropic thermal diffusion equation can be used to remove noise, smooth the image, and retain the details of the image edge, thus improving the quality and clarity of the image.

First, the anisotropic thermal diffusion equation is used to extract the depth information image in the monocular vision image. The imaging model of monocular vision image $Z\left(y\right)$ is:

$\displaystyle Z\left(y\right)=\int{h\left({y,x,b}\right)\omega\left(x\right)dy}$ (1)

Among them, $\left({x,y}\right)$ and $h\left(\cdot\right)$ represent pixel spread function and pixel coordinates respectively; $\omega\left(x\right)$ express luminance radiation function of position $x$ ; $b$ represents the blur radius of $Z$ .

In the above formula, the calculation of $h\left({y,x,b}\right)$ is as follows:

$\displaystyle h\left({y,x,b}\right)=\left[{\frac{\exp\left({-\frac{\left\|{x-y% }\right\|}{2\mu}}\right)}{2\pi\mu}}\right]^{2}$ (2)

Among them, $\mu$ represents the blurriness of the image.

The calculation formula of the blur radius of $b$ for $Z\left(y\right)$ is as follows:

$\displaystyle b=0.5\textit{Dv}\left|{\frac{1}{f}-\frac{1}{v}-\frac{1}{q}}\right|$ (3)

Where, $f$ represents the focal length of the monocular camera, $D$ and $q$ represent the lens aperture and object distance of monocular camera, $v$ indicates the rate of thermal diffusion.

For multi focus monocular visual images $Z_{1}\left(y\right)$ and $Z_{2}\left(y\right)$ , point spread $Z_{1}\left(y\right)$ acquired by $Z_{2}\left(y\right)$ , then:

$\displaystyle Z\left({y,b_{2}}\right)=\int{\left[{\frac{\exp\left({-\frac{% \left\|{x-y}\right\|}{2\mu_{2}}}\right)}{2\pi\mu_{2}}}\right]^{2}\omega\left(x% \right)dy}=\int{\left[{\frac{\exp\left({-\frac{\left\|{x-y}\right\|}{2\left({% \mu_{2}-\mu_{1}}\right)}}\right)}{2\pi\left({\mu_{2}-\mu_{1}}\right)}}\right]^% {2}Z\left({y,b_{1}}\right)dy}$ (4)

Among them, $\mu_{1}$ and $\mu_{2}$ represent the blurriness of $Z_{1}\left(y\right)$ and $Z_{2}\left(y\right)$ ; $b_{1}$ and $b_{2}$ represent the blur radius of $Z_{1}\left(y\right)$ and $Z_{2}\left(y\right)$ .

Where $\Omega$ represents the entire monocular visual image area, when $\left({\mu_{2}-\mu_{1}}\right)^{2}>0$ when $\Omega_{+}$ express the clear area that $Z_{1}\left(y\right)$ is better than $Z_{2}\left(y\right)$ ; When $\left({\mu_{2}-\mu_{1}}\right)^{2}<0$ . When, $\Omega_{-}$ express the clear area that $Z_{2}\left(y\right)$ is better than $Z_{1}\left(y\right)$ . Set up $\chi$ to representative the diffusion coefficient of $Z_{2}\left(y\right)$ and $Z_{1}\left(y\right)$ , the formula is as follows:

$\displaystyle\chi=\frac{\left({\mu_{2}-\mu_{1}}\right)^{2}}{Z\left({y,b_{2}}% \right)\Delta t}$ (5)

Among them, $\Delta t$ represent the collection interval of $Z_{1}\left(y\right)$ and $Z_{2}\left(y\right)$ .

Because $Z_{1}\left(y\right)$ and $Z_{2}\left(y\right)$ belongs to the imaging of the same target, so $Z_{1}\left(y\right)$ and $Z_{2}\left(y\right)$ has the same depth information of $q_{0}\left(y\right)$ . In the case of $\chi=0\left({\mu_{2}-\mu_{1}}\right)^{2}=0$ at this time, the thermal diffusion is in equilibrium, two diffusion regions can be obtained:

$\displaystyle\Omega_{+}=\left\{{y\left|{0<q_{0}\left(y\right)\leqslant\chi\ % \textit{or}q_{0}\left(y\right)>\frac{(v_{1}+v_{2})\chi}{(v_{1}+v_{2})-2\chi}}% \right.}\right\}$ (6) $\displaystyle\Omega_{-}=\left\{{y\left|{\chi<q_{0}\left(y\right)\leqslant\frac% {(v_{1}+v_{2})\chi}{(v_{1}+v_{2})-2\chi}}\right.}\right\}$

For the thermal diffusion equation $I\left({y,t}\right)$ if the diffusion direction is set to be positive the depth value of the monocular visual image is taken as the equation variable, and the formula is as follows:

$\displaystyle\frac{\partial I\left({y,t}\right)}{\partial t}=\left\{{\begin{% array}[]{l}\nabla\cdot\left({s\left(y\right)\nabla I\left({y,t}\right)}\right)% ,y\in\Omega_{+}\\ \nabla\cdot\left({-s\left(y\right)\nabla I\left({y,t}\right)}\right),y\in% \Omega_{-}\end{array}}\right.$ (7)

Among them, $\nabla$ and $\nabla\cdot$ represent gradient and divergence operators respectively; $t$ represents time.

Monocular visual depth information of $s\left(y\right)$ can be obtained through iterative solution Eq. (7)

2.2 Optimization of depth information extraction results

When extracting the depth information map of monocular visual image of 2.1, take the fixed initial depth value of $q_{0}$ this method can not better describe the real information of monocular visual image, and delay the convergence speed of iterative solution. For this purpose, adaptive the self-adaption initial depth information of $\hat{q}_{0}$ , replace $q_{0}$ , improve the anisotropic thermal diffusion equation, the calculation formula of $\hat{q}_{0}$ is as follows:

$\displaystyle\hat{q}_{0}=\frac{s\left(y\right)}{\frac{1}{f}-\frac{1}{v_{2}-v_{% 1}}-\frac{1}{\left|{v_{2}-v_{1}}\right|}\sqrt{\left|{1+\frac{4\left({\mu_{2}-% \mu_{1}}\right)^{2}}{D^{2}}\cdot\frac{v_{2}-v_{1}}{v_{2}+v_{1}}}\right|}}$ (8)

When solving the thermal diffusion equation iteratively, $E$ represents the error between $q$ Vs. the expectations of $\mathord{\buildrel\lower 3.0pt\hbox{$\scriptscriptstyle\smile$}\over{q}}$ set $E$ as a cost function therefore, the problem of depth information extraction can be transformed into the problem of functional energy minimization $\arg\mathop{\min}\limits_{s}\left({E_{1}\left(q\right)+E_{2}\left(\breve{q}% \right)+E_{3}\left(q\right)}\right)$ , where, $E_{1}\left(q\right)$ and $E_{2}\left(\breve{q}\right)$ represent the error fidelity term of $q$ and $\breve{q}$ ; $E_{3}\left(q\right)$ represents the error of the regular constraint term.

The calculation formula of $E_{1}\left(q\right)$ and $E_{2}\left(\breve{q}\right)$ is as follows:

$\displaystyle E_{1}\left(q\right)=\int{h\left({s\left(y\right)}\right)\left|{I% _{1}\left(y\right)-I_{2}\left(y\right)}\right|^{2}dy}$ (9) $\displaystyle E_{2}\left(\breve{q}\right)=\int{h\left({-s\left(y\right)}\right% )\left|{I_{2}\left(y\right)-I_{1}\left(y\right)}\right|^{2}dy}$

The calculation formula of $E_{3}\left(q\right)$ is as follows:

$\displaystyle E_{3}\left(q\right)=\gamma\left\|{\nabla\hat{q}}\right\|^{2}+% \gamma\left\|\hat{q}\right\|^{2}$ (10)

Among them, $\gamma$ represents the regularization coefficient.

The gradient descent method is used to solve the $\arg\mathop{\min}\limits_{s}\left({E_{1}\left(q\right)+E_{2}\left(\breve{g}% \right)+E_{3}\left(q\right)}\right)$ , the formula is as follows:

$\displaystyle\frac{\partial q}{\partial\tau}=-\left({{E}^{\prime}_{1}\left(q% \right)+{E}^{\prime}_{2}\left(\breve{q}\right)+{E}^{\prime}_{3}\left(q\right)}\right)$ (11)

Among them, $\tau$ represents virtual time; ${E}^{\prime}_{1}\left(q\right)$ ã€ ${E}^{\prime}_{2}\left(\mathord{\buildrel\lower 3.0pt\hbox{$\scriptscriptstyle% \smile$}\over{q}}\right)$ , ${E}^{\prime}_{3}\left(q\right)$ represent the fidelity term and regular constraint term of $\tau$ .

The calculation formula of ${E}^{\prime}_{1}\left(q\right)$ ã€ ${E}^{\prime}_{2}\left(\mathord{\buildrel\lower 3.0pt\hbox{$\scriptscriptstyle% \smile$}\over{q}}\right)$ , ${E}^{\prime}_{3}\left(q\right)$ is as follows:

$\displaystyle{E}^{\prime}_{1}\left(q\right)=\left[{-2H\left({s\left(y\right)}% \right)\int_{0}^{\Delta t}{\nabla I\left({y,t}\right)}+\delta\left({s\left(y% \right)}\right)\left({I\left({y,\Delta t}\right)-I_{2}\left(y\right)^{2}}% \right)}\right]\cdot{s}^{\prime}\left(\hat{q}\right)$ $\displaystyle{E}^{\prime}_{2}\left(\breve{q}\right)=\left[{2H\left({-s\left(y% \right)}\right)\int_{0}^{\Delta t}{\nabla I\left({y,t}\right)}+\delta\left({s% \left(y\right)}\right)\left({I\left({y,\Delta t}\right)-I_{1}\left(y\right)^{2% }}\right)}\right]\cdot{s}^{\prime}\left(\hat{q}\right)$ (12) $\displaystyle{E}^{\prime}_{3}\left(q\right)=-2\gamma\nabla\hat{q}+2\gamma\hat{q}$

Among them, $H$ , $\delta$ , ${s}^{\prime}$ represent step function, impulse function and the gradient value of $s$ .

The calculation formula of the optimized depth information of ${s}^{\prime}\left(\hat{q}\right)$ is as follows:

$\displaystyle{s}^{\prime}\left(\hat{q}\right)=\frac{\left({\mu_{2}-\mu_{1}}% \right)^{2}D^{2}\left({v_{2}-v_{1}}\right)}{4\hat{q}^{2}\Delta t}\left[{\left(% {v_{2}+v_{1}}\right)\cdot\left({\frac{1}{f}-\frac{1}{\hat{q}}}\right)-1}\right]$ (13)

2.3 Smoothing of depth information map of monocular vision image

However, there may be some noise and discontinuous areas in the depth information map obtained above, which will affect the subsequent image segmentation and virtualization effect. By smoothing the depth information map, noise and discontinuity in the image can be reduced, which makes image segmentation and virtualization more accurate and natural, and improves the accuracy and effect of subsequent image segmentation and virtualization, so as to better meet the practical application needs. After extracting the depth information map of monocular visual image of $Q$ of the 2.2, there will be a lot of small holes in it. Therefore, use morphology to fill $Q$ the formula is as follows:

$\displaystyle Q^{\prime}\cdot A=\left({Q\oplus A}\right)\cdot{s}^{\prime}\left% (\hat{q}\right)$ (14)

Among them, $A$ represents structural elements.

Because the extracted depth information map is poor in naturalness and smoothness, the improved bilateral filtering algorithm is used for smoothing to provide a smoother image for the subsequent image background virtualization.

${Q}^{\prime}$ represent the filled depth information map, and use improved bilateral filtering algorithm to smooth ${Q}^{\prime}$ , the specific steps are as follows:

Step 1:

Use the probability distribution function $P$ and maximum likelihood function $L$ , confirm ${Q}^{\prime}$ standard deviation of gray scale domain of $\alpha$ , make $U$ representative ${Q}^{\prime}$ the smooth area of, $u_{1},u_{2},\ldots,u_{N}$ representative $U$ of $N$ observations, the $k$ observations $u_{k}$ probability distribution function of $P\left({u_{k}}\right)$ is:

$\displaystyle P\left({u_{k}}\right)=\prod\limits_{k=1}^{N}{\frac{u_{k}}{\alpha% }e^{-\left({\frac{u_{k}+U}{2\alpha}}\right)^{2}}J\left({\frac{u_{k}U}{{Q}^{% \prime 2}}}\right)}$ (15)

Among them, $J\left(\cdot\right)$ represents a Bessel function.

The determination formula of $\alpha$ is as follows:

$\displaystyle\alpha^{2}=\arg\max\left\{{\ln L\left(U\right)}\right\}$ (16)

$\alpha$ can be obtained by derivation Eq. (19), using $\alpha$ to smooth the grayscale domain of the depth information image.

Step 2:

Determine the standard deviation of the spatial domain $\hat{\alpha}$ , the formula is as follows:

$\displaystyle\hat{\alpha}=\frac{R-m}{2}$ (17)

Among them, $R$ represents the filtering radius of the depth information map; $m$ represents a constant.

Utilize $\hat{\alpha}$ to smooth the spatial domain of depth information map.

2.4 Automatic layering

The automatic hierarchical processing of monocular vision image can separate the foreground and background in the image, so as to realize the fine processing and optimization of the image, highlight the characteristics and effects of foreground objects, and achieve the accurate segmentation and virtualization of the background. Depth value of $q^{\ast}$ which is calculated according to Section 2.2, to determine the depth interval of monocular image depth information map $\left[{q_{\min}^{\ast},q_{\max}^{\ast}}\right]$ , the minimum and maximum values of $q^{\ast}$ are $q_{\min}^{\ast}$ , $q_{\max}^{\ast}$ . Divide the depth information map into $M$ layer basis on $\left[{q_{\min}^{\ast},q_{\max}^{\ast}}\right]$ , for which the depth interval is $\frac{q_{\max}^{\ast}-q_{\min}^{\ast}}{M}$ . Statistically analyze the number of pixels in the depth area of each layer, and the number of fused pixels is lower than the depth threshold and the nearby layer of $\varepsilon$ , get a larger level [17], and finally get $\hat{N}$ large layers, and $\hat{N}<M$ set the $\hat{N}$ large layer serves as the foreground, and set the other depth layers serve as the background.

2.5 Background virtualization

The background obtained by automatic layering of 2.4 sections using Gaussian blur $Q_{\textit{back}}$ the foreground remains unchanged after virtualization. The principle of background virtualization is: based on one-dimensional normal distribution function $\varphi\left(x\right)$ , count the pixel value near a pixel $x$ . At the same time, the pixel value of the point is solved by means of weighted average to replace the original pixel value to achieve the purpose of background virtualization [18].

The calculation formula of $\varphi\left(x\right)$ is as follows:

$\displaystyle\varphi\left(x\right)=\frac{\exp\left[{-\left({\frac{x-\mu}{2% \theta}}\right)^{2}}\right]\hat{\alpha}}{\theta\sqrt{2\pi}}$ (18)

Among them, $\mu$ and $\theta$ represent the mean and variance of $x$ .

In the process of solving $\mu$ set $x$ as the starting point, then $\mu=0$ , therefore:

$\displaystyle\varphi\left(x\right)=\frac{\exp\left[{-\left({\frac{x}{2\theta}}% \right)^{2}}\right]\hat{\alpha}}{\theta\sqrt{2\pi}}$ (19)

Take $\varphi\left(x\right)$ act on two-dimensional function to obtain Gaussian function $G\left({x,y}\right)$ , where, $\left({x,y}\right)$ represents the pixels in the background image, the calculation formula of $G\left({x,y}\right)$ is as follows:

$\displaystyle G\left({x,y}\right)=\frac{\exp\left[{-\left({\frac{x+y}{2\theta}% }\right)^{2}}\right]\hat{\alpha}}{\theta\sqrt{2\pi}}$ (20)

Based on $G\left({x,y}\right)$ , the pixel values of each point in the background image are replaced by the weighted mean value of the neighborhood pixel values, which reduces the pixel values of each point to complete the background virtualization.

Using pyramid image fusion method [19], to integrate the foreground image of $Q_{\textit{prospect}}$ and the image of ${Q}^{\prime\prime}_{\textit{back}}$ after virtualization, the final monocular vision image automatically stratified background virtualization result is obtained. The specific steps are as follows:

Step 1:

Divide $Q_{\textit{prospect}}$ and ${Q}^{\prime\prime}_{\textit{back}}$ detailed image segmentation in each direction into $n\times n$ small pieces

Step 2:

Calculate the corresponding local block variance [20] of each direction detail image to be fused, and the formula is as follows:

$\displaystyle\rho\left({x,y}\right)=\frac{\sum\limits_{\hat{m}=-1,\hat{n}-1}^{% 1}{\left[{\left({O_{i}\left({x+\hat{m}}\right),\left({y+\hat{n}}\right)}\right% )-\hat{m}_{i}\left({x,y}\right)}\right]^{2}}}{n\cdot G\left({x,y}\right)}$ (21)

Among them, $O_{i}$ represents the gray value of the number $i$ small block; $\hat{m}$ , $\hat{n}$ represent the separable Gaussian weighted value of the horizontal and vertical coordinates; $\hat{m}_{i}$ represents the number of $i$ separable Gaussian weighted value of blocks.

Step 3:

Fuse the corresponding detail image [21] through the local variance criterion, so that $\lambda_{1}$ and $\lambda_{2}$ represent the detail image to be fused, $\xi\left({x,y}\right)$ represents the fused image, and the formula is as follows:

$\displaystyle\xi\left({x,y}\right)=\left\{{\begin{array}[]{ll}\lambda_{1}\left% ({x,y}\right)&\rho_{\lambda_{1}}\left({x,y}\right)\geqslant\rho_{\lambda_{2}}% \left({x,y}\right)\\ \lambda_{2}\left({x,y}\right)&\rho_{\lambda_{1}}\left({x,y}\right)<\rho_{% \lambda_{2}}\left({x,y}\right)\\ \end{array}}\right.$ (22)

Step 4:

Repeat the above steps until all detail images of all foreground layers and background layers are fused [22] to obtain the final monocular image automatic layered background virtualization result.

3. Experimental analysis

The data set of monocular vision images in a database is taken as the experimental object, which contains more than 30000 monocular vision images. The data set contains more than 20 categories of monocular visual images, such as vehicles, animals, plants, etc., and the monocular visual image pixels are 1024 $\times$ 768 dpi to 2048 $\times$ 1536 dpi, and the resolution is between 300 Pixel and 500 Pixel. Using this method, the experimental monocular vision image data set is processed with background virtualization to verify the feasibility of this method.

In the monocular vision image data set, randomly select a monocular vision image of a vehicle, and use the method in this paper to extract the depth information image in the monocular vision image. The extraction result is shown in Fig. 1.

Figure 1.

Extraction results of depth information map.

According to Fig. 1(a) and (b), the method in this paper can effectively extract the depth information map in the original monocular visual image, and the extracted depth information map can roughly describe the information contour in the original monocular visual image, but the clarity and smoothness are poor. Therefore, continue to use the method in this paper to smooth the extracted depth information map, and the depth information map smoothing processing result is shown in Fig. 2.

Figure 2.

Smooth processing results of depth information map.

According to Fig. 2, the method used in this paper utilizes anisotropic thermal diffusion equation to smooth the depth information map, making the edges of the depth information map clearer after smoothing and possessing better visual effects. By comparing and analyzing the depth information maps before and after smoothing, it can be found that the smoothed images exhibit good results in preserving edge details and features. In addition, the smoothed depth information map has smoother transitions between different grayscale values, and the transitions between objects at different depths are smoother. The experimental results further validate the advantages of using anisotropic thermal diffusion equation for smoothing, as it can preserve the details and features of the image, making it clearer.

The method in this paper is used to carry out automatic layering according to the extracted depth information image. The result of automatic layering of monocular vision image is shown in Fig. 3.

Figure 3.

Results of automatic stratification of monocular visual images.

According to Fig. 3(a) and (b), the method in this paper can effectively and automatically process monocular visual images in layers to obtain the foreground and background layers of monocular visual images. In the subsequent background virtualization process, only the background image needs to be virtualized without any processing of the foreground image, which greatly reduces the amount of calculation of subsequent background virtualization and speeds up the efficiency of background virtualization. Experimental results show that the proposed method is effective for automatic segmentation of monocular images.

The method in this paper is used to process the background image obtained by automatic layering, and the Gaussian blur effect of the method in this paper is analyzed under different pixel variances. Pixel variance refers to the variance of pixel values in an image, which is used to describe the distribution of pixel values in an image. The larger the pixel variance is, the greater the difference between the pixel values in the image is; otherwise, the smaller the difference between the pixel values is. The analysis results are shown in Fig. 4.

Figure 4.

Gaussian blur processing results of background image.

According to Fig. 4(a), when the variance is 0.1, the Gaussian blur result of the background image is basically the same as the original background image; According to Fig. 4(b), when the variance is 0.3, the Gaussian blur result of the background image is only slightly different from the original image, and the virtualization effect is not obvious; According to Fig. 4(c), when the variance is 0.5, the Gaussian blur result of the background image is significantly different from the original image, and the virtualization effect is better; It can be seen from Fig. 4(d) that when the variance is 0.7, the background image is too blurred and the virtualization effect is poor. Comprehensive analysis shows that when the variance is 0.5, the virtualization effect of the background image is the best.

Use the method in this paper to fuse the foreground image and the background image after the virtualization to obtain the final monocular vision image automatic layered background virtualization result, as shown in Fig. 5.

Figure 5.

Results of automatic background blurring in monocular visual images.

According to Fig. 5, the method in this paper can effectively fuse the foreground image and the background image after the virtualization to obtain the final monocular vision image background virtualization result. After the virtualization processing, the closer the object in the monocular vision image is, the clearer the object is, and the farther the object is, the more blurred the foreground information in the image can be highlighted, and the influence of the background information on the foreground information can be suppressed. It provides more valuable foreground information for subsequent monocular vision image applications.

The peak signal to noise ratio is used to measure the background virtualization effect of this method. The higher the peak signal to noise ratio, the better the edge transition effect between foreground and background after background virtualization, that is, the better the background virtualization effect. When analyzing different focal lengths, the peak signal to noise ratio of the monocular vision image automatically stratified background virtualization is proposed in this paper, and the analysis results are shown in Fig. 6.

Figure 6.

Background blur effect of monocular visual image at different focal lengths.

According to Fig. 6, with the increase of focal length, the peak signal-to-noise ratio at different image resolutions shows a downward trend; When the focal length is the same, the higher the image resolution, the higher the peak signal to noise ratio of background virtualization, that is, the better the effect of background virtualization. When the focal length is 40 mm, the peak signal-to-noise ratio under the three image resolutions is the lowest, which is about 24, 27, and 28 respectively, and is not lower than the threshold value of the peak signal-to-noise ratio. This indicates that the peak signal-to-noise ratio of the background virtualization in this method is higher when the image resolution and focal length are different, that is, the edge transition effect between the foreground and the background is better after the virtualization. Experimental results show that the method in this paper has a better effect of automatic hierarchical background virtualization of monocular vision images under different image resolutions and different focal lengths.

In order to further verify the background virtualization effect of the method proposed in this paper, 2000 sample images were selected as the research object in the experiment, and time was selected as the experimental indicator. The method proposed in this paper, references [8, 11, 12] were used to perform background virtualization on the sample images. The shorter the processing time, the more effective the method is. The practicality and efficiency of different methods were evaluated by comparing the length of time. The comparison results are shown in Fig. 7.

Figure 7.

Comparison of background phantom time for different methods.

According to Fig. 7, as the number of processed sample images continues to increase, the time taken by different methods shows an upward trend. Among them, reference [12] has the longest background blur time, and when processing 400 sample images, the usage time is already higher than the other three methods; Compared to other methods, the method proposed in this paper has the shortest usage time when dealing with image background virtualization, and only takes 15 ms when processing 2000 sample images. Because this method utilizes morphology to fill small holes in depth information maps, smoothes images, automatically layers depth information, and improves the efficiency of background virtualization, it has a good application effect.

4. Conclusion

There are many applications of image background virtualization methods in the field of images. In order to improve the effect of image background virtualization, a deep information extraction monocular vision image automatic layered background virtualization method is studied. Firstly, an automatic layered background virtualization method is used to extract monocular visual images using depth information. Based on the extracted depth information, the image is automatically layered to obtain foreground and background images, enhancing the rationality of image automatic layering. Then, Gaussian blur algorithm is used to blur the background image to achieve a virtual effect. Finally, the clear foreground image is fused with the blurred background image to achieve background blurring of monocular visual images. Through this method, the foreground information in the image can be highlighted, the interference of background information on the foreground information can be suppressed, and the application effect of the image in fields such as target tracking and detection can be further improved. This method can improve the background blur effect of images, make foreground information more prominent, and demonstrate good application results in subsequent applications such as target tracking and detection.

References

Almalioglu

Turan

Saputra

MRU

, et al. SelfVIO: Self-supervised deep monocular Visual–Inertial Odometry and depth estimation. Neural Networks. 2022; 150: 119-136.

Kayhani

Zhao

McCabe

, et al. Tag-based visual-inertial localization of unmanned aerial vehicles in indoor construction environments using an on-manifold extended Kalman filter. Automation in Construction. 2022; 135: 104112.

Titarenko

Malashin

. Study of the ability of neural networks to extract and use semantic information when they are trained to reconstruct noisy images. Journal of Optical Technology. 2022; 89(2): 81-88.

Shivahare

Gupta

. (Retracted) Hybrid whale optimization algorithm-Levy flight approach for multilevel thresholding image segmentation. Journal of Electronic Imaging. 2022; 31(5): 051420-051420.

S. PP Renjit

. Image restoration model using Jaya-Bat optimization-enabled noise prediction map. IET Image Processing. 2021; 15(9): 1926-1939.

Bhandari

Subramani

Veluchamy

. Multi-exposure optimized contrast and brightness balance color image enhancement. Digital Signal Processing. 2022; 123: 103406.

Chandrakar

Raja

Miri

, et al. Enhanced the moving object detection and object tracking for traffic surveillance using RBF-FDLNN and CBF algorithm. Expert Systems with Applications. 2022; 191: 116306.

Aouissaoui

Bakir

Sakly

. Robustly correlated key-medical image for DNA-chaos based encryption. IET image Processing. 2021; 15(12): 2770-2786.

Mlyahilu

Lee

, et al. Morphological geodesic active contour algorithm for the segmentation of the histogram-equalized welding bead image edges. IET Image Processing. 2022; 16(10): 2680-2696.

10.

Pudaruth

Nazurally

Appadoo

, et al. SuperFish: a mobile application for fish species recognition using image processing techniques and deep learning. International Journal of Computing and Digital Systems. 2020; 10: 1-14.

11.

Khongkraphan

Phonon

Nuiphom

. An efficient blind image deblurring using a smoothing function. Applied Computational Intelligence and Soft Computing. 2021; 2021: 1-10.

12.

Djerida

Zhao

. Background subtraction in dynamic scenes using the dynamic principal component analysis. IET Image Processing. 2020; 14(2): 245-255.

13.

Yahaghi

Mirzapour

Movafeghi

, et al. FISTA algorithm for radiography images enhancement with background blurring removal. Research in Nondestructive Evaluation. 2019; 30(2): 80-88.

14.

Raihan

Abas

De Silva

. Depth estimation for underwater images from single view image. IET Image Processing. 2020; 14(16): 4188-4197.

15.

Moriwaki

Shirasaki

Yoshida

. Deep learning for line intensity map** observations: information extraction from noisy maps. The Astrophysical Journal Letters. 2021; 906(1): L1.

16.

Smith

Arora

Stone

, et al. Pix2Prof: fast extraction of sequential information from galaxy imagery via a deep natural language ‘captioning’ model. Monthly Notices of the Royal Astronomical Society. 2021; 503(1): 96-105.

17.

Ozyoruk

Gokceler

Bobrow

, et al. EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Medical Image Analysis. 2021; 71: 102058.

18.

Alphonse

PJA

Sriharsha

. Depth estimation from a single RGB image using target foreground and background scene variations. Computers & Electrical Engineering. 2021; 94: 107349.

19.

Addesso

Restaino

Vivone

. An improved version of the generalized Laplacian pyramid algorithm for pansharpening. Remote Sensing. 2021; 13(17): 3386.

20.

Bhat

Koundal

. Multi-focus image fusion using neutrosophic based wavelet transform. Applied Soft Computing. 2021; 106: 107307.

21.

Telli

Sbaa

Bekhouche

, et al. A novel multi-level pyramid co-variance operators for estimation of personality traits and job screening scores. 2021; 38(3): 539-546.

22.

Kim

Shin

Han

, et al. An Efficient Scheme to Obtain Background Image in Video for YOLO-based Static Object Recognition. Journal of Web Engineering. 2022; 21(5): 1691-1706.