Abstract
Due to the influence of illumination, noise, distortion and other factors on monocular vision images, the image quality is reduced, the difficulty of image information extraction is high, and there are often errors and uncertainties in background segmentation, which affect the effect of monocular vision image background virtualization. Therefore, a new depth information extraction monocular vision image automatic hierarchical background virtualization method is studied to improve the effect of background virtualization. The depth information map is extracted by anisotropic thermal diffusion equation. The morphology is used to fill the tiny holes in the depth information map, and its smoothing process is used to determine the image depth range, automatically layer the depth information map, and obtain the foreground layer and background layer. The background layer is virtualized by Gaussian blur operation. Pyramid image fusion method is used to fuse the foreground layer and the blurred background layer to complete the background virtualization of monocular vision image. Experimental results have shown that this method can effectively improve the clarity of depth information map edges, preserve a large amount of image edge information, and have high structural similarity, with an average value of 0.96. The efficiency is high, and the background virtualization time is only 15 ms.
Keywords
Introduction
Monocular vision images are widely used in various fields [1]. Using monocular vision images can achieve positioning and tracking functions [2]. The imaging principle of monocular camera is basically the same as that of human eye. Both of them complete imaging in the form of convex lens focusing. The lens of the human eye belongs to the convex lens of the monocular camera, but the lens has the zoom function, and the convex lens does not have the zoom function. In recent years, people like to record people or objects of interest in the form of images, but monocular cameras do not have zoom function, resulting in images captured by them do not meet people’s aesthetic needs. Using image processing technology, all information in the image can be reconstructed [3] to improve the comprehensiveness of image application. By using image segmentation methods [4], such as threshold segmentation, the image is divided into two parts: background and foreground, which can extract key information within the image. The advantages of this method are simplicity, speed, and ease of implementation, which are suitable for images with high background and foreground contrast. However, the disadvantage of this method is that the selection of threshold has a significant impact on the segmentation effect. If the threshold is not selected properly, it may lead to inaccurate segmentation or noise. Through image restoration technology [5], denoising filtering method is a common image restoration method, whose basic idea is to smooth the image through certain algorithms or filters to remove noise or reduce noise interference. For example, median filters, mean filters, Gaussian filters, etc. The advantage of this method is that it can effectively remove noise and improve the clarity and quality of the image. However, denoising filtering methods may also blur or lose the details of the image. Using image enhancement technology [6], histogram equalization is a commonly used image enhancement method. Its basic idea is to transform the grayscale histogram of the original image into a uniformly distributed histogram, thereby enhancing the contrast and clarity of the image. The advantage of this method is that it can improve the visual effect and readability of images, and is suitable for enhancing grayscale and color images. However, histogram equalization may cause blurring or loss of detailed parts of the image. To this end, various fields will adopt image processing technology to provide services with different needs. However, there is little research on the image background virtualization method. Using the image background virtualization method can highlight the foreground information of the image, suppress the background information, provide concise foreground information for subsequent applications, and accelerate the efficiency of subsequent applications. Although most monocular cameras have the background virtualization function, they need to manually select the foreground. This way will not only improve the complexity of monocular visual image acquisition, but also cannot achieve the ideal background virtualization effect, affecting the follow-up target tracking and detection effect [7]. Therefore, it is necessary to study intelligent monocular image background virtualization methods. For example, Aouissaoui et al. proposed an efficient and secure image background virtualization method based on one-dimensional chaotic mapping (tent mapping and logistic mapping) and hash functions (SHA-256 and MD5). The first part of this method is to generate keys based on the hash function of the image and its metadata. At this time, the key is highly relevant and sensitive to the original image; The second part is to rotate and arrange the first two MSB bit plans of the image to reduce the black background of redundant coding sequences; The third part is to use dynamic selection rules to encode and decode every 2-bit pixel value through logic diagram. At the same time, tent graph and exclusive OR (XOR) operation are used for confusion diffusion to realize background virtualization. The experimental results show that this method has a good background virtualization effect, and the correlation coefficient is 6.66617e-7 [8]. Dornala et al. proposed a new complex scene image background virtualization method for different lighting conditions, complex textures and colors, and various occlusion situations. This method can effectively handle complex backgrounds and achieve background virtualization effects. Firstly, morphological operations are used to locate the scene within the image, and background information within the scene is effectively extracted through morphological operations and heuristic rules. Secondly, horizontal edge processing and vertical edge processing are used to detect the histogram of the background image edges. In order to reduce the deviation of the histogram output and make the image smoother, a low-pass filter is used to process the histogram. Use Gaussian blur algorithm to blur the smoothed background histogram to achieve the effect of background blurring. This method can highlight foreground objects and reduce background interference, making the image more focused and aesthetically pleasing [9]. Pudaruth et al. first convert all images into grayscale format, and then thresholding the images to subtract foreground objects from the background. Then, the background image of the foreground target is subtracted by Gaussian fuzzy blur processing to realize the image background virtualization. This method can effectively subtract the foreground target from the background and complete the background virtualization [10]. Khongkraphan et al. studied an image background virtualization method based on convolution and iteration. Firstly, the image is estimated by kernel estimation, and then the potential image is estimated by kernel estimation in each iteration. The final virtualized image is obtained by convolving the clear image with the final estimated kernel. Due to the non uniqueness of the solution, image blur is an ill posed problem. Therefore, a smoothing function is proposed to estimate the potential image, and l2 regularization is applied to intensity and gradient. Ensure that each subproblem has a closed form solution. Various experiments on synthetic images and real images have proved that this method can obtain more reasonable and natural virtual images [11]. Djerida et al. proposed a method that can virtualize the background under dynamic influence. Dynamic principal component analysis is used to model the sequence correlation between consecutive frames, and a robust pixel based background model is constructed to reduce the impact of light changes; In order to constrain the background model, the kernel density estimation is used to identify the distribution of the background lag data matrix, and then the confidence interval limit is used to determine the corresponding detection threshold; Use background subtraction to detect the foreground, and subtract the foreground from the background. Use the Gaussian fuzzy algorithm to subtract the background image of the foreground. This method has a better background virtualization effect [12]. Yahaghi et al. proposed a Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) ray image enhancement algorithm to remove the background blur. According to various digital image processing methods, improved image quality and information extraction were achieved, and the secondary regularization strategy was used to rapidly iterate the shrinkage threshold to achieve image deblurring. Radiographic Testing (RT) images of five artworks were evaluated to verify the effectiveness of the method [13]. However, the above methods all have the problems of poor continuity of background virtualization and uneven edges, resulting in poor transition effect between foreground and background areas, which cannot meet people’s aesthetic needs.
This paper utilizes the anisotropic thermal diffusion equation to extract depth information maps from monocular visual images [14], which can improve the accuracy and continuity of depth information map extraction. This research goal emphasizes the importance of depth information extraction in the field of image processing, and hopes to promote the development and progress of this field by introducing the advanced algorithm of anisotropic thermal diffusion equation. Anisotropic Diffusion Equation (ADE) is an image processing algorithm that can remove noise while preserving image edge information, thereby improving image quality and clarity. The introduction of this method solves the shortcomings of traditional filtering algorithms in image processing, such as inability to effectively remove noise and difficulty in preserving detail information. By applying the anisotropic thermal diffusion equation to fields such as image enhancement, edge detection, and texture analysis, more accurate and continuous depth information extraction can be achieved, thereby promoting the development and progress of image processing. The research results of this paper are expected to have a positive impact on related fields. Firstly, by combining automatic layering methods and studying the automatic layering and background blurring method for extracting depth information from monocular visual images, we will be able to improve the background blurring effect and make the image more hierarchical and three-dimensional. This research direction not only meets people’s pursuit of high-quality images, but also provides more valuable information for applications such as target tracking and detection in various fields. Secondly, by introducing the advanced algorithm of anisotropic thermal diffusion equation, we can promote academic and practical exchanges in related fields. The application of this method will provide useful references for researchers in other fields and promote the common development of related fields.
Automatic layered background virtualization of monocular vision image
The specific steps of the automatic layered background virtualization method for monocular vision images are as follows:
Use a fixed monocular camera, capture two monocular visual images Extract the depth information map in the acquired image through the anisotropic thermal diffusion equation [15]. Use morphology to fill the tiny holes in the depth information map; By improving the bilateral filtering algorithm, the filled depth information map is smoothed and the burr of the depth information map is eliminated. Determine the image depth range according to the smoothed depth information map, automatically process the depth information map by layers [16], and obtain the foreground layer and background layer. Through Gaussian blur operation, the background layer is virtualized and the foreground layer is not processed. Pyramid image fusion method, which fuses the foreground layer and the blurred background layer to obtain the blurred monocular vision image.
The extraction of depth information can complete segmentation and recognition of foreground and background, which lays the foundation for realizing automatic hierarchical background virtualization of monocular vision images. Monocular depth estimation uses a single camera to obtain two-dimensional images, and estimates the three-dimensional depth information of objects through computer vision algorithms. In the automatic hierarchical background virtualization of monocular vision image, the depth information of objects in the image can be obtained through monocular depth estimation technology, so as to better segment and recognize the foreground and background. Anisotropic thermal diffusion equation is an image processing algorithm, which can smooth the image while preserving the edge information of the image. In the depth extraction of image information, the anisotropic thermal diffusion equation can be used to remove noise, smooth the image, and retain the details of the image edge, thus improving the quality and clarity of the image.
First, the anisotropic thermal diffusion equation is used to extract the depth information image in the monocular vision image. The imaging model of monocular vision image
Among them,
In the above formula, the calculation of
Among them,
The calculation formula of the blur radius of
Where,
For multi focus monocular visual images
Among them,
Where
Among them,
Because
For the thermal diffusion equation
Among them,
Monocular visual depth information of
When extracting the depth information map of monocular visual image of 2.1, take the fixed initial depth value of
When solving the thermal diffusion equation iteratively,
The calculation formula of
The calculation formula of
Among them,
The gradient descent method is used to solve the
Among them,
The calculation formula of
Among them,
The calculation formula of the optimized depth information of
However, there may be some noise and discontinuous areas in the depth information map obtained above, which will affect the subsequent image segmentation and virtualization effect. By smoothing the depth information map, noise and discontinuity in the image can be reduced, which makes image segmentation and virtualization more accurate and natural, and improves the accuracy and effect of subsequent image segmentation and virtualization, so as to better meet the practical application needs. After extracting the depth information map of monocular visual image of
Among them,
Because the extracted depth information map is poor in naturalness and smoothness, the improved bilateral filtering algorithm is used for smoothing to provide a smoother image for the subsequent image background virtualization.
Use the probability distribution function
Among them, The determination formula of
Determine the standard deviation of the spatial domain
Among them,
Utilize
The automatic hierarchical processing of monocular vision image can separate the foreground and background in the image, so as to realize the fine processing and optimization of the image, highlight the characteristics and effects of foreground objects, and achieve the accurate segmentation and virtualization of the background. Depth value of
Background virtualization
The background obtained by automatic layering of 2.4 sections using Gaussian blur
The calculation formula of
Among them,
In the process of solving
Take
Based on
Using pyramid image fusion method [19], to integrate the foreground image of
Divide Calculate the corresponding local block variance [20] of each direction detail image to be fused, and the formula is as follows:
Among them, Fuse the corresponding detail image [21] through the local variance criterion, so that
Repeat the above steps until all detail images of all foreground layers and background layers are fused [22] to obtain the final monocular image automatic layered background virtualization result.
The data set of monocular vision images in a database is taken as the experimental object, which contains more than 30000 monocular vision images. The data set contains more than 20 categories of monocular visual images, such as vehicles, animals, plants, etc., and the monocular visual image pixels are 1024
In the monocular vision image data set, randomly select a monocular vision image of a vehicle, and use the method in this paper to extract the depth information image in the monocular vision image. The extraction result is shown in Fig. 1.
Extraction results of depth information map.
According to Fig. 1(a) and (b), the method in this paper can effectively extract the depth information map in the original monocular visual image, and the extracted depth information map can roughly describe the information contour in the original monocular visual image, but the clarity and smoothness are poor. Therefore, continue to use the method in this paper to smooth the extracted depth information map, and the depth information map smoothing processing result is shown in Fig. 2.
Smooth processing results of depth information map.
According to Fig. 2, the method used in this paper utilizes anisotropic thermal diffusion equation to smooth the depth information map, making the edges of the depth information map clearer after smoothing and possessing better visual effects. By comparing and analyzing the depth information maps before and after smoothing, it can be found that the smoothed images exhibit good results in preserving edge details and features. In addition, the smoothed depth information map has smoother transitions between different grayscale values, and the transitions between objects at different depths are smoother. The experimental results further validate the advantages of using anisotropic thermal diffusion equation for smoothing, as it can preserve the details and features of the image, making it clearer.
The method in this paper is used to carry out automatic layering according to the extracted depth information image. The result of automatic layering of monocular vision image is shown in Fig. 3.
Results of automatic stratification of monocular visual images.
According to Fig. 3(a) and (b), the method in this paper can effectively and automatically process monocular visual images in layers to obtain the foreground and background layers of monocular visual images. In the subsequent background virtualization process, only the background image needs to be virtualized without any processing of the foreground image, which greatly reduces the amount of calculation of subsequent background virtualization and speeds up the efficiency of background virtualization. Experimental results show that the proposed method is effective for automatic segmentation of monocular images.
The method in this paper is used to process the background image obtained by automatic layering, and the Gaussian blur effect of the method in this paper is analyzed under different pixel variances. Pixel variance refers to the variance of pixel values in an image, which is used to describe the distribution of pixel values in an image. The larger the pixel variance is, the greater the difference between the pixel values in the image is; otherwise, the smaller the difference between the pixel values is. The analysis results are shown in Fig. 4.
Gaussian blur processing results of background image.
According to Fig. 4(a), when the variance is 0.1, the Gaussian blur result of the background image is basically the same as the original background image; According to Fig. 4(b), when the variance is 0.3, the Gaussian blur result of the background image is only slightly different from the original image, and the virtualization effect is not obvious; According to Fig. 4(c), when the variance is 0.5, the Gaussian blur result of the background image is significantly different from the original image, and the virtualization effect is better; It can be seen from Fig. 4(d) that when the variance is 0.7, the background image is too blurred and the virtualization effect is poor. Comprehensive analysis shows that when the variance is 0.5, the virtualization effect of the background image is the best.
Use the method in this paper to fuse the foreground image and the background image after the virtualization to obtain the final monocular vision image automatic layered background virtualization result, as shown in Fig. 5.
Results of automatic background blurring in monocular visual images.
According to Fig. 5, the method in this paper can effectively fuse the foreground image and the background image after the virtualization to obtain the final monocular vision image background virtualization result. After the virtualization processing, the closer the object in the monocular vision image is, the clearer the object is, and the farther the object is, the more blurred the foreground information in the image can be highlighted, and the influence of the background information on the foreground information can be suppressed. It provides more valuable foreground information for subsequent monocular vision image applications.
The peak signal to noise ratio is used to measure the background virtualization effect of this method. The higher the peak signal to noise ratio, the better the edge transition effect between foreground and background after background virtualization, that is, the better the background virtualization effect. When analyzing different focal lengths, the peak signal to noise ratio of the monocular vision image automatically stratified background virtualization is proposed in this paper, and the analysis results are shown in Fig. 6.
Background blur effect of monocular visual image at different focal lengths.
According to Fig. 6, with the increase of focal length, the peak signal-to-noise ratio at different image resolutions shows a downward trend; When the focal length is the same, the higher the image resolution, the higher the peak signal to noise ratio of background virtualization, that is, the better the effect of background virtualization. When the focal length is 40 mm, the peak signal-to-noise ratio under the three image resolutions is the lowest, which is about 24, 27, and 28 respectively, and is not lower than the threshold value of the peak signal-to-noise ratio. This indicates that the peak signal-to-noise ratio of the background virtualization in this method is higher when the image resolution and focal length are different, that is, the edge transition effect between the foreground and the background is better after the virtualization. Experimental results show that the method in this paper has a better effect of automatic hierarchical background virtualization of monocular vision images under different image resolutions and different focal lengths.
In order to further verify the background virtualization effect of the method proposed in this paper, 2000 sample images were selected as the research object in the experiment, and time was selected as the experimental indicator. The method proposed in this paper, references [8, 11, 12] were used to perform background virtualization on the sample images. The shorter the processing time, the more effective the method is. The practicality and efficiency of different methods were evaluated by comparing the length of time. The comparison results are shown in Fig. 7.
Comparison of background phantom time for different methods.
According to Fig. 7, as the number of processed sample images continues to increase, the time taken by different methods shows an upward trend. Among them, reference [12] has the longest background blur time, and when processing 400 sample images, the usage time is already higher than the other three methods; Compared to other methods, the method proposed in this paper has the shortest usage time when dealing with image background virtualization, and only takes 15 ms when processing 2000 sample images. Because this method utilizes morphology to fill small holes in depth information maps, smoothes images, automatically layers depth information, and improves the efficiency of background virtualization, it has a good application effect.
There are many applications of image background virtualization methods in the field of images. In order to improve the effect of image background virtualization, a deep information extraction monocular vision image automatic layered background virtualization method is studied. Firstly, an automatic layered background virtualization method is used to extract monocular visual images using depth information. Based on the extracted depth information, the image is automatically layered to obtain foreground and background images, enhancing the rationality of image automatic layering. Then, Gaussian blur algorithm is used to blur the background image to achieve a virtual effect. Finally, the clear foreground image is fused with the blurred background image to achieve background blurring of monocular visual images. Through this method, the foreground information in the image can be highlighted, the interference of background information on the foreground information can be suppressed, and the application effect of the image in fields such as target tracking and detection can be further improved. This method can improve the background blur effect of images, make foreground information more prominent, and demonstrate good application results in subsequent applications such as target tracking and detection.
