Abstract
This paper discussed an anisotropic interpolation model that filling in-depth data in a largely empty region of a depth map. We consider an image with an anisotropic metric
Introduction
Depth maps are widely used in autonomous vehicles, autonomous aircraft, 3D modeling for BIM (Building Information Modelling) systems, and game consoles such as the Xbox. Depth maps frequently present a lack of acquired data or data with low confidence levels. Occlusions and sensor misinterpretation of acquired data (ToF camera, LiDAR, or a Kinect sensor) result in holes in the acquired depth map.
This paper discusses anisotropic depth data completion applied to the interpolation of depth maps in large empty regions. Our proposal aims to fill the empty data region of a depth image with two elements: i) data from the self depth image and ii) a corresponding color image known as the reference image. This paper presents an empirical analysis of a depth map completion interpolation operator. The biased Absolute Minimizing Lipschitz Extension, also known as the biased Infinity Laplacian or bAMLE, is an interpolation operator that first appeared in the axiomatic approach proposed in [1, 2].
The AMLE interpolator was introduced in [3, 4] as a completion method from a theoretical point of view. As explained in [2, 1], the infinity Laplacian operator is simple and satisfies a small set of mathematical axioms. This work proposed a computational scheme to solve the biased Infinity Laplacian using the “eikonal” operator to obtain a numerical implementation. This numerical implementation produces a weighted average-based numerical model of the biased Infinity Laplacian, which is fast and straightforward to implement.
Related works
The goal of the depth completion task is to estimate or recover dense depth maps from sparse data. Some depth completion methods rely solely on the depth data available, while others rely on additional data such as a scene color reference image. The goal of methods that use a color reference image or (guided methods) is to perform depth completion using a color image while avoiding creating new objects that are not present in the original image [5]. Image enhancement [6], depth inpainting [7], filtering, and various other tasks that have all benefited from guided methods. The main idea behind guided filters is that meaningful information from the color reference image, such as textures and edges, can be transferred to the incomplete depth map.
A bilateral filter is an image domain-spatial filter that focuses on local features like edges or smoothness of the image to guide the filtering. This filter is frequently used to filter and input image [7] preserving strong edges [8].
In [9] is proposed a method that simultaneously recovers a depth map and reconstructs a gray level image of the scene. To perform this task, the authors use convolutional neural networks (CNN). This scheme outperforms methods that used only depth in the KITTI depth completion suite benchmark [10]. In [11] is proposed a model to solve the problem of depth completion outside the object in the scene. The authors use CNN and a new representation for depth called Depth Coefficients (DC). This representation lets them avoid inter-object depth mixing. The authors of [5, 12] evaluated convolutional networks in a database containing depth images captured by a Kinect sensor. The authors used CNN to complete the data after down-sampling these depth images every other 8 pixels square.
In [13] an approach to classify disturbed depth data by learning an input trust estimator based on normalized convolutional neural networks is presented (NCNNs).
Application to depth completion of AMLE filter was performed in either elevation models [14] or optical flow completion [15].
The bAMLE was applied to complete depth data in [16], but the authors did not test different metrics or color spaces.
Recently in [17] the interpolations properties of the AMLE operator have been applied to the flow densification and large hole completion in optical flow. The work in [18] presents a method that simultaneously semantically segment the scene and depth completion under a multi-task deep learning framework. The authors proposed a scheme that considers one encoder used semantically to segment the scene and also for depth completion. The authors introduced boundary features used in the decoder. An extra boundary module was used to generate boundary features constructing a cross-task joint loss function for the training stage. Experimentally, the authors demonstrated that the proposal can jointly improve both estimations. In [19] is proposed a method to complete depth maps. To complete depth the authors use a binary anisotropic diffusion tensor. The authors also proposed an image-guided nearest neighbor search used by the binary tensor. This paper proposed a variational scheme that consists of a data term and a regularization term. The proposal due to its anisotropy preserves discontinuities in the image of the scene between different objects, i.e. generate piece-wise constant depth maps. Results show that the methods perform well in recovering flat surfaces.
The work in [20] presents a deep learning approach to solve the problem of depth completion. The authors use a depthwise separable technique to reduce the number of parameters of the net. They use that technique in the convolution and deconvolution stages. They reduced number of parameters in a 96% keeping the performance similar to the state of the art methods. Finally, they implemented their proposal in FPGA reaching processing time of 11 images per second.
This paper is an extended version of our original manuscript presented in [16].
The contribution of this extended manuscript are three folds:
Test the numerical implementation of the AMLE and bAMLE using new metrics, which use fractional exponent ( Test the implementation using different color spaces (sRGB, XYZ, The best parameter for the proposal were estimated using PSO (Particle Swarm Optimization) and EHO (Elephant Herd Optimization) and both performances were compared. We filtered the final estimated depth with a median filter increasing the performance of the whole estimation.
In Section 2 we present the biased infinity Laplacian (bAMLE) we used to complete sparse disparities. In Section 2.2 we explain our numerical implementation of the bAMLE and in Section 4 is presented the performance of our proposal in KITTI [21]. Finally, Section 5 presents our conclusions.
The bAMLE (biased infinity Laplacian) first appeared in the analysis of interpolators in manifolds [1, 2], but it was not evaluated there. In the context of sparse depth map completion, the bAMLE interpolator is very efficient in filling in blank areas in depth maps.
Dealing with the problem,
as well as the constraint
where
where
Let
Let us describe a particular case which is of interest for us here. Assuming that
In this case,
The AMLE is usually obtained as the limit of
where
Considering the discrete grid as a graph, let us take a pair of grid points
where
Let us take a curve
Given a pair of points
The distance
The discretization model proposed in [16] is fundamental for the proposal we recalled here. Given a point
Based on the “eikonal” operator, the AMLE following [23] is given by:
The discretized version of the biased Infinity Laplacian corresponds to:
with
The numerical implementation for the iterative discretized biased Infinity Laplacian is:
with
In practice, we have used the distance
Temporal extension
We have extended the AMLE and the biased AMLE to handle temporal information. Let us consider two consecutive frames
Color images and depth map for a video sequence. In the color images, a red balloon moves from left to right. We show the optical flow as a black arrow in frame 
We show in Fig. 1 a red balloon that moves from left to right. We also show the depth map of the balloon. There is a hole in the depth map that moves jointly with the balloon. By warping
Color images and depth map for a video sequence. In the color images, a red balloon moves from left to right. The black arrow represents the optical flow. We show the depth map for the red balloon that moves and we show a hole in the depth map. In this example, the hole has a different motion of the object, which means that additional information can be obtained compensating the depth map by the optical flow.
We have at our disposal a depth map
The metric presented in Eq. (6) considers a reference image in a specific color space. Our main idea is to evaluate the performance of the bAMLE model using many image color spaces. Three eye receptors give human color perception, which perceives a combination of three stimuli: Red, Green, and Blue. Different representation color models in digital images have been stated, taking into account aspects of human perception. We present transformations between the standard RGB (sRGB) model and other models.
sRGB to XYZ
In this space,
Computing variations in each component we have:
and,
then
This color space is composed by
and the nonlinear transformation:
Where
with
and,
Then
We have estimated the parameters of AMLE (radius,
PSO algorithm
This algorithm optimizes a function by iteratively improving many candidate’ solutions. The algorithm performs each iteration updating those candidates. The updating of these candidates’ solutions is performed according to the dynamic positions and velocities of those candidates’ answers. In our case, those candidate solutions are different model parameters and the function to optimize is the depth estimation error. Let
Each candidate solution (
and,
where
Inspired by the behavior of an Elephant Herd, the EHO algorithm estimates the best solution to an optimization problem given a set of random solutions in an iterative scheme. An elephant in a clan represents each possible solution. Each Clan has a matriarch. In each iteration, a fixed number of elephants abandon the Clan. The matriarch is the individual that presents a better performance in the Clan. Let
where
and the center
where
We have implemented the algorithm in a GPU. Our implementation is not optimized, but it runs fast, reaching 0.096 seconds per iteration in images of 1216
Example of RGB reference and depth images of the KITTI dataset. (a) (b) Example of color reference images of KITTI dataset. (c) and (d) Corresponding depth images of color reference images (a) and (b), respectively. Depth was color-coded using jet colormap in MATLAB.
In this section, we present the pseudocode of our algorithm to complete depth maps using bAMLE. We take the parameter value, the metric, the color space, and an occlusion mask indicating the interpolation region.
KwParParameters
[h] Depth completion using bAMLEOne color image
Completed depth out
Initialization
Determine
Determine
Compute
Update
Data set and experiments
The KITTI depth completion suite includes 1000 RGB reference images as well as the corresponding depth ground truth. This ground truth is a semi-dense depth obtained from raw LiDAR scans.
In Fig. 3a and b we show two reference images. Figure 3c and d sparse depth images of the corresponding reference image.
MPI-Sintel is a synthetic data set publicly available constructed to evaluate optical flow estimation algorithms. The data set consists of a training set and a validation set. The training set is also divided into two subsets Clean and Final. The data set contains sequences of synthetic images where different effects are present: occlusions, small and fast displacements, blur, illumination changes, fog, the rapid motion of the camera, and many others. Figure 4 shows examples of consecutive images of the MPI-Sintel Final set.
Examples of MPI-Sintel with color-coded ground truth. (a) and (b) frames 6 and 7 of sequences ambush_4, respectively. (c) color-coded optical flow. (d) and (e) frames 18 and 19 of sequence market_6, respectively. (f) color-coded optical flow. (g) and (h) frames 30 and 31 for sequence temple_3, respectively. (i) Color-coded ground truth.
The final data set contains around 1000 images where the optical flow ground truth is available.
For considered metric (
Training set. The third row shows the location of the sparse depth (yellow points) superimposed to the reference image.
We leave the rest of 997 reference color images and depth ground truth to validate the model.
The AMLE has the following parameters: neighbor size (radius), spatial constant
The evolution curve for the PSO. First row shows performance of the PSO algorithm estimation parameter for AMLE for 
Evolution of the EHO algorithm optimizing MSE 
Table 1 shows the final error for each considered model.
MSE
MSE
For the bAMLE interpolator, we used also the PSO algorithm to estimate its parameters. We show in Table 2 the final performance value of the bAMLE model.
In general, in Table 1 we observe that, best training stage estimation value was obtained for the metric
In Table 2 we selected
Additionally, we have estimated the parameters of the bAMLE model using Elephant Herd Optimization to compare the performance of different parameter estimation methods. Performing this task we considered 5 clans and 10 elephants per clan. We considered
MSE+MAE obtained by the EHO algorithm training bAMLE for the
metric and different color spaces
MSE+MAE obtained by the EHO algorithm training bAMLE for the
Results obtained by bAMLE model with RGB color space and 
We show in Fig. 7 the evolution of the best individual in each generation in the optimization using 30 iterations, 50 elephants, and 5 clans.
In the first row of Fig. 7 we how the evolution of the EHO algorithm minimizing MSE
As we see in Tables 2 and 3, comparing the performance of PSO and EHO algorithm, we observe that PSO reaches a bit better performance than the EHO algorithm in most of the color spaces, using the metric
Results of depth completion on KITTI dataset
We show in Table 4 the final results obtained by the selected models in the test KITTI dataset. We have completed 997 incomplete depth maps of the data set.
Results obtained by different methods in KITTI depth completion validation set
Results obtained by different methods in KITTI depth completion validation set
Table 4 shows that the bAMLE outperforms the AMLE model. The best performance was obtained by bAMLE using sRGB color space and square root value metric. Both methods present a performance in the middle of the KITTI ranking. Methods that present better performance use CNN [26], which are more complex to implement than our proposal and take hours to be trained. Figure 8 shows examples of interpolated depth images using bAMLE model.
AMLE and bAMLE using CMY and
Obtained color-coded optical flow using the model proposed in [25]. (a) Estimated optical for ambush_4 sequence. (b) optical flow market_6 sequence. (c) Optical flow for temple_3 sequence. (d), (e) and (f) are the estimated occlusion-disocclusion for sequence ambush_4, market_6, and temple_3, respectively.
As the last stage of depth estimation, we filtered the model output
Results obtained by different methods in the KITTI depth completion validation set
Results obtained by different methods in the KITTI depth completion validation set
Following ideas in [17], we applied the bAMLE to complete optical flow. The optical flow of the video sequence is the displacements of the pixels from the reference image (current image) to the target image (next image). Occluded or disoccluded points can not be matched between two consecutive images. The method in [25] has an occlusion-disocclusion estimator, based on large values of the optical flow estimation error. We detected the occlusion-disocclusion, and we constructed a binary mask with this information, which is used to eliminate the estimated optical flow in those regions. Then, each component of the optical flow is interpolated as in [17] but in our case using bAMLE.
Using the robust optical flow method presented in [25] we estimated the optical flow of the sequences in Fig. 4, and we show the obtained results in Fig. 9.
Using the results presented in Fig. 9 we computed the end-point-error (EPE) and average-angular-error (AAE) for each optimal flow estimation according to the equations:
where
Thus, the obtained EPE and AAE are presented in Table 6.
EPE and AAE for video sequences extracted from MPI-Sintel
We are taking into account the occlusion-disocclusion mask and the estimated optical flow. We did not consider the values of the estimated optical flow in the occlusion-not occlusion points. We created an artificial hole in each optical flow component, and then we fill in the wholes using bAMLE as an interpolator. We show in Fig. 10 the hole added to the optical flow. In the sequence market_6, the EPE increases from 2.17 to 2.21, 0.04, which is very small. In the other two cases, ambush_4 and temple_3, the EPE drop from 42.03 to 41.92 and from 22.95 to 22.87, respectively.
Holes added to the optical flow estimation used to show other possible applications of the proposal. The holes are presented in white color (c) Holes added to coded optical flow for sequence ambush_4. (b) Holes added to sequence market_6 sequence. (c) Holes added to sequence temples_6 sequence.
After completing each component of the optical flow, we computed the optical flow estimation error EPE and AAE as we show in Table 7. As we see in Table 7 the average EPE and AAE are inferior to the one obtained by the original algorithm presented in 6.
EPE and AAE for the completed video sequence extracted from MPI-Sintel
We have evaluated AMLE and bAMLE in the depth completion task for different color spaces and different metrics. We have selected the most frequently used color spaces such as s
Also, in the training stage, we have compared the performance of the PSO and EHO algorithm estimating parameters. We observed that the EHO method performs similarly to PSO, not showing a notorious improvement in the results. Parameters present some restrictions, for example, the iteration number can not be negative, or neither the
We tested different metrics giving values to the
We also stated in this manuscript an extension of our proposal to a temporal domain that will be evaluated as future work. The proposal considers the use of a sequence of reference images, a sequence of depth maps, and the optical flow of the video sequence.
Our proposal can be extended to other domains such that optical flow can be a new possibility to be explored in future work. Other possibilities can be to use the proposal as an up-sampling tool in image scale pyramids.
Conclusions
We have evaluated our implementation of AMLE and bAMLE in the KITTI depth completion suite, using different metrics and color spaces for the reference image. The combination of a metric based on the square root (
