Abstract
The traditional point cloud alignment method suffers from drawbacks such as a large extensive computation, low computing speed, and poor alignment accuracy. To overcome these problems, this paper proposes a fast and highly accurate algorithm based on fast point feature histograms (FPFH) algorithm and spatial constraints. The proposed algorithm first filters, denoises the point cloud dataset, and calculates the point cloud normal to obtain the FPFH eigenvalue. Then, the vertebral space is divided into three regions according to its location, and the feature points in each region are calculated. The Euclidean distance between a feature point and the boundary of the adjacent region, and the weight coefficient corresponding to the feature point are given according to the calculated distance. The method overcomes the defect that the query feature point has a large workload in the traditional ICP algorithm and improves the registration precision of the point cloud. The experimental results show that the proposed method effectively solves the problems of the traditional point cloud registration algorithm, can effectively reduce the mismatch rate of point cloud registration, and can improve the registration accuracy and stability without reducing the registration of the elements.
Keywords
Introduction
Computer vision [1–3], also known as machine vision, is a technique for analyzing visual information to identify objects or to find key features of the data and is commonly used in object tracking, monitoring and statistical systems, gestures and face recognition, automatic robot navigation, and other fields.
The most important application of machine vision is 3D reconstruction [4–6], which can obtain 3D models of local or global scenes. The practical applications of 3D reconstruction include different areas of human activity, and even some nontechnical areas may overlap. For example, in the medical field, doctors can perform 3D reconstruction of the lesions in the patient’s body using ultrasound tomographic images, for accurate diagnosis; in the preservation of historically valuable cultural relics such as buildings and statues, 3D reconstruction can be used. Ways to digitally manage it; in object models in movies, or products or product advertisements in video games, 3D reconstruction can create real objects and personas easier, creating a real-world 3D model that can be replaced by manual creation. In the factory or agricultural park, product quality can be controlled. In robotics, a 3D scanner can provide an environmental map for the robot to help the robot locate and move in an unknown space.
The key issue in registration is to find the coordinate transformation matrix. At present, the point cloud registration technology is still immature, and the fast and accurate point cloud registration method is based on research hotspots and difficulties in the field of 3D modeling and reverse engineering. The most classic algorithms are the iterative closest point (ICP) algorithm [7, 8] proposed by Beal and Mckay and its improved algorithm. These two algorithms iteratively search for the nearest point between two point clouds to establish the corresponding matching relationship. However, when the difference between the initial position and the point clouds is large, the search process is quite time-consuming and will fall into a local extremum, and thus the algorithm has a longer calculation time and is less efficient for dense point cloud data.
To this end, this paper proposes an improved 3D ICP matching algorithm based on the improved 3D ICP matching algorithm, which divides the camera’s vertebral body region and uses the Euclidean distance as the characteristic of each region. Each point is given an appropriate weight coefficient, which is used in the mean square error function to solve the problem, effectively improving the quality of the 3D reconstruction obtained by the continuous multiframe image after point cloud registration. The point cloud data collected by the 3D camera were tested and an ideal result was obtained.
Methodology
Fast Point Feature Histograms (FPFH)
The FPFH [9, 10] algorithm is an improved version of the original method, namely the point feature histograms or persistent feature histograms (PFH) algorithm. PFH are informative pose-invariant local features that represent the underlying surface model properties at a point D. Their computation is based on a combination of certain geometrical relations between the nearest k neighbors of D. Figure 1 shows an influence region diagram of PFH computation for a query point D q , which is placed in the middle of a circle (sphere in the 3D case) with radius r, and all its k neighbors (points with distances smaller than r) are fully interconnected in a mesh.

Neighborhood of PFH.
The key concept underlying the PFH algorithm is to calculate the angle and positional relationship between the corresponding normal vector pair (n
t
, n
s
) for each point pair (D
t
, D
s
) in the neighborhood of D
q
. The UVW local coordinate system shown in Fig. 2 is usually created using one of these two points (for example, D
t
) as the source, and the following parameters are calculated:
Relationship of the features of two points.
The main steps to calculate the PFH of D q are as follows:
For each point D q , all its neighbors enclosed in a sphere with a given radius r are selected (k-neighborhood).
For every pair of points (D t , D s ) in the k-neighborhood of D q and their estimated normals (n t , n s ) (Dt being the point with a smaller angle between its associated normal and the line connecting the points), we define a Darbouxuvw frame and compute the angular variations of (n t , n s ).
The α, β, θ values are divided into i intervals. Then, we have a total of i3 combinations of (α, β, θ).
A certain point falls into different i3 intervals according to its α, β, θ values and uses different distributions of different points in the intervals as its characteristic description.
The FPFH algorithm reduces not only the amount of calculation but also the time complexity of PFH from O (nk2) to O (nk). It increases the range of neighborhood selection. Its k-neighborhood is shown in Fig. 3. The radius of the neighborhood is a maximum of 2r, including the k-neighborhood of the sample point itself and the neighborhood of all the points in the neighborhood. The synthesized histograms are simplified by decomposing the triples.

Neighborhood of FPFH.
The FPFH k-neighborhood of D
q
is shown in Fig. 3. For D
q
, the FPFH calculation process is as follows: For each query point D
q
, we compute only the relationships between itself and its neighbors. We will call these the simplified point feature histograms (SPFH). For each point, we re-determine its k neighbors and use the neighboring SPFH values to weight the final histogram of D
q
(called FPFH).
When we process scene information captured by a camera, we usually focus on point cloud data that can represent the scene. The computer binarizes the real-world information it captures, where 1 represents the space occupied by the object in the 3D space and 0 represents the remaining space. However, with the information obtained by a depth camera, we can encode the 0 space for additional information. When the depth camera captures the point cloud information of an object, the light reflected by the object will be on the same line as the light emitted by the camera sensor, which means that there is no other object in this part of the 3D space. This information allows us to increase the speed and accuracy of reconstruction. We convert the original binarized representation to a ternary representation— that is, the space represented by 0 depends on whether or not there is an re-scratched object— and the part where the object is not known to exist is called a negative space.
The vertebral body denotes the computer graphics that represent the space observed by the camera in the scene, usually in the form of a pyramid. In theory, the area of this pyramid is infinite, but in practical applications, owing to technical limitations, 3D camras have limited observation distances and can only capture image information in a limited space as shown in Fig. 4.

Two-dimensional image of the camera’s vertebral body.
As shown in the Fig. 5, we divide the negative spatial information, that is, the spatial region represented by 0 in binarization into three categories: Shadows, which represent areas of the space that block the light behind the object being detected. Holes, which represent areas of the space where no objects are detected. Negative volume, the area where there is no object between the camera and the detected object.

Classification of unknown space.
The key step in cloud registration is to find the transformation matrix. Two point cloud data sets P L and P R are known, and the objective is to find the transformation matrix T such that P L and P R are aligned. First put P L and P R into the same frame of reference, and find the corresponding point iteratively, so that P L gradually approaches P R until the two point clouds are aligned. The most ideal case for registration is that each point of P L exactly coincides with a corresponding point in P R . Before full registration, there will be some points that cannot be aligned, that is, these points exist in the negative space of P R .
In the standard binarization method of space, the position of the point does not provide additional information. However, after introducing the concept of negative space, we can think that the point in the space should not exist in the negative space because this part should be blank. Therefore, we can try to eliminate these points from the negative space. The deeper the point is in the area, the more important it is to remove it from its current location. To accurately define the “depth” of a point in a negative space, a new concept, the negative boundary distance (NBD) is defined, which is the distance of the point p from the boundary in the negative space V.
The introduction of negative space does not affect the progress of the corresponding point search: it only changes the weight of each corresponding point.
The absolute direction problem without weight can be calculated by the least squares iteration to calculate the optimal coordinate transformation, namely the rotation matrix and the translation vector, so that the error function is minimal:
Which
After introducing a negative space constraint, the weight coefficient of the corresponding point will be brought to the error function:
Which ωi > 0 ∀1 ⩽ i ⩽ k
P
L
, P
R
, and W are known to try to get the conversion relationship by solving the minimized equation T:
The Arunmethod [13] is improved by SVD decomposition. First, the partial differential solution is solved to obtain the Equation (13):
Let
Bring the
We find that the translation matrix of Equation (16) has been replaced by a rotation matrix, then we only need to solve the rotation matrix, let
Using the matrix relationship, where
The rotation matrix R can be obtained by the SVD decomposition by the Equation (18), thereby further calculating the translation matrix
In general, a negative space appears as a polyhedron with planes, straight edges, sharp corners, and vertices in three dimensions. Calculating NBD is essentially finding the face closest to a point in the polyhedron. However, such problems are too complicated to obtain the expected NBD value. At the same time, the construction of the polyhedron will reduce the reconstruction process. To this end, we propose three assumptions: The point in the point cloud and the point on the boundary of the vertebral body define a negative space boundary. The vertebral body boundary is a plane in the 3D space, and the point cloud is formed by pixel depths stored in a depth map (a 2D matrix with finite value range values). This can greatly simplify the problem. Point cloud data are represented by discrete points rather than continuous 3D objects. Whether the algorithm is effective will depend on the specific method. The NBD value does not have to be calculated accurately. It is only the weight of the corresponding point, and the corresponding point itself does not directly depend on the NBD value. The problem can be further simplified after allowing for an approximate solution.
The point cloud data P L and P R are registered; that is, the point corresponding to P L is found in P R . Therefore, we want to calculate the NBD of the point set in P L for the negative space formed by the point cloud P R . We will solve this problem by using a 2D depth map instead of a 3D image. Because the properties of the camera are known, it is easy to determine the position (at which pixel) of any 3D point on the depth map and the depth corresponding to that pixel.
First, we need to find whether the point is in the negative space. If the point satisfies one of the following conditions, it is not in the negative space: The point exceeded the boundary of the vertebral body. In this case, it will be outside of the depth map, which means that the calculated pixel position will exceed its boundary. The point is located in the hole; then it will be on a pixel at an undetermined depth. The point is behind.
The depth value calculated by the corresponding point in these cases will be greater than the depth of the position in the original image.
After eliminating the points outside of the negative space, it is defined as the point on the boundary between the negative space and its surroundings, so that the distance between them can be calculated. The boundary belongs to one of the following four situations: Vertebral body boundary The set of points in the boundary between the negative cloud and the occluded portion of the space Hole boundary, which is a part between the hole and the negative space A shadow boundary that divides a portion of the shadow and the negative volume space
When calculating the (approximate) NBD, we can check all of these cases and calculate the minimum distance as the NBD value.
A. Sharp vertebral body boundary. You only need to calculate the distance from the point to the 3D plane.
B. The distance to the point cloud. Add the points in the kd tree to perform a nearest neighbor search. Both this and the previous classes give accurate results.
C. The distance to the hole boundary. Find the nearest hole (pixel with missing depth data) on the depth map to the pixel position solution. Again, you can use the kd tree. The specific steps are as follows: Find the pixel with the undetermined depth value closest to the location. Calculate the set of points formed if the hole pixel has some arbitrary positive depth value. Consider the line formed by the point set and the camera origin. This line represents the “beam” that starts at the camera sensor and travels along the hole. Calculate the distance to and from the hole in the 3D space.
The nearest hole found does not have to be the boundary of the negative space.
The results obtained in such cases are approximate. Because we search for the nearest hole boundary in the projected 2D space, the boundary does not retain the distance of the original 3D space. Figure 4.6 shows an example of a 1D case that can lead to inexact solutions. The approximation factor of the result depends on the camera’s viewing angle and is equal to. However, this is only an extreme case and the inaccuracy expected in practice is small.
D. The distance to the shadow boundary. This is the most complicated case. First, all boundaries must be determined. This can be accomplished by examining the depth map of adjacent pixels with sufficient depth differences. Check all adjacent pixels in the vertical and horizontal directions and compare their depth values. To consider missing depth values that may appear in the data, a specified number of pixels without a specified depth may be skipped when checking the corresponding points, allowing boundaries to be created between pixels that are not directly corresponding to the points. The boundary formed between the pixels includes the position of the boundary on the depth map and the depth range of the coverage. When searching for the nearest shadow boundary of a point, the boundary position in the depth map can be searched in a manner similar to the previous hole. However, the boundary found may be independent of the point because its depth may be different from the depth range of the boundary. Hence, the range buckets are introduced. These buckets divide the space into multiple subranges along the depth axis. The depth range of the boundary stored by each bucket intersects the range covered by the bucket (see, for example, Fig. 7). Each bucket then organizes its boundaries again in the kd tree in a similar manner as before, allowing for an efficient search for the nearest boundary in the depth map. However, the segment itself does not ensure that the boundary found is within the range of inclusion points. Because multiple boundaries are found, we are allowed to iterate over multiple boundaries until we find a boundary with the appropriate depth range. In the worst case, many boundaries must be queried to find the closest boundary, as shown in Fig. 8. However, in practice, this is rare and the appropriate minimum depth difference between adjacent pixels, the number of buckets, and the number of nearest boundaries examined can be almost completely eliminated. The specific methods can be summarized as follows: Detect the boundary between adjacent pixels of the depth map and iterate the horizontal and vertical corresponding points using some minimum depth difference required to form the boundary. A few pixels can be skipped to avoid losing the depth data, thus breaking boundary detection. All detected boundaries are assembled and added to their respective buckets. Boundaries can be added to multiple buckets based on their depth range: the larger the range, the more buckets it can cover. The boundaries in the bucket are stored in the kd tree according to their depth map locations, allowing an efficient search for the closest depth map location. To find the closest shadow boundary to the point, select the appropriate bucket based on the depth of the point. Then consider the depth map location to search for the nearest boundary. Check each boundary. If the depth of the point is within the depth of the boundary, check the distance to line AB, where A and B are the 3D points that originally formed the boundary. Finally, mark the compatible boundary closest to the point as the closest shadow boundary.

A distance map to four types of space.

Divided into equal depth buckets.

Algorithm flow chart.
Start with a simple breadth-first search from each pixel with an unknown depth. If the connection area of these pixels is small enough to be considered an error, the area is marked. Additional conditions can be used to allow marking of larger areas as their shape is stretched.
Once the NBD is found, it can be used for the corresponding point weights in the ICP process. However, the distance values themselves are not suitable for weighting, and among many different functions, functions that grow rapidly for smaller values and slower for larger values have proved to be the most effective. Therefore, a square root function is used, where the default value of 1 is represented as a correspondence with points other than the negative space and is represented as a larger value when the point is in the negative space.
The algorithm proposed in this paper is divided into two parts, feature extraction and point cloud registration. First, use the Kinect depth camera to obtain the 3D depth of field data and perform point cloud preprocessing to eliminate noise points in point cloud data arising from equipment accuracy and environmental factors. The feature extraction link adopts the FPFH algorithm, first obtaining the normal information of the 3D point cloud and then using its normal information to extract the FPFH features of the point cloud to find the feature points in the continuous frame image. The point cloud registration link first uses the random sampling consistency (RANSAC algorithm [11, 14]) to obtain the initial transformation matrix of the point cloud and performs an improved ICP precise registration based on the rough matching. Finally, the greedy projection triangulation algorithm is used to triangulate the point cloud to obtain the reconstructed surface [15–18].
The registration process outlined in this paper is shown in Fig. 6. The key to the registration process is the extraction of feature points and the calculation of feature descriptors to obtain a more accurate initial registration position to ensure the registration speed and accuracy of the subsequent operations.
Results and discussion
To facilitate the analysis, the indoor algorithm is used to reconstruct the indoor scene in three dimensions and compared with the traditional reconstruction algorithm. The experimental environment is a CPU i7-6700HQ 2.6 GHz, 8 GB memory, Windows 10 system, the simulation platform is Visual Studio 2013, and the image acquisition device is the Kinect 1.0 depth camera developed by Microsoft. The reconstruction results and analysis are as follows.
Figure 9 shows the four-frame continuous picture acquired by the Kinect 1.0 camera. Figure 10 shows the results of the feature matching of the ICP algorithm before and after the two improvements to the indoor scene. The mismatch rate of the traditional ICP algorithm can be clearly seen. The improved algorithm adopted in this paper effectively improves the accuracy of point cloud registration.

Continuous four-frame image acquired by Kinect.

Point cloud alignment with different algorithms.
Figure 9 is a comparison of point cloud registration using several algorithms. Among them, a is the traditional ICP to directly register the point cloud dataset, b is the ICP cloud registration result graph after extracting the feature points in the literature [15], and c is the registration map using the algorithm of this paper. The time taken by the three algorithms is shown in Table 1. From the experimental results and the table, we can see that traditional ICP cannot directly process point cloud data for large volumes of data, while the algorithm outlined in [15] processed a larger data set. Despite the improvement, shortcomings such as inaccurate point cloud registration and high mismatch rate still exist. The registration result is far from the requirement of the subsequent scene reconstruction. The time spent in the algorithm is not much different from that in [15], and the improved ICP point cloud registration algorithm can obtain a more accurate image reconstruction, which effectively improves the accuracy of registration.
Experimental comparisons
it can be seen from Fig. 11 that the 3D indoor scene reconstruction map obtained by the algorithm exhibits better results at various angles.

Multi-angle indoor scene reconstructed by this algorithm.
With the development of the computer vision technology and the improvement at the hardware level, the acquisition of 3D point cloud data has become easier. We present a sophisticated approach to 3D reconstruction of a scene using a handheld free-moving camera. By introducing a spatial constraint, that is, dividing the camera’s vertebral body region, and assigning different weight values to points of different regions, better registration information of a single frame can be achieved. The algorithm adopted in this paper helps to improve the reconstruction process. In the precise registration stage of the point cloud, the algorithm shows fewer mismatches and improves the registration accuracy, which has practical application values.
The point cloud registration outlined in this paper can be performed for images from different scenarios, and the speed and timeliness of real-time reconstruction of multiple images can be further studied for further improving the algorithm.
