PU-FPG: Point cloud upsampling via form preserving graph convolutional networks

Abstract

Point cloud upsampling can improve the resolutions of point clouds and maintain the forms of point clouds, which has attracted more and more attention in recent years. However, upsampling networks sometimes generate point clouds with unclear contours and deficient topological structures, i.e., the problem of insufficient form fidelity of upsampled point clouds. This paper focuses on the above problem. Firstly, we manage to find the points located at contours or sparse positions of point clouds, i.e., the form describers, and make them multiply correctly. To this end, 3 statistics of points, i.e., local coordinate difference, local normal difference and describing index, are designed to estimate the form describers of the point clouds and rectify the feature aggregation of them with reliable neighboring features. Secondly, we divide points into disjoint levels according to the above statistics and apply K nearest neighbors algorithm to the points of different levels respectively to build an accurate graph. Finally, cascaded networks and graph information are fused and added to the feature aggregation so that the network can learn the topology of objects deeply, enhancing the perception of model toward graph information. Our upsampling model PU-FPG is obtained by combining these 3 parts with upsampling networks. We conduct abundant experiments on PU1K dataset and Semantic3D dataset, comparing the upsampling effects of PU-FPG and previous works in multiple metrics. Compared with the baseline model, the Chamfer distance, the Hausdorff distance and the point-to-surface distance of PU-FPG are reduced by 0.159 × 10^-3, 2.892 × 10^-3 and 0.852 × 10^-3, respectively. This shows that PU-FPG can improve the form fidelity and raise the quality of upsampled point clouds effectively. Our code is publicly available at https://github.com/SATURN2021/PU-FPG.

Keywords

Point cloud upsampling convolutional networks completion

1 Introduction

Point clouds are the most basic and common representations of 3D scenes, which are used in 3D reconstruction [1–4], automatic driving [5, 6], geological and architectural surveying and other fields widely. However, due to hardware and computational limitations, 3D sensors often produce sparse and noisy point clouds, which are more pronounced for small objects or objects that are far away from the sensor. This brings a massive challenge to subsequent works on point clouds, such as semantic segmentation and object detection. As a task that can improve the resolutions of point clouds, point cloud upsampling converts sparse, incomplete and noisy point clouds into dense, complete and clean point clouds under the premise of retaining the geometry, topology, shape and other characteristics of the point clouds. In recent years, point cloud upsampling has become one of the hot issues in the research direction of point cloud.

Traditional point cloud upsampling algorithms were based on optimization [7–10], which fitted local geometries generally and worked well for smooth surfaces with few features. However, they required various prior information about shapes to constrain the generation of point clouds. Therefore, their methods were less practical. PointNet [11] and PointNet++ [12] demonstrated the feasibility and effectiveness of processing point clouds with deep neural networks. The point cloud upsampling methods based on deep neural networks learned features and structures from data effectively, and required less prior information of data, improving the practicability of point cloud upsampling significantly. In 2018, Yu et al. drew on the multi-layer and multi-scale encoding architecture in PointNet++to propose the first point cloud upsampling network PU-Net [13], which extended the point set implicitly through the multi-branch convolution units in feature space. They did not consider neighboring information however. In 2019, Yifan et al. proposed a patch-based upsampling network MPU [14], which gradually trained cascades end-to-end at different levels of detail. They achieved super-large point cloud upsampling through a series of architecture improvements, including the new dense connection for point-by-point feature extraction, the code allocation for feature expansion and the bilateral feature interpolation for inter-layer feature propagation. However, the feature vectors of the copied points led to the duplication of point set, making generated points gather around the input points. In the same year, Li et al. proposed the point cloud upsampling model PU-GAN based on the generative adversarial network (GAN) [15]. They not only constructed an up-down-up expansion unit in the generator for upsampling point features with error feedback and self-correction, but introduced a self-focus unit to enhance feature aggregation. In addition, they proposed a compound loss that included antagonism, uniformity and reconstruction to encourage the discriminator to learn more potential patterns, improving the uniformity of output points distribution. Their main contributions and performance advantages came from the discriminator, while they paid less attention to the generator. In 2020, Qian et al. incorporated discrete differential geometry into point cloud upsampling and proposed PUGeo-Net [16]. This model generated the upsampling network jointly that encrypted coordinates and normals of point clouds. It used local differential geometry constraints to improve the upsampling results and increased the upsampling points by learning the first and second basic forms of local geometry. However, their approach required additional supervision in the form of normal, which were difficult to obtain in many point clouds. In 2021, Qian et al. proposed a point cloud upsampling model PU-GCN [17] based on graph convolutional networks (GCNs), which regarded the point cloud as a graph and the points as nodes in the graph to utilize topological information of the point clouds. This model combined GCNs with Inception module to extract multi-scale graph structure information of the point clouds. PU-GCN also proposed a new reorganization mechanism NodeShuffle for feature expansion, which could better encode information from neighboring nodes during reorganization. However, PU-GCN fused the same number of neighbor features and did not distinguish them during feature aggregation, resulting in incorrect and biased semantic transmission. PU-GCN also used unified K nearest neighbors (KNN) algorithm, managing to fit the potential graph structure of the point clouds. However, KNN did not fit well on complex point clouds. In addition, the spatial-domain GCNs that had low utilization rate of graph structure information were used in PU-GCN for point feature aggregation. In 2022, Long et al. explored the correlation between patch-to-patch and point-to-point of point clouds, proposing the point cloud upsampling model PC2-PU [18]. This model took adjacent point clouds as supplementary input to compensate for the loss of structural information within a single point cloud. They also introduced a patch correlation module to capture the differences and similarities among patches.

Previous learning-based point cloud upsampling methods failed to pay attention to or solve the problem of the upsampled point clouds with incorrect and unclear contours and boundaries as well as deficient topology structures. These abnormal semantic features will affect the utilization value of point clouds profoundly. This paper focuses on the above problem, i.e., the problem of insufficient form fidelity of the upsampled point clouds. Firstly, local coordinate difference, local normal difference and describing index are designed to estimate the form describers of point clouds and further fuse more reliable features, so as to make the form describers multiply in the vicinity based on correct semantic information. Secondly, multi-level KNN graph building suitable for complex point clouds is proposed to replace the traditional unified KNN to build more realistic and accurate graph structures of the point clouds. Finally, the spectral-domain GCNs applicable to directed graphs are cascaded to enhance the perception of model toward graph information in the process of feature aggregation.

Our upsampling model PU-FPG is obtained by combining these 3 parts with point cloud upsampling networks. Experimental results show that our model can improve the form fidelity of the upsampled point clouds effectively. In addition, PU-FPG has significantly improved in various metrics. Our main contributions are as follows:

We design local coordinate difference, local normal difference and describing index to estimate the form describers of the point clouds, rectifying the feature aggregation of the form describers with more reliable and relevant neighboring information.

We design multi-level KNN graph building for complex point clouds to gain more realistic and accurate graph structures of the point clouds.

We use additional spectral-domain GCNs which are adapted to directed graphs to enhance the perception of model toward graph information in the process of feature aggregation.

Our upsampling model PU-FPG is obtained by combining these 3 parts with point cloud upsampling networks. Experiments on PU1K dataset and Semantic3D dataset show that our model can improve the form fidelity of upsampled point clouds effectively.

2 Related works

2.1 Learning-based point cloud upsampling

The learning-based point cloud upsampling methods learned features and structures from data effectively, and had low prior requirements on data, promoting the development of point cloud upsampling significantly and expanding application scenarios of point cloud upsampling.

In 2018, Yu et al. proposed the first point cloud upsampling network PU-Net [13], which mapped the input point clouds to feature space to obtain feature vectors. Then, they expanded the dimensions of the feature vectors and transformed redundant dimensions into points through reorganization, realizing the expansion of point set. Finally, they mapped the point set in the feature space to 3D Euclidean space, achieving the point cloud upsampling at a fixed magnification. The network structure of PU-Net was relatively simple and reasonable. Though PU-Net ignored neighboring information and had unnecessary resolution loss due to the down-sampling operation, the framework of PU-Net, i.e., feature extraction, feature expansion and coordinate reconstruction, were widely used in learning-based point cloud upsampling models.

In the same year, Yu et al. proposed the first edge-aware merging network EC-Net [19] to solve the problem of insufficient sensitivity of the network to edges. By designing a regression component that simultaneously returned point coordinates and point-to-edge distances as well as an edge-aware joint loss function, the network could focus on the detected sharp edges and achieve more accurate upsampling. Though EC-Net focused on the edges and proposed an effective model, the marked edges it required were often difficult to obtain.

In 2019, Yifan et al. proposed a progressive network MPU [14], which replicated input point patches in multiple steps so as to achieve super-large point cloud upsampling and further suppress noise and retain details. However, MPU copied the feature vectors of the points, resulting in the duplication of the features of the point set and making the generated points gather around the input points. In addition, the computing consumption of MPU was expensive due to the gradual nature of it.

In the same year, Li et al. proposed the first GAN-based point cloud upsampling model PU-GAN [15], proving that the geometric structures of sparse points were easily lost. PU-GAN constructed an up-down-up expansion unit in the generator to upsample point features with error feedback and self-correction. They also introduced a self-focus unit to enhance feature aggregation. Their main contributions and performance advantages came from discriminators, while they paid less attention to generators.

In the same year, Ye et al. proposed Meta-PU [20] that supported point cloud upsampling with any scale factor using a single model. They used residual plot convolution blocks to form a backbone network and learned a meta-subnet to adjust the weight of residual plot convolution blocks dynamically. The farthest sampling block was used to sample different number of points, realizing the point cloud upsampling with any scale factor.

In 2020, Qian et al. incorporated discrete differential geometry into point cloud upsampling to propose the point cloud upsampling model PUGeo-Net [16], learning the local parameterization and normal direction of each point. Specifically, PUGeo-Net generated dense points in 2D parameter domain at first. Then, they projected the 2D points into 3D space using linear transformation. Finally, they twisted the tangent plane to obtain 3D points on the object surface. PUGeo-Net also realized the regression of normals of the points. However, additional supervision in the form of normal required in PUGeo-Net were difficult to obtain.

In 2021, Li et al. considered the point cloud upsampling as a multi-target task. They proposed the point cloud upsampling network Dis-PU [21] with 2 cascaded subnetworks, i.e., dense generator and space refiner. The dense generator inferred a rough but dense output that described the potential surface roughly. The space refiner further fine-tuned the rough output by adjusting the position of each point.

In 2022, Long et al. explored the correlation between patch-to-patch and point-to-point of the point clouds. They proposed the point cloud upsampling model PC2-PU [18], which took adjacent point clouds as supplementary input to compensate for the loss of structural information within a single point cloud. They also introduced a patch correlation module to capture the differences and similarities among patches.

In the same year, Qian et al. proposed a new end-to-end learning-based point cloud upsampling framework MAFU [22], which analyzed the point cloud upsampling problem from the perspective of formularization and attributed the goal of the problem to determining the interpolation weight and higher-order approximation error explicitly. MAFU designed a lightweight neural network, which could adaptively learn the interpolation weight and high-order refinement of the unified and sorted interpolation by analyzing the local geometric structures of the input point clouds.

In the same year, Feng et al. proposed a new point cloud representation method Neural Points [23] and achieved arbitrary-factored upsampling. The model extracted deep local features of the points and constructed the neural field through the local isomorphism between the 2D parameter domain and the 3D local point cloud patch. Finally, local neural fields were integrated to form a global surface. The point clouds can be upsampled at any magnification by down-sampling on the learned global surface.

Previous works including the above managed to solve different problems in point cloud upsampling. MPU [14] and Dis-PU [21] introduced cascaded networks to solve the problem that it was difficult for a single level network to achieve multiple goals. However, their cascaded networks were neither flexible nor easy to train [24]. PU-GAN [15] and PUFA-GAN [25] considered learning the point clouds by the powerful modeling ability of GAN when the point clouds were uneven and sparse. However, GAN was difficult to train and had poor interpretability. Neural Points [23] and FSU [26] achieved arbitrary-factored upsampling to improve the flexibility of trained models when multiple upsampling rates were required. However, the deviation of their models was inevitable because of the finite numbers of 2D planes when fitting curved surfaces. L2G-AE [27] and SPU-Net [28] tried to construct self-supervised point cloud upsampling models to meet the challenge that it was hard to obtain real-scanned dense point clouds with high quality as the ground truth supervision at times. However, the effects of their models were influenced by the inconsistency of scales between training data and testing data.

We summarize the existing learning-based point cloud upsampling models roughly in Table 1, where the GCN-based models will be introduced in section 2.3. Note that existing models failed to pay attention to or solve the problem of the upsampled point clouds with incorrect and unclear contours and boundaries as well as deficient topology structures, i.e., the problem of insufficient form fidelity of the upsampled point clouds. Therefore, we propose the point cloud upsampling model PU-FPG to solve this problem. By designing local coordinate difference, local normal difference and describing index, we rectify the feature aggregation with more reliable and relevant neighboring information, improving the quality and practicality of upsampled point clouds.

Table 1
Summary of existing learning-based point cloud upsampling models

Categories Models Advantages Limitations

In the PU-Net framework PU-Net, EC-Net, PUGeo-Net Simple and reasonable Inflexible and poor local info

Cascades-based MPU, Dis-PU Available for multiple goals Hard to train

GAN-based PU-GAN, PUFA-GAN Strong modeling ability Poor interpretability

Arbitrary-factored Meta-PU, Neural Points, FSU Flexible for different rates Biased due to the fitting process

Self-supervised L2G-AE, SPU-Net Effective without ground truth Biased due to the varying data

GCN-based AR-GCN, PU-GCN, Meta-PU Powerful feature aggregation Biased due to the same weights

Categories	Models	Advantages	Limitations
In the PU-Net framework	PU-Net, EC-Net, PUGeo-Net	Simple and reasonable	Inflexible and poor local info
Cascades-based	MPU, Dis-PU	Available for multiple goals	Hard to train
GAN-based	PU-GAN, PUFA-GAN	Strong modeling ability	Poor interpretability
Arbitrary-factored	Meta-PU, Neural Points, FSU	Flexible for different rates	Biased due to the fitting process
Self-supervised	L2G-AE, SPU-Net	Effective without ground truth	Biased due to the varying data
GCN-based	AR-GCN, PU-GCN, Meta-PU	Powerful feature aggregation	Biased due to the same weights

2.2 Graph convolutional neural networks

Although graph data is a kind of non-Euclidean data, it generally exists in real life and can naturally express the data structures, such as transportation network, World Wide Web and social network. Point clouds have potential graph structures which can be obtained approximately by supervised learning methods such as GraphVAE [29] and MolGAN [30], and unsupervised methods such as unified KNN. The graph structures of point clouds can truly reflect the topological structures of objects in an ideal situation. The local structure of each node in the graph data is different, so the translation invariance is no longer satisfied, making it very challenging to define convolutional neural networks on the graph data. In order to process and utilize these graph data, many graph convolutional networks have been proposed. In the construction of convolution kernel, graph convolutional networks can be divided into spectral-domain GCNs (based on convolution theorem) and spatial-domain GCNs (based on aggregation function).

In spectral-domain GCNs, Spectral Networks [31] solved the problem of convolution operation on the graph and the problem of computing the time complexity of convolution operation. However, Spectral Networks encountered the problems of long time-consuming spectral decomposition of Laplace matrix and lack of local information. ChebyNet [32] used polynomials to link the convolution kernel with the eigenvalues of the Laplace matrix, avoiding the spectral decomposition of the Laplace matrix. ChebyNet reduced the number of parameters to be learned, achieving multilevel local distance preservation. However, it was difficult to choose the polynomial order in ChebyNet. Besides, the problem of high time complexity caused by large-scale matrix multiplication was also intractable. Kipf et al. [33] simplified the calculation formula greatly through multiple approximations and avoided the original large-scale matrix multiplication operation, enabling it to learn multi-hop information of nodes flexibly by increasing the network depth. However, the increase in the number of network layers in [33] might lead to over-smoothing. In addition, although many approximations were applied to simplify the calculation, they also led to inadequate utilization of some adjacency relations and graph structure information. The spectral-domain GCNs also had strong constraints on the graph, i.e., the graph must be undirected. This constraint stemmed from the fact that the spectral-domain GCNs could only construct the Laplace matrix through the symmetric adjacency matrix, which limited the application of spectral-domain GCNs in many aspects. In the potential graph structures of point clouds, edges not only represent the connection among nodes, but mean the direction of information aggregation. In general, the aggregation of point cloud nodes information is not bidirectional, therefore the potential graph structures of point clouds should be directed graphs. Although the directed graph can be relaxed into an undirected graph and then inputted into the spectral-domain GCNs, the directed information and some degree information of the graph will be lost, resulting in distorted output results. DGCN [34] solved the problem that the directed graph could not use the spectral-domain GCNs by transforming the adjacency matrix of the directed graph into 3 symmetric adjacency matrices, preserving the directional information in the graph to a large extent at the same time.

In the spatial-domain GCNs, GraphSAGE [35] expanded GCNs into inductive learning tasks by training the function of aggregating node neighbors, playing a generalization role for unknown nodes. DGCNN [36] designed EdgeConv module to perform dynamic image convolution on point clouds. DeepGCNs [37] not only introduced residual connection, dense connection and expansion convolution into GCNs, but trained the deep GCNs architecture successfully. The spatial-domain GCNs allowed the input of directed graphs, but its utilization of the information of the graph structures was lower than that of the spectral-domain GCNs.

We combine additional spectral-domain GCNs with original spatial-domain GCNs to adapt directed graphs and enhance the perception of model toward graph information in the process of feature aggregation.

2.3 Point cloud upsampling based on graph convolutional networks

Due to the powerful ability encoding local and global information, GCNs had been widely used in learning-based point cloud processing tasks, such as classification, segmentation and upsampling [38–40].

In 2019, Wu et al. proposed a data-driven point cloud upsampling network AR-GCN [41] based on graph network and countermeasure loss. They used the local similarity of point clouds and the similarity between low resolution input and high-resolution output. They also used a new loss function which combined Chamfer distance (CD) and image confrontation loss to realize the automatic capture of high-resolution point cloud features.

In 2021, Qian et al. proposed the point cloud upsampling model PU-GCN [17] based on graph convolutional networks. They regarded the point cloud as a graph and the points as the nodes in the graph to utilize the topological information of the point clouds. Besides, they combined GCNs with Inception module to extract multi-scale graph structure information of point clouds. They also proposed a new reorganization mechanism NodeShuffle for feature expansion, which could better encode information from neighboring nodes during reorganization.

Existing GCN-based point cloud upsampling models adopted unified KNN which ignored the difference among neighboring points to build the graph of the point cloud, resulting in the fuzzy semantic information. This problem became more severe when point clouds were complex. We design multi-level KNN graph building for complex point clouds to obtain more realistic and accurate graph structures of point clouds, realizing a more effective feature aggregation.

3 Methodology

In this paper, we focus on the problem of the upsampled point clouds with incorrect and unclear contours and boundaries as well as deficient topological structures, i.e., the problem of insufficient form fidelity of the upsampled point clouds. Local coordinate difference, local normal difference and describing index of points are designed to search for the form describers, i.e., the points located at edges, contours or sparse locations, so as to rectify the feature aggregation of form describers with more reliable neighboring features. This constitutes our rectification network based on form describer (RNFD). Multilevel KNN graph building (MKGB) is used to replace the traditional unified KNN on complex point clouds to build more complex, real and accurate graph structures of point clouds, so that the model can learn more accurate topological structures of objects according to the built graphs. Graph aware feature aggregation (GAFA) enhances the perception of model toward graph information in the process of feature aggregation by using additional spectral-domain GCNs which are suitable for directed graphs, enabling the model to fully learn semantic information and topological information from more complex graph structures. Our upsampling model PU-FPG is obtained by combining these 3 parts with point cloud upsampling networks.

The model structure of PU-FPG is shown in Fig. 1. RNFD finds the form describers of the input point clouds accurately and use more reliable neighboring features in the process of feature aggregation to obtain the extended and rectified feature N × C₀. The rectified feature N × C₀ is used to rectify the feature learned by the original graph convolutional networks, where ⊕ in the figure represents the concatenation of feature dimensions. In addition, MKGB can get the complex and real graph structure of the point clouds approximately. Then, GAFA learns more detailed topological information N × C₂ of the input point clouds from the graph structure. The rectified feature N × (C₀ + C₁) is inputted into Inception DenseGCN [17] to extract multi-layer and multi-scale feature. The result from Inception DenseGCN and topological feature acquired by GAFA are fused to obtain N × C₃. Subsequently, the fused feature of the points is used by Nodeshuffle to reorganize and expand the point set to obtain the extended point set rN × C₃ in the feature space. Finally, multilayer perceptrons (MLPs) are used as coordinate regressor to achieve point cloud upsampling in 3D Euclidean space.

Fig. 1

Structure of PU-FPG.

3.1 Rectification network based on form describer

The form describers of point clouds refer to those points that are roughly located on the sharp edges, contours or extremely sparse positions of objects, which can describe the boundaries and topological information of point clouds accurately. During point cloud upsampling, points tend to multiply around themselves to obtain upsampled points according to the feature representations of them. However, points fuse the features of the same number of neighboring nodes and do not distinguish these features in the process of feature aggregation. As a result, a large number of incorrect and irrelevant features are fused in the process of feature aggregation of the form describers. The form describers multiply around themselves based on incorrect semantic information finally, leading to insufficient form fidelity of upsampled point clouds. Therefore, it is necessary for us to search for the form describers of point clouds and rectify the process of feature aggregation of them to fuse correct and relevant semantic information.

In this paper, we define 3 statistics of points, i.e., local coordinate difference (LCD), local normal difference (LND) and describing index (DI), to realize form describer searching (FDS) of point clouds. These statistics can reflect information such as the positions and sparseness of points in the point clouds to a certain extent.

Definition 1. Given a point cloud P = {p₁, p₂, ⋯ , p_N}, the LCD of p_i ∈ P (i = 1, 2, ⋯ , N) is $LCD (p_{i}) = | \sum_{j = 1}^{k} (p_{i} - p_{ij}) |,$ (1) where p_i1, p_i2, ⋯ , p_ik are the k nearest neighbors of p_i calculated by KNN.

Definition 2. Given a point cloud P = {p₁, p₂, ⋯ , p_N} and the normals n₁, n₂, ⋯ , n_N, the LND of p_i ∈ P (i = 1, 2, ⋯ , N) is $LND (p_{i}) = \sum_{j = 1}^{k} | n_{ij} - n_{i} |,$ (2) where n_i1, n_i2, ⋯ , n_ik are the normals of k nearest neighbors p_i1, p_i2, ⋯ , p_ik of p_i calculated by KNN.

Definition 3. Given a point cloud P = {p₁, p₂, ⋯ , p_N}, the DI of p_i ∈ P (i = 1, 2, ⋯ , N) is $DI (p_{i}) = LCD (p_{i}) + λ LND (p_{i}),$ (3) where λ > 0. Obviously, DI (p_i) ⩾0 (i = 1, 2, ⋯ , N).

LCD and LND can reflect the positions and sparseness of points. For each point p_i in the point cloud, the larger LCD (p_i) and LND (p_i), the more likely p_i is a point that can describe the boundaries and topology information of the point cloud accurately, i.e., the form describer. Empirically, LCD is better at finding points in sparse locations, while LND is more sensitive to points at the boundaries and contours of the objects. Therefore, the form describers can be found by the complementarity of these 2 statistics accurately.

DI combines LCD and LND of p_i linearly to measure the degree to which p_i is a form describer. The parameter λ is used to weigh the importance of LCD and LND. In FDS, DI of each point in the point cloud is calculated to obtain the sample distribution of DIs. Empirically, M points with the largest DIs are selected to constitute the set of form describers. According to the process of FDS, we can define the set of form describers of a point cloud definitely.

Definition 4. Given 2 sets $X = {x_{i}}_{i = 1}^{K}$ and $Y = {y_{i}}_{i = 1}^{K}$ that satisfy |X| = |Y| > M and x_i ⩾ 0 (i = 1, 2, ⋯ , K). f (X, t) = x_{d
_t} is the t-th largest number in X and g (x_j, Y) = y_j. Given a point cloud $P = {p_{i}}_{i = 1}^{N}$ , the set of form describers of P is $Q = {g (f (DI, j), P)}_{j = 1}^{M},$ (4) where DI = {DI (p₁) , DI (p₂) , ⋯ , DI (p_N)}. Each element in Q is a form describer of P.

Algorithm 1 shows the process of FDS. Given a 2D virtual point cloud, as shown in Fig. 2, the arrow indicates the direction of normal of each point. The information of each point, including coordinate, normal, LCD, LND and DI, is given in Table 2. Note that the normals are unitized. We set k = 2 in both (1) and (2) and set λ = 1 in (3). The searching effects of LCD, LND and DI on the virtual point cloud are shown in Fig. 3, where the letters marked in the points represent the degrees to which the points are identified as form describers. The points marked as A are identified as the form describers to the largest extent. The points marked as B are identified as the form describers to the large extent. The points marked as C are identified as the form describers to the least extent. The searching effects on the 3D real point cloud shown in Fig. 4 proves that LCD can find more points in sparse locations, while LND is more sensitive to points at the boundaries and contours of the objects. Moreover, DI can combine the advantages of LCD and LND effectively to find the set of form describers of point clouds.

Fig. 2

Positions and normals of the points of the 2D point cloud.

Fig. 3

Searching effect of form describers on 2D virtual point cloud using our statistics.

Fig. 4

Searching effect of form describers on 3D real point cloud using our statistics.

Table 2

Information of each point in the 2D virtual point cloud

Number	Coordinate	Normal	LCD	LND	DI
1	(0, 0)	(– 1, 0)	3	0	3
2	(0, 1)	(– 1, 0)	0	0	0
3	(0, 2)	(– 1, 0)	0	0	0
4	(0, 3)	(– 1, 0)	0	0.765	0.765
5	(0, 4)	(– 0.707, 0.707)	1.414	1.935	3.349
6	(1, 4)	(0.447, 0.894)	1	3.116	4.116
7	(2, 3)	(0, – 1)	2	3.893	5.893
8	(3, 4)	(– 0.447, 0.894)	1	3.116	4.116
9	(4, 4)	(0.707, 0.707)	1.414	1.935	3.349
10	(4, 3)	(1, 0)	0	0.765	0.765
11	(4, 2)	(1, 0)	0	0	0
12	(4, 1)	(1, 0)	0	0	0
13	(4, 0)	(1, 0)	3	0	3

Algorithm 1 Form Describer Searching (FDS)
Input: point cloud $P = {p_{i}}_{i = 1}^{N}$ , normals ${n_{i}}_{i = 1}^{N}$ , parameters k₁, k₂, λ, M
1. fori = 1, 2, ⋯ , Ndo
2. Set k = k₁ and get LCD (p_i) based on (1)
3. Set k = k₂ and get LND (p_i) based on (2)
4. Calculate DI (p_i) with λ based on (3)
5. end for
6. Arrange {DI (p₁) , DI (p₂) , ⋯ , DI (p_N)} in descending
order and get the result {DI (p_{d ₁}) , DI (p_{d ₂}) , ⋯ , DI (p_{d _N})}
7. return the set of form describers Q = {p_{d ₁}, p_{d ₂}, ⋯ , p_{d _M}}

Since form describers describe the boundaries and topology information of point clouds accurately, the feature aggregation strategy of them should be conservative, otherwise the upsampling model is likely to generate incorrect new boundaries based on incorrect feature representations. Nearby neighbors are more reliable compared with the neighbors far away, because they are more likely to contain correct and relevant semantic information, i.e., small local structures are more reliable than large local structures. In this paper, the importance of small local structures is emphasized in the process of feature aggregation to realize the rectification of feature representations of form describers.

Specifically, an independent deep graph convolutional network is introduced as a rectification network to learn feature representations of small local structures of form describers conservatively. The rectification network has the same structure as the feature embedding network of all nodes and runs in parallel. We integrate the feature representations of the form describers learned by the rectification network into the corresponding part of the feature representations learned by the feature embedding network by way of summation, realizing the rectification in the process of feature aggregation of the form describers. The structure of our RNFD is shown in Fig. 5. Note that the rectified feature M × C₀ is extended to N × C₀ in order to ensure the consistence of shape of rectified feature and feature learned by the feature embedding network. Specifically, zero vectors are used to fill the tensor to ensure that the positions of the rectified feature of the form describers remain unchanged throughout the point cloud.

Fig. 5

Structure of RNFD.

3.2 Multilevel KNN graph building

Point clouds have latent graph structures that can describe the topology of objects accurately in the ideal situation. The following graph convolutional network learns the topology of the point clouds by using the graph structure information of the point clouds. The upsampled point clouds with high form fidelity are obtained finally.

Traditionally, the graph structure of a point cloud is built by unified KNN. This algorithm uses a simple strategy to build the graph structure of a point cloud, i.e., to find the same number of neighbors for each point, which is suitable for building the graph structure on the flat and simple surface of the objects. However, unified KNN is likely to generate overfilled graphs on the sharp edges and holes of the objects, which cannot depict the topological structures of the objects accurately. This fuzzy semantic information and incorrect graph structures will mislead the subsequent graph convolutional network, making it learn incorrect semantic information of point clouds. Therefore, improving the building method of the graph structures of the point clouds is conducive to the subsequent graph convolutional network for point cloud learning, obtaining the upsampled point clouds with high form fidelity finally.

Note that DI can quantify the positions and sparsity of points. For point pairs with large differences in positions and sparsity, the numbers of neighbors of them should be some difference to build a more realistic and accurate point cloud structure. Empirically, for points that are close to the edges and outlines of the objects and have large degrees of sparsity, i.e., points with larger DIs, selecting fewer neighbors can describe the local topology more realistically. For points located in the center of the objects and with small degrees of sparsity, i.e., points with smaller DIs, appropriate increases in the numbers of neighbors will help to make more and correct use of local structure information to generate a more uniform point cloud.

FDS can find the point set consisting of points with large DIs. Therefore, FDS can be used multiple times to divide the input point cloud into several disjoint levels according to the values of DIs of the points, i.e., $P = P_{1} \cup P_{2} \cup \dots \cup P_{t},$ (5)

and $P_{i} \cap P_{j} = \emptyset, \forall i, j = 1, 2, \dots, t,$ (6) where P is the input point cloud, t is the hyperparameter that represents the number times we use FDS, i.e., the number of levels. Subsequently, unified KNN is independently applied to the points in different sets to build the respective subgraph structures of them. Note that we use unified KNN to find neighboring points on the whole point cloud, establishing associations among all subgraph structures. Finally, all subgraph structures are integrated to obtain a complete graph structure, i.e., $G = G_{1} \cup G_{2} \cup \dots \cup G_{t},$ (7) where G_i is a graph structure built by P_i using unified KNN on the whole point cloud. The structure of our MKGB is shown in Fig. 6, where the hyperparameters S and t satisfy S × t = N.

Fig. 6

Structure of MKGB.

The complete graph structure realizes a more realistic and accurate representation of the topological structure information of the point clouds through a variety of subgraph structures, which is helpful for the subsequent graph convolutional network to utilize the topological structure information of the point clouds, obtaining the upsampled point clouds with high form fidelity finally.

3.3 Graph aware feature aggregation

The spatial-domain GCNs are often applied to point clouds. These methods manage to learn neighboring feature representations of points to perceive the detailed topological structures of objects. However, spatial-domain methods do not incorporate graph structure information including node degrees, edge weights, etc., and are only suitable for learning simple graph structures. The graph structure of the point cloud is a description of the topological structures of the object, which is generally more complex, especially for those objects with more complex topological structures. Therefore, spatial-domain methods often have very limited ability to learn the graph structures of complex point clouds. It is likely to learn fuzzy and biased detail topology, which leads to the insufficiency of form fidelity eventually.

The method [33] proposed by Kipf et al. is in the spectral domain, which effectively integrates the degrees of nodes and the weights of edges in the process of learning the neighboring feature representations of points, making full use of graph structure information. The calculation formula of the graph convolution layer of the spectral-domain methods is $Z = ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}}) X Θ,$ (8) where $\tilde{A} = A + I$ , A is the adjacency matrix of the graph, $\tilde{D}$ is the degree matrix corresponding to $\tilde{A}$ , X is the feature of all nodes and Θ is the trainable convolution kernel. However, the spectral-domain methods restrict the input graph structures to be undirected, which are not suitable for point cloud data using for directed feature aggregation. Although the directed graph can be relaxed into an undirected graph and then inputted into the spectral-domain methods, this will lose the directed information and some degree information of the graph, misleading the further learning of the graph structure information. Directed graph convolutional network (DGCN) [34] is based on the spectral-domain methods and designs symmetric proximity matrices that preserve the directed information of the graph. This method can make full use of graph structure information without losing the directed information in the graph. Therefore, DGCN is used to build our GAFA to achieve accurate perception of model toward topological structures of the objects.

We directly introduce the theories in DGCN [34] for the convenience of reading. Given a directed graph $G = (V, E)$ and the weighted adjacency matrix A of it, the first-order proximity matrix of graph $G$ is defined as $A_{F} (i, j) = max {A (i, j), A (j, i)} .$ (9)

The first-order adjacency matrix A_F is obtained by symmetrizing the adjacency matrix A, which describes the neighboring information of each node. As shown in Part a of Fig. 7, nodes 4, 7, 9 and 10 are first-order neighbors of node 1, i.e., first-order similarity. The second-order in-degree matrix and the second-order out-degree matrix of graph $G$ are defined as

Fig. 7

Node proximity.

$A_{S_{in}} (i, j) = \sum_{k} \frac{A_{k, i} A_{k, j}}{\sum_{v} A_{k, v}}$ (10)

and $A_{S_{out}} (i, j) = \sum_{k} \frac{A_{i, k} A_{j, k}}{\sum_{v} A_{v, k}},$ (11)

respectively, where $k, v \in V$ . A_{S
_in} (i, j) sums the weights of all normalized edges pointing to both node v_i and node v_j, reflecting the in-degree similarity of v_i and v_j. As shown in Part b of Fig. 7, since nodes 1 and 5 have edges pointing to nodes 4 and 9 at the same time, thus nodes 4 and 9 are similar in second-order in-degree. A_{S
_out} (i, j) sums the weights of all normalized edges pointed to by both v_i and v_j, reflecting the out-degree similarity of v_i and v_j. As shown in Part c of Fig. 7, since nodes 3 and 5 have edges pointing to nodes 2 and 6 at the same time, nodes 3 and 5 are similar in second-order out-degree.

The first-order adjacency matrix A_F only retains the undirected information of the graph, while the second-order in-degree adjacency matrix A_{S
_in} and the second-order out-degree adjacency matrix A_{S
_out} calculated through the weighted directed edges in the graph retain the directed information of the graph. Therefore, these proximity matrices highly preserve the full information of the graph. It is easy to verify that A_F, A_{S
_in} and A_{S
_out} are all symmetric matrices, thus they can be used for full utilization by spectral-domain methods, i.e., ${\begin{matrix} Z_{F} = {\tilde{D}}_{F}^{- \frac{1}{2}} {\tilde{A}}_{F} {\tilde{D}}_{F}^{- \frac{1}{2}} X Θ, \\ Z_{S_{in}} = {\tilde{D}}_{S_{in}}^{- \frac{1}{2}} {\tilde{A}}_{S_{in}} {\tilde{D}}_{S_{in}}^{- \frac{1}{2}} X Θ, \\ Z_{S_{out}} = {\tilde{D}}_{S_{out}}^{- \frac{1}{2}} {\tilde{A}}_{S_{out}} {\tilde{D}}_{S_{out}}^{- \frac{1}{2}} X Θ, \end{matrix}$ (12)

where ${\begin{matrix} {\tilde{D}}_{F} = diag (\sum_{j}^{n} {\tilde{A}}_{F} (i, j)), \\ {\tilde{D}}_{S_{in}} = diag (\sum_{j}^{n} {\tilde{A}}_{S_{in}} (i, j)), \\ {\tilde{D}}_{S_{out}} = diag (\sum_{j}^{n} {\tilde{A}}_{S_{out}} (i, j)) . \end{matrix}$ (13)

We fuse the graph structure feature representations of these 3 adjacency matrices to obtain the complete graph structure feature representations of the nodes. We also use the parameters α, β to weigh the importance of different graph structure feature representations, i.e., $Z = Concat (Z_{F}, α Z_{S_{in}}, β Z_{S_{out}}) .$ (14)

The structure of GAFA is shown in Fig. 8. Note that multiple DGCNs are cascaded to extract multi-layer features.

Fig. 8

Structure of GAFA.

Our GAFA enhances the ability of graph convolutional network to learn the graph structures of complex point clouds, so that the model can make full use of the graph structure information of point clouds, obtaining the upsampled point clouds with high form fidelity.

4 Experiments

Our model is trained and tested on PU1K [17] dataset. Quantitative and qualitative experimental results are reported as well. We also provide qualitative experimental results on the real-scanned dataset Semantic3D [42] to prove the practicability of our PU-FPG in practical scenarios. In addition, abundant ablation studies are conducted to verify the effectiveness of the proposed RNFD, MKGB and GAFA in solving the problems of the insufficient form fidelity of the upsampled point clouds.

4.1 Dataset

4.1.1 PU1K dataset

PU1K consists of 1147 3D models, which are divided into 1020 training samples and 127 test samples, covering a large number of 3D object semantic ranges. The training set consists of 120 3D models of PU-GAN and 900 3D models of ShapeNetCore [43]. The test set includes 27 3D models of PU-GAN and more than 100 3D models of ShapeNetCore.

Following the previous work, Poisson disk sampling is used in the original mesh to generate input and ground truth point cloud pairs. In addition, 50 point cloud patches are cut from each 3D model as the input when training the network, i.e., a total of 51000 point cloud patches are used for training. Each patch includes 256 points as training input and 1024 points as ground truth. Each input point cloud containing 2048 points and each ground truth point cloud containing 8192 points are used as paired test data.

4.1.2 Semantic3D dataset

Semantic3D provides a large labelled 3D point cloud data set of natural scenes with over 4 billion points in total. It also covers a range of diverse urban scenes: churches, streets, railroad tracks, squares, villages, soccer fields, castles to name just a few. The point clouds Semantic3D provides are scanned statically with state-of-the-art equipment and contain very fine details.

We manually cut point cloud patches with exact semantic information from Semantic3D. These patches are used as the input point clouds to conduct qualitative experiments in practical scenarios.

4.2 Loss function

In this paper, we use CD as the loss function to minimize the difference between the upsampled point cloud and the ground truth. Given 2 point clouds P and Q, Chamfer distance of them is defined as $\begin{matrix} d_{CD} (P, Q) = \frac{1}{| P |} \sum_{x \in P} min_{y \in Q} {∥ x - y ∥}_{2}^{2} \\ + \frac{1}{| Q |} \sum_{y \in Q} min_{x \in P} {∥ x - y ∥}_{2}^{2}, \end{matrix}$ (15) where | · | represents the number of elements in the set, and ∥ · ∥ ₂ represents 2-norm.

4.3 Evaluation metrics

We follow the previous work and select CD, Hausdorff distance (HD) and point-to-surface distance (P2F) as evaluation metrics in order to facilitate comparison. The smaller the values of these metrics, the better the model generates upsampled points. All models are trained and tested on the same computer with a single NVIDIA Tesla V100-32 G GPU and Intel Xeon E5-2698 v4 CPU.

4.4 Implementation details

In all experiments, we train PU-FPG for 200 epochs with batch size 64 on NVIDIA Tesla V100-32 G and set the input number of points N = 256. In our RNFD, the number of form describers M = 64, and the compromise factor λ = 0.95. In our MKGB, we use FDS for t = 4 times, and the numbers of neighbors selected at each level are set to k₁ = 8, k₂ = 12, k₃ = 16 and k₄ = 20. In our GAFA, we use the tradeoff parameter α = 0.85 and β = 1. In addition, we use the Adam optimizer and set the learning rate to 0.001 and beta to 0.9. Similar to the previous work, normalizing, rotating, scaling and random perturbation are all used in the point clouds to enhance the data. We train PU-Net, MPU, PU-GAN, PU-GCN and PU-FPG on PU1K for multiple times to ensure the stabilities of them. In our experiments, the upsampling factor r = 4.

4.5 Quantitative experimental results

Table 3 shows the mean quantitative results in multiple experiments of PU-Net, MPU, PU-GAN, PU-GCN and PU-FPG on PU1K. Here we note that we cannot compare with the works in 2022 such as [18] and [23], since the dataset we used is different from those they used. Although PU-FPG needs more training parameters compared with the state-of-the-art model PU-GCN, PU-FPG has significantly improved in other metrics, where CD is reduced by 0.159 × 10^-3, HD is reduced by 2.892 × 10^-3 and P2F is reduced by 0.852 × 10^-3. The experimental results show that our proposed PU-FPG can improve the form fidelity of the upsampled point clouds effectively, thus raising the quality of the upsampled point clouds.

Table 3
Quantitative experimental results

Models CD↓ 10^–3 HD↓ 10^–3 P2F↓ 10^–3 Param. Kb

PU-Net 1.175 16.973 4.799 812.0

MPU 0.929 12.328 3.274 76.2

PU-GAN 0.777 10.986 3.001 541.6

PU-GCN 0.658 9.924 2.738 76.0

PU-FPG 0.499 7.032 1.886 218.5

Models	CD↓ 10^–3	HD↓ 10^–3	P2F↓ 10^–3	Param. Kb
PU-Net	1.175	16.973	4.799	812.0
MPU	0.929	12.328	3.274	76.2
PU-GAN	0.777	10.986	3.001	541.6
PU-GCN	0.658	9.924	2.738	76.0
PU-FPG	0.499	7.032	1.886	218.5

4.6 Qualitative experimental results

4.6.1 Results on PU1K

Qualitative upsampling comparison results of PU-Net, MPU, PU-GAN, PU-GCN and PU-FPG on PU1K are shown in Fig. 9. In Part a, PU-FPG keeps the topological details of holes of objects to the greatest extent. At the location of the close-up display, there is a certain gap between the shape of the point cloud generated by PU-FPG and the ground truth (the result of PU-FPG is close to a circular hole, while the ground truth is a triangular hole), which may be caused by the insufficient expression ability of the input point cloud to the complete point cloud. In Part b, the point cloud generated by PU-FPG is the best in preserving the holes and right angles of objects. In Part c, PU-FPG maintains the hollowed details and overall shape of the object as much as possible. However, our predicted point cloud still loses some details, which may be caused by the extremely complex details of the object. In addition, a short arc appears at the bottom of the close-up of ground truth, while the corresponding position of the point cloud generated by PU-FPG is close to a straight line, which may be because the input point cloud is not very clear at the corresponding position and misleads the model. In Part d, PU-FPG keeps the inverted trapezoid of the close-up position of the object at most. In Part e, PU-FPG retains the basic topological features of the close-up position, gaining the best upsampling effect in the compared models. In a word, our PU-FPG can generate the point clouds whose forms are the closest to ground truth, and have the highest form fidelity.

Fig. 9

Qualitative experimental results on PU1K.

4.6.2 Results on Semantic3D

Qualitative upsampling comparison results of PU-Net, MPU, PU-GAN, PU-GCN and PU-FPG on Semantic3D are shown in Fig. 10. In Part a, PU-FPG generates the most realistic point cloud compared with other models. At the location of the close-up display, PU-FPG can avoid the generation of outliers and preserve the circular arc structure of the door frame to the greatest extent. In Part b, though PU-FPG produces a few outliers, it can generate the upsampled window point cloud with the most realistic square structure at the close-up position. Additionally, the highest window structure of the church is maintained as well. In Part c, although the structure of the sparse input point cloud is complex, PU-FPG retains the form of holes of the input point cloud to some extent, while other models obtain incorrect results by covering the holes. In summary, our PU-FPG can generate the upsampled point clouds with the highest form fidelity.

Fig. 10

Qualitative experimental results on Semantic3D.

4.7 Ablation study

Abundant ablation studies are conducted to verify the contribution of RNFD, MKGB and GAFA in improving the quality of point cloud upsampling. We conduct the following experiments on PU1K and ensured that all experimental parameters are consistent. Table 4 shows the effects of our proposed parts individually and in combination. The experimental results show that the combination of the 3 parts we proposed, i.e., PU-FPG, can produce the highest quality upsampled point clouds.

Table 4
Experimental results of ablation study

Experiments CD↓ 10^–3 HD↓ 10^–3 P2F↓ 10^–3

PU-GCN 0.658 9.924 2.738

RNFD 0.569 7.795 2.196

RNFD+GAFA 0.504 6.255 1.939

PU-FPG 0.499 7.032 1.886

Experiments	CD↓ 10^–3	HD↓ 10^–3	P2F↓ 10^–3
PU-GCN	0.658	9.924	2.738
RNFD	0.569	7.795	2.196
RNFD+GAFA	0.504	6.255	1.939
PU-FPG	0.499	7.032	1.886

5 Conclusion

In this paper, we focus on the problem of the upsampled point clouds with incorrect and unclear contours and boundaries as well as deficient topology structures in the process of point cloud upsampling, i.e., the problem of insufficient form fidelity of the upsampled point clouds. Firstly, we manage to make the form describers of the point clouds multiply near themselves according to correct semantic information. To this end, 3 statistics of points, i.e., LCD, LND and DI, are designed to estimate form describers of point clouds and further rectify the process of feature aggregation of them with more reliable neighboring features. This constitutes our rectification network based on form describer. Secondly, multilevel KNN graph building divides all points into several disjoint levels according to the above 3 statistics and apply KNN algorithm to the points of different levels respectively to build more realistic and accurate graph structures, so that the graph convolutional network can fully utilize potential graph structures of point clouds. Finally, to enhance the perception of model toward graph information, graph aware feature aggregation fuses cascaded networks and more graph information and add them to the process of feature aggregation in order that the network can deeply learn the topology of the object. Our model PU-FPG is obtained by combining these 3 parts with point cloud upsampling networks. Abundant quantitative and qualitative experiments prove that PU-FPG can improve the form fidelity and raise the quality of upsampled point clouds effectively.

In the future, we will combine more statistical methods with our DI to search for form describers more precisely. Besides, designing a mechanism to balance LCD and LND adaptively will be beneficial as well. Moreover, searching for outliers and reducing their weights in feature aggregation will also help improve the form fidelity of upsampled point clouds.

Footnotes

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (no. 62072024 and 41971396), the Projects of Beijing Advanced Innovation Center for Future Urban Design (no. UDC2019033324), R&D Program of Beijing Municipal Education Commission (KM202210016002 and KM202110016001), the Fundamental Research Funds for Municipal Universities of Beijing University of Civil Engineering and Architecture (no. X20084 and ZF17061), and the BUCEA Post Graduate Innovation Project (PG2022144 and PG2023143).

References

Hoppe

, DeRose

, Duchamp

, McDonald

and Stuetzle

, Surface reconstruction from unorganized points, in: Proceedings of the 19th annual conference on computer graphics and interactive techniques (1992). https://doi.org/10.1145/133994.134011

Kazhdan

and Hoppe

, Screened Poisson Surface Reconstruction, ACM Transactions on Graphics (ToG) 32 (2013), 1–13. https://doi.org/10.1145/2487228.2487237

Newcombe

R.A.

, Izadi

, Hilliges

, Molyneaux

, Kim

, Davison

A.J.

, Kohi

, Shotton

, Hodges

and Fitzgibbon

, Kinectfusion: Real-time dense surface mapping and tracking, in: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (2011).

Riegler

, Ulusoy

A.O.

, Bischof

and Geiger

, Octnetfusion: Learning depth fusion from data, in: 2017 IEEE International Conference on 3D Vision (3DV) (2017).

Lang

A.H.

, Vora

, Caesar

, Zhou

, Yang

and Beijbom

, Pointpillars: Fast en-coders for object detection from point clouds, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (2019).

Wang

, Chao

W.L.

, Garg

, Hariharan

, Campbell

and Weinberger

K.Q.

, Pseudolidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).

Alexa

, Behr

, Cohen-Or

, Fleishman

, Levin

and Silva

C.T.

, Computing and rendering point set surfaces, IEEE Transactions on Visualization and Computer Graphics 9 (2003), 3–15.

Lipman

, Cohen-Or

, Levin

and Tal-Ezer

, Parameterization-free projection for geometry reconstruction, ACM Transactions on Graphics (TOG) 26 (2007), 22es. https://doi.org/10.1145/1276377.1276405

Huang

, Wu

, Gong

, Cohen-Or

, Ascher

and Zhang

, Edge-aware point set resampling, ACM Transactions on Graphics (TOG) 32 (2013), 1–12. https://doi.org/10.1145/2421636.2421645

10.

, Huang

, Gong

, Zwicker

and Cohen-Or

, Deep points consolidation, ACM Transactions on Graphics (ToG) 34 (2015), 1–13. https://doi.org/10.1145/2816795.2818073

11.

C.R.

, Su

, Mo

and Guibas

L.J.

, PointNet: Deep learning on point sets for 3D classification and segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (2017).

12.

C.R.

, Yi

, Su

and Guibas

L.J.

, PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space, Advances in Neural Information Processing Systems 30 (2017), 5099–5108.

13.

, Li

, Fu

C.W.

, Cohen-Or

and Heng

P.A.

, Pu-net: Point cloud upsampling network, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (2018).

14.

Yifan

, Wu

, Huang

, Cohen-Or

and Sorkine-Hornung

, Patch-based progressive 3d point set upsampling, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).

15.

, Li

, Fu

C.W.

, Cohen-Or

and Heng

P.A.

, Pu-gan: a point cloud upsampling adversarial network, in: Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (2019).

16.

Qian

, Hou

, Kwong

and He

, Pugeo-net: A geometry-centric network for 3d point cloud upsampling, in: European conference on computer vision (ECCV) (2020). https://doi.org/10.1007/978-3-030-58529-7_44

17.

Qian

, Abualshour

, Li

, Thabet

and Ghanem

, Pu-gcn: Point cloud up-sampling using graph convolutional networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021).

18.

Long

, Zhang

W.X.

, Li

, Wang

, Dong

and Yang

, PC2-PU: Patch Correlation and Point Cor-relation for Effective Point Cloud Upsampling, in: Proceedings of the 30th ACM International Conference on Multimedia, Lisbon, Portugal, 2022 pp. 2191–2201. https://doi.org/10.1145/3503161.3547777

19.

, Li

, Fu

C.W.

, Cohen-Or

and Heng

P.A.

, Ec-net: An Edge-Aware Point Set Consolidation Network, in: Proceedings of the European conference on computer vision (ECCV) (2018).

20.

, Chen

, Han

, Wan

and Liao

, Meta-PU: An arbitrary-scale upsampling network for point cloud, IEEE Transactions on Visualization and Computer Graphics 28 (2022), 3206–3218.

21.

, Li

, Heng

P.A.

and Fu

C.W.

, Point cloud upsampling via disentangled refinement, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021).

22.

Qian

, Hou

, Kwong

and He

, Deep magnification-flexible upsampling over 3d point clouds, IEEE Transactions on Image Processing 30 (2021), 8354–8367.

23.

Feng

, Li

, Cai

, Luo

and Zhang

, Neural Points: Point Cloud Representation with Neural Fields for Arbitrary Upsampling, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).

24.

, Yan

, Wang

, Xie

and Pu

, Point cloud up-sampling via cascaded refinement network, in: Proceedings of the Asian Conference on Computer Vision (ACCV) (2022).

25.

Liu

, Yuan

, Hou

, Hamzaoui

and Gao

, PUFA-GAN: A Frequency-Aware Generative Adversarial Network for 3D Point Cloud Upsampling, IEEE Transactions on Image Processing 31 (2022), 7389–7402.

26.

Heimann

, Spruck

and Kaup

, Joint Geometry and Attribute Upsampling of Point Clouds Using Frequency-Selective Models with Overlapped Support (2023). arXiv, arXiv:2301.11630.

27.

Liu

, Han

, Wen

, Liu

Y.S.

and Zwicker

, L2g auto-encoder: Understanding point clouds by local-to-global recon-struction with hierarchical self-attention, in: Proceedings of the 27th ACM International Conference on Multimedia, Nice, France (2019), 989–997. https://doi.org/10.1145/3343031.3350960

28.

Liu

, Liu

, Han

and Liu

Y.S.

, Spu-net: Self-supervised point cloud upsampling by coarse-to-fine reconstruction with self-projection optimization, IEEE Transactions on Image Processing 31 (2022), 4213–4226.

29.

Simonovsky

and Komodakis

, Graphvae: Towards generation of small graphs using variational autoencoders, in: ICANN 2018:27th International Conference on Artificial Neural Networks, Rhodes, Greece, (2018), 412–422. https://doi.org/10.1007/978-3-030-01418-6_41.

30.

De Cao

and Kipf

, MolGAN: An implicit generative model for small molecular graphs, arXiv, arXiv:1805.11973, (2018).

31.

Bruna

, Zaremba

, Szlam

and LeCun

, Spectral networks and locally connected networks on graphs, arXiv, arXiv:1312.6203 (2013).

32.

Defferrard

, Bresson

and Vandergheynst

, Convolutional neural networks on graphs with fast localized spectral filtering, Advances in Neural Information Processing Systems 29 (2016).

33.

Kipf

T.N.

and Welling

, Semi-supervised classification with graph convolutional networks, arXiv, arXiv:1609.02907 (2016).

34.

Tong

, Liang

, Sun

, Rosenblum

D.S.

and Lim

, Directed graph convolutional network, arXiv, arXiv:2004.13970 (2020).

35.

Hamilton

, Ying

and Leskovec

, Inductive representation learning on large graphs, Advances in Neural Information Processing Systems 30 (2017).

36.

Wang

, Sun

, Liu

, Sarma

S.E.

, Bronstein

M.M.

and Solomon

J.M.

, Dynamic graph cnn for learning on point clouds, Acm Transactions on Graphics (tog) 38 (2019), 1–12. https://doi.org/10.1145/3326362

37.

, Mueller

et al., Deepgcns: Making gcns go as deep as cnns, IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).

38.

Thomas

, Qi

C.R.

, Deschaud

J.E.

, Marcotegui

, Gou-lette

and Guibas

L.J.

, Kpconv: Flexible and deformable con-volution for point clouds, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019).

39.

, Müller

, Thabet

and Ghanem

, Deepgcns: Can gcns go as deep as cnns? in: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019).

40.

Zhang

and Rabbat

, A Graph-CNN for 3D Point Cloud Classification, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 2018, pp. 6279–6283.

41.

, Zhang

and Huang

, Point Cloud Super Resolution with Adversarial Residual Graph Networks, arXiv, arXiv:1908.02111 (2019).

42.

Hackel

, Savinov

, Ladicky

, Wegner

J.D.

, Schindler

and Pollefeys

, Semantic3d.net: A new large-scale point cloud classification benchmark, arXiv, arXiv:1704.03847 (2017).

43.

Chang

A.X.

, Funkhouser

et al., Shapenet: An information-rich 3d model repository, arXiv, arXiv:1512.03012 (2015).

PU-FPG: Point cloud upsampling via form preserving graph convolutional networks

Abstract

Keywords

1 Introduction

2 Related works

2.1 Learning-based point cloud upsampling

2.3 Point cloud upsampling based on graph convolutional networks

3 Methodology

4.1 Dataset

4.1.1 PU1K dataset

4.1.2 Semantic3D dataset

4.2 Loss function

4.4 Implementation details

4.5 Quantitative experimental results

Table 3 Quantitative experimental results Models CD↓ 10–3 HD↓ 10–3 P2F↓ 10–3 Param. Kb PU-Net 1.175 16.973 4.799 812.0 MPU 0.929 12.328 3.274 76.2 PU-GAN 0.777 10.986 3.001 541.6 PU-GCN 0.658 9.924 2.738 76.0 PU-FPG 0.499 7.032 1.886 218.5

4.6.1 Results on PU1K

Table 4 Experimental results of ablation study Experiments CD↓ 10–3 HD↓ 10–3 P2F↓ 10–3 PU-GCN 0.658 9.924 2.738 RNFD 0.569 7.795 2.196 RNFD+GAFA 0.504 6.255 1.939 PU-FPG 0.499 7.032 1.886

Footnotes

Declaration of competing interest

Acknowledgments

References

Table 3
Quantitative experimental results

Models CD↓ 10^–3 HD↓ 10^–3 P2F↓ 10^–3 Param. Kb

PU-Net 1.175 16.973 4.799 812.0

MPU 0.929 12.328 3.274 76.2

PU-GAN 0.777 10.986 3.001 541.6

PU-GCN 0.658 9.924 2.738 76.0

PU-FPG 0.499 7.032 1.886 218.5

Table 4
Experimental results of ablation study

Experiments CD↓ 10^–3 HD↓ 10^–3 P2F↓ 10^–3

PU-GCN 0.658 9.924 2.738

RNFD 0.569 7.795 2.196

RNFD+GAFA 0.504 6.255 1.939

PU-FPG 0.499 7.032 1.886