An adaptive locally-coded point cloud classification and segmentation network coupled with genetic algorithm

Abstract

Local information coding helps capture the fine-grained features of the point cloud. The point cloud coding mechanism should be applicable to the point cloud data in different formats. However, the local features of the point cloud are directly affected by the attributes, size and scale of the object. This paper proposes an Adaptive Locally-Coded point cloud classification and segmentation Network coupled with Genetic Algorithm(ALCN-GA), which can automatically adjust the size of search cube to complete network training. ALCN-GA can adapt to the features of 3D data at different points, whose adjustment mechanism is realized by designing a robust crossover and mutation strategy. The proposed method is tested on the ModelNet40 dataset and S3DIS dataset. Respectively, the overall accuracy and average accuracy is 89.5% and 86.5% in classification, and overall accuracy and mIoU of segmentation is 80.34% and 51.05%. Compared with PointNet, average accuracy in classification and mIoU of segmentation is improved about 10% and 11% severally.

Keywords

Genetic algorithm 3D classification segmentation deep learning local coding

1 Introduction

Local information coding plays a key role in 3D point cloud processing, especially in classification [20] and segmentation [26]. Methods for classification usually learn the features of each point first and then extract a global shape from point cloud. Based on classification, the goal of segmentation is to separate it into several subsets according to the semantic meanings of points [11]. The combination of local information coding and deep learning can effectively facilitate 3D data analysis.

Point-based methods [4] are more conducive to controlling obvious information loss, and is an increasingly popular point cloud processing method recently. There are [3 , 26], and [35], where [3] is the representative of this class of methods. PointNet [3] uses MLPs and a symmetric function to build the point cloud processing network. However, it ignores the fine-grained features of the point cloud [25], such as the edges and corners of a table and the wings of an airplane. In order to solve this problem, many scholars have done a lot of work on local feature extraction of point cloud targets. PointNet++ [25] applies PointNet recursively on a nested partitioning of the input point set, enhancing the local feature description with context features. But the geometric relationships among points are not fully explored, since it treats each point independently.

Local information coding can effectively capture the fine-grained features of point clouds [2]. Based on PointNet [3], many improved networks have put forward to enhance the capture of local features. A superpoint graph structure is desighed by Landrieu and Simonovsky [18] to effectively represent the relationship between different object parts. Moreover, a SO-Net [19] is proposed to take advantage of local information in point clouds, but this structure becomes very complicated compared to Pointnet [3]. It is instructive that an point cloud encoding method that facilitates the deep learning network to utilize the local information can effectively balance network complexity and performance [28]. However, some parameters of this method are set manually, and it will improve the network performance if an automatic evolution parameter optimization mechanism is designed.

In Song’s method [28], two parameters, d and N, mainly affect the performance of classification and segmentation. Among them, N is directly related to the point cloud coding structure. In a complete training and testing process of the network, N is not easy to change, so the empirical value can be taken. The setting of parameter d will change with the change of the point cloud of the input network, and different d will affect the amount of information of the local feature description matrix of the point cloud. Therefore, in the process of capturing local features of the point cloud, dynamic optimization and adjustment of d parameter until the specific attribute parameters corresponding to the dataset are obtained, which will be conducive to improving classification and segmentation performance.

Genetic algorithm is a heuristic algorithm for optimization inspired by biological evolution based on the principles of heredity and gene selection [10]. It is widely used in dynamic programming problems [27]. Proposes three linear relaxation based heuristics (LRH) and an evolutionary heuristic that hybridizes a genetic algorithm with a variable neighborhood descent (GA+VND) to study the Dynamic Facility Location Problem with Modular Capacities (DFLPM). A hybrid solution approach that combines a genetic algorithm with the exact dynamic programming procedure (GA-DP) is proposed for studying the time-dependent vehicle routing and scheduling problem with CO2 emissions optimization (TD-VRSP-CO2) [30]. For parameter d dynamic programming problems, genetic algorithm can sacrifice a small amount of operation overhead to guide the training and parameter evolution of the network.

This paper proposes an Adaptive Locally-Coded point cloud classification and segmentation Network based on Genetic Algorithm, termed ALCN-GA. The network structure consists of three parts, local feature coding, genetic operations and point cloud processing network. The design of local feature coding is mainly to capture the fine-grained features of point cloud and obtain the input interface of convolution operation. The steps of coding method are as follows: First, the cube search range of local points is determined, then the standardization is carried out, and finally the random sorting operation is carried out to obtain the matrix expression of fixed format. The coding process produces two variable parameters, d and N. Wherein, parameter N realizes the empirical value through Song’s method [28], and parameter d is the main goal of dynamic programming. Genetic operations includes coding, selection, crossover and mutation of parameter d, and the design of appropriate selection strategy and fitness function. The point cloud processing network of ALCN-GA is a simplified PointNet [3], which removes the network layer of T-Net.

The influence parameters of ALCN-GA mainly include fitness function, selection strategy and the value of parameter N. The fitness function is the form of the change of classification and segmentation accuracy, among which the function similar to the shape of x^-3 is more suitable. The roulette selection strategy is more suitable for the slow convergence of neural networks and thus less likely to lead to large jumps. Parameter N mainly affects the coding of the network. The larger the value, the more likely it is to increase the local information of the point cloud, but it will lead to the blurring of fine-grained features, and vice versa. By comparing different N values, combined with Song’s method [28], this factor is controllable.

In order to verify the rationality of ALCN-GA, several groups of experiments are conducted to compare the results of previous studies fairly. The selected datasets are the well-known ModelNet40 [36] and S3DIS [1] datasets. The overall accuracy and average accuracy is 89.5% and 86.5% in classification respectively, and overall accuracy and mIoU of segmentation is 80.34% and 51.05% respectively. Compared with PointNet, average accuracy in classification and mIoU of segmentation is improved by about 10% and 11%.

The rest of this paper includes a summary of the related works in Section 2, details of the proposed methods in Section 3, experiments in Section 4 and conclusion in Section 5.

2 Related works

This paper mainly involves four parts of research content: Point cloud local coding, Genetic Algorithm, Deep learning on point clouds and Point cloud Datasets. The work research of this paper refers to the results of many scholars, which systematically guides the follow-up progress.

2.1 Point cloud local coding

The local coding method has a wide range of applications. KDD uses kernel density parameters to build a descriptor to code the 3D spatial information around the feature points, contributing to the strategy of adapting different matching indexes under different resolutions [34]. Manipulating local folding method has been designed to decode the primordial folding pattern [21]. Neural Image Compression (NIC) combines global and local visual information for neural image compression and histopathological image analysis of gigapixels [29]. It can be seen that the local information coding method has an obvious effect on the fine-grained feature capture of data.

2.2 Genetic algorithm

Heuristic algorithm is mainly used to solve the problem of parameter optimization [31]. Common heuristic algorithms include Genetic Algorithm(GA) [15], Particle Swarm Optimization(PSO) [33], Ant Colony Optimization(ACO) [6], and Others. Among them, Genetic algorithm is proposed by Holland on the basis of Darwin’s theory of natural selection. [29] and [23] use genetic algorithms for medical image encryption; [9] proposes an improved genetic algorithm based on biased random key for combinatorial optimization sequencing. In addition, genetic algorithm is applied to sequencing problem of mixed-model assembly line, thus reducing energy consumption [32]. With the development of sensing technology, the demand for processing large amounts of data is increasing, and the combination of genetic algorithm and neural network is an important method [24]. Applies the combination of genetic algorithm and convolutional neural network to adaptive image analysis, and the accuracy is improved compared with transfer learning [12]. Uses multi-objective genetic algorithm to design a neural network radial basis function to classify the Windows of GPR. For the point cloud deep learning network, genetic algorithm can also be used to optimize relevant parameters.

2.3 Deep learning on point clouds

With the improvement of GPU manufacturing level, deep learning methods are gradually applied to point cloud processing. For example, the well-known PointNet, leverages a simple combination of MLPs and max pooling achieves direct consumption of the point clouds [3]. A large number of subsequent scholars have extended and improved on this basis, and put forward futher works [16 , 35]. This type of approach is effective for testing fixed datasets, but for many 3D scenarios, it is also necessary to accommodate varying point cloud data. Meanwhile, the fine-grained features of the input point cloud should be cared. Song et al. proposes a local information coding method that captures the fine-grained features of the point cloud [28].

2.4 Point cloud datasets

Point cloud dataset is used to evaluate the performance of the point cloud deep learning method [11]. Applications of point cloud datasets mainly include 3D shape classification and 3D point cloud segmentation. Common datasets ModelNet40 and ShapeNet [36] belong to synthetic datasets, which are often used to test the accuracy of classification algorithms. Segmentation datasets, such as S3DIS [1] and ScanNet [5], are usually acquired by different types of sensors, which are often used to develop target segmentation algorithms (Table 1).

Table 1
Point cloud datasets

Datasets Year Samples Scans Classes Channels Application

ModelNet40 [36] 2015 12311 - 40 3 Classification

ShapeNet [36] 2015 51190 - 270 3 Segmentation

S3DIS [1] 2017 - 272 13 9 Segmentation

ScanNet [5] 2017 - 1513 20 5 Segmentation

Datasets	Year	Samples	Scans	Classes	Channels	Application
ModelNet40 [36]	2015	12311	-	40	3	Classification
ShapeNet [36]	2015	51190	-	270	3	Segmentation
S3DIS [1]	2017	-	272	13	9	Segmentation
ScanNet [5]	2017	-	1513	20	5	Segmentation

3 Method

Inspired by Song’s method [28], a point cloud coding method that can contain a variety of information is implemented, which is used as a part of the ALCN-GA architecture (as shown in Fig. 1). Wherein, the point cloud data including size and the RGB information (Fig. 1(a)) may be obtained by the coding operation, to obtain a regular matrix representation (Fig. 1(b)). Meanwhile, an appropriate genetic algorithm is designed to adjust the coding parameters adaptively in order to obtain the anticipated coding parameters according to the characteristics of the datasets (Fig. 1(c)).

Fig. 1

The overview of ALCN-GA. (a)Raw point cloud. (b)Local information coding. (c) GA optimization mechanism. (d) Simple PointNet.

PointNet implements point cloud processing through clever design [3]. Referring to the classification and segmentation network in PointNet, this paper simplifies and designs the interface of the convolutional layer, which can realize the processing of the locally coded point cloud (Fig. 1(d)).

3.1 Local feature coding method

The so-called point cloud local coding is to transform the features of the surrounding points of a point into matrix expression. Studying the local information of a specific point can effectively capture the fine-grained features of the point cloud. For point cloud deep learning, a fixed data input format should be set, and the robustness of the data should also be considered. In order to realize the expectation, the coding design of local feature of point cloud includes the following three parts.

3.1.1 Point cloud local search

For any set of input 3D point cloud data, first select local feature points, and then search for points in their vicinity. There are many ways to search, the common methods are octree [13], KD tree [14] and so on. Theoretically, avoiding too much distance calculation, using Boolean operation and index method to fetch points can improve the search efficiency.

By using cube area search method to search point cloud locally, it can avoid distance calculation and directly conduct conditional index. Set the input raw point cloud (Fig. 1(a)) as K = {κ_i|i = 1, 2, ..., n}, and select a local point, κ_i (x_i, y_i, z_i). Define a cube search area $ℤ (d)$ with side length d, and determine the expression of the cube area with κ_o (x_o, y_o, z_o) as the center.

$\begin{matrix} ℤ (d) = & {(x, y, z) | x \in [x_{o} - χ, x_{o} + χ], y \in [y_{o} - χ, \\ y_{o} + χ], z \in [z_{o} - χ, z_{o} + χ], χ = \frac{d}{2}} \end{matrix}$ (1)

A specific number of points (N points) are randomly selected from this search range (Fig. 1(b)). After selecting the target point, the additional information of the point cloud, such as color, normal vector and scalar field, can be stored in the vector set.

3.1.2 Data format standardization

The data input to the point cloud deep learning network is a set of fixed-size matrix. However, within the search scope of local point cloud, it is impossible to form all the points into a matrix. Only N points are chosen for coding, which resultes in an insufficient number of points. Set the fill function: f : α (p₁, p₂, ..., p_a) → β (p₁, p₂, ..., p_b) , a ≤ b where ∀N > 0, ∃ a ≤ b makes the coding matrix miss information elements. So, β (p₁, p₂, ..., p_b) = α (p₁, p₂, ..., p_a) ∘ ∂ (o_1,1, o_2,1, ..., o_b,b) can standardize the coding matrix for easy calculation (Fig. 1(b)). Here, ∂ stands for filling operation, ∘ stands for matrix transformation operation, and its computational meaning is as follows:

$\partial (o_{1, 1}, o_{2, 1}, ..., o_{b, b}) = [\begin{matrix} O_{1, 1} & \dots & \dots & \dots & \dots & O_{1, a} \\ ⋮ & ⋱ & ⋮ \\ O_{a, 1} & ⋱ & O_{a, a} \\ O_{a + 1, 1} & ⋱ & O_{a + 1, a} \\ ⋮ & ⋱ & ⋮ \\ O_{b, 1} & \dots & \dots & \dots & \dots & O_{b, b} \end{matrix}]$ (2)

Where O is the transformation factor in the ∂ matrix.

3.1.3 Shuffle the order

In order to avoid the consistency of data and increase the robustness of training, the algorithm shuffles and reorganizes the standardized data, but keeps the number of point cloud groups unchanged (Fig. 1(b)). Set the shuffle function: g : β (p₁, p₂, ..., p_b) → η (p₃, p₂, ..., p_b-1), which is also a symmetric function whose purpose is to reorganization matrix [3].

3.2 Genetic operations

Combining the characteristics of local coding parameters, some genetic evolution operations are designed to continuously evolve better coding parameters (Fig. 1 (c)).

3.2.1 Crossover

Considering that the object of optimization is a series of continuous real numbers, a random breakpoint method is used to cross chromosomes (Fig. 2). The real number crossover process is as follows:

Judge whether crossover operation occurs;

Random selection of parents and random generation of intersection points;

Progeny chromosomes are generated from parental chromosome data.

Fig. 2

Real number cross processing.

3.2.2 Mutation

When optimizing the local search parameters, the mutation operation has two main functions: (1) Improve the convergence speed; (2) Prevent falling into local optimality. At the initial stage of network model training, the mutation operation can search the most appropriate parameters to make the model converge quickly. At the end of training, in order to get a better network model, new local search parameters are constantly mutated so as to jump out of the local optimal. The real coding mutation process is shown as follows (Fig. 3):

Judge whether mutation operation occurs;

The location and amount of mutations are randomly selected;

Any new progeny can mutate.

Fig. 3

Mutation processing.

3.2.3 Selection

The purpose of the selection operation is to filter out excellent local search parameters from the overall. The commonly used selection strategies include roulette, average selection, perfect retention and so on [8]. This paper uses a suitable roulette selection strategy. The roulette method is beneficial to improve the probability of being selected by individuals with strong survivability, which is called a random sampling method with put back. According to the probability of parents appearing in the population, the offspring are randomly selected to form individuals. The main steps are as follows:

Calculate the fitness of each individual; $fitness (i), i = 1, 2, ..., popsize$ (3)

Calculate the probability of individuals being selected to the next generation; $p_{i} = \frac{fitness (i)}{\sum_{j = 1}^{popsize} fitness (j)}$ (4)

Calculate the probability of individual accumulation; $P_{i} = \sum_{k = 1}^{i} p_{k} = \frac{\sum_{k = 1}^{i} fitness (k)}{\sum_{j = 1}^{popsize} fitness (j)}$ (5)

Random number 0 - 1 is generated. Call the random number as

ξ. If P_i-1 ≤ ξ ≤ P_i is satisfied, individual i is selected.

According to the cumulative selection probability of each generation of the population, the random number method is used to select the offspring into the next generation population, which requires the design of a suitable fitness function.

Obviously, the fitness function should be related to the classification or segmentation accuracy of the point cloud deep learning network model. The general accuracy calculation is a normalized expression. $Acc = \frac{NC}{NA} \times 100 %$ (6)

Where Acc represents the index between 0 and 1, and NC represents the number of correct subjects in the experiment, and NA represents the total number of subjects participating in the experiment. Considering that the accuracy parameters are all less than 1. In order to magnify the differences of the population, the fitness function should be modified to meet the following rules:

Rule1:In order to distinguish individual fitness in a wide range, the function form should be monotonically decreasing from 0 to 1.

In theory, the accuracy will increase with the increase of iterations, which is the dual role of neural network evolution and genetic evolution. In order to make the iteration converge as fast as possible, the fitness function should make the population in the early stage have a greater degree of discrimination. For example, x^-1 and x^-3 in the Fig. 4, it is obvious that x^-3 has a more obvious distinguishing effect.

Fig. 4

The fitness and accuracy relation of different selection functions.

In the convergence stage, the population change difference is not obvious, but the network parameters cannot be kept consistent. Meanwhile, the diversity of parameters should be increased, and the variation should be ensured near the optimization results. Therefore, the fitness function should be changed slowly and somewhat at the end. Therefore, the following rule should also be met:

Rule 2:There is an iterative wait-and-see number ψ, which satisfies: $v_{1, 2} = {\begin{matrix} (1 \pm ω \cdot ɛ), & Counter < ψ \\ μ_{1} + Δ, μ_{2} + Δ, & Counter \geq ψ \end{matrix}$ (7)

Where v_1,2 represents the evolutionary parameters after adjustment, v_1,2 represents the evolutionary parameters before adjustment, ω represents the adjustment factor, ɛ represents the random number between 0 and 1, Δ epresents the adjustment quantity, and Counter represents the counter.

Referring to several functions in the Fig. 5, it is found that the fractional function has a good solution to this kind of problem. In this paper, the form of fitness (x) = x^-3 is used as the fitness function (Fig. 4).

Fig. 5

Evolutionary terminal regulatory mechanisms.

The chromosome coding is combined with the empirical parameters mentioned above to set the upper and lower limits. The recording operation should be consistent with the network structure and the structural characteristics of the genetic algorithm, so as to achieve the purpose of marking the optimal chromosome.

3.3 Point cloud processing network

The core of PointNet consists of a symmetric function(max pooling) and MLPs [3]. However, this method is not accurate for local feature capture. Refer to the latter part of PointNet, and improve the part of local feature extraction (Fig. 1(d)).

Define Eϖ as the point cloud of the input network. Combined with coding operation, the expression is as follows: $E ϖ = g (f ({κ_{1}, κ_{2}, ..., κ_{n}}))$ (8)

Where parameter n represents the number of raw points, κ_i represents the input point cloud, f and g are point cloud coding operations.

The symmetric function here is the Max, which means that the output is consistent in the case of any arrangement of input. Therefore, the PointNet mechanism can be summarized by the following formula: $k = φ_{k} (\max (φ_{h} (E ϖ)))$ (9)

Where φ_k and φ_h is MLPs. The network structure evaluates the result k to determine the category of the input point cloud. The realization of segmentation function is to combine the output of φ_h and the output of φ_g into a new vector, which is sent into the MLPs of φ_m, so as to obtain new evaluation parameters to guide segmentation.

$m = φ_{m} (\sum (φ_{h} (*), φ_{g} (*)))$ (10)

Where ∑ represents vector concatenation and * represents omitted parameters.

By taking the maximum value of m parameter and comparing it with the label, the predicted classification results are obtained.

4 Experiment

The experiment consists of three parts: 1) Materials and Settings; 2) 3D Object Classification Experiments; 3) 3D Semantic Segmentation Experiments.

4.1 Materials and settings

Datasets ModelNet40 [36] and S3DIS [1] are used in this experiment. ModelNet40 contains 12311 samples for 3D classification, and S3DIS has 272 scans for 3D semantic segmentation (Table 1).

The setting of algorithm parameters should be combined with the point cloud deep learning network. In order to facilitate comparative experiments, the number of cross offspring in each generation should be kept at 2-3, and the number of mutant offspring should be kept at 1-2.

In these experiments, TensorFlow is used to build a deep learning network, while programming in Python3.6 environment. The hardware environment for the experiments includes: Intel^® Core(TM) i7-10875H CPU @ 2.30GHZ, NVIDIA GeForce RTX 2070 and 32GB RAM.

4.2 3D object classification experiments

Experimental design. For the 3D object classification task,Algorithms 1, 2 and simple PointNet are used to build an adaptive local coding network. Dataset ModelNet40 is used to test the performance of ACLN-GA. In order to fairly compare with previous methods, the algebra is set as 100 and the population size is 5, so that the epoch of each group could reach 500. In order to give full play to the classification performance of the proposed algorithm, the crossover incidence as 0.6 and mutation incidence as 0.2.

Briefly, local coding points are random feature points included in the search cube. The number of local coding points has a great influence on the performance and time cost of the network. A small number of coding points is not conducive to the expression of fine-grained features. Too many points will overwrite fine-grained features. In the classfication experiment, the default value of N is 2, and N=1 and N=2,3,4,5 are also compared.

In order to evaluate the experimental effect, refer to the evaluation criteria of δ and Δ. This is as follows: $δ (i) = \frac{1}{M_{c}} \sum_{j = 1}^{M} \frac{A_{ij}}{J_{ij}}$ (11)

$Δ (i) = \frac{T_{i}}{A_{i}}$ (12)

Where T_i represents the total number of target objects of class j in generation i, A_ij represents the correctly classified number of class j in generation i, M_c represents the total number of classes, Δ is the total correct rate, and δ is the average correct rate. Obviously, the larger these two parameters Δ and δ are,the better the experimental result will be. Therefore, in combination with the algorithm and experimental characteristics, there is a δ, which can better represent the excellent performance of the experimental results, as the criterion for designing the fitness function.

The form of fitness function has a great influence on the selection effect of population. The more obvious the individual fitness difference is, the more obvious the differentiation degree will be. In order to increase the survival adaptability of excellent individuals as much as possible, the method tries to ensure the same accuracy index. Combining Rule1 and Rule2, the fitness function is designed in the following form.

$fitness (i) = \frac{1}{(\frac{1}{M_{c}} \sum_{j = 1}^{M} \frac{T_{ij}}{A_{ij}})^{3}}$ (13)

The smaller the fitness function value is, the more advantageous the individual is in the competition, and the better the result will be.

Results analysis. In order to display the performance of ACLN-GA, different fitness functions and different selection strategies are set, and most of the results are better than PointNet [3] and Song’s [28].

Fig. 6 shows that different fitness functions have different effects on the performance of ACLN-GA algorithm. Fig. 6(a) presents standard experimental results. Fig. 6(b) shows the comparison between Form 1 and fitness Form 2, and it can be seen that Form 1 is more conducive to the end optimization of ACLN-GA. But the Form 2 is more conducive to the rapid convergence of the network. Fig. 6(c) and (d) show the comparison of fitness Form 1, 3 and 4. It can also be seen that Form 1 is superior to Form 4, while Form 3 is superior to Form 4. This set of experiments proved the correctness of Rule 1 and Rule 2.

Fig. 6

Model evolution with different fitness function forms. (a) Baseline(Form I). (b) Form II and baseline comparison. (c) Form III and baseline comparison. (d) Form IV and baseline comparison.

Fig. 7 shows that the number of coding points has a slight impact on the performance of ACLN-GA algorithm. Fig. 7(a) compares the case of N = 1 with the case of N=2. This is because a point cannot represent the characteristic case, and a matrix containing fine-grained features can be obtained when N=2. In Fig. 7(b)(c)(d), compared with the situation of N=2 and N=3,4,5, the situation of N=2 is slightly better than other situations, because the increase in the number of points will produce the situation of feature information being covered, which also verifies Song’s conclusion [28].

Fig. 7

Model evolution with different N values. (a) N=1 and N=2 comparison. (b) N=2 and N=3 comparison. (c) N=2 and N=4 comparison. (d) N=2 and N=5 comparison.

Fig. 8 shows that different selection strategies have different effects on the performance of ACLN-GA algorithm. It shows the comparison between the roulette strategy and the all-advantage retention strategy. It can be seen that the roulette strategy is more conducive to the optimization of ACLN-GA.

Fig. 8

Model evolution of different selection strategies.

Table 2 shows that the proposed method can achieve an accuracy rate of 89.5% and an average accuracy rate of 86.5%, which is better than previous results, especially in terms of average accuracy. This is due to the genetic evolution mechanism designed, so that the search parameters can still be adjusted adaptively at the end of the network evolution to improve the accuracy.

Table 2

Comparison of 3D classification results

Method	Input	Views	Accuracy avg.Class	Overall accuracy
SPH [17]	Mesh	-	68.2	-
3Dshapenets [7]	Volume	1	77.3	84.7
VoxNet [22]	Volume	12	83.0	85.9
LFD [36]	Image	10	75.5	-
PointNet [3]	Point	1	86.2	89.2
Song’s [28]	Point	1	85.5	89.5
ALCN-GA	Point	1	86.5	89.5

In addition, ALCN-GA can make the network convergence quickly, which can save training time and resources.

4.3 3D Semantic segmentation experiments

Experimental design In this section, S3DIS dataset [1] is used for segmentation test of ALCN-GA on the basis of classification experiment, and 6-fold crossover experiment is used for evaluation. Semantic segmentation of 3D point cloud is a complicated difficulty. The proposed method makes an improvement contribution to the solution of this kind of problem. In order to make a fair comparison with the previous methods, the algebra is set as 20, the population size as 2, so that the epoch of each group can reache 40. In order to give full play to the classification performance of the proposed algorithm, the crossover incidence is set as 0.5 and mutation incidence is set as 0.5. Similar to the classification experiment, N=4 is set, based on the empirical values [28].

The mIoU index is generally used to evaluate the segmentation effect, as shown below: $s Δ (i) = \frac{{TC}_{i}}{{AC}_{i}}$ (14)

Where TC_i represents the prediction accuracy of generation i, and AC_i represents the total number of participants in the prediction of generation i.

${IoU}_{ij} = \frac{{TC}_{ij}}{{GT}_{ij} + {AC}_{ij} - {TC}_{ij}}$ (15) $mIoU = \frac{1}{M_{s}} \sum_{j = 1}^{M} {IoU}_{ij}$ (16)

Where TC_ij represents the prediction accuracy of class j in generation i, AC_ij represents the total prediction number of class j in generation i, GT_ij represents the ground truth number of class j in generation i, M_s represents the total number of classes, and IoU_ij represents the IoU value of class j in generation i.

Here, sΔ (i) is the total accuracy rate. Obviously, the larger the parameter, the better the result. Therefore, this test combines the Algorithms 1, 2 and experimental characteristics, and uses sΔ (i) as the criterion for designing the fitness function. At the same time, mIoU of the results of each generation is calculated as the evaluation index. Similar to the classification experiment, the fitness function used in the experiment is expressed as follows: $fitness (i) = \frac{1}{(\frac{{TC}_{i}}{{AC}_{i}})^{3}}$ (17)

Results analysis Through the segmentation experiment, six groups of graphs of the relationship between evolutionary algebra and accuracy will be generated. It can be seen from Fig. 9(a)-(f) that ALCN-GA can accelerate the training speed of the segmentation network model. Only Fig. 9(d) and (e) of the six images don’t achieve convergence at the end of the experiment, while the other four groups of experiments all achieve convergence before the predetermined number of experiments. This indicates that the genetic evolution mechanism designed can adaptively adjust the search parameters at the initial stage to obtain the appropriate training network model. At the end, the network performance can be continuously enhanced and the segmentation effect can be promoted.

Fig. 9

Model evolution for 6 fold crossover experiments. (a) Segmentation on Test_area_1. (b) Segmentation on Test_area_2. (c) Segmentation on Test_area_3. (d) Segmentation on Test_area_4. (e) Segmentation on Test_area_5. (f) Segmentation on Test_area_6.

Table 3 shows the comparison between the experimental results of ALCN-GA and previous methods. Through evolutionary optimization, 3D classfication accuracy rate is improved about 10%, and the mIoU index is improved about 11% compared with PointNet. Combined with Fig. 10, it can be seen that the segmentation effect of ALCN-GA is clear in a variety of scenarios.

Table 3

Comparison of 3D semantic segmentation results

Method	Average IoU	Overall Accuracy
PointNet [3]	47.71	78.62
Song’s [28]	49.04	79.57
ALCN-GA	51.05	80.34

Fig. 10

Segmentation of different scenes. (a) Segmentation of Conference Room. (b) Segmentation of Lobby. (c) Segmentation of Office. (d) Segmentation of Open Space.

5 Conclusion

The main contribution of this paper is to propose an adaptive local information coding strategy combined with genetic algorithm to enhance the 3D object classification and segmentation effect of the point cloud deep learning network. Genetic algorithm is a commonly used dynamic optimization algorithm. Through crossover, mutation and selection operations, dynamic optimization of local coding parameters can be realized, so as to achieve the purpose of adaptive evolution. Combined with the simplified PointNet, ALCN-GA can adaptively capture the features of the point cloud, through training and evolution, it can enhance the classification and segmentation of 3D point cloud data. According to the coding parameters and the characteristics of the network, two design rules for fitness functions are summarized. The performance experiment of ALCN-GA is carried out on ModelNet40 dataset and S3DIS dataset. The experimental results show that ALCN-GA can not only meet the requirement of fast convergence in the early stage of training, but also can continuously search for better values in the later stage of training. This result verifies the correctness of the two rules. In addition, the proposed method can achieve an accuracy rate of 89.5% and an average accuracy rate of 86.5% in classification. Compared with PointNet, the proposed method improves the accuracy of classification about 10% and the mIoU of segmentation about 11%.

Future studies can focus on the effect between GA and other dynamic PointNet parameters for optimization. Morever, the wrong network processing results can also be dynamically fed back to the genetic algorithm to obtain better evolutionary effects. At the same time, it can be further studied from the perspective of time consuming.

Conflicts of interest

The authors declare that they have no conflicts of interest.

Footnotes

Acknowledgments

This work is supported by the National Key R & D Program of China (2018YFB1308400).

Novelty & contribution

This paper adds a model optimization structure based on Song’s[28] and PointNet []. According to the output parameters of the network, the accuracy, a feedback mechanism is designed to optimize the d parameter to get better learning effect. What’s more, this paper introduces Genetic Algorithm (GA) into the optimization and evolution process of d parameter. Summarized the design rules of the selection strategy, and found that the form of x^-3 is more suitable for the selection of the population. Furthermore, this paper designs an adaptive optimization strategy in the crossover and mutation links. On this basis, this paper tests the above innovations on public datasets ModelNet40 [36] and S3DIS [1], and compares them with experiments done by Song’s [28], PointNet [3] and others. The experimental datas show that progress has been made.

In this paper, the first author is mainly responsible for the experiment and the writing of the manuscript. The corresponding author is mainly responsible for sorting out the ideas, reviewing the paper and answering questions. The third author is responsible for assisting the first author to complete the experiment and revision of the paper.

References

Armeni

, Sener

, Zamir

A.R.

, Jiang

, Brilakis

, Fischer

and Savarese

, 3d semantic parsing of large-scale indoor spaces, In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 1534–1543.

Cao

, Wang

, Zhang

, Jin

and Vasilakos

A.V.

, Gchar: An efficient group-based context—aware human activity recognition on smartphone, Journal of Parallel and Distributed Computing 118 (2018), 67–80.

Charles

R.Q.

, Su

, Kaichun

and Guibas

L.J.

, Pointnet: Deep learning on point sets for 3d classification and segmentation, In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 77–85.

Cheng

, Chen

, He

, Liu

and Bai

, Pra-net: Point relation-aware network for 3d point cloud analysis, IEEE Transactions on Image Processing 30 (2021), 4436–4448.

Dai

, Chang

A.X.

, Savva

, Halber

, Funkhouser

and Nießner

, Scannet: Richly-annotated 3d reconstructions of indoor scenes, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 2432–2443.

Dorigo

, Maniezzo

and Colorni

, Ant system: optimization by a colony of cooperating agents, IEEE Transactions on Systems Man and Cybernetics, Part B (Cybernetics) 26(1) (1996), 29–41.

Engelcke

, Rao

, Wang

D.Z.

, Tong

C.H.

and Posner

, Vote3deep: Fast object detection in 3d point clouds using efficient convolutional neural networks, In 2017 IEEE International Conference on Robotics and Automation (ICRA) (2017), 1355–1361.

Gao

, Xie

, Wang

, Zhang

, Chen

and Wang

, Predicting human body composition using a modified adaptive genetic algorithm with a novel selection operator, Plos One 15 (2020), 1–23.

Gonçalves

J.F.

and Resende

M.G.C.

, Biased randomkey genetic algorithms for combinatorial optimization, J Heuristics 17 (2011), 487–525.

10.

Grefenstette

J.J.

, Genetic algorithms and machine learning. In Proceedings of the Sixth Annual Conference on Computational Learning Theory, COLT ’93, page 3–4, New York, NY, USA, (1993). Association for Computing Machinery.

11.

Guo

, Wang

, Hu

, Liu

and Bennamoun

, Deep learning for 3d point clouds: A survey, IEEE Transactions on Pattern Analysis and Machine Intelligence (2020), 1–1.

12.

Harkat

, Ruano

A.E.

, Ruano

M.G.

and Bennani

S.D.

, Gpr target detection using a neural network classifier designed by a multi-objective genetic algorithm, Applied Soft Computing 79 (2019), 310–325.

13.

Hornung

, Wurm

K.M.

, Bennewitz

, Stachniss

and Burgard

, Octomap: an efficient probabilistic 3d mapping framework based on octrees, Autonomous Robots 34(3) (2013), 189–206.

14.

, Peng

and Forrest

, Mean shift denoising of pointsampled surfaces, The Visual Computer 22 (2006), 147–157.

15.

Holland

J.H.

, Adaptation in Natural and Artificial Systems: An Introductory Analysis With Applications to Biology and Artificial Intelligence[M], Control (1975).

16.

Joseph-Rivlin

, Zvirin

and Kimmel

, Momen(e)t: Flavor the moments in learning to classify shapes, In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (2019).

17.

Kazhdan

, Funkhouser

and Rusinkiewicz

, Rotation invariant spherical harmonic representation of 3D shape descriptors, In Symposium on Geometry Processing (2003).

18.

Landrieu

and Simonovsky

, Large-scale point cloud semantic segmentation with superpoint graphs, In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 4558–4567.

19.

, Chen

B.M.

and Lee

G.H.

, So-net: Selforganizing network for point cloud analysis, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018).

20.

Liu

, Song

, Tian

, Ji

, Sung

, Wen

, Zhang

, Song

and Gozho

, Vbnet: Voxel-based broad learning network for 3d object classification, Applied Sciences 10(19) (2020), 6735.

21.

Matsuda

, Gotoh

, Adachi

, Inoue

and Kondo

, Computational analyses decipher the primordial folding coding the 3d structure of the beetle horn, Scientific Reports 11(1) (2021).

22.

Maturana

and Scherer

, Voxnet: A 3d convolutional neural network for real-time object recognition, In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2015), 922–928.

23.

Pareek

and Patidar

, Medical image protection using genetic algorithm operations, Soft Computing 20 (2014).

24.

Połap

, An adaptive genetic algorithm as a supporting mechanism for microscopy image analysis in a cascade of convolution neural networks, Applied Soft Computing 97 (2020), 106824.

25.

, Yi

, Su

and Guibas

, Pointnet++: Deep hierarchical feature learning on point sets in a metric space, In In Proceedings of the Advances in Neural Information Processing Systems (2017), 5099–5108.

26.

Ren

, Wang

and Xu

, An innovative segmentation method with multi-feature fusion for 3d point cloud, Journal of Intelligent and Fuzzy Systems 38(1) (2019), 1–9.

27.

Silva

, Aloise

, Coelho

L.C.

and Rocha

, Heuristics for the dynamic facility location problem with modular capacities, European Journal of Operational Research 290(2) (2021), 435–452.

28.

Song

, Gao

, Li

and Shen

, A novel point cloud encoding method based on local information for 3d classification and segmentation, Sensors 20(9) (2020), 2501.

29.

Tellez

, Litjens

, van der Laak

and Ciompi

, Neural image compression for gigapixel histopathology image analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2021), 567–578.

30.

Xiao

and Konak

, A genetic algorithm with exact dynamic programming for the green vehicle routing and scheduling problem, Journal of Cleaner Production 167 (2017), 1450–1463.

31.

Yan

, Zhao

, Hu

and Zeng

, Multimodal optimization problem in contamination source determination of water supply networks, Special Issue on Collaborative Learning and Optimization based on Swarm and Evolutionary Computation 47 (2019), 66–71.

32.

Zhang

, Xu

and Zhang

, A multi-objective cellular genetic algorithm for energy-oriented balancing and sequencing problem of mixed-model assembly line, Journal of Cleaner Production 244 (2019), 118845.

33.

Zhang

and Ma

, Hybrid fuzzy clustering method based on fcm and enhanced logarithmical pso (elpso), Computational Intelligence and Neuroscience 2020 (2020), 1–12.

34.

Zhang

, Li

, Guo

and Zhang

, Kdd: A kernel density based descriptor for 3d point clouds, Pattern Recognition 111 (2021), 107691.

35.

Zhao

, Jiang

, Fu

C.-W.

and Jia

, Pointweb: Enhancing local neighborhood features for point cloud processing, In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).

36.

, Song

, Khosla

, Yu

, Zhang

, Tang

and Xiao

, 3d shapenets: A deep representation for volumetric shapes, In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 1912–1920.

An adaptive locally-coded point cloud classification and segmentation network coupled with genetic algorithm

Abstract

Keywords

1 Introduction

2 Related works

2.1 Point cloud local coding

2.2 Genetic algorithm

2.3 Deep learning on point clouds

2.4 Point cloud datasets

Table 1 Point cloud datasets Datasets Year Samples Scans Classes Channels Application ModelNet40 [36] 2015 12311 - 40 3 Classification ShapeNet [36] 2015 51190 - 270 3 Segmentation S3DIS [1] 2017 - 272 13 9 Segmentation ScanNet [5] 2017 - 1513 20 5 Segmentation

3.1.1 Point cloud local search

3.2 Genetic operations

3.2.1 Crossover

4.1 Materials and settings

4.2 3D object classification experiments

Conflicts of interest

Footnotes

Acknowledgments

Novelty & contribution

References

Table 1
Point cloud datasets

Datasets Year Samples Scans Classes Channels Application

ModelNet40 [36] 2015 12311 - 40 3 Classification

ShapeNet [36] 2015 51190 - 270 3 Segmentation

S3DIS [1] 2017 - 272 13 9 Segmentation

ScanNet [5] 2017 - 1513 20 5 Segmentation