Abstract
Local information coding helps capture the fine-grained features of the point cloud. The point cloud coding mechanism should be applicable to the point cloud data in different formats. However, the local features of the point cloud are directly affected by the attributes, size and scale of the object. This paper proposes an Adaptive Locally-Coded point cloud classification and segmentation Network coupled with Genetic Algorithm(ALCN-GA), which can automatically adjust the size of search cube to complete network training. ALCN-GA can adapt to the features of 3D data at different points, whose adjustment mechanism is realized by designing a robust crossover and mutation strategy. The proposed method is tested on the ModelNet40 dataset and S3DIS dataset. Respectively, the overall accuracy and average accuracy is 89.5% and 86.5% in classification, and overall accuracy and mIoU of segmentation is 80.34% and 51.05%. Compared with PointNet, average accuracy in classification and mIoU of segmentation is improved about 10% and 11% severally.
Introduction
Local information coding plays a key role in 3D point cloud processing, especially in classification [20] and segmentation [26]. Methods for classification usually learn the features of each point first and then extract a global shape from point cloud. Based on classification, the goal of segmentation is to separate it into several subsets according to the semantic meanings of points [11]. The combination of local information coding and deep learning can effectively facilitate 3D data analysis.
Point-based methods [4] are more conducive to controlling obvious information loss, and is an increasingly popular point cloud processing method recently. There are [3, 26], and [35], where [3] is the representative of this class of methods. PointNet [3] uses MLPs and a symmetric function to build the point cloud processing network. However, it ignores the fine-grained features of the point cloud [25], such as the edges and corners of a table and the wings of an airplane. In order to solve this problem, many scholars have done a lot of work on local feature extraction of point cloud targets. PointNet++ [25] applies PointNet recursively on a nested partitioning of the input point set, enhancing the local feature description with context features. But the geometric relationships among points are not fully explored, since it treats each point independently.
Local information coding can effectively capture the fine-grained features of point clouds [2]. Based on PointNet [3], many improved networks have put forward to enhance the capture of local features. A superpoint graph structure is desighed by Landrieu and Simonovsky [18] to effectively represent the relationship between different object parts. Moreover, a SO-Net [19] is proposed to take advantage of local information in point clouds, but this structure becomes very complicated compared to Pointnet [3]. It is instructive that an point cloud encoding method that facilitates the deep learning network to utilize the local information can effectively balance network complexity and performance [28]. However, some parameters of this method are set manually, and it will improve the network performance if an automatic evolution parameter optimization mechanism is designed.
In Song’s method [28], two parameters, d and N, mainly affect the performance of classification and segmentation. Among them, N is directly related to the point cloud coding structure. In a complete training and testing process of the network, N is not easy to change, so the empirical value can be taken. The setting of parameter d will change with the change of the point cloud of the input network, and different d will affect the amount of information of the local feature description matrix of the point cloud. Therefore, in the process of capturing local features of the point cloud, dynamic optimization and adjustment of d parameter until the specific attribute parameters corresponding to the dataset are obtained, which will be conducive to improving classification and segmentation performance.
Genetic algorithm is a heuristic algorithm for optimization inspired by biological evolution based on the principles of heredity and gene selection [10]. It is widely used in dynamic programming problems [27]. Proposes three linear relaxation based heuristics (LRH) and an evolutionary heuristic that hybridizes a genetic algorithm with a variable neighborhood descent (GA+VND) to study the Dynamic Facility Location Problem with Modular Capacities (DFLPM). A hybrid solution approach that combines a genetic algorithm with the exact dynamic programming procedure (GA-DP) is proposed for studying the time-dependent vehicle routing and scheduling problem with CO2 emissions optimization (TD-VRSP-CO2) [30]. For parameter d dynamic programming problems, genetic algorithm can sacrifice a small amount of operation overhead to guide the training and parameter evolution of the network.
This paper proposes an Adaptive Locally-Coded point cloud classification and segmentation Network based on Genetic Algorithm, termed ALCN-GA. The network structure consists of three parts, local feature coding, genetic operations and point cloud processing network. The design of local feature coding is mainly to capture the fine-grained features of point cloud and obtain the input interface of convolution operation. The steps of coding method are as follows: First, the cube search range of local points is determined, then the standardization is carried out, and finally the random sorting operation is carried out to obtain the matrix expression of fixed format. The coding process produces two variable parameters, d and N. Wherein, parameter N realizes the empirical value through Song’s method [28], and parameter d is the main goal of dynamic programming. Genetic operations includes coding, selection, crossover and mutation of parameter d, and the design of appropriate selection strategy and fitness function. The point cloud processing network of ALCN-GA is a simplified PointNet [3], which removes the network layer of T-Net.
The influence parameters of ALCN-GA mainly include fitness function, selection strategy and the value of parameter N. The fitness function is the form of the change of classification and segmentation accuracy, among which the function similar to the shape of x-3 is more suitable. The roulette selection strategy is more suitable for the slow convergence of neural networks and thus less likely to lead to large jumps. Parameter N mainly affects the coding of the network. The larger the value, the more likely it is to increase the local information of the point cloud, but it will lead to the blurring of fine-grained features, and vice versa. By comparing different N values, combined with Song’s method [28], this factor is controllable.
In order to verify the rationality of ALCN-GA, several groups of experiments are conducted to compare the results of previous studies fairly. The selected datasets are the well-known ModelNet40 [36] and S3DIS [1] datasets. The overall accuracy and average accuracy is 89.5% and 86.5% in classification respectively, and overall accuracy and mIoU of segmentation is 80.34% and 51.05% respectively. Compared with PointNet, average accuracy in classification and mIoU of segmentation is improved by about 10% and 11%.
The rest of this paper includes a summary of the related works in Section 2, details of the proposed methods in Section 3, experiments in Section 4 and conclusion in Section 5.
Related works
This paper mainly involves four parts of research content: Point cloud local coding, Genetic Algorithm, Deep learning on point clouds and Point cloud Datasets. The work research of this paper refers to the results of many scholars, which systematically guides the follow-up progress.
Point cloud local coding
The local coding method has a wide range of applications. KDD uses kernel density parameters to build a descriptor to code the 3D spatial information around the feature points, contributing to the strategy of adapting different matching indexes under different resolutions [34]. Manipulating local folding method has been designed to decode the primordial folding pattern [21]. Neural Image Compression (NIC) combines global and local visual information for neural image compression and histopathological image analysis of gigapixels [29]. It can be seen that the local information coding method has an obvious effect on the fine-grained feature capture of data.
Genetic algorithm
Heuristic algorithm is mainly used to solve the problem of parameter optimization [31]. Common heuristic algorithms include Genetic Algorithm(GA) [15], Particle Swarm Optimization(PSO) [33], Ant Colony Optimization(ACO) [6], and Others. Among them, Genetic algorithm is proposed by Holland on the basis of Darwin’s theory of natural selection. [29] and [23] use genetic algorithms for medical image encryption; [9] proposes an improved genetic algorithm based on biased random key for combinatorial optimization sequencing. In addition, genetic algorithm is applied to sequencing problem of mixed-model assembly line, thus reducing energy consumption [32]. With the development of sensing technology, the demand for processing large amounts of data is increasing, and the combination of genetic algorithm and neural network is an important method [24]. Applies the combination of genetic algorithm and convolutional neural network to adaptive image analysis, and the accuracy is improved compared with transfer learning [12]. Uses multi-objective genetic algorithm to design a neural network radial basis function to classify the Windows of GPR. For the point cloud deep learning network, genetic algorithm can also be used to optimize relevant parameters.
Deep learning on point clouds
With the improvement of GPU manufacturing level, deep learning methods are gradually applied to point cloud processing. For example, the well-known PointNet, leverages a simple combination of MLPs and max pooling achieves direct consumption of the point clouds [3]. A large number of subsequent scholars have extended and improved on this basis, and put forward futher works [16, 35]. This type of approach is effective for testing fixed datasets, but for many 3D scenarios, it is also necessary to accommodate varying point cloud data. Meanwhile, the fine-grained features of the input point cloud should be cared. Song et al. proposes a local information coding method that captures the fine-grained features of the point cloud [28].
Point cloud datasets
Point cloud dataset is used to evaluate the performance of the point cloud deep learning method [11]. Applications of point cloud datasets mainly include 3D shape classification and 3D point cloud segmentation. Common datasets ModelNet40 and ShapeNet [36] belong to synthetic datasets, which are often used to test the accuracy of classification algorithms. Segmentation datasets, such as S3DIS [1] and ScanNet [5], are usually acquired by different types of sensors, which are often used to develop target segmentation algorithms (Table 1).
Point cloud datasets
Point cloud datasets
Inspired by Song’s method [28], a point cloud coding method that can contain a variety of information is implemented, which is used as a part of the ALCN-GA architecture (as shown in Fig. 1). Wherein, the point cloud data including size and the RGB information (Fig. 1(a)) may be obtained by the coding operation, to obtain a regular matrix representation (Fig. 1(b)). Meanwhile, an appropriate genetic algorithm is designed to adjust the coding parameters adaptively in order to obtain the anticipated coding parameters according to the characteristics of the datasets (Fig. 1(c)).

The overview of ALCN-GA. (a)Raw point cloud. (b)Local information coding. (c) GA optimization mechanism. (d) Simple PointNet.
PointNet implements point cloud processing through clever design [3]. Referring to the classification and segmentation network in PointNet, this paper simplifies and designs the interface of the convolutional layer, which can realize the processing of the locally coded point cloud (Fig. 1(d)).
The so-called point cloud local coding is to transform the features of the surrounding points of a point into matrix expression. Studying the local information of a specific point can effectively capture the fine-grained features of the point cloud. For point cloud deep learning, a fixed data input format should be set, and the robustness of the data should also be considered. In order to realize the expectation, the coding design of local feature of point cloud includes the following three parts.
Point cloud local search
For any set of input 3D point cloud data, first select local feature points, and then search for points in their vicinity. There are many ways to search, the common methods are octree [13], KD tree [14] and so on. Theoretically, avoiding too much distance calculation, using Boolean operation and index method to fetch points can improve the search efficiency.
By using cube area search method to search point cloud locally, it can avoid distance calculation and directly conduct conditional index. Set the input raw point cloud (Fig. 1(a)) as K = {κ
i
|i = 1, 2, ..., n}, and select a local point, κ
i
(x
i
, y
i
, z
i
). Define a cube search area
A specific number of points (N points) are randomly selected from this search range (Fig. 1(b)). After selecting the target point, the additional information of the point cloud, such as color, normal vector and scalar field, can be stored in the vector set.
The data input to the point cloud deep learning network is a set of fixed-size matrix. However, within the search scope of local point cloud, it is impossible to form all the points into a matrix. Only N points are chosen for coding, which resultes in an insufficient number of points. Set the fill function: f : α (p1, p2, ..., p a ) → β (p1, p2, ..., p b ) , a ≤ b where ∀N > 0, ∃ a ≤ b makes the coding matrix miss information elements. So, β (p1, p2, ..., p b ) = α (p1, p2, ..., p a ) ∘ ∂ (o1,1, o2,1, ..., ob,b) can standardize the coding matrix for easy calculation (Fig. 1(b)). Here, ∂ stands for filling operation, ∘ stands for matrix transformation operation, and its computational meaning is as follows:
Where O is the transformation factor in the ∂ matrix.
In order to avoid the consistency of data and increase the robustness of training, the algorithm shuffles and reorganizes the standardized data, but keeps the number of point cloud groups unchanged (Fig. 1(b)). Set the shuffle function: g : β (p1, p2, ..., p b ) → η (p3, p2, ..., pb-1), which is also a symmetric function whose purpose is to reorganization matrix [3].
Genetic operations
Combining the characteristics of local coding parameters, some genetic evolution operations are designed to continuously evolve better coding parameters (Fig. 1 (c)).
Crossover
Considering that the object of optimization is a series of continuous real numbers, a random breakpoint method is used to cross chromosomes (Fig. 2). The real number crossover process is as follows: Judge whether crossover operation occurs; Random selection of parents and random generation of intersection points; Progeny chromosomes are generated from parental chromosome data.

Real number cross processing.
When optimizing the local search parameters, the mutation operation has two main functions: (1) Improve the convergence speed; (2) Prevent falling into local optimality. At the initial stage of network model training, the mutation operation can search the most appropriate parameters to make the model converge quickly. At the end of training, in order to get a better network model, new local search parameters are constantly mutated so as to jump out of the local optimal. The real coding mutation process is shown as follows (Fig. 3): Judge whether mutation operation occurs; The location and amount of mutations are randomly selected; Any new progeny can mutate.

Mutation processing.
The purpose of the selection operation is to filter out excellent local search parameters from the overall. The commonly used selection strategies include roulette, average selection, perfect retention and so on [8]. This paper uses a suitable roulette selection strategy. The roulette method is beneficial to improve the probability of being selected by individuals with strong survivability, which is called a random sampling method with put back. According to the probability of parents appearing in the population, the offspring are randomly selected to form individuals. The main steps are as follows: Calculate the fitness of each individual;
Calculate the probability of individuals being selected to the next generation;
Calculate the probability of individual accumulation;
Random number 0 - 1 is generated. Call the random number as ξ. If Pi-1 ≤ ξ ≤ P
i
is satisfied, individual i is selected.
According to the cumulative selection probability of each generation of the population, the random number method is used to select the offspring into the next generation population, which requires the design of a suitable fitness function.
Obviously, the fitness function should be related to the classification or segmentation accuracy of the point cloud deep learning network model. The general accuracy calculation is a normalized expression.
Where Acc represents the index between 0 and 1, and NC represents the number of correct subjects in the experiment, and NA represents the total number of subjects participating in the experiment. Considering that the accuracy parameters are all less than 1. In order to magnify the differences of the population, the fitness function should be modified to meet the following rules:
In theory, the accuracy will increase with the increase of iterations, which is the dual role of neural network evolution and genetic evolution. In order to make the iteration converge as fast as possible, the fitness function should make the population in the early stage have a greater degree of discrimination. For example, x-1 and x-3 in the Fig. 4, it is obvious that x-3 has a more obvious distinguishing effect.

The fitness and accuracy relation of different selection functions.
In the convergence stage, the population change difference is not obvious, but the network parameters cannot be kept consistent. Meanwhile, the diversity of parameters should be increased, and the variation should be ensured near the optimization results. Therefore, the fitness function should be changed slowly and somewhat at the end. Therefore, the following rule should also be met:
Where v1,2 represents the evolutionary parameters after adjustment, v1,2 represents the evolutionary parameters before adjustment, ω represents the adjustment factor, ɛ represents the random number between 0 and 1, Δ epresents the adjustment quantity, and Counter represents the counter.
Referring to several functions in the Fig. 5, it is found that the fractional function has a good solution to this kind of problem. In this paper, the form of fitness (x) = x-3 is used as the fitness function (Fig. 4).

Evolutionary terminal regulatory mechanisms.
The chromosome coding is combined with the empirical parameters mentioned above to set the upper and lower limits. The recording operation should be consistent with the network structure and the structural characteristics of the genetic algorithm, so as to achieve the purpose of marking the optimal chromosome.
The core of PointNet consists of a symmetric function(max pooling) and MLPs [3]. However, this method is not accurate for local feature capture. Refer to the latter part of PointNet, and improve the part of local feature extraction (Fig. 1(d)).
Define Eϖ as the point cloud of the input network. Combined with coding operation, the expression is as follows:
Where parameter n represents the number of raw points, κ i represents the input point cloud, f and g are point cloud coding operations.
The symmetric function here is the Max, which means that the output is consistent in the case of any arrangement of input. Therefore, the PointNet mechanism can be summarized by the following formula:
Where φ k and φ h is MLPs. The network structure evaluates the result k to determine the category of the input point cloud. The realization of segmentation function is to combine the output of φ h and the output of φ g into a new vector, which is sent into the MLPs of φ m , so as to obtain new evaluation parameters to guide segmentation.
Where ∑ represents vector concatenation and * represents omitted parameters.
By taking the maximum value of m parameter and comparing it with the label, the predicted classification results are obtained.
The experiment consists of three parts: 1) Materials and Settings; 2) 3D Object Classification Experiments; 3) 3D Semantic Segmentation Experiments.
Materials and settings
Datasets ModelNet40 [36] and S3DIS [1] are used in this experiment. ModelNet40 contains 12311 samples for 3D classification, and S3DIS has 272 scans for 3D semantic segmentation (Table 1).
The setting of algorithm parameters should be combined with the point cloud deep learning network. In order to facilitate comparative experiments, the number of cross offspring in each generation should be kept at 2-3, and the number of mutant offspring should be kept at 1-2.
In these experiments, TensorFlow is used to build a deep learning network, while programming in Python3.6 environment. The hardware environment for the experiments includes: Intel® Core(TM) i7-10875H CPU @ 2.30GHZ, NVIDIA GeForce RTX 2070 and 32GB RAM.
3D object classification experiments
Briefly, local coding points are random feature points included in the search cube. The number of local coding points has a great influence on the performance and time cost of the network. A small number of coding points is not conducive to the expression of fine-grained features. Too many points will overwrite fine-grained features. In the classfication experiment, the default value of N is 2, and N=1 and N=2,3,4,5 are also compared.
In order to evaluate the experimental effect, refer to the evaluation criteria of δ and Δ. This is as follows:
Where T i represents the total number of target objects of class j in generation i, A ij represents the correctly classified number of class j in generation i, M c represents the total number of classes, Δ is the total correct rate, and δ is the average correct rate. Obviously, the larger these two parameters Δ and δ are,the better the experimental result will be. Therefore, in combination with the algorithm and experimental characteristics, there is a δ, which can better represent the excellent performance of the experimental results, as the criterion for designing the fitness function.
The form of fitness function has a great influence on the selection effect of population. The more obvious the individual fitness difference is, the more obvious the differentiation degree will be. In order to increase the survival adaptability of excellent individuals as much as possible, the method tries to ensure the same accuracy index. Combining
The smaller the fitness function value is, the more advantageous the individual is in the competition, and the better the result will be.
Fig. 6 shows that different fitness functions have different effects on the performance of ACLN-GA algorithm. Fig. 6(a) presents standard experimental results. Fig. 6(b) shows the comparison between Form 1 and fitness Form 2, and it can be seen that Form 1 is more conducive to the end optimization of ACLN-GA. But the Form 2 is more conducive to the rapid convergence of the network. Fig. 6(c) and (d) show the comparison of fitness Form 1, 3 and 4. It can also be seen that Form 1 is superior to Form 4, while Form 3 is superior to Form 4. This set of experiments proved the correctness of

Model evolution with different fitness function forms. (a) Baseline(Form I). (b) Form II and baseline comparison. (c) Form III and baseline comparison. (d) Form IV and baseline comparison.
Fig. 7 shows that the number of coding points has a slight impact on the performance of ACLN-GA algorithm. Fig. 7(a) compares the case of N = 1 with the case of N=2. This is because a point cannot represent the characteristic case, and a matrix containing fine-grained features can be obtained when N=2. In Fig. 7(b)(c)(d), compared with the situation of N=2 and N=3,4,5, the situation of N=2 is slightly better than other situations, because the increase in the number of points will produce the situation of feature information being covered, which also verifies Song’s conclusion [28].

Model evolution with different N values. (a) N=1 and N=2 comparison. (b) N=2 and N=3 comparison. (c) N=2 and N=4 comparison. (d) N=2 and N=5 comparison.
Fig. 8 shows that different selection strategies have different effects on the performance of ACLN-GA algorithm. It shows the comparison between the roulette strategy and the all-advantage retention strategy. It can be seen that the roulette strategy is more conducive to the optimization of ACLN-GA.

Model evolution of different selection strategies.
Table 2 shows that the proposed method can achieve an accuracy rate of 89.5% and an average accuracy rate of 86.5%, which is better than previous results, especially in terms of average accuracy. This is due to the genetic evolution mechanism designed, so that the search parameters can still be adjusted adaptively at the end of the network evolution to improve the accuracy.
Comparison of 3D classification results
In addition, ALCN-GA can make the network convergence quickly, which can save training time and resources.
The mIoU index is generally used to evaluate the segmentation effect, as shown below:
Where TC i represents the prediction accuracy of generation i, and AC i represents the total number of participants in the prediction of generation i.
Where TC ij represents the prediction accuracy of class j in generation i, AC ij represents the total prediction number of class j in generation i, GT ij represents the ground truth number of class j in generation i, M s represents the total number of classes, and IoU ij represents the IoU value of class j in generation i.
Here, sΔ (i) is the total accuracy rate. Obviously, the larger the parameter, the better the result. Therefore, this test combines the

Model evolution for 6 fold crossover experiments. (a) Segmentation on Test_area_1. (b) Segmentation on Test_area_2. (c) Segmentation on Test_area_3. (d) Segmentation on Test_area_4. (e) Segmentation on Test_area_5. (f) Segmentation on Test_area_6.
Table 3 shows the comparison between the experimental results of ALCN-GA and previous methods. Through evolutionary optimization, 3D classfication accuracy rate is improved about 10%, and the mIoU index is improved about 11% compared with PointNet. Combined with Fig. 10, it can be seen that the segmentation effect of ALCN-GA is clear in a variety of scenarios.
Comparison of 3D semantic segmentation results

Segmentation of different scenes. (a) Segmentation of Conference Room. (b) Segmentation of Lobby. (c) Segmentation of Office. (d) Segmentation of Open Space.
The main contribution of this paper is to propose an adaptive local information coding strategy combined with genetic algorithm to enhance the 3D object classification and segmentation effect of the point cloud deep learning network. Genetic algorithm is a commonly used dynamic optimization algorithm. Through crossover, mutation and selection operations, dynamic optimization of local coding parameters can be realized, so as to achieve the purpose of adaptive evolution. Combined with the simplified PointNet, ALCN-GA can adaptively capture the features of the point cloud, through training and evolution, it can enhance the classification and segmentation of 3D point cloud data. According to the coding parameters and the characteristics of the network, two design rules for fitness functions are summarized. The performance experiment of ALCN-GA is carried out on ModelNet40 dataset and S3DIS dataset. The experimental results show that ALCN-GA can not only meet the requirement of fast convergence in the early stage of training, but also can continuously search for better values in the later stage of training. This result verifies the correctness of the two rules. In addition, the proposed method can achieve an accuracy rate of 89.5% and an average accuracy rate of 86.5% in classification. Compared with PointNet, the proposed method improves the accuracy of classification about 10% and the mIoU of segmentation about 11%.
Future studies can focus on the effect between GA and other dynamic PointNet parameters for optimization. Morever, the wrong network processing results can also be dynamically fed back to the genetic algorithm to obtain better evolutionary effects. At the same time, it can be further studied from the perspective of time consuming.
Conflicts of interest
The authors declare that they have no conflicts of interest.
Footnotes
Acknowledgments
This work is supported by the National Key R & D Program of China (2018YFB1308400).
Novelty & contribution
This paper adds a model optimization structure based on Song’s[28] and PointNet [
]. According to the output parameters of the network, the accuracy, a feedback mechanism is designed to optimize the d parameter to get better learning effect. What’s more, this paper introduces Genetic Algorithm (GA) into the optimization and evolution process of d parameter. Summarized the design rules of the selection strategy, and found that the form of x-3 is more suitable for the selection of the population. Furthermore, this paper designs an adaptive optimization strategy in the crossover and mutation links. On this basis, this paper tests the above innovations on public datasets ModelNet40 [36] and S3DIS [1], and compares them with experiments done by Song’s [28], PointNet [3] and others. The experimental datas show that progress has been made.
In this paper, the first author is mainly responsible for the experiment and the writing of the manuscript. The corresponding author is mainly responsible for sorting out the ideas, reviewing the paper and answering questions. The third author is responsible for assisting the first author to complete the experiment and revision of the paper.
