Look inside 3D point cloud deep neural network by patch-wise saliency map

Abstract

The 3D point cloud deep neural network (3D DNN) has achieved remarkable success, but its black-box nature hinders its application in many safety-critical domains. The saliency map technique is a key method to look inside the black-box and determine where a 3D DNN focuses when recognizing a point cloud. Existing point-wise point cloud saliency methods are proposed to illustrate the point-wise saliency for a given 3D DNN. However, the above critical points are alternative and unreliable. The findings are grounded on our experimental results which show that a point becomes critical because it is responsible for representing one specific local structure. However, one local structure does not have to be represented by some specific points, conversely. As a result, discussing the saliency of the local structure (named patch-wise saliency) represented by critical points is more meaningful than discussing the saliency of some specific points. Based on the above motivations, this paper designs a black-box algorithm to generate patch-wise saliency map for point clouds. Our basic idea is to design the Mask Building-Dropping process, which adaptively matches the size of important/unimportant patches by clustering points with close saliency. Experimental results on several typical 3D DNNs show that our patch-wise saliency algorithm can provide better visual guidance, and can detect where a 3D DNN is focusing more efficiently than a point-wise saliency map. Finally, we apply our patch-wise saliency map to adversarial attacks and backdoor defenses. The results show that the improvement is significant.

Keywords

Saliency map point cloud deep neural network critical points adversarial attack

1. Introduction

The rapid development of virtual reality (VR), augmented reality (AR) and self-driving all require the construction of 3D scenes [1, 2, 3, 4, 5, 6, 7]. Point clouds are the most efficient way to represent 3D data since they are close to the raw sensor data [8, 9, 10, 11, 12]. To process the point cloud efficiently, many industry domains try to understand point clouds using Deep Neural Network (DNN). However, the point cloud cannot be directly consumed by typical DNN designed for 2D images due to its unordered points. To compromise the data structure, point clouds were rasterized [13, 14, 15] or projected [16, 17, 18]. However, the data transformation will inevitably causes shape disturbances. PointNet [19] is the pioneering deep model to directly process point clouds. It learns a spatial encoding of each point and then aggregates all individual point features to a global point cloud signature. Following PointNet, many 3D DNNs were proposed successively including PointNet $++$ [21], PointBert [22] and RSCNN [23]. They all achieve remarkable results with respect to classification and segmentation tasks.

Figure 1.

Visualization of our proposed patch-wise saliency map, along with critical point set theory [19] and point-wise saliency map [20]. Red means the point is important, and blue means unimportant, for specific 3D DNN making its decision. The patch-wise saliency map guides people to learn which region the 3D DNN focuses on and gathers more importance/unimportance using the same number of points.

Figure 2.

To discuss the reliability of the point-wise saliency map, we drop the detected critical points and find this makes some of the remaining non-critical points become critical. Meanwhile, the topological structure and 3D DNN output change slightly. Therefore, we argue that the 3D DNN judges a point cloud mainly according to its structure (represented by the critical points) instead of some specific points. The finding suggests that critical points are alternative (replaceable) and unreliable. As a result, we believe that it is more meaningful to discuss local structure-wise (named patch-wise) importance than point-wise importance.

However, the black-box character of DNNs prevents them from being applied in many safety-critical domains such as self-driving, 3D face recognition and augmented reality [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]. The saliency map technique (shown as Fig. 1) is a remarkable way to explain the black-box. It aims to visualize the importance of an input data. The generated saliency map provides an efficient guidance for point cloud feature selection [35, 36] and compression [37, 38]. In the image domain, many saliency map algorithms have been proposed [39, 40, 41, 42, 43, 44]. In this way, people are able to understand how a DNN deals with an image to a great extent. However, due to the discrimination between images and point clouds, few saliency techniques designed for images can deal with point clouds. Therefore, it is urgent to exploit a new method to explain how 3D DNN consumes a point cloud.

We recall the critical point sets theory in PointNet [19], which describes that the output will change if and only if the critical points are moved. The theory suggests that one 3D DNN only focuses on a subset points of the point cloud when making its decision. Recently, a few works [45, 46, 20] have been proposed to refine the theory and assign importance to every point based on the gradient information. As a typical and representative work, Zheng et al. [20] obtains the saliency score of each point by calculating the gradient under spherical coordinate. We refer the method designed by Zheng et al. [20] as point-saliency in this paper. In sum, these works all aim to evaluate point-wise saliency.

To discuss the reliability of point-wise saliency, we drop the critical points and observe the change in the non-critical points. The result in Fig. 2 shows that once critical points are moved, the non-critical points adjacent to them will be exposed and replaced as the new critical points. Therefore, we argue that (Motivation:) a point becomes critical because it is responsible for representing one specific local structure. However, conversely, one local structure does not have to be represented by some specific points. Meanwhile, the non-critical points are not useless. They are just obscured by the critical points to express information (topological structure). Therefore, we argue that the local structure is more important than individual points for the 3D DNN recognizing a point cloud. In addition, the critical points indicated by point-wise saliency [19, 45, 46, 20] are alternative (replaceable) and unreliable. Furthermore, for solid evaluation, we randomly drop several points from the point cloud and observe the 3D DNN output. The results suggest that the probability change is less than 3% as long as the global structure is preserved.

In addition, in many 3D engineering tasks such as self-driving, objects are frequently blocked [47, 48, 49, 50, 51]. In this way, the obtained point clouds are usually incomplete and missing patches. Thus, it is critical to research the patch-wise importance of point clouds for 3D DNN applications.

Furthermore, existing point-wise point cloud saliency methods assign saliency score based on the structure of 3D DNNs. However, the specific information of 3D DNNs is usually hidden in most engineering senses. Therefore, designing a saliency method under the black-box setting is urgent.

The above several analyses suggest that (a) the critical points indicated by point-wise saliency are service for local structure, (b) 3D DNN recognizes a point cloud according to its local/global structure instead of some specific points, and (c) studying patch-wise saliency under the black-box setting is critical for 3D DNN applications. Therefore, we argue that it is more meaningful to discuss which patch of the topological structure is important rather than specific points.

Based on the above motivations, this paper aims to design a method to generate a patch-wise saliency map under the black-box setting. In detail, one way to evaluate the importance of points is to drop them, and observe the probability change. In this paper, we drop points in patches instead of individual points [45, 46, 20]. However, segmenting point clouds in patches is a tricky problem since important/unimportant patches usually have different sizes and we have no priori knowledge. Our idea is to design a Mask Building-Dropping process to adaptively match the patch size. In the experiment, we compare the patch-wise saliency algorithm with the point-wise saliency algorithm [20] on typical 3D DNNs. The experimental results suggest that our patch-wise saliency algorithm is able to find the important/unimportant regions more efficiently in most cases. Finally, we apply our patch-wise saliency map to adversarial attacks and backdoor defenses, and achieve significant improvement. In addition, from the aspect of resolution for sensitivity analysis, our patch-wise saliency map has a lower resolution than the point-wise saliency map. However, this has a slight impact on its performance, as shown in the experimental results.

In sum, we analyse the related works in Section 2. Then, the proposed patch-wise saliency map is illustrated in Section 3. The conducted experiments are exhibited in Section 4. Moreover, we apply the proposed method to adversarial attacks and backdoor defenses, as shown in Section 5. Finally, we conclude our work and provide suggestions for future works in Section 6. Our main contributions are as follows:

•

We reveal that critical points indicated by current point-wise saliency maps are mainly service for local structures. Therefore, discussing saliency of local structure (patch of point cloud) is more meaningful.

•

A black-box algorithm for generating a patch-wise point cloud saliency map is proposed to help us determine where a 3D DNN mainly focuses when making its decision.

•

Existing adversarial attacks and backdoor defenses are improved by combining the proposed saliency technique. Meanwhile, we find that the saliency patch provides key dimensions to cross the decision boundary.

•

Our research is practical. The obtained point clouds in engineering scenarios are usually incomplete and missing patches. Therefore, determining the important/unimportant patches is meaningful for practical 3D DNNs application.

2. Related works and preliminaries

2.1 Image saliency map

In the image domain, a saliency map is used to characterize the contribution of each pixel value to the recognition result [52, 53]. According to different ways to process input images, saliency algorithms can be classified as CAM-based [40, 54, 55, 56, 57], back propagation-based [41, 58, 59] and perturbation-based [42, 60, 61, 62]. However, due to data restrictions, existing saliency map techniques cannot be directly used to analyse point clouds.

2.2 Point-wise point cloud saliency map

To date, a few works have been designed to build saliency map for point clouds, which all focus on point-wise saliency. According to the situation of DNN, they can be classified as DNN-specific [45, 46, 20] or DNN-agnostic [63, 64]. In detail, DNN-agnostic saliency methods explore the saliency of each point based on the point cloud geometric information. They cannot show the characteristics of DNNs and are not the area on which this paper focuses.

DNN-specific saliency techniques aim to explore which points the DNN focuses when making its decision. Zheng et al. [20] propose a representative method that obtains saliency scores by calculating gradients under spherical coordinates. Following it, Liang et al. [46] utilize the non-contributing factors to design point cloud saliency map. Without utilizing gradient information, Souai et al. [45] generate a point cloud saliency map by designing a deep learning model.

As the typical and representative work to obtain point-wise saliency, we introduce the work from Zheng et al. [20] in detail. The authors consider dropping a point to be similar to moving it to the centre of the point cloud. After that, the loss change is calculated by moving a point to the centre. Specifically, shifting a point towards the centre by a constant $\eta$ will increase the loss $L$ by $-\frac{dL}{dr}\eta$ . Thus, the final score is:

$\displaystyle s_{i}=-\frac{dL}{dr_{i}}r_{i}^{1+\alpha}$ (1)

where $r_{i}=\sqrt{\sum_{j=1}^{3}(x_{ij}-x_{cj})^{2}}$ , $x_{c}$ is the centre of the point cloud, $x$ is the Euclidean coordinates of each point, and $\alpha$ is a hyper-parameter.

Zheng et al. [20] regard the loss variation as a point’s score after dropping it by calculating the gradient under a spherical coordinate system, which imitates the method in the image field. It is the refinement of concept critical point sets from [19]. In this paper, we focus on characterizing the contribution of each patch instead of specified points.

Critical Point Sets Theory is proposed in PointNet [19]. The theory suggests that the output of PointNet only depends on a part of the input point cloud due to the designed MaxPool. The author refers to these points as critical point sets. Furthermore, the critical point sets theory also works for other 3D DNNs (such as DGCNN and RSCNN) despite various model structures. We give this conclusion by computing the gradients of the 3D DNN output with respect to the input point cloud. Furthermore, the points with non-zero gradient are selected as the critical point sets.

3. Patch-wise point cloud saliency map

The formal description of 3D DNN is as follows. A 3D point cloud $X\in R^{n*3}$ is constituted by $N_{X}$ 3D points, whose point $x_{i}\in R^{3}$ is represented by a 3D coordinates ( $x_{i}$ , $y_{i}$ , $z_{i}$ ). 3D DNN $F(X,\theta)=t$ maps an input point cloud $X$ to its corresponding class label $t\in T$ , with the well trained parameters $\theta$ .

Then, we define the saliency map technique as follows. Given a point cloud $X$ and a well-trained 3D DNN $F(\cdot)$ , the saliency map algorithm aims to assign each point a score that represents its saliency for the 3D DNN recognizing the point cloud. The formal description is:

$\displaystyle Sl=G(F,X)$ (2)

where $G(\cdot)$ is the designed saliency map algorithm and $S l$ represents the generated saliency map which has the same size as $X$ . Our aim is to design a new $G(\cdot)$ that could better obtain the saliency score.

One way to evaluate the saliency of a point is to drop it, and observe the probability change. Therefore, we drop points in patches and record the probability change to evaluate the saliency score. However, there is a key issue of how to segment patches on point clouds. Since important/unimportant patches usually have different sizes, hard segmentation will separate them and fail to score them. Our idea is to newly build patch masks for various segmentation schemes in one evaluation. For one unique built patch mask, points are assigned different states in patch. Then, we drop each patch according to the built mask and assign the probability change to every point of it. After repeating Mask Building-Dropping for $S$ times, every point will get $S$ probability variations. The average probability changes will be regarded as its final score. In this way, the score of each point is associated with multiple masked patches that contain the point and can show the adjacent information. Once every point obtains a final score, we regard the clustered points with similar score as the final patch, which can adaptively match the sizes of important/unimportant patches. We refer to this process as Mask Building-Dropping shown as Fig. 3, which can be formally described as follows.

Figure 3.

The outline to generate patch-wise point cloud saliency map. The inputs include a original point cloud and a well-trained 3D DNN. The output is a patch-wise saliency map that indicate where the 3D DNN focuses on the point cloud. Specifically, different colour schemes of patch masks lead to different patch dropping schemes, which have no semantic information. By repeating Mask Building-Dropping for $S$ times, each point obtains $S$ scores that associate with $S$ different patch segmentation information and the corresponding patch saliency. Then, the average score is regarded as the final score of each point. The gathered points with close scores will adaptively match the size of important/unimportant patches.

Building Patch Mask

Different from the previous binary mask $M_{b}\rightarrow\{0,1\}$ that distinguishes points to opposite states, we want to evaluate points in various states. Therefore, we newly define a patch mask $M\rightarrow\{0,1,\ldots,N_{c}\}$ of size $N_{X}\times 1$ with state variation $N_{c}$ . Each element of patch mask $M$ is associated with one point of point cloud $X$ in order, and its value indicates the state in the Dropping process.

State variation $N_{c}$ is a hyper-parameter which impacts the accuracy of the acquired important/unimportant patch size. Since it decides the segmented patch size, a larger $N_{c}$ tends to catch more global information, and a smaller $N_{c}$ tends to catch more local information. At the same time, to lead patch building, we utilize farthest point sampling (FPS) to sample a set of centre points $\mathcal{P}_{c}$ of $N_{c}$ states on $X$ . Compared with random sampling and curvature sampling, it has a more even distribution given the same sampling times.

Aiming at assembling points in patch, we want the points with the same state to be as close as possible. Therefore, we allocate state $j\in N_{c}$ to each point by the K nearest neighbors (KNN) algorithm. Therefore, each patch is built by:

$\displaystyle\textit{Patch}_{j}=\textit{KNN}(p^{j},K)\quad s.t.\quad p^{j}\in% \mathcal{P}_{c}$ (3)

where $\textit{Patch}_{j}$ is the point set with the same state $j$ and $p^{j}\in P_{c}$ is the center point of $\textit{Patch}_{j}$ . $K=\lfloor\frac{N_{X}}{N_{c}}\rfloor$ means the number of neighbors for the KNN algorithm.

We have now assigned a value to patch mask $M$ . In the next step, we drop points in patch guided by $M$ and evaluate their saliency.

Dropping

We define the dropping process as $M\bigodot X$ . In detail, for one dropping process, we specify one state to drop. Meanwhile, the patches of rest states stay the same. The formal description is as follows:

$\displaystyle\left\{\begin{array}[]{ll}\textit{Patch}_{i}=\textit{None},&\text% {if }\quad M(i)=m\\ \textit{Patch}_{i}=\textit{Patch}_{i},&\text{if }\quad M(i)\neq m\end{array}\right.$ (4)

where $\textit{Patch}_{i}\in X$ is the patch in point cloud $X$ ; $i$ is the index of $\textit{Patch}_{i}$ ; $M(i)$ is the state of $\textit{Patch}_{i}$ ; $\textit{Patch}_{i}=\textit{None}$ means dropping the patch from the point cloud; patch state $m\in\{0,1,\ldots,N_{c}\}$ indicates which patch to drop.

The point cloud after dropping the m-th patch is represented by $X_{dm}=M(j)|_{j=m}\bigodot X$ . The score $O_{m}$ of the m-th patch is calculated by:

$\displaystyle O_{m}=\log\left(\frac{P(t|X_{dm},\theta)}{P(t|X,\theta)}\right)$ (5)

Here, $P(t|X,\theta)$ means the probability that $X$ is classified as label $t$ . After that, the patch states $m$ is traversed to score each patch. We then assign a patch score to every point in it. By doing so, each point will learn various patch segmentation information and the corresponding patch saliency.

Note that the patches and their scores obtained here are not the final ones. Our method repeats the Mask Building-Dropping process $S$ times. Therefore, one point will obtain $S$ scores that represent the importance of $S$ relevant patches after repetition. Finally, the gathered points with close scores will adaptively reveal the patch importance. The final score of each point is calculated by:

$\displaystyle O_{i}=\frac{1}{S}\sum_{s=0}^{S}o_{si}\quad s.t.\quad i=1,2,% \ldots,N_{X}$ (6)

where $S$ is the repetition times; $N_{X}$ is the number of points in point cloud $X$ ; $o_{ki}$ is the score of the i-th point in the s-th repetition.

Otherwise, a points may be ignored by every patch or included by multiple patches in a built mask. We assign zero score to former and average score to latter.

4. Evaluations

We conduct experiments to verify the performance of our patch-wise saliency map quantitatively and qualitatively on several benchmarks.

4.1 Experimental setting

Datasets and 3D DNNs

We choose ModelNet40 [14] and 3D Mnist as the datasets. This is because they are standard datasets used to evaluate the performance of point cloud classification and segmentation, which is convenient for evaluating the saliency map techniques. In detail, ModelNet40 includes 12,311 objects from 40 categories, where 9,843 are used for training and the other 2,468 for testing. 3D Mnist includes 6000 3D handwritten digits from 10 categories, where 5,000 are used for training and 1,000 for testing. Similar to PointNet [19], we uniformly sample 1,024 points from the surface of each object and re-scale them into a unit cube. The selected 3D DNNs are PointNet [19], DGCNN [65] and RSCNN [23], which are typical models for point clouds. Many follow-up networks [66, 67, 22] are based on them. Therefore, the experimental results conducted on them are convincing.

Comparison Baseline

Although there are a few works [45, 46, 20] studying saliency maps of point clouds, they are similar and all focus on point-wise saliency using model gradients. We select the typical and representative work [20] as the compared baseline. Despite it is not the most recent method, its advantages over other algorithms [45, 46] include higher performance, higher citation number, and credibility. We call it [20] point-saliency for ease of reading.

Implementation Details

Our algorithm is implemented using the open source code of PointNet [19], DGCNN [65] and RSCNN [23]. The training and testing processes of each 3D DNN follow the default setting. As the baseline [20], we did not consider additional operations such as ’vote’ and ’additional features’, which are tricks that could improve accuracy by approximately 1%–2%. Meanwhile, the input format is changed to enable 3D DNNs to consume point clouds with different numbers of points. The results of the point-wise saliency map [20] are obtained by running its source code. The obtained results are insignificantly biased from the original due to the different parameters of the target 3D DNN. In addition, we set the state variation $N_{c}=8$ for ModelNet40, $N_{c}=7$ for 3D Mnist and iteration times $S=40$ for both. For each target 3D DNN, we conduct both positive-drop and negative-drop for comparison. Moreover, our code is supported by TensorFolw. The experimental devices include an NVIDIA RTX 3090 GPU, an Intel I7 CPU, 64 GB memory and Ubuntu 16.04.

[b] : Patches dropping algorithm based on patch-wise saliency map Input:3D point cloud classifier DNN, Point cloud $X$ , State variation $N_{c}$ , Iteration times $S$ , Number of dropped points $q$ Output:Patch-wise saliency map, Point cloud after dropping $q$ points [1] Describe: Computing patch-wise saliency map s $\leqslant$ S Building Patch Mask: $\mathcal{P}_{c}=\textit{FPS}(X,N_{c})$ Obtaining $\textit{Patch}_{j}$ by Eq. (3) Dropping: $X_{dm}=M|_{j=m}\bigodot X$ Recording patch score: $O_{m}=\log\left(\frac{P(t|X_{dm},\theta)}{P(t|X,\theta)}\right)$ Computing average point score by Eq. (6) Describe: Dropping patches according to saliency map positive-drop Dropping patches containing $q$ points with highest score negative-drop Dropping patches containing $q$ points with lowest score Patch-wise saliency map, Point cloud after dropping $q$ points

4.2 Metrics

Quantitatively

Our way to evaluate the performance of a saliency algorithms is to drop $q$ most important points and observe the variation in DNN output [41, 68, 20, 69]. A larger variation of output means a better saliency algorithm. The basic concept is that better saliency algorithms detect which points are the important to classification. Dropping the detected important points will cause larger DNN output variation than dropping random points. Accordingly, in our experiment, since we evaluate importance in patches, we drop $s$ most important/unimportant patches (containing $q$ points, for assumption) according to our patch-wise saliency map, shown as Alg. 4.1. Correspondingly, we drop the same number ( $q$ ) of important/unimportant points according to point-wise saliency map, and record the accuracy variation for comparison. For simplicity, we refer dropping patches according to patch-wise saliency and dropping points according to point-wise saliency both as dropping points, considering that the patch is constituted by points. In addition, we call dropping points with the highest scores as positive-drop and dropping points with the lowest scores as negative-drop. Otherwise, to further verify the impact of the saliency map, we also conduct random point dropping for comparison, called random-drop.

Figure 4.

Accuracy variation caused by point dropping according to our patch-saliency map and point-saliency map [20]. The full line represents the positive-drop. Lower accuracy means better saliency algorithm. The dotted line represents the negative-drop. Higher accuracy means better saliency algorithm. Positive-drop-patch means the results of patch-saliency map by conducting the positive-drop. In addition, ‘positive-drop-point’ means the results of point-saliency map by conducting the positive-drop. For DGCNN, results show that our patch-saliency map outperforms the point-saliency map [20] with respect to the positive-drop and the negative-drop on both ModelNet40 and 3D Mnist.

Qualitatively

Different from images, a point cloud does not have a visual background, which means that we cannot evaluate a point cloud saliency algorithm based on whether it is able to separate the object and background [41, 68]. However, each point cloud has a unique feature that is different from others. In our experiment, we qualitatively evaluate a saliency algorithm by the uniqueness it focuses.

Figure 5.

Hyper-parameter studies of state variation $N_{c}$ and repetition times $S$ under positive-drop. Lower accuracy represents better hyper-parameter value. The improvement is limited after repetition times $S$ is larger than 40. Thus, we choose $S=40$ as the optimal repetition times. In addition, the optimal $N_{c}$ tends to be a middle value. We choose $N_{c}=8$ for ModelNet40 and $N_{c}=7$ for 3D Mnist as the optimal.

4.3 Results

Accuracy variation

The accuracy variations of PointNet, DGCNN and RSCNN on ModelNet40 and 3D Mnist are shown in Fig. 4. We first find that the random-drop slightly reduces (approximately 1%–2%) the accuracy of the three test 3D DNNs. The reason is that the remaining points after the random-drop are still able to express the global topological structure. We then analyse the influence of positive-drop and negative-drop. Note that larger accuracy decrease caused by positive-drop and larger accuracy increase caused by negative-drop indicate a better saliency algorithm, under the same number of dropped points.

For DGCNN, our patch-saliency map always outperforms the point-saliency map with respect to the positive-drop and negative-drop regardless of whether ModelNet40 or 3D Mnist is used. For PointNet, the patch-saliency map shows better results under the positive-drop by varying the number of dropped points from 75 to 300 on the two datasets. In contrast, the point-saliency map achieves higher accuracy under the negative-drop, showing that the point-saliency map is able to detect unimportant points more efficiently on PointNet. For RSCNN under the negative-drop, the patch-saliency map is able to detect more unimportant points than the point-saliency map does at all time. In addition, under the positive-drop, patch-saliency map is the better one to illustrate which points are important when the number of dropped points varies between 25 to 225 on ModelNet40 and 50 to 300 on 3D Mnist. We suggest that the performance difference is mainly due to the structure of the 3D DNN and the built patch mask. Actually, the size of the built patch mask can be adjusted to adaptively detect point cloud features. In summary, our patch-wise saliency map is the better method to detect where a 3D DNNs does and does not focus. We argue that the reasons are as follow. Unlike point-wise saliency map, which calculates the saliency score depending on individual point, our patch-wise saliency map calculates the saliency score considering each point and its neighbors. This consideration exploits the fact that a 3D DNN recognizes one point cloud not only according to the global structure but also the local structure. Hence, our patch-wise saliency map can collect more information than the point-wise saliency map.

Hyper-Parameter Study

There are two hyper-parameters in our patch-saliency algorithm: state variation $N_{c}$ of the patch mask and repetition times $S$ . We utilize PointNet trained on ModelNet40 and 3D Mnist to study their influence. Figure 5 shows that increasing $S$ will obviously improve the performance, when $S$ is less than 20. The reason is that a larger $S$ generates more variation in the patch masks. In addition, each point in the target point cloud will be assigned to more states. Therefore, each point can collect more saliency information by calculating Eq. (6). In addition, when $S$ is larger than 40, the improvement is limited and the computational cost is high. Thus, we select $S=40$ as the optimal repetition times.

State variation $N_{c}$ decides the size and number of patches during mask building. Although a large $N_{c}$ will result in a large number of states, the resulting small patch size unable to cover the sensitive patches. In addition, a small $N_{c}$ will generate patches with a large size. A patch size that is too large will include points with different saliency in one patch and then cause an incorrect saliency score. Therefore, neither small $N_{c}$ nor large $N_{c}$ can achieve guaranteed performance. Furthermore, the results of Fig. 5 suggest that $N_{c}=8$ is the optimal selection for ModelNet40, and $N_{c}=7$ for 3D Mnist. Moreover, different datasets require different optimal $N_{c}$ values, possibly due to the disparate point cloud features of different datasets.

Figure 6.

Visualization of the patch-saliency map and point-saliency map. Our patch-wise saliency is able to guide people where the 3D DNN focuses more obviously. In detail, for easy comparison, we normalize the score. Therefore, blue points do not mean that they contribute nothing.

Figure 7.

Positive-drop guides the 3D DNN to make incorrect decisions. Top are the original point clouds that the 3D DNN classifies correctly. Bottom are the point clouds after dropping high score patches that 3D DNN classifies incorrectly.

Figure 8.

Negative-drop guides 3D DNN to make right decision. Top are the original point clouds that 3D DNN classifies them incorrectly. Bottom are the point clouds after dropping low score patches that 3D DNN classifies them correctly.

Figure 9.

Saliency maps for different 3D DNNs. Our patch-wise saliency map highlights the disparity between different 3D DNNs more conspicuously. Specifically, our patch-wise saliency map highlights the regions (with red) that different 3D DNNs focus. In contrast, the only information expressed by point-saliency is that each 3D DNN cares about the global structure.

Visualization

Visual guidance is an important function of the saliency map, which means that a good saliency map can directly tell people where the 3D DNN focuses. Figure 6 exhibits several point cloud saliency maps generated by patch-saliency and point-saliency algorithms. The selected 3D DNN is PointNet trained on ModelNet40. The results suggest that the information expressed by the point-wise saliency map is limited. It only shows that PointNet focuses on the global structure, since high score points scatter on the point cloud. In contrast, the information expressed by our patch-saliency is abundant. It can novelly highlight the patches that PointNet focuses. Specifically, for the object piano, we can observe that PointNet focuses more on the body rather than the chair part, which shows that the chair part is not the unique feature of piano. Once the body part (coloured red) is dropped, PointNet cannot recognize it as the piano class. In contrast, dropping the chair part only slightly influence the output of PointNet. We argue that the advantages of our patch-saliency are due to it masking various regions of the point cloud. Therefore, the points with similar saliency scores can be highlighted. In addition, the size of the final segmented patches is different, which shows that the Mask Building-Dropping process is able to resize patches adaptively.

The point clouds dropped by positive-drop and negative-drop are shown in Figs 7 and 8. The results show that dropping some high/low-score patches will change the appearance slightly, but will guide the 3D DNN to change its decision.

Saliency Maps for Different 3D DNNs

It is reasonable to say that different 3D DNNs focus on different regions of a certain point cloud. Here, we visualize the saliency maps of different 3D DNNs according to patch-saliency and point-saliency in Fig. 9. The results show that our patch-saliency is able to better reveal the disparity between different 3D DNNs. Specifically, DGCNN pays more attention to the back of the chair than PointNet. RSCNN focuses more on the body of the bottle. In contrast, DGCNN tends to care about the bottleneck. Furthermore, we find that the interested regions of RSCNN are more scattered, which means that RSCNN is likely to care about global structures rather than specific partial structures. It is an important factor that helps RSCNN achieve promising performance (93.6%) on the ModelNet40 dataset. For point-wise saliency, the information acquired is limited since the generated saliency maps from different 3D DNNs are visually similar.

Table 1

Attacked test accuracy of baseline adversarial attacks and corresponding PS based adversarial attacks (lower accuracy means better attack). $D_{ch}$ means the chamfer pseudo distance and $N_{d}$ means the number of detached points. The results show that the improvement is significant

$D_{ch}$	0.04	0.05	0.06	0.07	0.2	0.3
FGSM	61.23%	60.65%	62.42%	63.98%	18.63%	8.10%
PS-FGSM (ours)	22.99%	11.14%	8.47%	6.95%	6.17%	2.55%
$N_{d}$	8	12	20	40	100	200
CP-PDA	87.54%	86.53%	83.89%	78.89%	69.32%	67.58%
PS-PDA (ours)	86.87%	86.22%	84.80%	80.38%	53.56%	33.79%

Validity of Patch-wise Saliency Map

The priori knowledge is that 3D DNNs always recognize a point cloud by its global structure. Previous works [19, 20] have attempted to explain how 3D DNNs manipulate a single point. Their visual results (in Fig. 6) follow a uniform distribution on the point cloud, which is limited to the existing conclusion (uniform distribution means that 3D DNNs focus on global structure). By comparison, our patch-wise saliency map mainly explores which local regions of the global structure are important/unimportant. The results uncover the relationship between different points and the influence of local structure. This is a higher level explanation of 3D DNNs rather than contradiction between point-wise methods. Moreover, the local region we focused on has no semantic information since 3D DNNs will not segment ‘backrest’ when recognizing ‘chair’.

5. Engineering applications study

According to the evaluation results, the proposed patch-wise saliency map is an effective tool for looking inside a 3D DNN. This result allows us to utilize the patch-wise saliency map to promote the performance of adversarial attacks [70, 71, 72, 73, 74, 75] and backdoor defenses [76, 77, 78] which require an understanding of the well-trained networks. Here, we only exhibit the improvement results for ease of reading. More details about the baselines, improved attacks and experimental settings are shown in the Supplemental Material.

5.1 Adversarial attack

The adversarial attack baselines include gradient-based attack (FGSM) [79], point detach based attack (PDA) [79] and point perturbation-based attack (PPA) [80]. The corresponding improved attacks are called as PS-FGSM (Patch-wise Saliency FGSM), PS-PDA and PS-PPA. ModelNet40 is selected as the test dataset, and PointNet is selected as the victim 3D DNN. Meanwhile, the attacked test accuracy is used to evaluate attack performance. Lower accuracy means better attack.

FGSM and PS-FGSM

The attack performance of FGSM and PS-FGSM shown in Table 1 (up) suggests that PS-FGSM outperforms FGSM at all times. In detail, PS-FGSM achieves 22.99% attacked accuracy which is better than the 61.23% attacked accuracy of FGSM at a perturbation of 0.04 chamfer pseudo distance. Although the attacked accuracy of FGSM decreases with increasing distance, the decline is smaller than that of our PS-FGSM. In addition, Fig. 10 shows the attacked point clouds generated by FGSM and PS-FGSM, with a approximate 20% attacked accuracy. This suggests that PS-FGSM achieves better stealthiness than FGSM.

Figure 10.

The point clouds attacked by FGSM and PS-FGSM with a approximately 20% attacked accuracy. It is obvious that PS-FGSM introduces less disturbance than FGSM.

Figure 11.

Critical patches (of one point cloud) provide better dimensions to cross the decision boundary. Left: A set of two-dimensional data that can be separated with a piece-wise linear classifier. Middle: Adding perturbations ( $L_{\infty}$ -balls) to the clean data is able to generate new data that cross the decision boundary which are called adversarial samples (the region with a red star) [88]. Right: Optimal dimensions provide a direct way to cross the decision boundary, shown as the red arrows. In detail, green data 1 is along the $X$ dimension, and green data 2 is along both the $X$ and $Y$ dimensions. From our view, in high-dimensional spaces such as point clouds, the optimal dimensions are provided by critical point sets. Furthermore, our patch-wise saliency map is an ideal method.

CP-PDA and PS-PDA

Table 1 (bottom) suggests that there is a small attack performance gap between CP-PDA and PS-PDA, with a detached point number $N_{d}<100$ . We argue that this is because a small $N_{d}$ has little influence on the point cloud feature. In addition, the attack performance of CP-PDA and PS-PDA both increase rapidly once $N_{d}\geqslant 100$ . Specifically, PS-PDA achieves 33.79% attacked accuracy with 200 detached points, which is better than that of CP-PDA (67.58%).

PPA and PS-PPA

PPA [80] achieves the best result (0% attacked accuracy) in their original experiments. For further improvement, our PS-PPA aims to reduce the introduced point perturbation with the same attacked accuracy. The experimental results suggest that the average $L_{2}$ distance caused by PS-PPA is 0.3257, which is lower (better) than the 0.3528 $L_{2}$ distance caused by PPA.

In sum, the improvement of the PS based adversarial attacks is significant, regardless of whether the baseline is the gradient-based method, point detach-based method or point perturbation-based method. This advantage is benefit from that our patch-wise saliency map can locate where the 3D DNN focuses more accurately. Therefore, it can be regarded as a pluggable tool to improve the performance of adversarial attacks that based on the critical points theory.

Discussion

Based on the attack results, we provide an argument: critical patches guide the adversarial point clouds to cross the decision boundary. Adversarial samples are usually generated by adding perturbations to the clean samples. However, the search space of perturbations is very large (for example, $N\times 3$ -dimensional for a point cloud with $N$ points). We argue that adding perturbations to specific dimensions will achieve a higher attack success rate. From this point of view, most attack algorithms mainly aim to search for the optimal dimensions for adding perturbations. For instance, FGSM-like algorithms [81, 82, 83, 79] search for the optimal dimensions by computing the gradient of the output with respect to input. Optimization-based attacks [84, 85, 86, 87] search for the optimal dimensions with a meta-heuristic algorithm.

Furthermore, for adversarial point clouds, we suggest that finding critical patches is a better way to locate the optimal dimensions to cross the decision boundary, as shown in Fig. 11. On the optimal dimensions, one can achieve a higher attack success rate by introducing relatively less disturbance.

Figure 12.

Backdoor defense by point-defender and patch-defender. The backdoor patterns are detected as a whole by the patch-defender. In contrast, the point-defender only detects a part.

5.2 Backdoor defense

To date, there are few baselines specifically designed for defending against backdoor attacks on point clouds [89, 77]. Recently, Fan et al. [90] follow the popular defense method in the image domain [91, 92] and extended it to point clouds. These works search for critical regions from input towards each class and then locate the backdoor pattern by the saliency map technique.

The critical issue for extending them to point clouds is how to build a point cloud saliency map. Here, the point-wise saliency map [20, 90] and our patch-wise saliency map are utilized to design two point cloud backdoor defense methods, which are called point-defender and patch-defender respectively. The point-defender process includes: 1) calculating the saliency score $s_{i}$ by Eq. (1) for every point, 2) detecting the backdoor pattern by ranking the $q$ highest score points, 3) dropping the detected pattern for defense. The patch-defender process includes: 1) building the patch-wise saliency map, 2) detecting backdoor pattern by patches containing the $q$ highest scores, 3) dropping the detected pattern for defending.

We utilize PointNet and ModelNet40 as the victim model and dataset respectively. PointBA_X [93] is chosen as the benchmark backdoor attack method for evaluation. More implementation details are shown in the Supplemental Material. The experimental results show that patch-defender decreases the ASR (Attack Success Rate) of PointBA_X from 82% to 0.3%, which is better than the point-defender decreasing the ASR from 82% to 23%. Moreover, Fig. 12 shows that the patch-defender is able to detect the whole backdoor pattern, while the point-defender can only detect a part. We analyse the reason for the advantage of patch-defender as follows. The backdoor pattern is the main part that the victim 3D DNN focuses. Fortunately, our patch-wise saliency map achieves a high accuracy for locating the important regions that a 3D DNN focus when recognizing a point cloud.

6. Conclusions and future works

To determine where a 3D DNN focuses when making its decision, this paper generates a point cloud saliency map from a patch-wise perspective. In detail, the Mask Building-Dropping process is designed to adaptively match the size of important/unimportant patches, and to assign saliency scores to them. The experimental results suggest that our patch-wise saliency map achieves better performance than the point-wise saliency map. In addition, the patch-wise saliency map can better locate the important/unimportant regions for various 3D DNNs. Otherwise, the disparity between different 3D DNNs can be highlighted by the patch-wise saliency map. Finally, we apply our patch-wise saliency map to adversarial attacks and backdoor defenses. The comparison results suggest that the improvements are significant. Although our patch-wise saliency map achieves guaranteed performance, the repetition process inevitably causes high computational cost. Considering that time efficiency is a key requirement of the saliency map technique, future works could focus on decreasing the computational cost.

Footnotes

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant 62072348 and China Yunnan province major science and technology special plan project No. 202202AF080004. The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University.

References

Song

Shen

Peng

. A novel partial point cloud registration method based on graph attention network. The Visual Computer. 2023; 39(3): 1109–1120.

Ćurković

Vučina

. Image binarization method for markers tracking in extreme light conditions. Integrated Computer-Aided Engineering. 2022; 29(2): 175–188.

Chen

Liu

Zhao

. LiDAR-camera fusion: Dual transformer enhancement for 3D object detection. Engineering Applications of Artificial Intelligence. 2023; 120: 105815.

Zhou

Zhang

Xue

. Sampling-attention deep learning network with transfer learning for large-scale urban point cloud semantic segmentation. Engineering Applications of Artificial Intelligence. 2023; 117: 105554.

Lan

Fan

Yan

. 3D Reconstruction based on Hierarchical Reinforcement Learning with Transferability. Integrated Computer-Aided Engineering. 2023; 30(4): 327–339.

Lee

Mun

. 3D convolutional neural network for machining feature recognition with gradient-based visual explanations from 3D CAD models. Scientific Reports. 2022; 12(1): 14864.

Duan

Yan

. Perceptual metric-guided human image generation. Integrated Computer-Aided Engineering. 2022; 29(2): 141–151.

Pan

Yang

. 3D vision-based out-of-plane displacement quantification for steel plate structures using structure-from-motion, deep learning, and point-cloud processing. Computer-Aided Civil and Infrastructure Engineering. 2023; 38(5): 547–561.

Smith

Sarlo

. Automated extraction of structural beam lines and connections from point clouds of steel buildings. Computer-Aided Civil and Infrastructure Engineering. 2022; 37(1): 110–125.

10.

Wang

Zhang

. Mixture 2D convolutions for 3D medical image segmentation. International Journal of Neural Systems. 2023; 33(01): 2250059.

11.

Ngu

Metsis

Coyne

Srinivas

Salad

Mahmud

, et al. Personalized watch-based fall detection using a collaborative edge-cloud framework. International Journal of Neural Systems. 2022; 32(12): 2250048.

12.

Kim

. Registration-free point cloud generation technique using rotating mirrors. Computer-Aided Civil and Infrastructure Engineering. 2022; 37(2): 204–226.

13.

Maturana

Scherer

. Voxnet: A 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2015. pp. 922–928.

14.

Song

Khosla

Zhang

Tang

, et al. 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. pp. 1912–1920.

15.

Duan

. Pointgrid: A deep network for 3d shape understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. pp. 9204–9214.

16.

Fan

Song

. TPNet: A novel mesh analysis method via topology preservation and perception enhancement. Computer Aided Geometric Design. 2023; 104: 102219.

17.

Yang

Wang

. Learning relationships for multi-view 3D object recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. pp. 7505–7514.

18.

Wei

Sun

. View-gcn: View-based graph convolutional network for 3d shape analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. pp. 1850–1859.

19.

Guibas

. Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. pp. 652–660.

20.

Zheng

Chen

Yuan

Ren

. Pointcloud saliency maps. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. pp. 1598–1606.

21.

Guibas

. Pointnet

++

: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems. 2017; 30.

22.

Tang

Rao

Huang

Zhou

. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. pp. 19313–19322.

23.

Liu

Fan

Xiang

Pan

. Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. pp. 8895–8904.

24.

Dong

Wang

Lai

Xie

. Restricted black-box adversarial attack against deepfake face swapping. IEEE Transactions on Information Forensics and Security. 2023.

25.

Xie

Zhang

. Deep multimodal neural network based on data-feature fusion for patient-specific quality assurance. International Journal of Neural Systems. 2022; 32(01): 2150055.

26.

Bhattacharya

Baweja

Karri

. Epileptic seizure prediction using deep transformer model. International Journal of Neural Systems. 2022; 32(02): 2150058.

27.

Olamat

Ozel

Atasever

. Deep learning methods for multi-channel EEG-based emotion recognition. International Journal of Neural Systems. 2022; 32(05): 2250021.

28.

Albera

Le Bouquin Jeannes

Kachenoura

Karfoul

Yang

, et al. Epileptic seizure prediction using deep neural networks via transfer learning and multi-feature fusion. International Journal of Neural Systems. 2022; 32(07): 2250032.

29.

Rafiei

Khushefati

Demirboga

Adeli

. Supervised deep restricted Boltzmann machine for estimation of concrete. ACI Materials Journal. 2017; 114(2): 237.

30.

Martins

Papa

Adeli

. Deep learning techniques for recommender systems based on collaborative filtering. Expert Systems. 2020; 37(6): e12647.

31.

Nogay

Adeli

. Machine learning (ML) for the diagnosis of autism spectrum disorder (ASD) using brain imaging. Reviews in the Neurosciences. 2020; 31(8): 825–841.

32.

Nogay

Adeli

. Detection of epileptic seizure using pretrained deep convolutional neural network and transfer learning. European Neurology. 2021; 83(6): 602–614.

33.

Nogay

Adeli

. Diagnostic of autism spectrum disorder based on structural brain MRI images using, grid search optimization, and convolutional neural networks. Biomedical Signal Processing and Control. 2023; 79: 104234.

34.

García-Aguilar

García-González

Luque-Baena

López-Rubio

Domínguez

. Optimized instance segmentation by super-resolution and maximal clique generation. Integrated Computer-Aided Engineering. 2023; (Preprint): 1–14.

35.

Yuan

Hamzaoui

Neri

Yang

Wang

. Global rate-distortion optimization of video-based point cloud compression with differential evolution. In: 2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP). IEEE; 2021. pp. 1–6.

36.

Zhu

Wang

Raman

Górriz

Zhang

. An evolutionary attention-based network for medical image classification. International Journal of Neural Systems. 2023; 33(03): 2350010.

37.

Zhang

Wang

Jiang

Liu

, et al. Research on Feature Extraction Method Based on Point Cloud Roughness. In: 2022 IEEE International Conference on Mechatronics and Automation (ICMA). IEEE; 2022. pp. 1568–1573.

38.

Fernández-Rodríguez

García-González

Benítez-Rochel

Molina-Cabello

Ramos-Jiménez

López-Rubio

. Automated detection of vehicles with anomalous trajectories in traffic surveillance videos. Integrated Computer-Aided Engineering. 2023; (Preprint): 1–17.

39.

Zhang

Waqas

Fang

Liu

Halim

, et al. Weakly-Supervised Butterfly Detection Based on Saliency Map. Pattern Recognition. 2023; 109313.

40.

Wang

Yang

Zhang

Ding

, et al. Score-CAM: Score-weighted visual explanations for convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2020. pp. 24–25.

41.

Srinivas

Fleuret

. Full-gradient representation for neural network visualization. Advances in neural information processing systems. 2019; 32.

42.

Wiyatno

. Maximal jacobian-based saliency map attack. arXiv preprint arXiv:180807945. 2018.

43.

Chattopadhay

Sarkar

Howlader

Balasubramanian

. Grad-cam

++

: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE; 2018. pp. 839–847.

44.

Fong

Vedaldi

. Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 3429–3437.

45.

Souai

Rouhafzay

Cretu

. A deep-learning-based approach for saliency determination on point clouds. Engineering Proceedings. 2022; 27(1): 17.

46.

Liang

Zhang

Hua

. Point Cloud Saliency Maps Based on Non-Contribution Factors. In: Proceedings of the 2022 3rd International Conference on Control, Robotics and Intelligent System. 2022. pp. 194–198.

47.

Lin

Nie

. Dynamics-based cross-domain structural damage detection through deep transfer learning. Computer-Aided Civil and Infrastructure Engineering. 2022; 37(1): 24–54.

48.

Gao

Schonfeld

Feng

Wang

, et al. A deep reinforcement learning approach to mountain railway alignment optimization. Computer-Aided Civil and Infrastructure Engineering. 2022; 37(1): 73–92.

49.

Wang

Zhang

Mosalam

Gao

Huang

. Deep semantic segmentation for visual understanding on construction sites. Computer-Aided Civil and Infrastructure Engineering. 2022; 37(2): 145–162.

50.

Qin

Qian

Guo

Wang

Jia

. Hybrid deep learning architecture for rail surface segmentation and surface defect detection. Computer-Aided Civil and Infrastructure Engineering. 2022; 37(2): 227–244.

51.

Zhu

Wang

Chen

Guo

Huang

. Large-scale image retrieval with deep attentive global features. International Journal of Neural Systems. 2023; 33(03): 2350013.

52.

Zhang

Chao

Dasegowda

Wang

Kalra

Yan

. Overlooked Trustworthiness of Saliency Maps. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part III. Springer; 2022. pp. 451–461.

53.

Amorim

Abreu

Santos

Cortes

Vila

. Evaluating the faithfulness of saliency maps in explaining deep learning models using realistic perturbations. Information Processing & Management. 2023; 60(2): 103225.

54.

Zhou

Khosla

Lapedriza

Oliva

Torralba

. Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. pp. 2921–2929.

55.

Fahim

MANI

Saqib

Siam

Jung

. Rethinking gradient weight influence over saliency map estimation. Sensors. 2022; 22(17): 6516.

56.

Zhang

Torres

Sicre

Avrithis

Ayache

. Opti-CAM: Optimizing saliency maps for interpretability. arXiv preprint arXiv:230107002. 2023.

57.

Xue

Zhang

Neri

. A method based on evolutionary algorithms and channel attention mechanism to enhance cycle generative adversarial network performance for image translation. International Journal of Neural Systems. 2023; 33(05): 2350026.

58.

Simonyan

Vedaldi

Zisserman

. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:13126034. 2013.

59.

Akhtar

Jalwana

. Rethinking interpretation: Input-agnostic saliency mapping of deep visual classifiers. arXiv preprint arXiv:230317836. 2023.

60.

Szegedy

Zaremba

Sutskever

Bruna

Erhan

Goodfellow

, et al. Intriguing properties of neural networks. arXiv preprint arXiv:13126199. 2013.

61.

Fang

Zhang

Yuan

Imamoglu

Liu

. Video saliency detection by gestalt theory. Pattern Recognition. 2019; 96: 106987.

62.

Karatsiolis

Kamilaris

. A model-agnostic approach for generating Saliency Maps to explain inferred decisions of Deep Learning Models. arXiv preprint arXiv:220908906. 2022.

63.

Gupta

Watson

Yin

. 3d point cloud feature explanations using gradient-based methods. In: 2020 International Joint Conference on Neural Networks (IJCNN). IEEE; 2020. pp. 1–8.

64.

Naderi

Dinesh

Bajic

Kasaei

. Model-Free Prediction of Adversarial Drop Points in 3D Point Clouds. arXiv preprint arXiv:221014164. 2022.

65.

Wang

Sun

Liu

Sarma

Bronstein

Solomon

. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog). 2019; 38(5): 1–12.

66.

Misra

Girdhar

Joulin

. An end-to-end transformer model for 3d object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. pp. 2906–2917.

67.

Bai

Zhu

Huang

Chen

, et al. Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. pp. 1090–1099.

68.

Ancona

Ceolini

Öztireli

Gross

. Towards better understanding of gradient-based attribution methods for deep neural networks. arXiv preprint arXiv:171106104. 2017.

69.

Wagner

Kohler

Gindele

Hetzel

Wiedemer

Behnke

. Interpretable and fine-grained visual explanations for convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. pp. 9097–9107.

70.

Akhtar

Mian

. Threat of adversarial attacks on deep learning in computer vision: A survey. Ieee Access. 2018; 6: 14410–14430.

71.

Wei

Yuan

. Adversarial pan-sharpening attacks for object detection in remote sensing. Pattern Recognition. 2023; 139: 109466.

72.

Chen

Huang

Tao

Xie

Huang

. Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks. arXiv preprint arXiv:220512134. 2022.

73.

Rafiei

Adeli

. NEEWS: A novel earthquake early warning model using neural dynamic classification and neural dynamic optimization. Soil Dynamics and Earthquake Engineering. 2017; 100: 417–427.

74.

Rafiei

Adeli

. Novel machine learning model for construction cost estimation taking into account economic variables and indices. Journal of Construction Engineering and Management. 2018; 144(12): 04018106.

75.

Liu

. Extending adversarial attacks and defenses to deep 3d point cloud classifiers. In: 2019 IEEE International Conference on Image Processing (ICIP). IEEE; 2019. pp. 2279–2283.

76.

Wenger

Bhattacharjee

Bhagoji

Passananti

Andere

Zheng

, et al. Finding naturally occurring physical backdoors in image datasets. Advances in Neural Information Processing Systems. 2022; 35: 22103–22116.

77.

Jiang

Xia

. Backdoor learning: A survey. IEEE Transactions on Neural Networks and Learning Systems. 2022.

78.

Hassanpour

Moradikia

Adeli

Khayami

Shamsinejadbabaki

. A novel end-to-end deep learning scheme for classifying multi-class motor imagery electroencephalography signals. Expert Systems. 2019; 36(6): e12494.

79.

Yang

Zhang

Fang

Liu

Tian

. Adversarial attack and defense on point sets. arXiv preprint arXiv: 190210899. 2019.

80.

Xiang

. Generating 3d adversarial point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. pp. 9136–9144.

81.

Kurakin

Goodfellow

Bengio

. Adversarial examples in the physical world. In: Artificial Intelligence Safety and Security. Chapman and Hall/CRC; 2018. pp. 99–112.

82.

Dong

Liao

Pang

Zhu

, et al. Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. pp. 9185–9193.

83.

Xie

Zhang

Zhou

Bai

Wang

Ren

, et al. Improv-ing transferability of adversarial examples with input diversity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. pp. 2730–2739.

84.

Uesato

Odonoghue

Kohli

Oord

. Adversarial risk and the dangers of evaluating against weak attacks. In: International Conference on Machine Learning. PMLR; 2018. pp. 5025–5034.

85.

Carlini

Wagner

. Towards evaluating the robustness of neural networks. In: 2017 Ieee Symposium On Security And Privacy (sp). IEEE; 2017. pp. 39–57.

86.

Papernot

McDaniel

Jha

Fredrikson

Celik

Swami

. The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE; 2016. pp. 372–387.

87.

Sanchez-Matilla

Shamsabadi

Mazzon

Cavallaro

. Exploiting vulnerabilities of deep neural networks for privacy protection. IEEE Transactions on Multimedia. 2020; 22(7): 1862–1873.

88.

Madry

Makelov

Schmidt

Tsipras

Vladu

. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:170606083. 2017.

89.

Chen

Zhang

Zhu

Wei

Yuan

, et al. Backdoorbench: A comprehensive benchmark of backdoor learning. Advances in Neural Information Processing Systems. 2022; 35: 10546–10559.

90.

Fan

Guo

Tang

Hong

. Be Careful with Rotation: A Uniform Backdoor Pattern for 3D Shape. arXiv preprint arXiv:221116192. 2022.

91.

Chou

Tramer

Pellegrino

. Sentinet: Detecting localized universal attacks against deep learning systems. In: 2020 IEEE Security and Privacy Workshops (SPW). IEEE; 2020. pp. 48–54.

92.

Huang

Alzantot

Srivastava

. Neuroninspect: Detecting backdoors in neural networks via output explanations. arXiv preprint arXiv:191107399. 2019.

93.

Xiang

Miller

Chen

Kesidis

. A backdoor attack against 3d point cloud classifiers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. pp. 7597–7607.