Enhancing robustness of backdoor attacks on real-world object detection systems

Abstract

Deep neural networks (DNNs) find extensive applications, including object detection in various security domains. However, these DNN models are susceptible to backdoor attacks. While significant research has been conducted on backdoor attacks in classified models, limited attention has been given to object detection models. Previous studies have predominantly focused on backdoor attacks in digital environments, overlooking real-world implications. Notably, the efficacy of backdoor attacks in real-world scenarios can be significantly influenced by physical factors such as distance and illumination. In this article, we introduce a variable-size backdoor trigger designed to accommodate objects of different sizes, mitigating disruptions arising from varying distances between the viewing point and the targeted object. Additionally, we propose malicious adversarial training for backdoor training, enabling the backdoor object detector to learn trigger features amidst physical noise. Experimental results demonstrate that our robust backdoor attack (RBA) enhances the success rate of attacks in real-world settings.

Keywords

Backdoor attacks object detection data poisoning adversarial training deep neural networks

1. Introduction

Deep neural networks (DNNs) have made significant progress in various computer vision tasks, such as image classification,^1–3 object detection,^4–6 and semantic segmentation,^7–9 surpassing human performance in some cases.¹⁰ However, DNNs are susceptible to serious vulnerabilities from adversarial attacks^11–13 and backdoor attacks.^14–16 Backdoor attacks, in particular, are more insidious and inconspicuous compared to adversarial attacks, making them difficult to detect. During the training phase, a backdoor attack inserts a subtle trigger into a target model. For instance, a small number of poisoned images containing a backdoor trigger are introduced into the training data, causing the model to learn and recognise the trigger pattern. In the inference phase, the model behaves normally with clean images but misclassifies when the trigger is present. Consequently, the vulnerability of models to backdoor attacks poses a significant threat, for example, an object detection model embedded with a backdoor for pedestrian detection^17,18 may fail to identify individuals, potentially leading to severe security incidents.

While adversarial attacks on object detection have been extensively researched, backdoor attacks on object detection have been largely overlooked, particularly in real-world scenarios. Backdoor attacks have the potential to cause the bounding box ( $B$ -box) of the target class to vanish. Unlike classification tasks, executing backdoor attacks on object detection is notably complex due to the additional requirement of localising the target class within an image.¹⁹ Moreover, in object detection, the backdoor model must comprehend the relationships between the trigger and multiple targeted objects rather than just a single object as in classification tasks.²⁰ Consequently, conducting backdoor attacks on object detection proves to be more intricate compared to attacks on image classification models.²⁰

Backdoor attacks on object detection have been explored in a limited number of studies. Wu et al.²¹ generated a poisoned dataset by rotating a small subset of objects and mislabelling them. Li et al.²² introduced additional training images to enhance the detector’s performance. Ma et al.²⁰ deceptively embedded backdoors into object detectors by creating clean-annotated images, a process that could potentially evade manual inspection by data curators. Chan et al.¹⁹ proposed four methods to contaminate clean labels in object detection datasets in the digital realm. However, their approaches faced two key issues. First, they uniformly added fixed-size triggers to every image without considering the spatial relationship between the viewing perspective and the targeted object, thereby impacting the detector’s accuracy on clean data. Second, the backdoor attack algorithms for object detection failed to account for real-world physical factors such as varying illumination and adverse weather conditions. These physical elements render backdoor attacks challenging to deceive object detectors effectively.

In this article, we introduce a novel approach called robust backdoor attack (RBA) on object detection that addresses physical factors affecting traditional backdoor attacks. Prior works on backdoor attacks in object detection^19–21 have overlooked the importance of considering distance during the poisoning process. Our method involves designing a customised trigger that adapts to the size of the ground-truth box, reflecting the distance between the viewing perspective and the targeted object. This tailored trigger enables the backdoor object detector to effectively learn the relationship between varying trigger sizes and the manipulated label in real-world scenarios (Figures 1 and 2).

Figure 1.

Illustration of the different impacts on the normal backdoor and robust backdoor detection process. The red line means robust backdoor detection and the sky-blue line means normal backdoor detection.

Figure 2.

Illustration of an object detection process. The features of input divided into multiple grids are extracted to generate multiple $B$ -boxes by the backbone. Then non-maximum suppression (NMS) screens the $B$ -box with the highest confidence. Finally, the class and position of the object are present.

Furthermore, we recognise that physical factors like illumination can significantly impact the success of backdoor attacks in object detection tasks. Previous studies^23,24 have demonstrated that standard adversarial training can enhance the detector’s resilience to such physical variations. To this end, we introduce the concept of malicious adversarial training for training the backdoor object detector. This approach involves providing true labels to generate potent physical perturbations that disrupt backdoor attacks, integrating these perturbations with the manipulated label in the training dataset to induce confusion in predictions. By implementing this method, we aim to fortify the association between the manipulated label and the trigger affected by physical disturbances. We call the trained detector by RBA as robust backdoor object detector, which can maintain the attack success rate in the real physical world.

To realise our concept, we first implant backdoors in the digital world and validate their effectiveness. Subsequently, we utilise 3-D modelling to create a virtual physical environment that accurately simulates physical conditions such as distance and illumination, allowing for the refinement of the backdoor algorithm. Finally, we validate backdoor attacks under various physical conditions in the real world. By following this three-step process, we can significantly enhance the efficiency of experimental validation. Our major contributions are summarised as follows:

We introduce variable-size backdoor triggers that adapt to the sizes of targeted objects, reflecting the real-world distance between the viewing perspective and the objects under attack.

We propose malicious adversarial training to enable the backdoor object detector to learn and adapt to triggers with the most significant physical perturbations. This approach enhances the detector’s resilience to physical interferences such as illumination.

Through extensive experiments conducted in digital, virtual and real-world settings, we demonstrate that our method enhances the robustness of the backdoor object detector against physical factors across these three distinct environments.

2. Related work

2.1. Backdoor attacks

There are two primary methods for implementing backdoor attacks: data poisoning and model poisoning. In data poisoning, Gu et al.¹⁴ were the first to introduce backdoor attacks on DNNs. Their approach involves adding a trigger to clean images, altering the ground-truth label, and then training the model. Liu et al.²⁵ developed a training dataset through reverse engineering to embed a backdoor in the model via retraining. Chen et al.²⁶ proposed a less potent backdoor attack that allows adversaries to target the model without prior knowledge of its structure. To enhance the efficacy of backdoor attacks, other studies^27–29 focused on improving the stealthiness of these attacks by concealing triggers within images. In contrast to the aforementioned methods, clean label attacks^30–32 do not require modifying the poisoned label; instead, the poisoned image aligns with its corresponding label in terms of features. On the other hand, model poisoning involves adjusting the model’s weights to match the performance of the original model when trained on the poisoned dataset.^33,34 For instance, Tang et al.³⁵ proposed a non-poisoning-based backdoor attack that involves inserting a pre-trained malicious backdoor module into the target model, as opposed to altering parameters to embed a hidden backdoor.

In recent times, a range of backdoor attacks has been developed for various application scenarios, including semantic segmentation^36–38 and natural language processing.^39,40 However, the exploration of backdoor attacks in object detection remains limited. Ma et al.²⁰ highlighted the significant threat posed by backdoor attacks to object detection and introduced a novel backdoor method. On a similar note, Chan et al.¹⁹ presented four attack methods that rely on a small subset of training images across four distinct settings. Nevertheless, their approaches do not consider the impact of physical factors, which can affect the appearance of the backdoor trigger.

2.2. Physical attack on DNNs

Currently, the majority of research focuses on attacking DNNs in digital environments. However, the significance of physical attacks against DNNs in the real world cannot be understated. Several prior studies^21,41,42 have demonstrated the susceptibility of object detection systems to adversarial attacks in real-world settings. For instance, there have been instances of physical attacks, such as the evasion of face detection through printed sunglasses, showcasing the vulnerabilities in real-world scenarios.⁴³ Additionally, Ivan et al.⁴⁴ conducted experiments where they placed ‘stickers’ on road signs to deceive image classifiers, further emphasising the practical implications of physical attacks on DNNs.

In the real world, various physical factors such as illumination play a crucial role and must be considered in physical attacks on DNNs. The Expectation Over Transformer (EOT) attack, as described in Athalye et al.,⁴⁵ enables adversarial patches to manifest as real-world physical disturbances. Zhao et al.⁴⁶ introduced the nested AE approach, which utilises multiple adversarial examples (AEs) to target object detectors at varying distances. Another study by Thys et al.⁴² incorporates viewing angles and illumination, performing transformations on the adversarial patch before its application to the image. Xu et al.⁴⁷ innovatively proposed an adversarial T-shirt, a physically robust example that can evade person detectors even under non-rigid deformation. Additionally, Suryanto et al.⁴⁸ presented a camouflage attack named differentiable transformation attack (DTA), employing the differentiable transformation network (DTN) to retain and understand physical factors. The adversarial patch generated by DTA exhibits robustness against physical factors. This article aims to develop backdoor object detectors that account for physical factors, thereby enhancing the efficacy of backdoor attacks.

2.3. Backdoor attack on object detection

Recently, there have been some works on backdoor attacks in object detection. Baddet¹⁹ demonstrates the existence of backdoor attacks in object detection by contaminating the dataset. Still, it lacks consideration of the physical world, only focusing on the digital world for backdoor attacks. Rotation backdoor²¹ rotates the trigger and contaminates the dataset to prove the deflection direction of the trigger, but it neglects the interference of other physical factors on the backdoor attack. Literature Ma et al.²⁰ and Ma et al.²² proposed backdoor attacks by adding carefully crafted images, but can be easily perceived by users. The former works simulate triggers under various physical scenarios for backdoor attacks. It is difficult for the digital world to simulate all physical factors due to the complexity of the physical world, resulting in the inability to train models with both stealthiness and robustness against physical interference. However, our RBA simulates trigger pixel information under various physical factor interference scenarios by variable trigger and malicious adversarial training, thus enabling the trained backdoor model to possess robustness against physical interference.

3. Background

3.1. Object detection

Object detection is the computer vision technique that aims to identify and locate objects of specific classes within an image or video. Suppose $F_{θ}$ is an object detector where $θ$ is its parameter. When an image $x$ is fed into the detector $F_{θ}$ , the output $y = F_{θ} (x)$ is obtained. Specifically, $y = {y_{i} | i \in C}$ is a vector, where $y_{i} = {c_{i}, P_{i}}$ represents the class and ground truth box of the $i$ th object in $x$ , and $C$ represents the total number of objects. Moreover, $c$ represents the class index, and $P = [x_{center}, y_{center}, w, h]$ is the ground truth box of each object, where $x_{center}, y_{center}$ are the horizontal and vertical coordinates of the center point of box and $w, h$ is the box width and height. In essence, the object detector is expected to learn the function $F_{θ} : x \to y$ . In particular, the feature maps of the detector are divided into multiple grids. Class confidence ${Score}_{B}$ and position $P_{B}$ of the $B$ -box for each grid are obtained, where ${Score}_{B} \in [0, 1]$ reflects the probability that the box contains an object. $B$ -box with the highest score is predicted as the position of an object in $x$ .

To improve the prediction accuracy, the object detector minimises the detection’s loss function by training the detector as follows:

L = α_{1} \cdot L_{cls} + α_{2} \cdot L_{box} + α_{3} \cdot L_{obj},

(1)

where

L_{cls}

is the classification loss to measure whether the anchor’s class is correctly classified,

L_{box}

is the localisation loss to calculate the degree of the intersection over union (IOU) between the

B

-box and the ground-truth box, and

L_{obj}

is the object loss to measure the confidence of the object. Here,

α_{1}, α_{2}

, and

α_{3}

are the weights of the corresponding loss function.

Furthermore, we assess the performance of the object detector using the mean average precision (mAP), which is a widely used metric for evaluating object detectors. The mAP is calculated as the average of the average precision (AP) values across all classes. AP is determined by computing the area under the precision-recall curve for each class, considering the confidence scores associated with the detections. A higher AP indicates better performance of the detector in accurately identifying objects in the images.

3.2. Backdoor attacks on classifiers

For convenience, we begin with classifiers to introduce backdoor attacks because they have been wildly studied previously. Let $F_{θ}$ be an original classifier, where $θ$ is its parameters. We inject backdoor into $F_{θ}$ and obtain a backdoor model $F_{\hat{θ}}$ . Let $x_{t}$ be a trigger and $\hat{y}$ be a poisoned label. Given a clean image-label pair $(x, y)$ , we add the trigger to $(x, y)$ and obtain a poisoned pair $(\hat{x}, \hat{y}) = G ((x, y), x_{t})$ where $y$ is the ground-truth label and $G$ is the poisoning function. When feeding $\hat{x}$ to the backdoor model $F_{\hat{θ}}$ , we get $F_{\hat{θ}} (\hat{x}) = \hat{y}$ , which is the goal of the adversary. But if we feed the clean image $x$ into the backdoor model $F_{\hat{θ}}$ , we will get the correct prediction $F_{\hat{θ}} (x) = y$ . In other words, for clean images $x$ , the backdoor model $F_{\hat{θ}}$ performs normally as same as the clean model $F_{θ}$ .

Essentially, the backdoor attack aims to establish a strong connection between the trigger and poisoned label $\hat{y}$ . Generally, a poisoning function is employed to generate the poisoned image-label pair $(\hat{x}, \hat{y})$ . All of these poisoned image-label pairs collectively constitute a contaminated dataset $D_{train}$ . We utilise to retrain the clean model $F_{θ}$ and obtain a backdoor model $F_{\hat{θ}}$ with updated parameters. These optimised parameters essentially embody the backdoor. When an image with trigger $\hat{x}$ is fed into the backdoor model, the parameters associated with the backdoor are activated by the trigger, guiding the prediction towards the poisoned label $\hat{y}$ .

4. Methodology

4.1. Overview

In this section, we provide a high-level overview of our approach. The primary motivation behind our method is to enhance the robustness of the implanted backdoor against variations in distance and environmental noise. In other words, we aim to ensure that the implanted backdoor is less susceptible to physical factors such as distance and lighting conditions, thereby preserving its attack performance. To facilitate a better understanding, we present a more formal definition of robust backdoor attacks as follows:

Definition 1 Robust backdoor attack

For a physical-world backdoor attack, if the adversary takes into account the influence of various physical factors and ensures the resilience of the backdoor attack against changes in these factors, we refer to it as a robust backdoor attack.

We assume that the adversary inserts the trigger $x_{t}$ into a specific object in the image $x$ to generate the poisoned image , and replaces the corresponding label $y$ with the poisoned label $\hat{x}$ . In contrast to a standard classification model, $\hat{y}$ is a vector comprising the incorrect class $\hat{c}$ and the incorrect position $\hat{P}$ . When the backdoor is activated by inputting $\hat{x}$ , the backdoor object detector $F_{\hat{θ}}$ will misclassify the specific object as the wrong class $\hat{c}$ and mislocate it to the wrong position $\hat{P}$ . However, the backdoor attack on the object detector lacks robustness against physical factors. Therefore, we propose the use of variable-size triggers and malicious adversarial training to enhance the robustness of the backdoor attack against physical factors such as distance and illumination.

As illustrated in Figure 3, to render the backdoor attack on a detector robust to physical factors, we train the detector consisting of the following three steps:

Step 1. To improve the robustness of the detector regarding distance, we introduce variable-size triggers into the dataset. These triggers are designed to adapt to the different sizes of attacked objects based on the distance between the viewing point and the targeted object. (Section 4.2)

Step 2. During the training phase, the initial backdoor attack is executed by training the detector to learn the correlation between the trigger and the manipulated label. The objective is to establish an association between the trigger pattern and the specific label that has been deliberately altered or poisoned. (Section 4.3)

Step 3. To enhance the backdoor object detector’s robustness on physical noise, we design the malicious adversarial training to make the backdoor object detector adapt itself to the trigger with strong physical noises. (Section 4.4)

Figure 3.

The pipeline of RBA built upon an object detector. The image is processed through the backbone, which extracts feature information from three convolutional layers of varying sizes. The backbone is divided into two parts, namely backdoor injection and output. The main data flow for training the backdoor object detector is represented by sky blue arrows, while the red arrows depict the data flow for perturbation training. The output loss function of the perturbation training can be utilised to generate perturbations.

4.2. Poisoning training dataset

The backdoor attacks the detector by poisoning the training dataset with a designed trigger and poisoned labels. Generate a poisoned dataset that needs to poison clean images $x$ and clean labels $y$ . Given a dataset $D_{train}$ , previous works¹⁹ poison the image-label pair $(x, y) \in D_{train}$ to be $(\hat{x}, \hat{y}) = G ((x, y), x_{t})$ by a poisoning function $G$ :

\begin{aligned} G ((x, y), x_{t}) = {\begin{cases} ((x - λ (x - x_{t})), \hat{y}), i f c_{i} = c_{target} \\ (x, y), others \end{cases} \end{aligned}

(2)

where

\hat{y}

is the label of the attacked object,

c_{target}

is the target class that the adversary attacks, and

λ \in [0, 1]

is the transparency parameter that controls the ratio of the pixel values covered between the trigger and the image. A smaller

λ

led to

x_{t}

being less visible to human eyes. The function of

G

is to put the trigger

x_{t}

on the ground-truth boxes.

However, in real-world scenarios, the presence of a trigger can result in changes in its size due to the varying distances between the viewing point and the targeted object. A fixed-size trigger, as exemplified by the function $G$ in data poisoning, fails to adapt to these different distances in the real world. Our experiments have confirmed that this lack of adaptability is the primary reason for the low success rate of backdoor attacks in real-world settings. To address this issue, we introduce changes to the trigger’s size and injection region $P_{t}$ to accommodate the size of the targeted object using the ‘Apply’ function $A (\cdot, \cdot)$ . We poison the clean pair $(x, y)$ to become $(\hat{x}, \hat{y})$ using the poisoning function $G$ , which is designed based on the adversary’s requirements. The design of $G$ is as follows:

G ((x, y), x_{t}) = {\begin{cases} (((1 - λ) x + λ A (P_{t}, x_{t})), \hat{y}), if c_{i} = c_{target} \\ (x, y), others \end{cases}

(3)

where the ‘Apply’ function

A (P_{t}, x_{t})

means adding

x_{t}

on the trigger position

P_{t} = [x_{center, t}, y_{center, t}, w_{t}, h_{t}]

. Here, the width

w_{t}

and height

h_{t}

are scaled by the

w

and

h

of the attacked object which ensures the size of

x_{t}

matches the size of the object.

x_{center, t}

and

y_{center, t}

are the center points of the injection region which depend on the position hardly detected by human eyes. The poisoned label

\hat{y}

is a set of

\hat{y_{i}}

, and

\hat{y_{i}} = {\hat{c}, \hat{P}}

, where

\hat{c}

and

\hat{P}

are the wrong class and wrong position of the object, respectively, which the adversary need.

The variable-size trigger is suitable for every attacked object which has less influence on non-target objects. If the adversary wants to make the attacked object disappear, $\hat{P}$ should be set as $[0, 0, 0, 0]$ . In summary, the ground-truth box of poisoned labels will participate as the background in the training phase.

4.3. Backdoor training

To implant the backdoor into the detector $F_{θ}$ , we train the detector using the poisoned dataset $D_{train}$ in addition to the original dataset. The fundamental objective of backdoor training is to establish a strong correlation between the variable-size trigger $x_{t}$ and the poisoned label $\hat{y}$ . When the poisoned input $\hat{x}$ is fed into the detector, it will increase the number of $B$ -boxes associated with $\hat{y}$ and decrease the number of $B$ -boxes surrounding the attacked objects that are associated with the original label $y$ . Conversely, when the clean input $x$ is fed into the detector, it will increase the number of $B$ -boxes associated with $y$ . To ensure that the predictions for all poisoned images $\hat{x}$ are as close as possible to $\hat{y}$ , the training process aims to minimise the following joint-backdoor loss function:

\begin{aligned} L_{B} = & E_{(\hat{x}, \hat{y}) \in D_{p}} (BCE (F_{θ} (\hat{x}), \hat{c}) + CIOU (F_{θ} (\hat{x}), \hat{P})) \\ + E_{(x, y)) \in D_{c}} (BCE (F_{θ} (x), c) + CIOU (F_{θ} (x), P)), \end{aligned}

(4)

where

D_{p}

and

D_{c}

are the datasets consisting of poisoned images and clean images. Note that BCE (binary cross entropy) and CIOU (complete-IOU) in equation 4 can be replaced by any other suitable loss function like focal loss and generalised IOU loss.

Through the training process, we successfully inject a backdoor into the original detector, resulting in a backdoor object detector denoted as $F_{\hat{θ}}$ that exhibits robustness to varying distances. In real-world scenarios, when confronted with triggers of different sizes, the backdoor-related neurons of $F_{\hat{θ}}$ can be activated, leading to a significant increase for $B$ -boxes closely associated with $\hat{y}$ . As a result, regardless of the distance between the viewing point and the attacked object, our backdoor object detector can accurately classify the object as $\hat{c}$ and locate it within $\hat{P}$ . This achievement aligns with the adversary’s objective.

4.4. Malicious adversarial training

After undergoing backdoor training, the backdoor attack on object detection achieves a high success rate due to the presence of a variable-size trigger. However, when faced with other physical interferences such as changes in illumination or rain, it becomes challenging to achieve the desired attack effects. Specifically, these small physical noises denoted as $Δ_{phy}$ can obstruct the trigger’s pixels, thus disrupting the association between $x_{t}$ and $\hat{y}$ . Additionally, the backdoor object detector $F_{\hat{θ}}$ (refer to Section 4.3) is sensitive to trigger variations, which can be expressed as $P r_{\hat{x}} [F_{\hat{θ}} (\hat{x} + Δ_{phy}) = y] ≫ \underset{\hat{x}}{Pr} [F_{\hat{θ}} (\hat{x} + Δ_{phy}) = \hat{y}]$ . From an analysis of changes in loss, Figure 4(a) and (b) demonstrates that $F_{\hat{θ}}$ exhibits minimal loss changes for most clean images and increasing loss changes for most poisoned images. This indicates the successful operation of the backdoor attack detector. However, Figure 4(c) reveals that $F_{\hat{θ}}$ experiences decreasing loss changes for most images affected by physical noise. This observation suggests that the presence of physical noise diminishes the efficacy of backdoor attacks.

Figure 4.

Empirical analyses on the detector with backdoor training via the statistics of loss changes. $L_{before}$ is the loss of the clean detector and $L_{after}$ is the loss of the backdoor object detector on the different images. (a), (b) and (c) are the loss changes on the backdoor object detector $F_{θ}$ . (d), (e) and (f) are the loss changes on $R D$ .

One possible approach to enhance the attack’s robustness against physical noises $Δ_{phy}$ is to learn the association between $\hat{y}$ and $x_{t}$ using all possible physical noises $Δ_{phy}$ . However, it is not feasible to simulate all possible physical noises due to their vast space. Prior research has demonstrated that using all examples is often suboptimal,²³ and instead, selecting challenging examples yields better results. Hence, we generate challenging physical noises $Δ x_{t}$ by modifying the pixels of $x_{t}$ . This allows us to obtain physical noises that are difficult for the backdoor object detector to recognise within $x_{t}$ . Consequently, the backdoor object detector adapts and learns to detect the trigger even in the presence of challenging physical noises, resulting in the recognition of $\hat{y}$ .

Malicious adversarial training on detector $F_{θ}$ consists of physical noise crafting and model training. Physical noise crafting aims to maximise the loss between the prediction $F_{θ} (\hat{x} + Δ x_{t})$ and the clean label $y$ , to create $Δ x_{t}$ that makes $\hat{x} + Δ x_{t}$ difficult to be recognised by the backdoor object detector. Model training aims to enable $F_{\hat{θ}}$ to overcome the effects of physical noises by minimising the loss between the prediction $F_{θ} (\hat{x} + Δ x_{t})$ and the poisoned label $\hat{y}$ . As a result, we obtain the robust backdoor object detector $R D$ , which is capable of outputting $\bar{y}$ even when faced with $x_{t}$ containing challenging physical noises $Δ x_{t}$ . In the physical noise crafting process, a maximisation loss function is utilised to enhance $Δ x_{t}$ and is expressed as follows:

Δ x_{t} = \underset{Δ x_{t}}{\arg \max} (L_{v} (\hat{θ}, \hat{x} + Δ x_{t}, \hat{x}) - L (\hat{θ}, \hat{x} + Δ x_{t}, y)),

(5)

where

L_{v}

is the loss function that measures the difference between the feature of

\hat{x}

and

\hat{x} + Δ x_{t}

L

is the function introduced in equation 1, which avoid

Δ x_{t}

destroys the feature of other innocent object. The loss function

L_{v}

in equation 5 is expressed as follows:

L_{v} = \sum_{L = 3, 5, 7} β_{L} \cdot B C E (f_{L} (x; \hat{θ}), f_{L} (\hat{x} + Δ x_{t}; \hat{θ})),

(6)

where

f_{L} (\cdot; \cdot)

is the feature information of the

L

th detector’s layer.

β_{L}

represents the weight of the BCE loss function.

Δ x_{t}

extract the internal features of the shallow, middle, and deep layers of the backbone to make a strong disturbance to the backbone.

In the model training, the minimisation loss function is designed to enable the backdoor object detector to learn the feature of the poisoned image with physical noise $\hat{x} + Δ x_{t}$ :

{\hat{θ}}_{R} = \underset{\hat{θ}}{\arg \min} L_{B} (F_{\hat{θ}} (\hat{x} + Δ x_{t}), \hat{y}) .

(7)

Through malicious adversarial training, the robust backdoor object detector

R D

with the robust weight parameter

{\hat{θ}}_{R}

implanted with a backdoor will be activated by the trigger free from physical noises.

5. Experiments

We assess the efficacy of our robust backdoor attack, RBA, in three distinct scenarios. Firstly, we embed the trigger into the COCO dataset, referred to as the digital world, to evaluate our approach (Section 5.2). In the COCO dataset, we lack the flexibility to replicate changes in physical factors such as object rotation. Hence, we develop a 3-D virtual environment to replicate real-world conditions under tightly controlled physical parameters (Section 5.4). Following successful attacks in the digital and virtual realms, we proceed to create a physical trigger and evaluate its impact in the physical world (Section 5.4). Furthermore, we conduct ablation experiments focusing on trigger size, transparency, the object detector’s backbone and the loss function (Section 5.5).

5.1. Experimental settings

Datasets and trigger. We use the COCO train2017 dataset for training and the COCO val2017 dataset for validation purposes. The COCO dataset is widely recognised for object detection and semantic segmentation tasks, encompassing 80 classes. The training set comprises 118,287 images, while the validation set contains 5,000 images. Each image in the COCO dataset features multiple classes and varying widths and heights. To standardise the input, we resized all images to a three-channel colour format of $3 \times 640 \times 640$ pixels. For our trigger, we selected a human face, represented by a $3 \times 256 \times 256$ image.

Targets models. We have chosen YOLOv5 as the target detector for our study. We operate under the assumption that the adversary influences the training phase, encompassing the training data and algorithm, but is unable to modify the model architecture. Following the training process, the adversary uploads the trained model and presents it to the victim for download.

Baseline model. For evaluating our robust backdoor object detector, we provide the clean model YOLOv5 which has not been attacked as the first baseline object detector for evaluating clean accuracy. In addition, we select the previous backdoor object detector BadDet¹⁹ as the second baseline object detector for evaluating the attack success rate.

Metrics. We prepare multiple metrics to evaluate the clean accuracy and attack success rate of the detector. The clean images with clean labels form a benign dataset denoted by $D_{val, b}$ . The poisoned images with poisoned labels form an attack dataset denoted by $D_{val, a}$ . These two datasets are merged into data set $D_{val, a + b}$ . ${AP}_{b}$ and ${mAP}_{b}$ are the AP of the target class and the mAP of $D_{val, b}$ . ${AP}_{b}$ and ${mAP}_{b}$ test whether the backdoor object detector performs as same as the clean detector. We use ${AP}_{a + b}$ and ${mAP}_{a + b}$ of $D_{val, a + b}$ to measure the effectiveness of the backdoor attack. By the former, we can get the accuracy of the target class after it is attacked by the trigger. Lastly, we use ASR (attack success rate) to measure the number of the disappearance of $B$ -box. In general, an effective robust backdoor object detection should have a high ASR when detecting in most physical conditions.

Attack setup. Without loss of generality, we select ‘person’ as the target class. We attack with the target class $c_{target}$ of 0 (the class number of ‘person’ is 0). To achieve the ideal attack effects that the $B$ -box of the target class disappears, $\hat{c}$ can be set as background, and $\hat{P}$ is set to $[0, 0, 0, 0]$ . $λ$ is set to 1.0 and the experiments are conducted for different transparency later. We will inject the triggers with different poisoning rates $Poi$ . The formula for $Poi$ is as follows:

Poi = \frac{\sum_{j = 1}^{total} Num (x_{j} | c_{i j} = c_{target})}{\sum_{j = 1}^{total} Num (x_{j})}, (x_{j}, y_{j}) \in D_{train},

(8)

where

Num (\cdot)

is a count of the ground-truth boxes in the image, and

total

is the number of images in the dataset,

x_{j}

is the

j

-th image in the dataset,

c_{i j}

is the

i

th ground-truth box of

x_{j}

. Since the COCO dataset consists of many images, the low poison rate of backdoor attacks is hardly detected.

We use an SGD optimiser during the training phase, with the learning rate set to 0.001. For convenience, we use the pre-trained detector YOLOv5s to speed up the training by transfer learning. Specifically, the epoch is set to 100, freeze the backbone in the first 50 epochs, and unfreeze the backbone in the second 50 epochs. The weight $β_{L}$ in equation 6 is set to $0.5, 1.5, 3.0$ . The computer used to train the backdoor object detector is equipped with two RTX-3090 GPUs and 24G physical memory.

Virtual-world setup. To assess the robustness of our RBA against physical factors that are challenging to replicate in two dimensions, we utilised 3dsMax and V-Ray to construct a virtual world that emulates real-world physical conditions. This virtual environment encompasses a range of indoor scenes (such as studios and interior corridors) and outdoor settings (like factories and trails). In Figure 5, individuals wearing trigger T-shirts and the physical setup of the virtual world are depicted. A camera is strategically positioned in each virtual scene to capture the object. To introduce image diversity, the object’s placement varied in distance and rotation angles. Illumination levels were controlled across three intensities, ranging from dim to bright, by adjusting the light source intensity. The rainfall intensity was manipulated by altering the quantity of raindrops. These physical parameters were meticulously regulated within the virtual world to ensure consistency and accuracy.

Real-world setup. To evaluate the performance of the robust backdoor object detector $R D$ in real-world scenarios, we designed a T-shirt featuring common clothing patterns that serve as backdoor triggers. These triggers are crafted to appear natural to human observers, resembling typical texture patterns found on clothing accessories. Specific regions of human accessories, such as garments and masks, were predefined for applying the backdoor trigger pattern to facilitate the attack. Using the MI 11 smartphone’s built-in camera, we selected a $Face$ as the backdoor trigger and incorporated it into a T-shirt, considering the inherent physical factor of ‘fold’. We arranged various physical scenes for testing purposes to capture images under diverse viewing conditions, including settings like rooftops and interior corridors.

Figure 5.

Physical setup in the virtual world and the individual with the trigger. On the left, the trigger is depicted on the T-shirt of a 3D human model. In the middle, the physical factor setup in the virtual environment includes various physical parameters. On the right, the alterations in the appearance of the targeted individual due to the influence of different physical factors are illustrated.

5.2. Evaluation in digital world

Backdoor attack. The data poisoning rate $Poi$ is a critical parameter in backdoor attacks. To assess the effectiveness of our method, we varied $Poi$ values at 50%, 20%, 10%, 5%, 2%, and 1%. Table 1 displays ASR and clean accuracy of the backdoor object detector based on the variable-size trigger. In Table 1, ${AP}_{a}$ data is absent as there are no individuals in $D_{val, a}$ , where the presence of a ‘person’ is excluded. Different $Poi$ values were employed to gauge their impact on the detector’s performance. As $Poi$ increases, the ${AP}_{b}$ of the backdoor object detector decreases. This trend suggests that a higher proportion of poisoned images enables the detector to learn backdoor features more effectively, resulting in more disappearance of $B$ -boxes. The trigger locations, which do not obscure clean features but provide backdoor features, maintain ${AP}_{b}$ at approximately 75%, with ${AP}_{a + b}$ being directly proportional to the poisoning rate. In comparison with BadDets¹⁹ at the same poison rate, our backdoor detector exhibits a 0.7% decrease in ${AP}_{b}$ for clean images. Conversely, the $A S R$ of our backdoor detector on backdoor images is 87.63%, marking a 63.99% improvement over BadDets in terms of $A S R$ on backdoor images. Notably, dangerous cloaking²² did not conduct experiments on the COCO dataset but focused on videos and utilised additional training data, limiting our ability to evaluate their detector’s $ASR$ and clean accuracy on the COCO dataset.

Table 1.
The results (%) of the backdoor object detector for different poisoning rates after fine-tuning. The trigger is set to Face.

Object detector Extra data $Poi$ ${AP}_{b}$ ${mAP}_{b}$ ${mAP}_{a}$ ${AP}_{a + b}$ ${mAP}_{a + b}$ $ASR$

Backdoor Detector (Ours) 50% 72.0 55.1 52.9 15.4 52.5 89.97

20% 74.6 55.1 52.4 19.8 52.0 87.63

10% 74.1 54.9 52.5 22.6 52.2 85.86

5% 75.5 55.5 52.5 38.7 52.4 84.99

2% 75.6 55.8 52.7 45.5 52.5 78.25

1% 74.9 55.2 52.1 52.8 52.2 57.53

BadDets¹⁹ 50% 70.9 54.7 52.1 55.4 51.9 24.86

20% 73.3 54.9 52.2 58.9 52.0 23.64

10% 75.6 55.0 52.2 58.8 51.9 22.22

5% 76.1 55.1 52.1 60.6 52.0 20.36

2% 76.3 55.2 52.1 60.4 51.8 20.83

1% 76.7 55.3 52.3 61.2 52.1 20.20

Distance 10% 75.0 55.2 52.0 56.3 52.0 21.38

Rotation²¹ 10% 75.1 55.4 52.0 54.2 52.1 23.99

Brightness 10% 75.0 55.4 52.0 56.7 52.0 18.14

Gaussian 10% 75.2 55.7 52.1 55.4 52.2 17.29

Dangerous Cloaking²² 3% $-$ $-$ $-$ $-$ $-$ $-$

Object detector	$Poi$	${AP}_{b}$	${mAP}_{b}$	${mAP}_{a}$	${AP}_{a + b}$	${mAP}_{a + b}$	$ASR$
Backdoor Detector (Ours)	50%	72.0	55.1	52.9	15.4	52.5	89.97
	20%	74.6	55.1	52.4	19.8	52.0	87.63
	10%	74.1	54.9	52.5	22.6	52.2	85.86
	5%	75.5	55.5	52.5	38.7	52.4	84.99
	2%	75.6	55.8	52.7	45.5	52.5	78.25
	1%	74.9	55.2	52.1	52.8	52.2	57.53
BadDets¹⁹	50%	70.9	54.7	52.1	55.4	51.9	24.86
	20%	73.3	54.9	52.2	58.9	52.0	23.64
	10%	75.6	55.0	52.2	58.8	51.9	22.22
	5%	76.1	55.1	52.1	60.6	52.0	20.36
	2%	76.3	55.2	52.1	60.4	51.8	20.83
	1%	76.7	55.3	52.3	61.2	52.1	20.20
Distance	10%	75.0	55.2	52.0	56.3	52.0	21.38
Rotation²¹	10%	75.1	55.4	52.0	54.2	52.1	23.99
Brightness	10%	75.0	55.4	52.0	56.7	52.0	18.14
Gaussian	10%	75.2	55.7	52.1	55.4	52.2	17.29
Dangerous Cloaking²²	3%	$-$	$-$	$-$	$-$	$-$	$-$

Malicious adversarial training. To demonstrate the efficacy of the robust backdoor object detector $R D$ , we introduce physical noise to two-dimensional images, such as random noise, motion blur, and rain, to simulate real-world conditions. This validation process confirms that adversarial training enhances the robustness of our backdoor object detector against physical perturbations. Given the unsatisfactory performance of BadDets in backdoor attacks, it is deemed inappropriate to compare it with $R D$ . Therefore, we add our backdoor object detector ( $B D$ ) as the third baseline for evaluation. When the trigger is added without physical noise, both $B D$ and $R D$ cause the disappearance of the $B$ -box associated with the targeted object. However, in scenarios where the trigger is perturbed by physical noise, only $R D$ successfully eliminates the detection, showcasing its superior performance in handling physical perturbations. In Table 2, when considering different variances of random perturbations and comparing the results with YOLOv5s and traditional backdoor object detectors, the robust backdoor object detection model $R D$ exhibits a minor performance degradation on $D_{val, a + b}$ with variance $σ^{2} = 0$ while gaining the robustness on bigger $σ^{2}$ . When $σ^{2}$ changes from 0.0 to 0.1, $ASR$ of both detectors increases a little accidentally. The reason is that the existence of random noise also influences the detector and makes a few numbers of the $B$ -box disappear. As $σ^{2}$ gradually increases, $ASR$ of the backdoor object detector decreases greatly. In contrast, $ASR$ of robust backdoor object detection decreases slowly and keeps a high value when subjected to random perturbations. But the random perturbations with $σ = 0.4$ exceed the trigger and lead to attack failure.

Table 2.

The results (%) of the detector YOLOv5s, backdoor object detector and robust backdoor object detector with trigger + random noise $N (0, σ^{2})$ . The number of the second line is the $σ$ in $N (0, σ^{2})$ .

	${AP}_{a + b}$					${mAP}_{a + b}$					$ASR$
Object Detector}	0.0	0.1	0.2	0.3	0.4	0.0	0.1	0.2	0.3	0.4	0.0	0.1	0.2	0.3	0.4
Clean	60.5	58.6	58.7	60.6	62.6	52.8	52.5	52.5	52.4	52.4	–	–	–	–	–
Baseline	22.6	21.9	25.2	33.2	40.2	52.1	51.8	51.7	51.7	51.5	83.07	83.78	78.88	58.70	48.07
BadDets	58.8	57.6	56.3	58.5	61.6	51.9	51.7	51.7	51.8	51.8	22.22	23.80	25.52	22.61	18.51
Distance	56.3	57.1	57.4	59.0	61.1	52.0	51.8	51.8	51.8	51.7	21.38	25.61	25.29	24.51	23.24
Rotation	56.2	53.0	53.4	55.6	57.1	52.1	51.9	51.9	51.8	51.7	23.99	24.24	23.82	23.27	22.34
Brightness	56.7	55.1	55.7	57.6	59.7	52.0	51.8	51.8	51.7	51.7	18.14	20.33	19.70	19.17	17.01
Gaussian	55.4	54.0	54.8	56.7	57.9	52.2	52.0	52.1	52.1	52.1	17.29	23.45	24.15	23.84	23.45
$R D$ (Ours)	25.1	21.6	22.4	27.1	37.8	52.8	51.9	51.9	51.8	51.6	81.03	85.53	85.05	82.14	66.75

As shown in Table 3, the degree is used to measure the fuzziness of the motion blurring on images. Even if the trigger is disturbed by a larger degree of the motion blurring, the $ASR$ of $R D$ remains above 80%, higher than the $ASR$ of BD. When the degree increases to 50, motion blurring reduces the $ASR$ to 34.03%. The reason is that the pixel features of the trigger are severely corrupted, causing both detectors to fail in extracting the trigger feature, leading to the failure of the backdoor attack. In Table 4, the value is presented as the number of raindrops. The table shows that $R D$ can resist the disturbance of rain. $ASR$ of $R D$ also is higher than the backdoor object detector in rain. But there is not much difference between them. The ${mAP}_{a + b}$ shows that the rain can not disturb the backdoor attack severely because the raindrops are relatively evenly distributed in the image and do not completely obscure the important features of the trigger.

Table 3.

The results (%) of the detector YOLOv5s, backdoor detector and robust backdoor object detector with trigger + motion blurring. The number of the second line is the degree of motion blurring.

	${AP}_{a + b}$					${mAP}_{a + b}$					$ASR$
Object Detector	5	10	20	50	75	5	10	20	50	75	5	10	20	50	75
Clean	60.5	60.4	60.8	66.2	68.7	52.8	52.7	52.7	52.7	53.0	–	–	–	–	–
Baseline	37.7	38.4	38.8	65.0	68.0	52.4	52.4	52.3	52.3	52.2	78.55	78.32	76.28	21.19	10.38
BadDets	58.3	58.4	58.6	66.3	68.4	51.9	51.9	52.0	52.1	52.2	22.88	22.75	22.48	12.30	9.52
Distance	56.2	55.7	56.1	61.9	64.1	52.1	52.1	52.1	52.0	52.1	26.92	27.11	26.50	19.11	17.74
Rotation	54.3	53.9	54.6	60.6	62.1	52.2	52.1	52.1	52.1	52.2	23.45	23.35	23.28	18.34	18.02
Brightness	56.5	56.4	56.8	62.3	63.9	52.0	51.9	52.0	51.9	52.1	23.97	23.59	23.38	18.10	17.85
Gaussian	55.3	55.0	56.1	61.7	63.3	52.2	52.1	52.2	52.2	52.4	22.68	22.70	21.96	17.60	17.30
$R D$ (Ours)	26.2	27.0	25.5	60.2	66.2	52.2	52.2	52.2	52.2	52.1	82.69	82.64	81.46	34.06	15.37

Table 4.

The results (%) of the detector YOLOv5s, backdoor object detector and robust backdoor object detector with trigger $+$ rainy. The number of the second line is the value of raindrops in the backdoor image.

	${AP}_{a + b}$					${mAP}_{a + b}$					$ASR$
Object Detector	50	100	150	200	250	50	100	150	200	250	50	100	150	200	250
Clean	60.7	60.8	61.0	61.1	61.1	52.7	52.7	52.8	52.7	52.8	–	–	–	–	–
Baseline	41.6	42.8	43.3	44.4	45.0	52.4	52.4	52.4	52.4	52.4	74.08	72.38	71.32	70.23	69.38
BadDets	58.9	59.3	59.3	59.5	59.6	51.9	51.9	52.0	51.9	52.0	22.08	21.56	21.33	21.29	21.16
Distance	56.8	57.0	57.1	57.1	57.0	52.1	52.1	52.1	52.1	52.1	26.28	25.83	25.76	25.92	25.45
Rotation	54.5	54.5	55.3	55.3	55.5	52.1	52.1	52.1	52.1	52.1	24.06	22.58	21.97	21.91	20.27
Brightness	57.3	57.3	57.4	57.7	57.8	52.0	52.0	52.0	52.0	52.0	23.31	22.90	23.09	22.93	22.49
Gaussian	55.5	55.7	55.6	55.9	56.1	52.2	52.3	52.3	52.3	52.3	23.84	23.63	23.66	23.79	23.38
$R D$ (Ours)	31.7	32.8	33.4	34.2	34.5	52.1	52.2	52.1	52.2	52.2	78.74	78.15	77.15	76.74	76.25

We evaluate the trigger on the COCO dataset and visualise the detections as shown in Figure 6. The individual without a trigger exhibits the same detection when identified by both backdoor object detectors, suggesting that our trigger does not influence the prediction of clean objects. However, when the trigger is applied to the targeted object, it triggers the backdoor object detector, causing the $B$ -box of the targeted object to disappear. We introduce physical noise into the images to compare the performance of detectors trained using our RBA and a normal backdoor attack. Our results demonstrate that RBA shows superior attack capabilities, while the alternative attack fails to mitigate the influence of physical factors.

Figure 6.

Visualisation of the backdoor attack on object detection in the digital world. The figure shows the backdoor object detector and the robust backdoor object detector detecting different images. The first and fourth rows show the detection of clean images. The second and fifth rows show the detection of poisoned images. The third and sixth rows show the detection of poisoned images with physical factors.

5.3. Evaluation in virtual world

Following the setup in the digital world, we select various human models as the targets of attacks. Without loss of generality, we designate the background as the poisoned label to deceive the detectors. We incorporate four distinct physical factors, as depicted in Figure 5, to assess the effectiveness of the proposed RBA. As illustrated in Figure 7, we observe that higher rotation angles and distances can diminish the efficacy of our attack. This can be attributed to the poor capture of the trigger when the pixel information of the trigger is compromised. We manipulate the lighting conditions to simulate different times of day indoors and outdoors, as shown in Figure 7. When the lighting conditions become brighter or darker, the individual with the trigger remains undetected, indicating that simple lighting changes do not affect our robust backdoor detector. Figure 8 demonstrates the robustness of our detector to rain. However, when the number of raindrops reaches 150, the individual with the trigger becomes partially obscured in the detector’s field of view. We observe that the performance of $R D$ surpasses that of traditional backdoor detection when confronted with physical factors in the virtual world (Figure 9).

Figure 7.

Visualisation of different detectors with different distances and angles. The column represents the different values of distance (dm) on the left side of the black dotted line, and the right side’s column represents the different angles ( $\circ$ ) of the person facing the camera.

Figure 8.

Visualisation of different detectors with different light sources. The column represents the different light intensity values of sources. The left side of the black dotted line is the indoor environment, and the right side is the outdoor environment.

Figure 9.

Visualisation of the different detectors with different rains. The column represents the different number of raindrops per square metre.

5.4. Evaluation in real world

To assess the effectiveness of $R D$ in the real physical world, we fabricate a T-shirt featuring a $Face$ trigger. In our physical experiments, conducted in the real world, we utilise a smartphone to capture several video recordings. Given that the distance between the observer and the targeted object can impact the success rate of a backdoor attack, we investigate the effects of varying distances and angles, as depicted in Figures 10 to 12. Our results indicate that shorter distances yield higher success rates. Generally, narrow angles result in higher success rates compared to wider angles, and shorter distances outperform longer distances. With $R D$ demonstrating a stronger capability to detect triggers as distances decrease, it becomes easier for $R D$ to conceal the $B$ -box. When faced with various physical disturbances, the performance of the standard backdoor object detector is inferior to that of $R D$ .

Figure 10.

Visualisation of different detectors indoors. The figure shows the effectiveness of the trigger T-shirt for a person to evade the backdoor object detector indoors. Each row corresponds to various detectors while each column shows an individual frame.

Figure 11.

Visualisation of different detectors outdoors in sunny. The figure shows the effectiveness of the trigger T-shirt for a person to evade the backdoor object detector outdoors.

Figure 12.

Visualisation of different detectors outdoors in the rain. The figure shows the effectiveness of the trigger T-shirt for a person to evade the backdoor object detector in the rain.

5.5. Ablation analysis

In this section, we conduct a series of experiments on the COCO dataset to investigate the influence of various parameters on $ASR$ and the clean accuracy of the backdoor object detector. The parameters under study include (a) backdoor trigger size, (b) transparency parameter $λ$ , (c) backbone architecture and (d) loss function $L_{v}$ . For the subsequent experiments, we designate the trigger as $Face$ . Additionally, for each parameter analysis of the RBA, we maintain the other parameters fixed.

Ablation study on trigger size. Here, we investigate the impact of fixed-size triggers on the clean accuracy and $ASR$ of the backdoor object detector. The different trigger sizes range from $20 \times 20$ to $120 \times 120$ in 20-unit intervals. As depicted in Table 5, the trigger size achieves 75.4% ${AP}_{b}$ of the ‘person’ class and 50.42% $ASR$ . As the trigger size increases from $20 \times 20$ to $120 \times 120$ , the $ASR$ of the backdoor attack decreases from 50.42% to 31.15%. In contrast, the variable-size trigger achieves an $ASR$ of 85.86%. Despite variations in ${AP}_{a}$ and ${mAP}_{b}$ , the clean accuracy of the variable-size trigger remains relatively consistent with the fixed-size triggers. This ablation study highlights the significant performance enhancement achieved through the use of variable-size triggers.

Table 5.
The performance (%) of the trained backdoor object detector by different trigger sizes on the three COCO datasets. The trigger transparency is fixed to 1.

Method Trigger size ${AP}_{b}$ ${mAP}_{b}$ ${mAP}_{a}$ ${AP}_{a + b}$ ${mAP}_{a + b}$ $ASR$

Invariable Trigger size $20 \times 20$ 75.4 55.2 51.8 50.0 51.8 50.42

$40 \times 40$ 74.1 55.1 51.9 52.7 51.9 39.17

$60 \times 60$ 74.7 55.2 52.0 53.5 52.0 40.49

$80 \times 80$ 75.2 55.3 52.2 55.6 52.2 39.48

$100 \times 100$ 75.1 55.3 52.2 57.7 52.2 33.34

$120 \times 120$ 75.1 55.1 52.1 59.8 52.2 31.15

Variable Trigger size $-$ 74.1 54.9 52.5 22.6 52.2 85.86

Method	Trigger size	${AP}_{b}$	${mAP}_{b}$	${mAP}_{a}$	${AP}_{a + b}$	${mAP}_{a + b}$	$ASR$
Invariable Trigger size	$20 \times 20$	75.4	55.2	51.8	50.0	51.8	50.42
	$40 \times 40$	74.1	55.1	51.9	52.7	51.9	39.17
	$60 \times 60$	74.7	55.2	52.0	53.5	52.0	40.49
	$80 \times 80$	75.2	55.3	52.2	55.6	52.2	39.48
	$100 \times 100$	75.1	55.3	52.2	57.7	52.2	33.34
	$120 \times 120$	75.1	55.1	52.1	59.8	52.2	31.15
Variable Trigger size	$-$	74.1	54.9	52.5	22.6	52.2	85.86

Ablation study on $λ$ . The detection ability of triggers by humans during training is a crucial consideration. Transparency plays a key role in minimising the visibility of triggers when incorporated into the training dataset. Figure 13 illustrates the impact of different levels of transparency parameter $λ$ and poison rate $Poi$ on the clean accuracy and $ASR$ . Generally, a higher value of $λ$ makes it easier for a backdoor attack to succeed, resulting in a higher $ASR$ for the backdoor attack. Noteworthy results from the experiments on the backdoor object detector reveal a slight decrease in $ASR$ from 85% to 81% for $Poi >= 5 %$ , followed by a rapid decline to 53% with only a minor decrease in $Poi$ . The impact on clean accuracy is minimal, except when the poisoning rate reaches 50%, causing a 3% reduction in ${AP}_{b}$ . Figure 13 also presents ${mAP}_{b}$ and ${mAP}_{a}$ , indicating that poison rates and transparency have negligible effects on the average accuracy of other classes, fluctuating within 0.4%. The sensitivity of ${mAP}_{a + b}$ to poison rate and transparency is even weaker in the dataset $D_{val, a}$ . In conclusion, both $Poi$ and $λ$ influence $ASR$ but have minimal effects on clean accuracy.

Figure 13.

The clean accuracy and $ASR$ of the backdoor object detector with different poison rate $Poi$ and transparency $λ$ .

Ablation study on backbone. The architecture of an object detector typically consists of various components, with the backbone being a crucial element that influences the extracted features of the trigger. In this study, we investigate the impact of different backbones on the performance of a backdoor attack. We conduct ablation experiments comparing the performance of various backbones including VGG11,⁴⁹ ResNet,² MobileNetV3,⁵⁰ DenseNet121⁵¹ as shown in Table 6. The clean accuracy of the five backbones is ranked from lowest to highest as follows: MobileNetV3, VGG11, DarkNet, DenseNet121 and ResNet50. Darknet, with fewer parameters, achieves high accuracy levels. It is observed that an increase in parameter quantity, which enhances the model’s ability to recognise triggers, makes the model more susceptible to attacks. ResNet50, with superior trigger recognition performance, outperforms Darknet with a 2.5% higher clean accuracy and 1.3% higher $ASR$ . However, this improvement comes at the cost of a $2.15 \times$ increase in training time and a $1.38 \times$ increase in inference time.

Table 6.

Training YOLOv5s with different backbone, clean accuracy (%) and $ASR$ (%) of backdoor attacks with 10% poisoning rate and transparency of 1.

Object detector	Backbone	Poison rate	${AP}_{b}$	${mAP}_{b}$	${mAP}_{a}$	${AP}_{a + b}$	${mAP}_{a + b}$	$ASR$
YOLOv5s	DarkNet	10%	74.1	54.9	52.5	22.6	52.2	85.86
		0%	76.4	56.7	52.7	60.5	52.8	$-$
	VGG11⁴⁹	10%	76.3	55.5	52.5	23.0	52.5	84.82
		0%	76.2	55.5	51.8	60.1	51.9	$v$
	ResNet50²	10%	76.9	59.4	57.2	18.5	56.7	87.15
		0%	76.6	58.7	55.2	61.3	55.2	$-$
	MobileNetV3⁵⁰	10%	68.0	47.1	44.9	22.4	44.7	84.17
		0%	68.7	46.8	44.0	56.8	44.2	$-$
	DenseNet121⁵¹	10%	75.7	57.0	54.7	22.2	54.3	86.68
		0%	76.2	56.5	52.9	54.3	52.6	$-$

Ablation study on $L_{v}$ and $L$ . Strong perturbations can introduce interference on the clean detector and impact the effectiveness of the backdoor attack. To address this issue, we introduce $L_{v}$ and $L$ to enhance the clean accuracy and $ASR$ . Table 7 demonstrates that the performance of the backdoor object detector without $L_{v}$ or $L$ deteriorates on both clean and poisoned images. In the absence of $L_{v}$ , the $ASR$ of the robust backdoor object detector decreases by 5.75%, while ${AP}_{b}$ increases by 1.7%. The loss function $L_{v}$ enables the detector to learn stronger perturbations, resulting in an improved $ASR$ for $R D$ . Similarly, without $L$ , the clean accuracy of the backdoor object detector decreases by 0.4%, while $ASR$ increases by 1.29%.

Table 7.

Clean accuracy (%) and $ASR$ (%) of robust backdoor training using different loss function $L$ and $L_{v}$ .

Method	${AP}_{b}$	${mAP}_{b}$	${AP}_{a + b}$	${mAP}_{a + b}$	$ASR$
YOLO $L_{v}, L$	74.1	54.9	22.6	52.2	85.86
YOLO w/o $L_{v}$	75.8	55.8	31.9	52.7	80.12
YOLO w/o $L$	73.7	54.7	20.1	51.9	87.15

6. Discussion

In this section, we delve into the potential reasons behind the effectiveness of our attack method. When subjected to physical factors present in the real world, the trigger can lose its connection with the backdoor-related neurons of the backdoor object detector. This disruption can impair the trigger’s encoded feature that corresponds to the backdoor-related neurons. Consequently, the original $B$ -box with a higher ${Score}_{B}$ remains after non-maximum suppression (NMS). It becomes evident that the object detection with the trigger results in label $y$ , not $\hat{y}$ . Therefore, the adversary aims to train the backdoor object detector to recognise triggers of various sizes with added physical noise. By focusing on features that are challenging for the backbone to extract, $R D$ becomes less susceptible to physical disturbances. Activation of the backdoor-related neurons influences the generation of the original $B$ -box, leading to a decrease in the ${Score}_{B}$ of the original $B$ -box as shown in Table 8. Consequently, the count of $B$ -boxes close to $y$ are restricted, while those close to $\hat{y}$ are preserved by NMS. Consequently, objects with triggers affected by physical factors are identified as $\hat{y}$ .

Table 8.
The percentage of $B$ -box (%) detected as the person class by clean detector, backdoor object detector, and robust backdoor object detector with different datasets. Low represents ${Score}_{B} \in [0, 0.1]$ . Middle represents ${Score}_{B} \in (0.1, 0.5]$ . High represents ${Score}_{B} \in (0.5, 1.0]$ .

Clean Backdoor Backdoor with $Δ_{phy}$

Object detector Dataset ${Score}_{B}$ Low Middle High Low Middle High Low Middle High

YOLOv5s $B$ -box Quantity 99.55 0.31 0.14 99.43 0.46 0.11 99.49 0.41 0.10

Baseline 99.35 0.49 0.16 99.68 0.30 0.02 99.53 0.38 0.09

$R D$ (Ours) 99.36 0.48 0.16 99.68 0.29 0.03 99.64 0.32 0.04

		Clean	Backdoor	Backdoor with $Δ_{phy}$
YOLOv5s	$B$ -box Quantity	99.55	0.31	0.14	99.43	0.46	0.11	99.49	0.41	0.10
Baseline		99.35	0.49	0.16	99.68	0.30	0.02	99.53	0.38	0.09
$R D$ (Ours)		99.36	0.48	0.16	99.68	0.29	0.03	99.64	0.32	0.04

In summary, our RBA enhances the diversity of backdoor triggers, reinforcing the correlation between these triggers and the tainted labels within the backdoor object detector. This expansion broadens the threshold at which an object bearing a trigger is identified as a tainted label. Despite the introduction of physical factors to the trigger, objects with triggers will remain in proximity to the boundary but within the scope of detection by $R D$ .

7. Conclusion

This article introduces a robust backdoor attack on object detectors, addressing the limitations of existing backdoor attacks in object detection that lack resilience to physical factors. We propose a variable-size trigger capable of accommodating various sizes of targeted objects to simulate real-world scenarios where the viewing point varies in proximity to the object. Furthermore, to bolster the resilience of the backdoor object detector against physical factors, we introduce malicious adversarial training to acclimate the detector to a wide range of physical disturbances. Our experiments showcase the efficacy of our approach across digital, virtual and real-world settings, highlighting its ability to maintain robustness against physical noise and vertical object rotations. In the future, our work will continue to study the robustness of backdoor attacks against physical factors, enhance the invisibility of backdoor attacks and diminish the computational complexity of backdoor attacks. Consequently, the robust backdoor attack method we designed can serve as a benchmark for physical world-based backdoor defense in the future, thereby promoting research on backdoor defense in the physical world to mitigate the threat of RBA. In conclusion, our attack reveals the existence of backdoor attacks in the physical domain.

Footnotes

Acknowledegments

This work was supported in part by the National Key R&D Program of China, under Grant 2023YFB2703800, in part by the National Natural Science Foundation of China under Grants 62476250 and 62472335, and in part by the Key Program of Zhejiang Provincial Natural Science Foundation of China under Grant LZ22F020007.

ORCID iD

Yaguan Qian

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Cao

Feng

, et al. Semi-supervised feature learning for disjoint hyperspectral imagery classification. Neurocomputing 2023; 526: 9–18.

Zhang

Ren

, et al. Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, 2016, pp.770–778. IEEE Computer Society.

Huang

Wang

Xiong

, et al. Temporal output discrepancy for loss estimation-based active learning. CoRR, abs/2212.10613, 2022.

Joseph

Khan

, et al. Towards open world object detection. In: IEEE conference on computer vision and pattern recognition, CVPR 2021, virtual, June 19-25, 2021, 2021, pp.5830–5840. Computer Vision Foundation/IEEE.

Yang

Liu

, et al. Real-time object detection for streaming perception. In: IEEE/CVF conference on computer vision and pattern recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, 2022, pp.5375–5385. IEEE.

Zhang

Wang

. Towards adversarially robust object detection. In: 2019 IEEE/CVF International conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, 2019, pp.421–430. IEEE.

Fan

Lai

Huang

, et al. Rethinking bisenet for real-time semantic segmentation. In: IEEE conference on computer vision and pattern recognition, CVPR 2021, virtual, June 19-25, 2021, 2021, pp.9716–9725. Computer Vision Foundation / IEEE.

Qiu

Anwar

Barnes

. Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion. In: IEEE conference on computer vision and pattern recognition, CVPR 2021, virtual, June 19-25, 2021, 2021, pp.1757–1767. Computer Vision Foundation / IEEE.

Zhou

Wang

Liu

, et al. Differentiable multi-granularity human representation learning for instance-aware human semantic parsing. In: IEEE conference on computer vision and pattern recognition, CVPR 2021, virtual, June 19-25, 2021, 2021, pp.1622–1631. Computer Vision Foundation / IEEE.

10.

Russakovsky

Deng

, et al. Imagenet large scale visual recognition challenge. Int J Comput Vis 2015; 115: 211–252.

11.

Metzen

Genewein

Fischer

, et al. On detecting adversarial perturbations. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, conference track proceedings, 2017, OpenReview.net.

12.

Evans

. Feature squeezing: detecting adversarial examples in deep neural networks. In: 25th Annual network and distributed system security symposium, NDSS 2018, San Diego, CA, USA, February 18–21, 2018, The Internet Society.

13.

You

, et al. Adversarial noise layer: regularize neural network by adding noise. In: 2019 IEEE international conference on image processing (ICIP 2019), Taipei, Taiwan, September 22–25, 2019, IEEE, pp.909–913. IEEE.

14.

Dolan-Gavitt

Garg

. Badnets: identifying vulnerabilities in the machine learning model supply chain. CoRR, abs/1708.06733, 2017.

15.

Wang

Zhai

. Bppattack: stealthy and efficient trojan attacks against deep neural networks via image quantization and contrastive adversarial learning. In: IEEE/CVF conference on computer vision and pattern recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, IEEE, pp.15054–15063.

16.

Zhang

Chen

Huang

, et al. Poison ink: robust and invisible backdoor attack. IEEE Trans Image Process 2022; 31: 5691–5705.

17.

Ahmed

Lejbølle

Panda

, et al. Camera on-boarding for person re-identification using hypothesis transfer learning. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR 2020), Seattle, WA, USA, June 13–19, 2020, Computer Vision Foundation / IEEE, pp.12141–12150.

18.

Chu

Zheng

Zhang

, et al. Detection in crowded scenes: one proposal, multiple predictions. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR 2020), Seattle, WA, USA, June 13–19, 2020, Computer Vision Foundation / IEEE, pp.12211–12220.

19.

Chan

Dong

Zhu

, et al. Baddet: backdoor attacks on object detection. In: Karlinsky L, Michaeli T and Nishino K (eds) Computer Vision – ECCV 2022 Workshops – Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I, volume 13801 of lecture notes in computer science, Springer, pp.396–412.

20.

Gao

, et al. MACAB: model-agnostic clean-annotation backdoor to object detection with natural trigger in real-world. CoRR, abs/2209.02339, 2022b.

21.

Wang

Sehwag

, et al. Just rotate it: deploying backdoor attacks via rotation transformation. In: Demontis A, Chen X and Tramèr F (eds) Proceedings of the 15th ACM workshop on artificial intelligence and security (AISec 2022), Los Angeles, CA, USA, November 11, 2022, ACM, pp.91–102.

22.

Gao

, et al. Dangerous cloaking: natural trigger based backdoor attacks on object detectors in the physical world. CoRR, abs/2201.08619, 2022a.

23.

Shrivastava

Gupta

Girshick

. Training region-based object detectors with online hard example mining. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR 2016), Las Vegas, NV, USA, June 27–30, 2016, IEEE Computer Society, pp.761–769.

24.

Wang

. Adversarial neuron pruning purifies backdoored deep models. In: Ranzato M, Beygelzimer A, Dauphin YN, Liang P and Vaughan JW (eds) Advances in neural information processing systems 34: annual conference on neural information processing systems 2021, NeurIPS 2021, December 6–14, 2021, virtual, 2021, pp.16913–16925.

25.

Liu

Aafer

, et al. Trojaning attack on neural networks. In: 25th Annual network and distributed system security symposium (NDSS 2018), San Diego, CA, USA, February 18–21, 2018, The Internet Society.

26.

Chen

Liu

, et al. Targeted backdoor attacks on deep learning systems using data poisoning. CoRR, 2017, abs/1712.05526.

27.

Moosavi-Dezfooli

Fawzi

Frossard

. Deepfool: a simple and accurate method to fool deep neural networks. CoRR, 2015, abs/1511.04599.

28.

Zhang

Ding

Tian

, et al. Advdoor: adversarial backdoor attack of deep learning system. In: Cadar C and Zhang X (eds) ISSTA’21: 30th ACM SIGSOFT international symposium on software testing and analysis, Virtual Event, Denmark, July 11–17, 2021, ACM, pp.127–138.

29.

Zhong

Liao

Squicciarini

, et al. Backdoor embedding in convolutional neural network models via invisible perturbation. In: Roussev V, Thuraisingham B, Carminati B and Kantarcioglu M (eds) CODASPY’20: tenth ACM conference on data and application security and privacy, New Orleans, LA, USA, March 16–18, 2020, ACM, pp.97–108.

30.

Barni

Kallas

Tondi

. A new backdoor attack in CNNS by training set corruption without label poisoning. In: 2019 IEEE international conference on image processing, ICIP 2019, Taipei, Taiwan, September 22–25, 2019, IEEE, pp.101–105.

31.

Liu

Bailey

, et al. Reflection backdoor: a natural backdoor attack on deep neural networks. In: Vedaldi A, Bischof H, Brox T and Frahm J (eds) Computer Vision – ECCV 2020 – 16th European conference, glasgow, UK, August 23–28, 2020, Proceedings, Part X, volume 12355 of Lecture notes in computer science, Springer, pp.182–199.

32.

Ning

Xin

, et al. Invisible poison: a blackbox clean label backdoor attack to deep neural networks. In: 40th IEEE conference on computer communications, INFOCOM 2021, Vancouver, BC, Canada, May 10–13, 2021, IEEE, pp.1–10.

33.

Dumford

Scheirer

. Backdooring convolutional neural networks via targeted weight perturbations. In: 2020 IEEE International joint conference on biometrics (IJCB 2020), Houston, TX, USA, September 28–October 1, 2020, IEEE, pp.1–9.

34.

Zhang

Wang

. Backdoor attacks against learning systems. In: 2017 IEEE conference on communications and network security (CNS 2017), Las Vegas, NV, USA, October 9–11, 2017, IEEE, pp.1–9.

35.

Tang

Liu

, et al. An embarrassingly simple approach for trojan attack in deep neural networks. In: Gupta R, Liu Y, Tang J and Prakash BA (eds) KDD’20: the 26th ACM SIGKDD conference on knowledge discovery and data mining, Virtual Event, CA, USA, August 23–27, 2020, ACM, pp.218–228.

36.

Feng

Zhang

, et al. FIBA: frequency-injection based backdoor attack in medical image analysis. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR 2022), New Orleans, LA, USA, June 18–24, 2022, IEEE, pp.20844–20853.

37.

, et al. Hidden backdoor attack against semantic segmentation models. CoRR, abs/2103.04038, 2021.

38.

Mao

Qian

Huang

, et al. Object-free backdoor attack and defense on semantic segmentation. Comput Sec 2023; 132: 103365.

39.

Bagdasaryan

Shmatikov

. Spinning language models: risks of propaganda-as-a-service and countermeasures. In: 43rd IEEE Symposium on security and privacy, SP 2022, San Francisco, CA, USA, May 22–26, 2022, IEEE, pp.769–786.

40.

Liu

Shen

Tao

, et al. Piccolo: exposing complex backdoors in NLP transformer models. In: 43rd IEEE symposium on security and privacy, SP 2022, San Francisco, CA, USA, May 22–26, 2022, IEEE, pp.2025–2042.

41.

Sibai

Fabry

. Adversarial examples that fool detectors. CoRR, abs/1712.02494, 2017.

42.

Thys

Ranst

Goedemé

. Fooling automated surveillance cameras: adversarial patches to attack person detection. In: IEEE conference on computer vision and pattern recognition workshops, CVPR Workshops 2019, Long Beach, CA, USA, June 16–20, 2019, Computer Vision Foundation / IEEE, pp.49–55.

43.

Sharif

Bhagavatula

Bauer

, et al. Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In: Weippl ER, Katzenbeisser S, Kruegel C, Myers AC and Halevi S (eds) Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, Vienna, Austria, October 24–28, 2016, ACM, pp.1528–1540.

44.

Eykholt

Evtimov

Fernandes

, et al. Robust physical-world attacks on deep learning visual classification. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, Computer Vision Foundation / IEEE Computer Society, pp.1625–1634.

45.

Athalye

Engstrom

Ilyas

, et al. Synthesizing robust adversarial examples. In: Dy JG and Krause A (eds) Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, volume 80 of proceedings of machine learning research, PMLR, pp.284–293.

46.

Zhao

Zhu

Liang

, et al. Seeing isn’t believing: towards more robust adversarial attack against real world object detectors. In: Cavallaro L, Kinder J, Wang X and Katz J (eds) Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, CCS 2019, London, UK, November 11–15, 2019, ACM, pp.1989–2004.

47.

Zhang

Liu

, et al. Adversarial t-shirt! Evading person detectors in a physical world. In: Vedaldi A, Bischof H, Brox T and Frahm J (eds) Computer Vision – ECCV 2020 – 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V, volume 12350 of Lecture Notes in Computer Science, Springer, 2020, pp.665–681.

48.

Suryanto

Kim

Kang

, et al. DTA: physical camouflage attacks using differentiable transformation network. In: IEEE/CVF conference on computer vision and pattern recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, IEEE, pp.15284–15293.

49.

Simonyan

Zisserman

. Very deep convolutional networks for large-scale image recognition. In: Bengio Y and LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings, 2015.

50.

Howard

Zhu

Chen

, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.

51.

Huang

Liu

Weinberger

. Densely connected convolutional networks. CoRR, abs/1608.06993, 2016.

		Clean			Backdoor			Backdoor with $Δ_{phy}$
Object detector	Dataset ${Score}_{B}$	Low	Middle	High	Low	Middle	High	Low	Middle	High
YOLOv5s	$B$ -box Quantity	99.55	0.31	0.14	99.43	0.46	0.11	99.49	0.41	0.10
Baseline		99.35	0.49	0.16	99.68	0.30	0.02	99.53	0.38	0.09
$R D$ (Ours)		99.36	0.48	0.16	99.68	0.29	0.03	99.64	0.32	0.04