Abstract
Traffic sign detection technology plays an important role in driver assistance systems and automated driving systems. This paper proposes DeployEase-YOLO, a real-time high-precision detection scheme based on an adaptive scaling channel pruning strategy, to facilitate the deployment of detectors on edge devices. More specifically, based on the characteristics of small traffic signs and complex background, this paper first of all adds a small target detection layer to the basic architecture of YOLOv5 in order to improve the detection accuracy of small traffic signs.Then, when capturing specific scenes with large fields of view, higher resolution and richer pixel information are preserved instead of directly scaling the image size. Finally, the network structure is pruned and compressed using an adaptive scaling channel pruning strategy, and the pruned network is subjected to a secondary sparse pruning operation. The number of parameters and computations is greatly reduced without increasing the depth of the network structure or the influence of the input image size, thus compressing the model to the minimum within the compressible range. Experimental results show that the model trained by Experimental results show that the model trained by DeployEase-YOLO achieves higher accuracy and a smaller size on TT100k, a challenging traffic sign detection dataset. Compared to existing methods, DeployEase-YOLO achieves an average accuracy of 93.3%, representing a 1.3% improvement over the state-of-the-art YOLOv7 network, while reducing the number of parameters and computations to 41.69% and 59.98% of the original, respectively, with a compressed volume of 53.22% of the previous one. This proves that the DeployEase-YOLO has a great deal of potential for use in the area of small traffic sign detection. The algorithm outperforms existing methods in terms of accuracy and speed, and has the advantage of a compressed network structure that facilitates deployment of the model on resource-limited devices.
Introduction
In recent years, with the explosive growth of Internet data and IoT data, as well as the dramatic increase of computer computing power, deep learning-based computer vision technology has made breakthrough progress in the field of target detection. Among them, automatic driving system [1] and driverless technology [2] have become research hotspots in the automobile driving industry. The road transportation system, as the foundation of the national economy, is developing rapidly. At the same time, traffic accidents occur frequently. The main reasons for analyzing the frequent occurrence of accidents mainly include fatigue driving, driving in violation of traffic signs, and bad weather. Therefore, the research on driverless systems is particularly important. With the global outbreak of the New Crown Epidemic in 2020, many hospitals began using driverless vehicles to distribute emergency supplies. And today, driverless cabs have begun to operate in many countries, and driverless vehicles are beginning to attract public attention. In the automatic driving system, traffic sign detection is a key module, and its detection accuracy and the difficulty of deployment on driving devices are one of the challenges in the research of driverless systems. Thus, traffic sign recognition and model compression techniques are important components of automated driving technology.
Deep learning is a branch of machine learning that employs convolutional neural networks to explore abstract representations of data. As the core problem of computer vision, target detection has been a hotspot in the research of traditional image processing techniques and machine learning methods. Currently, deep learning-based target detection almost dominates the field of target detection. Depending on whether the candidate region is selected or not, these methods can be categorized into two-stage target detection networks [3] and one-stage target detection networks [4, 5], and have been developed and applied to small traffic sign detection [6–8] with state-of-the-art performance. Various feature extraction, fusion strategies and loss definitions have been integrated into the detection model. Typical two-stage detection algorithms are Faster-RCNN [9], one-stage algorithms are YOLO-series [10, 11], FCOS [12], etc. As a representative network of single-stage detection algorithms, YOLO (You Only Look Once) network has been widely researched and applied in computer vision due to its advantages of high accuracy, fast detection speed and flexible structure. However, as the accuracy of detection increases, the size of the model gradually increases, making it more difficult to deploy on edge devices. Without sacrificing accuracy, the model’s footprint on hardware resources must be reduced to make it suitable for edge devices. Therefore, the balance between model size and detection accuracy becomes critical.
For traffic sign detection, most studies use common target detection algorithms without considering that in real life, the farther the driving device is from the detected traffic sign, the smaller the traffic sign is at that time, and the traffic sign is usually in a complex background, for example, the earlier the traffic sign is detected in the process of automatic driving, the safer the driving process. For example, the earlier the traffic sign is detected in the process of automatic driving, the safer the driving process will be. Therefore, ordinary target detection methods cannot deal with the small traffic sign and the changing scale of the target, which reduces the ability to detect small targets [13–15] and reduces the overall detection accuracy.
This paper proposes a traffic sign detection algorithm DeployEase-YOLO, and optimizes YOLOv5 [38] to make it more suitable for small target detection. In order to improve the detection ability of small targets, this paper uses the Concat feature fusion module to fuse different feature maps, changing the original three-scale prediction to four-scale prediction. To solve the problem of increasing the network depth by feature fusion, this paper proposes a pruning strategy based on adaptive scaling of the channel, which determines the importance of the channel according to the average activation value to prune the unimportant channel, and then performs secondary sparsification pruning on the pruned network model. The pruned network model is then pruned by quadratic sparsification. Experiments on publicly available datasets show that the method achieves higher accuracy than existing algorithms and significantly reduces the size. The rest of the paper is organized as follows: Section II introduces related work, Section III outlines the optimization method, and Section IV discusses the experimental results. Finally, the paper is summarized by drawing conclusions and discussing future research directions.
Related work
In this section, previous research work on small traffic sign detection and model compression is summarized and analyzed, and a new algorithmic model is introduced based on this work.
Small Traffic Sign Detection
The two-step approach uses a region suggestion module to generate candidate objects, which are then further classified and positionally regressed. Cen et al. [16] proposed an improved algorithm based on Faster-RCNN. First, the small region suggestion generator is used to extract features of small traffic signs. Considering that the step size of the generator is too large, the pool4 layer of VGG-16 [39] is removed and ResNet is extended, and then the improved Faster-RCNN structure is combined with online hard sample mining to improve the robustness of the system to small traffic sign regions. Although these methods can achieve accurate results, the processing is complex and slow due to the two-step strategy.
Unlike the above two-stage approach, the single-stage detection algorithm does not require a region proposal stage, and directly generates positions and categories, thus providing faster detection speed. Many researchers have improved the single-stage algorithm to improve the traffic sign detection capability, Zhang et al. [17] proposed an improved YOLOv3 [40] target detection algorithm, YOLO-R [41]. YOLOv3 increases the original three feature scales to four, reduces the leakage detection rate, and adopts the K-means target frame clustering’s to the new target detection candidate frames, which improves the detection accuracy.
However, current traffic sign detection techniques still face challenges in detecting small target objects. Small target traffic signs usually account for only a small portion of the real road scene, which makes it difficult for the detector to extract relevant features, and the size of small traffic signs that can actually be detected in the traffic scene images captured by a real on-board camera of 2048 × 2048 pixels is only about 36 × 36 pixels, and these small targets are easily ignored or misdetected during detection due to their low resolution and limited amount of information, which poses a great challenge to the small target detection task.
Therefore, in addition to the simple detection of small traffic signs, Li et al. [18] used an improved YOLOv5 target detection structure for traffic sign detection, which uses stacking of residual blocks to increase the depth of the network, and adds channel attention squeezing and excitation SE to enhance the learning ability of small traffic signs. Jie et al. [19] guaranteed the accuracy of the lightweight network for the traffic sign detection task on the proposed a lightweight traffic sign detection algorithm CDYOLO based on the YOLOv4-Tiny [42] improvement, which uses the CBMA [43] attention mechanism and depth-separable convolution to improve the backbone feature extraction and detection header.
Model compression
In recent years, due to the continuous advancement of deep learning, most of the research on traffic sign detection models focuses on detection accuracy, and the target detection models have achieved remarkable accuracy and are widely used in various fields. However, hardware limitations can lead to network performance degradation, so lightweight network solutions embedded in mobile or micro devices are needed.
Nowadays, most of the researches use mainstream lightweight network models, such as MobileNetV3 [36], ShuffleNet [37], etc. and they improve the efficiency through different convolution methods and structures. Table 1 lists the above and other existing detection methods based on lightweight small targets. Li et al. [20] replaced the feature extraction module in YOLOv3 by a lightweight network MobileNetV3, which solves the storage limitation in mobile devices, thus reducing the network size and increased the speed. These approaches typically embed the lightweight network into the detection model, but lack direct control over the final degree of compression, and thus do not effectively balance the model’s volumetric size and detection accuracy. Model compression [21, 22] techniques, on the other hand, do not directly change the network structure, but rather reduce the size of the model through a variety of methods, such as removing unnecessary connections through pruning, reducing the accuracy of weights through quantization, and fitting the output of a larger model using a smaller model through distillation, which can make it possible to maintain or minimize the performance loss while reducing the model size. The adaptive scaling channel pruning strategy proposed in this paper allows the weights or number of channels to be adjusted according to the input data and the actual needs of the network. This flexibility allows the model to automatically adjust under different input scenarios, which improves the adaptability and robustness of the model.
Lightweight small target detection method
Lightweight small target detection method
While YOLOv5 demonstrates excellent target detection capabilities, its widespread use as a foundation for existing algorithms requires customized adaptation for hardware deployments. To address this issue, this paper makes significant improvements to the YOLOv5 model by introducing the novel "DeployEase-YOLO" model, whose architectural overview is shown in Fig. 1. Initially, to improve the detection accuracy of small traffic signs, a small target detection layer is added to the head section. The addition of a small target detection layer can better capture the detail information and improve the detection accuracy. The addition of a small target detection layer can better capture the detail information and make predictions at multiple scales, thus improving the detection accuracy. In addition, to efficiently retain a large amount of pixel information, a larger image input size of 1280 × 1280 is used instead of the traditional 608 × 608 or 640 × 640. Lastly, this paper proposes an adaptive scaling channel pruning strategy to effectively prune out redundant parameters in the improved model. The core principle behind this strategy lies in dynamically adjusting the pruning rate based on channel importance. First, the channel importance is calculated using the channel The average of the activation values is used to measure the importance of each channel. The channels with lower importance are considered redundant and can be pruned. Then, based on the calculated channel importance, a threshold is determined to classify the channels into important and non-important channels. A pruning ratio is set to retain the proportion of important channels, the weight of unimportant channels is set to zero, and the unimportant channels are pruned away, and then the pruned network model is pruned and fine-tuned using secondary sparsification, which compresses the model size, reduces the model complexity, and reduces the model size. By compressing the model size, reducing the model complexity and computational volume, so as to improve the detection performance of the model and the volume of the prediction model, and to realize the high and predictive model volume, thus realizing high precision and easy deployment of the model.

DeployEase-YOLO network architecture.
YOLOv5 is an efficient target detection algorithm, which is the latest version of the classic target detection algorithm YOLO improved and optimized. Compared to previous versions, YOLOv5 uses data enhancement and automatic learning of bounding box anchoring algorithms in the input section. Using mosaic data augmentation, YOLOv5 can simulate some real scenes with obstacle occlusion or partially occluded targets to improve the algorithm’s ability to detect and recognize occluded targets. The automatic learning of bounding box anchors is to set corresponding anchor frames for different data sets, and YOLOv5 anchors are automatically learned based on the training data. The design idea of YOLOv5 is single-step detection, which can complete the target detection in one forward pass without the need for a single step. YOLOv5 is designed as a single-step detection, which can complete the target detection by one forward pass, without the need of additional region suggestion generation step, which greatly simplifies the detection. Therefore, the network is suitable for traffic sign detection in real road environments, as shown in Fig. 2. YOLOv5 has five network structures, namely YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x and YOLOv5n. YOLOv5s Balance of speed and precision.

YOLOv5 network architecture.
As the current state-of-the-art deep learning target detection algorithm YOLOv5. YOLOv5 has already accumulated a large number of tricks, but it is still prone to the problem of false and missing detection when dealing with some complex background problems, YOLOv5 is still prone to errors and omissions. One of the reasons for the poor detection of small targets is that the small target samples One of the reasons why YOLOv5 is not effective in detecting small targets is that the size of small target samples is small, and the downsampling multiplier of YOLOv5 is relatively large. It is difficult to learn the feature information of small targets from the deeper feature maps. Therefore, it is proposed to add a small target detection layer to detect small targets after splicing the shallow feature maps with the deep feature maps. Therefore, it is proposed to add a small target detection layer to detect the shallow feature maps after splicing with the deep feature maps. Taking the 1280×1280 network input as an example, adding a small target detection layer has the following advantages.
(1) YOLOv5s has three feature map sizes: 20×20, 40×40 and 80×80. 80×80 feature maps are responsible for detecting small targets. However, the pixel size of the images in the TT100k dataset is 2048×2048, and most of the traffic signs are smaller than 50×50, so it is necessary to add a detector head dedicated to detecting small objects to reduce the computation of detecting large objects and thus speed up the detection of small objects.
(2) First, in this paper, image features are extracted using CSPDarknet53 to obtain the feature information of different layers, and the feature maps extracted by the fusion backbone network are fused from deep to shallow and from shallow to deep, respectively. The different feature maps are fused by the Concat feature fusion module. Then, four scales (20 × 20, 40 × 40, 80 × 80, 160 × 160) are finally output.
(3) By adding a small target detection layer, the original three-scale prediction is changed to four-scale prediction, so that YOLOv5 can better adapt to targets of different sizes, thus improving the accuracy of multi-target detection. At the same time, by training specifically for small targets, the detection ability of the whole model for small targets can be improved, thus improving the accuracy of the whole model, as shown in Fig. 1.
Due to the increasing depth of the network, a lot of unnecessary parametric quantities will be added, in order to remove these parametric quantities that play a small role in the image feature extraction, the following model compression method based on the adaptive scaling channel pruning strategy is proposed, which can be achieved by judging the degree of importance of the channel and retaining the proportion of the important channel so as to achieve the compression of the model, and then perform a second sparsifying pruning trimming on the pruned network model.
Adaptive scaling channel trimming
This paper proposes a new pruning algorithm to solve the problems of limited device computational power and the difficulty of deploying the model on resource-limited devices. This technique aims to compress and accelerate neural network models by reducing the number of parameters and overall computational complexity. Unlike direct pruning methods that rely on predetermined ratios or thresholds to prune parameters, the adaptive scaling channel pruning strategy selectively prunes channels by evaluating their importance. This adaptive approach allows the method to tailor itself to specific models and tasks, preserving essential channels and minimizing the impact on model performance. Consequently, it maximizes the compression of the final model, as illustrated in Fig. 3.

Adaptive scaling pruning process.
Once the initial training of the model is complete, it is determined which channels can be pruned by evaluating the importance of each channel to the model’s performance. First, the average activation value for each channel is computed; to compute the average activation value, the activation value of the model on the training samples needs to be obtained through forward propagation. The steps are: first, train the data with the model to get the output of the model on the data batch; for each channel, calculate the sum of activation values for all samples on the data batch; divide the sum of activation values for each channel by the batch size, i.e., the number of samples, to calculate the average activation value. Let S j denote the average activation value of the j - th channel, N denote the number of samples, and A ij denote the activation value of the i - th sample on the j - th channel, the average activation value is calculated as follows:
After evaluating the importance of the channel, the weight of the unimportant channel is set to zero, let W
ij
denote the weight of the ith sample on the jth channel and Mj denote the binary mask of the jth channel, if the channelability is below the threshold, then M
j
= 0, otherwise M
j
= 1, and the calculation process is as follows:
Through the above adaptive scaling channel pruning operation, it has been possible to reduce the number of parameters and computational complexity of the model to a large extent. However, it is hoped that the model density can be reduced to a greater extent, so the model is sparsified after pruning and then pruned in one step, and the pruning method adopts gamma pruning based on the coefficients of the BN layer. First, the sparsification of gamma can be achieved by adding the L1 regular constraints of gamma to the loss function, let (x, y) denote the training inputs and targets, W denote the trainable weights, the first term is the normally trained loss function, and the second term is the constraints, where g (s) = |s|, and λ is the regular coefficient, The loss function L is formulated as follows:
It is only necessary to multiply the output and coefficients of the fitting function of the weights at the BN layer weights in training, backpropagation. In fact, in YOLOv5, only Bottleneck in the backbone has a shortcut, and Head has no shortcut at all. The histogram distribution of λ with epoch during normal training is approximated to be positively distributed, and there are very few values around 0, so it is impossible to prune. Therefore, L1 rule constraints need to be added to Eq. (4) to sparsify the parameters, as shown in Fig. 4. This operation can further reduce the model memory and computation, and improve the inference speed. After training is completed, pruning can be performed according to the set pruning ratio. However, the sparsification operation may have a certain impact on the accuracy of the model because sparse matrix multiplication introduces a certain computational error, so the model needs to be repeatedly fine-tuned for training after sparsification.This can minimize the impact of the sparsification operation on the accuracy of the model to some extent.

Normal and post-thinning training.
In this section, the algorithm is tested on the TT100K dataset, which is currently a challenging dataset in the traffic sign domain. The Pytorch deep learning framework is used to implement the small traffic sign detection algorithm proposed in this paper. TT100k is a challenging dataset that contains many small traffic signs covering weather and lighting variations. traffic signs with weather and lighting variations.

Some pictures from the TT100k.
The TT100k [2] dataset was released by Tsinghua University and Tencent, and there are 9176 images with a resolution of 2048×2048 in the TT100k dataset. There are 6105 images used as training images and 3071 images used as test images. However, many of the categories in this dataset had small sample sizes, resulting in an uneven distribution of samples. To solve this problem, the dataset in this paper was repartitioned and only categories with more than 100 images were kept. First, the number of images in each category was counted, and after removing some images, the dataset was repartitioned into 45 categories. The proportion of images was adjusted to form a new dataset with 6105 images in the training set, 1023 images in the validation set, and 2043 images in the test set. Figure 6 visualizes some selected images from the TT100k dataset, which clearly shows that the dataset has a complex traffic background and a very small target size.

Performance of our algorithm on TT100K.
To further enhance the data, this paper uses data enhancement methods [23, 24] such as random scaling and cropping, random horizontal flipping, and random rotation. Because of the special characteristics of traffic signs, for example, there are left signs and right signs, it is necessary to use the special data enhancement methods such as random horizontal flipping and random rotation. The combination of these data augmentation methods can increase the diversity of training data, improve the generalization ability of the model, and enhance its adaptability to different scenes and targets.
The experiment was trained using the Pytorch 1.10.0 deep learning framework. The GPU used in this experiment is NVIDIA RTX 2080Ti, and stochastic gradient descent (SGD) was used to train the network. The impulse parameter was 0.949, the decay value was 0.0005, and each model was trained for 120 epochs with an initial learning rate of 0.01. Each iteration of the training process consisted of a forward propagation and a back propagation. The forward propagation process predicts the outcomes and the back propagation process updates the weight parameters.
Traffic sign detection falls under the category of object detection. In object detection, accuracy and recall are usually used to evaluate the detection performance of a detection network. The accuracy rate is the ratio of correct detections among the detected objects, and the recall rate is the ratio of the number of correctly detected targets in the test set to the number of targets to be detected. The formulas are as follows:
Where AP i measures how good the learned model is on a separate category, and mAP measures how good the learned model is on all categories. To measure the amount of computation in the convolutional layer, it is usually measured in Floating Point Operations (FLOPs). The formula for FLOPs is given below:
Where H, W, and Cout denote the height and width of the output tensor and the number of channels, respectively, K denotes the size of the convolution kernel, C in denotes the number of channels of the input tensor, and the number 1 denotes the calculation of the offset. To measure the number of parameters in a convolutional layer, the total number of parameters (params) is usually used for the measurement. params is calculated using the following formula:
Results on TT100k
The principle of YOLOv5 small target detection was analyzed above, and then DeployEase-YOLO, a real-time high-precision detection method based on adaptive scaling channel pruning strategy, was proposed for the used dataset. The training process of the algorithm is described in detail in terms of precision, recall, mAP@0.5, mAP@0.5:0.95, and F1, as shown in Fig. 6. From Fig. 6, it can be seen that after 300 rounds of fine-tuning training, the performance of the algorithm in traffic sign detection is gradually stabilized. The experimental results show that the method performs well on the TT100k dataset.
Comparison of experimental performance after each improvement
In order to compare the effects of different improvements on the model’s detection performance, ablation experiments are performed in this section. Since the network improves the original model in three ways, in order to verify the feasibility of the effect of each improvement, the performance of the model after each improvement is investigated in this section. The first improvement is to change the input size of the image from 640 to 1280, the second improvement is to add a small target layer on top of the original model structure, and the third improvement is to use an adaptive scaling channel pruning strategy on the model, and subsequently secondary pruning, iterative iterations, and fine-tuning training. Table 1 shows the detailed experimental arrangements and strategies, and a comparison of the four models is given in this section.
Yolov5 scales the input image size to a specified size by default when performing raw network training. Currently, most of the algorithms used for training on the TT100k dataset choose to scale the image size to 640 × 640.However, after a detailed study of the TT100k dataset, it is found that the image size of this dataset is 2048 × 2048, which is a kind of large-size image. And most of the detected traffic signs are around 50×50 in size, and the background of this dataset is complex. If the image is scaled to 640×640 for training, it will result in losing a lot of information and image features. In order to improve the accuracy of small target detection as well as to enhance the generalization ability of the model, a higher resolution 1280×1280 input image is used for training. In this way, the model can learn more contextual information and visual features. The final trained model is better adapted to new scenes and targets when applied to images of other resolutions.
After the adaptive scaling channel pruning strategy is completed, the model needs to be pruned twice by the sparsification operation. Since the value of sparsity rate (sr) directly affects the sparsity of the model, several values are randomly selected to perform the sparsification operation on the model first, and the value of sr is adjusted according to the gamma distribution of the batch normalization (BN) layer during the model sparsification process. As shown in Table 3, when sr > 0.005, the accuracy of the sparsified model decreases sharply, while when sr < 0.0001, the accuracy of the sparsified model decreases gradually and slightly. By observing the gamma distribution, 0.0001 is finally chosen as the sparsification parameter, and the model is fine-tuned after sparsification to ensure that the gap between the accuracy of the model after pruning and that before pruning is kept within a very small range.
Performance comparison of different networks(%)
Performance comparison of different networks(%)
As shown in Table 2, both the adaptive channel pruning strategy and the secondary pruning have a slight effect on the recall and precision. After pruning, the recall and precision of the model decrease from 90.6% and 89.1% to 89.8% and 88.0%, respectively, which is a decrease of 0.8% and 1.1%, respectively. Therefore, the precision inevitably decreases due to the pruning of the network channels. However, through repeated iterations of the fine-tuning process, the accuracy degradation is successfully controlled to about 0.2%. It is worth noting that despite the slight decrease in accuracy, the model size is significantly compressed, while the model recognition speed is also improved, as shown in Table 4.
Detailed arrangements and strategies for ablation experiments(%)
Effect of different sparsification parameters
In addition, it can be seen from Table 4 that the detection speed of YOLOv5-Pruning is only 15.0ms, and the detection speed of YOLOv5-Layer is about 1.33 times faster than that of YOLOv5-Pruning.YOLOv5-Pruning uses fewer parametric quantities and computations, which reduces the consumption of computational resources and shortens the response time. The model is easier to deploy on edge devices, and YOLOv5-Pruning has better overall performance and can be used as the final model for unmanned traffic sign detection systems.
As shown in Table 5, a comprehensive comparison of the proposed method with other state-of-the-art methods was performed to verify the superiority of the proposed method in traffic sign detection. From Table 4, it can be seen that the method has better detection performance compared to other methods, striking an optimal balance between accuracy and model size. DeployEase-YOLO has a higher mAP.Compared to the lightweight network YOLOv4-tiny, the model has a significant improvement in terms of accuracy, and the model size is also smaller. For autopilot systems, the smaller the model size, the easier it is to deploy on resource-limited devices and the less device resources it consumes. Considering the detection accuracy, detection speed, number of model parameters, and model computation, the model provides better performance in traffic sign detection. In addition, compared to the latest YOLOv7 network, DeployEase-YOLO not only significantly improves the accuracy of the model, but also improves the compression rate of the model, confirming the reliability and practicality of the algorithm.
Comparison of volume sizes of different models
Comparison of volume sizes of different models
Comparison with current advanced algorithms(%)
Figure 7 shows some of the image detection results of the trained model on the TT100k dataset. These images show the complex and varied background in the dataset, and due to the relative smallness of the traffic signs, zooming is performed to present the detection results more clearly. The results show that the method is able to accurately detect individual traffic signs, further validating its feasibility.

Some test images of our algorithm on TT100k.
In order to achieve lightweight deployment of the model on various edge devices with limited resources and high detection accuracy at the same time, this paper improves the traffic sign detection model YOLOv5s used in real road environments and proposes a joint optimization strategy by adding a small target detection layer, adaptive scaling channel pruning, and sparse quadratic pruning. First, a larger image size is used for training, and a new small target detection layer is added to the original network model structure to improve the small target detection accuracy. Second, an adaptive channel scaling algorithm is introduced to reduce the computational burden and significantly compress the model size. Finally, the model size is further reduced by secondary pruning, and the accuracy loss caused by pruning is compensated by fine-tuning the training. Experiments show that the model exhibits superior performance compared to the original model. Not only is the number of parameters and the computational volume of the model significantly reduced, but the algorithm also shows higher superiority compared to similar state-of-the-art methods.
In future work, we will focus on further optimizing the detection algorithm to minimize the model’s detection time, aiming to achieve both lightweight and real-time performance, effectively addressing the diverse demands of the automatic driving system in complex environments.
