Abstract
Many feature pyramid models now use simple contextual feature aggregation, which does not make full use of the semantic information of multi-scale features. Therefore, Multi-scale Redistribution Feature Pyramid Network (MRFPN) is proposed. In order to strengthen feature fusion and solve the two problems of feature redundancy and high abstraction, modified-BiFPN is designed. The features output by the modified-BiFPN module are semantically balanced through the balanced feature map, so as to alleviate the semantic differences between multi-scales. Then a new channel attention module is proposed, which realizes the multi-scale association of the feature information fused to the balanced feature map. Finally, a new feature pyramid is formed through the residual edge for prediction. MRFPN have been evaluated on PASCAL VOC 2012 dataset and MS COCO dataset, which has higher detection accuracy compared with other state-of-the-art detectors.
Introduction
From simple multi-layer neural network operation [19,20] to the proposal and introduction of convolution layer [42,53]. From the understanding of convolution characteristics to the design of deep convolution model [46,50]. It can be see that deep convolution neural network has developed rapidly. Among them, the performance of object detection [3,8,39,40], instance segmentation [24,27,28,34] and semantic segmentation [11,23,38,55] has been significantly improved. No matter what kind of computer vision task, feature semantic information has a decisive impact on the prediction results. The feature pyramid model gradually appears in the deep neural network. The earliest feature pyramid model can be traced back to 1984 and was proposed by Adelson E.H.et al [1], which can be used for image processing. Then with the popularity of deep learning, models based on feature pyramids have been widely used in different computer vision tasks. Although the low-level features have a small number of features and a lot of noise, they contain rich location information and detail description. The high-level features have more feature and richer semantic information, but they lack the location information of the object and have poor perception of details. It can be seen from the above that the high-level and low-level are complementary. If they are reasonably combined, it will certainly be of great benefit to the final prediction. FPN [26] through ingenious horizontal connection was a paradigm of multi-scale feature fusion in early.
In feature fusion, features are propagated the top-down path after passing through the bottom-up path. Low-level features can be improved with stronger semantic information from high-level features [41,47] (high-level features can also use low-level features to supplement information). The up-sampling operation is usually used in the multi-scale feature fusion, but this will cause the loss of high-level features such as detail texture. Therefore, Mao et al. [31] proposed the pyramid frequency feature fusion network. Different frequency features are formed through the low and high frequency enhancement pyramid after the depth features are extracted by the primary pyramid, which makes the object location more accurate. However, these models still have limitations: (1) it is difficult to improve the detection performance by simple horizontal connection; (2) the semantic information captured by different receptive fields is difficult to establish interconnection. Although the horizontal connection can achieve the purpose of multi-scale fusion, there are still shortcomings. On the one hand, for some complex data samples. It is difficult to learn the required features by simple fusion, this will cause insufficient object information and affect subsequent predictions. On the other hand, the reduction of the number of high-level feature channels during fusion will also lead to feature loss due to the information gap between high-level and low-level features. According to ACFPN [4], it argues that only relying on the bottom-up path of stacking of receptive fields and element-wise addition to merge these receptive fields does not facilitate information transmission. Therefore, the multi-scale redistribution feature pyramid network (MRFPN) is proposed in this paper.

MRFPN overview.
Inspired by the receptive field, high-level features have stronger ability to represent semantic information, while low-level features have stronger ability to represent geometric details information. Therefore, in order to enhance the fusion of different scale features captured by different receptive fields, MRFPN uses the improved EfficientDet [39] (multi-version model) for feature extraction and preliminary feature processing. This is because the higher the version, the corresponding modules will be increased, which will lead to the performance reduction of MRFPN. In this case, MRFPN choose the second version. In addition, this paper argues that there is feature redundancy in the feature processing of EfficientDet. Therefore, this paper proposes a new solution that reduces the parameters of preliminary feature processing and also facilitates subsequent feature processing. In the subsequent feature reprocessing stage, this paper introduces balanced feature map, channel allocation module designed based on the attention mechanism and residual edges. Their purpose is to further correlate the features at multiple scales, and then combine the original semantic information to restructure the new feature pyramid for prediction. The reconstruction process of the feature pyramid is shown in Fig. 1. The redistribution feature pyramid in the dashed box in the figure is the main work of this paper, which will be introduced in Section 3.
Feature pyramid
Feature pyramid detects objects through different layers and level windows to generate semantically representative multi-scale features. Associating image features from different spatial scales and resolutions gets the final prediction. There are many networks similar to the feature pyramid model [10,13,15,29,35,36,52]. The high-level and low-level features are predicted separately, which reduces the robustness of object detection. Therefore, on the basis of SSD [29], DSSD [10] with deconvolution fusion and DSOD [52] with dense connection fusion are proposed respectively. The researchers found that for FPN, although simple horizontal connection can easily realize the correlation of different levels of features, feature detail description and the accuracy of detection need to be improved. The later Fast-RCNN [13], Faster-RCNN [35] and Mask-RCNN [15] are all improvements to FPN. Zhang et al. proposed RetinaNet [36] with ResNet and FPN as the backbone network. Positive and negative samples are balanced by Focal Loss function in each layer of classification to improve the detection accuracy. In terms of improving small object detection, the pyramid model with residual [6] has more effective and powerful performance. It is predicted by residual form RECODE module and purification module. The single-connection feature pyramid model is easy to cause the loss of low-level features. In order to prevent this situation, the bi-directional pyramid model appears [30,32,44]. Currently, the pre-training cost of the backbone is high [16,18,37,51] and some backbone are designed based on the image classification task. This can lead to poor object detection performance. Therefore, CBNet [30] was proposed by Liu et al. CBNet predicts features through a composite backbone network. In DetectoRS [32], the recursive feature pyramid in the model feeds back the top-down features to the bottom-up features through additional connections. The switchable dilated convolutions are added to the feedback connection to enable the model to adaptively select the receptive field. In GBFPN [44], the features of the highest layer and the lowest layer of the backbone form the gated top-down module and the gated bottom-up module respectively. They correspond to each other at the same level to form a bi-directional feature pyramid for final prediction. In order to fuse contextual information more efficiently, this paper proposes MRFPN. MRFPN processes the extracted abstract feature information and combines the residual edges for subsequent detection.
Attention mechanism
There are many attention modules [7,12,17,22,45] in deep learning. Its introduction enables the model to better focus on a certain part of the input. The attention mechanism can strengthen the expression of features with a large amount of information, while suppressing the expression of less useful features.It can enable the model to adaptively focus on important regions in the context. SENet [17] pays attention to the relationship between channels and wants the model to dynamically learn the importance of different channel features. On the basis of SENet, CBAM [45] also globally pools the spatial dimensions. When detecting objects of different sizes, the receptive field will be adjusted according to the detection situation to determine the convolution size. SKNet [22] uses dynamic selection mechanism to adaptively select the receptive field to detect objects of different sizes. The attention mechanism is also widely used in the feature pyramid model. In BlenderMask [5], it generates corresponding attention for each proposal box, so it can capture instance-level semantic information. The feature distribution and prediction scores are uneven among example objects at different locations, so Zhu et al. proposed the SAPD [54]. It assigns different weights to each positive anchor point to achieve a combination of attention bias and feature selection. Feature maps at different scales may lead to classification difficulties in image detection, so Li et al. proposed PAN [21] that introduces attention mechanism into convolution. Different from the above attention module, this paper pays more attention to the relationship between the channels of the feature map. The channel allocation module proposed in this paper can provide each channel a different feature information focus after the introduction of the balanced feature map. Finally, they are assigned to the original feature map to achieve better prediction.

The overall structure of EfficientDet.

The overall structure of MRFPN.
MRFPN overview
This paper proposes MRFPN (Multi-scale Redistribution Feature Pyramid Network) on the basis of EfficientDet, and the overall framework is shown in Fig. 3. EfficientDet is shown in Fig. 2. It is mainly composed of the feature extraction(EfficientNet) and BiFPN module. BiFPN modules will be added according to the model version, up to 8 BiFPN serial connections.
MRFPN consists of three parts: (1) modified-BiFPN; (2) channel allocation module (CAM), including balanced feature map (
MRFPN simply divides the feature processing into the preliminary feature processing stage (modified-BiFPN) and the feature re-processing stage (CAM). In order to reduce the number of parameters and the interference of similar or identical feature information, MRFPN changes the information transmission form of the node in the feature preprocessing stage, that is, modified-BiFPN. In order to establish the channel relationship, this paper proposes channel allocation module. In order to retain more feature details and inspired by ResNet, this paper proposes a new residual connection method to generate new the feature pyramid for prediction.
Modified-BiFPN
Multiple BiFPN will form multiple bottom-up and top-down paths in EfficientDet. The form of BiFPN feature fusion is to down-sample the upper-level features in the top-down path and fuse them with the lower-level features. Then the output node of the previous BiFPN is transmitted to the input node of the next BiFPN.

BiFPN module and modified-BiFPN module.
From the flow principle of feature information in BiFPN and modified-BiFPN, the blue background bar indicates that the input node and the output node are at the same order in Fig. 4.

Feature information redundancy.
According to Han [14] and Zhang [49], the information difference between layers is the difference caused by convolution, so similar feature information is likely to be generated between different layers. Although rich redundant information tends to ensure a comprehensive understanding of the input data, they can lead to an excessive stack of redundant information at nodes to effect detection result. Therefore, the original feature extraction part is kept and D1 version is designed in this paper, which has four improved BiFPN modules (modified-BiFPN). The design has two advantages: (1) reduce unnecessary operations at the initial stage of feature processing. Reducing the number of parameters while leaving enough space for subsequent processing; (2) for input node of the same order, it inputs its own information into both the input node of the same order and its input node of the next order. This way reduces the secondary transmission of the same information to the same output node
Modified-BiFPN is the preliminary feature processing stage, in which it is hoped that the feature maps of different levels can establish preliminary correlation. If it is increased, passing highly abstracted features processed by modified-BiFPN into subsequent feature processing can destroy features and lead to spatial imbalance. It can achieve the above two advantages very well. MRFPN-D2 (5 modified-BiFPN) is compared to MRFPN-D1 in Section 4. It can be seen from the experimental results that the performance of MRFPN-D1 is higher than that of MRFPN-D2.
The channel allocation module (CAM) proposed in this paper mainly consists of balanced feature map and channel allocation attention (CAA).
Balanced feature map
Object detection would simply integrate multi-level features through horizontal connection in the past. But now there are more and more bi-directional feature pyramid models that realize feature fusion through bi-directional connection operation. The balanced feature map strengthens multi-scale features by means of the same balanced semantic features in the spatial dimension.
At present, the prediction methods of the feature pyramid model can be roughly divided into direct prediction of high-level feature map, direct prediction of low-level feature map and direct prediction of feature pyramid, as shown in Fig. 6 (a), (b) and (c). If (a) or (b) are directly used to predict, the information of small or large object will be missing due to the limitation of receptive field and semantic information acquisition. This will reduce prediction accuracy of smaller or larger objects and the average accuracy. If (c) is used to predict, it will increase the amount of computation and memory consumption. A large number of object feature information will be lost if the feature fusion is not appropriate, and even the spatial balance of the model will be destroyed. Thus, this paper proposes indirect prediction method, as shown Fig. 6(d). It scales down the multi-scale features to the same object feature size and then fuses them to achieve the effect of multi-scale information association. This fusion method can achieve the same purpose as the previous method without too many parameter and the final effect is better.

Pyramid model prediction method.
The multi-level features after preliminary feature processing are marked as
Where

Balanced feature map operation.
Considering the difference of semantic information between high-level and low-level features, it is important to integrate them to express the final result. This fusion method inherits and balances the feature information from all levels. This inheritance means that the features of each level can obtain equivalent semantic information from other levels. This balance means that balanced feature map itself is to aggregate the global information, so as to achieve the balance of feature information and have better information discrimination ability.
For the object detection task, the extracted features include object information(useful information) and non-object information(useless information). It is obvious that useful information has a good impact on the final prediction. SENet believes that the receptive field of the convolution kernel observes local information. Only after the multi-layer convolution is stacked, can different parts of the area be associated, as shown in Fig. 8. The feature map distinguishes the importance of different channels through squeeze and excitation operations.

SENet module.
The feature map F is convoluted to generate the feature graph
Inspired by SENet, this paper proposes a new channel allocation attention mechanism, as shown in Fig. 9. The purpose is to distinguish the contribution of useful information and useless information, and to weaken the impact of useless information to make the prediction results in a better direction. The core of Channel Allocation Attention (CAA) is the dual global pooling and the sigmoid function. CAA not only makes the spatial transformation of input more robust, but also makes the relationship between feature maps more intuitive. The sigmoid function is used to convert the previous information into a representation similar to the weight to form channel attention, which is an important condition for the subsequent generation of a new feature pyramid.

CAA structure.
Taking the balanced feature map
Where GAP represents global average pooling, GMP represents global max pooling, and σ represents the sigmoid function. After σ function, the values of each channel of the feature map with 1 Õ 1 spatial scale are between 0 and 1, which more intuitively represents the importance of the feature information.
MRFPN uses channel descriptors to represent each channel information by introducing global pooling. For the global max pooling, it pays more attention to a part of the object and the edge information of the object. It has no effect on other areas. For the global average pooling, it pays more attention to the whole object and aggregates all the feature information to get the final activation. The former has a higher classification advantage. The latter has a higher recognition ability. Therefore, CAA combines the two in order to better play the role of the global pooling.
This paper discards the fully connected layer with redundant parameters in CAA, mainly based on the following three considerations: (1) both deep convolution features and fully connected layer features are beneficial for recognition tasks, but deep convolution features contain more local information [43]. (2) there are a large number of parameters in the fully connected layer, which will increase the amount of additional training and excessive computational overhead. Therefore, the discard of fully connected layers in CAA makes the training speed of the model faster. (3) both the fully connected layer and the attention mechanism can focus on the local information of the input features. CAA is a separate module compared to the fully connected layer. Purpose of CAA is to make the network focus on the connection between features and results. (4) convolution preserves the semantic and spatial dimensions of the object. After convolution in MRFPN, the object already has its own properties, which includes location information. Although the fully connected layer can also perform classification, it focuses more on object category attributes rather than location information. If the fully connected layer is introduced in CAA, the localization ability will be weakened or even lost, which is not good for the final prediction.
Deep learning models are becoming more and more complex and training costs increase accordingly. If the gradient disappears or gradient explodes, the model will be destroyed. The residual edge can dynamically adjust the complexity of the model. Although the structure of the MRFPN is relatively complex, each module does not affect each other. The introduction of residual edge in MRFPN can fine-tune the model. But more importantly, the output of the CAM contributes to the final prediction. Therefore, the residual edge feeds back the feature information in the original feature pyramid (the original feature is the feature after feature extraction, not the feature after modified-BiFPN) to the output of CAM to reconstruct a new feature pyramid. Compared with new features after modified-BiFPN, the original features directly reflect the original properties of the differences between objects. Although there is a certain correlation for new features, the separability between objects is reduced, which is not conducive to the final prediction. The new feature pyramid model is shown in Fig. 10.

Residual edge reconstruction feature pyramid.
The original image gets
After a series of processing,
Experiment setup
This paper conduct experiments on two widely used benchmarks: PASCAL VOC 2012 and MS COCO [25] to evaluate MRFPN. In the experiments on the PASCAL VOC 2012 dataset, ablation experiments of different modules of MRFPN, comparison experiments of MRFPN with EfficientDet and comparison experiments of MRFPN with state-of-the-art detector have been conducted. Experiments on MS COCO dataset are different from the PASCAL VOC dataset and require experiments on different IoU thresholds (0.5, 0.75, 0.5: 0.95). In this paper, Validity experiments of modified-BiFPN, comparison experiments of MRFPN with EfficientDet and comparison experiments of MRFPN with state-of-the-art detectors have been conducted on MS COCO dataset. Batch Size is set to 16, the number of iterations is 100. This detectors is trained with the SGD optimizer with momentum of 0.9 and weight attenuation of 4e-5.
PASCAL VOC 2012
PASCAL VOC 2012 mainly serves three tasks: image classification, object detection and image segmentation. This dataset has four major categories, namely people, common animals, vehicles and indoor furniture. This paper uses PASCAL VOC 2012 trainval and PASCAL VOC 2007 trainval + test for training, and tests on the PASCAL VOC 2012 test set. The experimental results are shown in Fig. 11.
Figure 11 shows the relationship between Loss and mAP. The horizontal axis represents the final Loss value of each model after 100 iterations. The vertical axis represents the mAP value on PASCAL VOC 2012 according to the Loss value. The models of picture include MRFPN-D1 without CAA or GAP or GMP, MRFPN-D1(D1 version of MRFPN), MRFPN-D2(D2 version of MRFPN), EfficientDet-D1(D1 version of EfficientDet), EfficientDet-D2(D2 version of EfficientDet). These models are shown in different colors in Fig. 11 and on the right side of the figure. MRFPN-D1 has the highest detection accuracy among them.

The Loss-mAP of EfficientDet and MRFPN.
The experiments in this paper also explore whether the presence or absence of modified-BiFPN and CAM is consistent with the expected hypothesis. The mAP value of MRFPN without CAA is 80% from Fig. 11 and Table 1, which is close to 2% lower than MRFPN-D1. It can be seen that it is very important for the model to distinguish useful information from useless information by focusing on the internal relationship of features through the attention mechanism. There is a significant difference in the experimental results obtained by introducing only GAP or GMP in CAA. The mAP values for both are 81.5% and 80.7%, which are slightly lower compared to MRFPN-D1. Although global pooling can aggregate information, global average pooling aggregates global information by summing and averaging all feature information, which can make the spatial transformation more robust. However, global max pooling selects the max of all information, which may discard some information. It can be seen from Table 1 that GAP-only MRFPN has better performance compared to GMP-only MRFPN. MRFPN has higher performance than both. It can be reflected from Table 1 that the information aggregated by the global max pooling may be complementary to the global average pooling information.
Ablation study of MRFPN on the PASCAL VOC 2012 dataset. Performance of different modules in MRFPN
Ablation study of MRFPN on the PASCAL VOC 2012 dataset. Performance of different modules in MRFPN
It is obvious from Table 2 that EfficientDet-D2 is much more accurate than EfficientDet-D1. Because more BiFPN enable better feature fusion. The mAP values of MRFPN-D1 and MRFPN-D2 are 81.8% and 80.2% after the improvement. MRFPN-D2 does not leave enough processing space for the subsequent feature refinement compared to MRFPN-D1. This is because it further abstracts the features in the initial processing stage. The original information of the features is destroyed after the feature reprocessing stage for highly abstracted features in higher versions, which may lead to lower detection accuracy. This also verifies the hypothesis presented in Section 3. All MRFPN mentioned later are D1 version.
Comparison of EfficientDet and MRFPN on the PASCAL VOC 2012 dataset. The symbol “*” indicates our re-implemented results
Comparison of EfficientDet and MRFPN on the PASCAL VOC 2012 dataset. The symbol “*” indicates our re-implemented results
Table 3 shows the comparison of MRFPN with other state-of-the-art object detectors on the PASCAL VOC 2012 dataset. It can be seen From the experimental results that the mAP of MRFPN is 81.8%, which has a higher detection accuracy compared with other classical one-stage models and two-stage models. For example Faster-RCNN, SSD, DSSD etc., these models simply correlate the feature information of different scales. However, MRFPN obtains rich semantic information through multi-scale(modified-BiFPN) fusion and efficient feature processing(CAM).
Comparison of MRFPN with state-of-the-art detectors on the PASCAL VOC 2012 dataset. The symbol “*” indicates our re-implemented results
The detection results of MRFPN and other state-of-the-art object detectors for each category on the PASCAL VOC 2012 dataset are shown in Table 4. It is obvious that the mAP of MRFPN proposed in this paper is higher than other detectors. MRFPN also outperforms other detectors in most categories in terms of detection accuracy. The detection level of aero, bus, cat, dog, horse and train categories is more than 90% among all categories. MRFPN is 10%-20% higher than other detectors in some specific background categories (such as airplane, cow, ship, etc.). MRFPN is also 15% higher than other detector in some small instance categories(such as bottle, potted plant, etc.). It can be illustrated that MRFPN is able to extract more feature information at different scales and also to simply detect small instances and small categories.
Comparison of state-of-the-art detectors on the PASCAL VOC 2012 dataset. The symbol “*” indicates our re-implemented results
The MS COCO dataset is one of the most widely used object detection datasets. MS COCO has a large number of images and up to 80 categories of data. It can well determine the effectiveness and generalization of the model. The MS COCO 2017 data set includes 118k train images, 5k verification images(val images) and 41k test images. MRFPN trains and tests on the MS COCO 2017 dataset.
Ablation study on MS COCO
In order to maintain the validity of the experiment, the CAM and the reconstructed feature pyramid in MRFPN are removed. Separate experiments are conducted on modified-BiFPN (MRFPN) and BiFPN (EfficientDet). The structures of both are shown in Fig. 4. The results are shown in Table 5.
The mAP of MRFPN(D1 and D2) is lower than that of EfficientDet(D1 and D2). This paper believes that this result is obtained because the same output node of each layer in the EfficientDet has both indirect and direct inputs from the same input node. Although this connection way may result in redundant features, it is conducive to understand the input data (including the classification and location of object). Unlike EfficientDet(a single-module reusable structure), MRFPN takes into account the subsequent feature processing and how to reduce redundant features. Based on this consideration, modified-BiFPN connection mode is formed. Therefore, this paper believes that mAP can be maintained at a level close to that of EfficientDet with modified-BiFPN retained and subsequent processes removed, which is enough to illustrate its effectiveness. It can be clearly seen in Table 5 that the FPS of MRFPN (D1 and D2) is higher than that of EfficientDet (D1 and D2). Although EfficientDet has higher accuracy, its speed is reduced. While modified-BiFPN can achieve a better balance between accuracy and speed.
A new evaluation index designed by this paper is added in the experiment, which is called the initial stage processing time (
Experiment results on MS COCO
Effectiveness verification of BiFPN and modified-BiFPN on the MS COCO 2017 dataset. The symbol “*” indicates our re-implemented results
Effectiveness verification of BiFPN and modified-BiFPN on the MS COCO 2017 dataset. The symbol “*” indicates our re-implemented results
This paper compares the performance of EfficientDet(D1 and D2) and MRFPN(D1 and D2) on the MS COCO dataset in Table 6. It can be seen that MRFPN(D1 and D2) far exceeds EfficientDet-D1. Although EfficientDet-D1 has performed well, its ability to express features is still limited. The Channel Attention Module of MRFPN can make up for this. Both the experimental results of MRFPN-D1 and MRFPN-D2 on MS COCO dataset and PASCAL VOC dataset have proved the shortcomings of the high version. More modified-BiFPN not only will affect the detection performance, but also will lead to model breakdown. Although the number of parameters of MRFPN increases by 3.2M compared with EfficientDet-D1, the accuracy increases by 2.2% in Table 6. The number of parameters of MRFPN is much smaller than them when MRFPN is compared with other state-of-the-art object detectors in Table 7. In terms of the number of parameters, MRFPN inherits the lightweight of EfficientDet.
Comparison of EfficientDet and MRFPN on the MS COCO 2017 dataset. The symbol “*” indicates our re-implemented results
Comparison with state-of-the-art detectors on MS COCO 2017 dataset. The symbol “*” indicates our re-implemented results
The performance of MRFPN and other state-of-the-art detectors on the MS COCO dataset are shown in Table 7 in this paper. The detection accuracy of MRFPN is the highest and the number of parameters is much smaller than them. The generalization capacity of the MRFPN model is further proved in this experiment. MRFPN has made innovations in many aspects (in terms of node information transmission, feature follow-up processing, etc.) compared with EfficientDet-D1. MRFPN takes into account more factors, such as feature redundancy, parameters quantity, etc. MRFPN forms different feature fusion method and feature processing method from other detectors, and makes it have better detection performance.
In this paper, a new method called Multi-scale Redistribution Feature Pyramid (MRFPN) is proposed to reconstruct new feature pyramid for detecting objects. MRFPN consists of several new modules. First, the extracted multi-scale features establish preliminary connection in modified-BiFPN. Second, the information of multi-scale features is aggregated together by balanced feature map. Then, CAA is used to distinguish the contribution of useful and useless information in the features. Finally, the information of features with different contributions is combined with the original feature pyramid by residual edges to reconstruct a new feature pyramid for prediction. In the experimental part, the ablation experiment of MRFPN self-module, the comparison experiment with EfficientDet-D1/D2 and the comparison experiment with other state-of-the-art object detectors are carried out on the PASCAL VOC dataset. MRFPN has a higher detection accuracy on the PASCAL VOC dataset and has better class detection accuracy than other detectors for some classes. The ablation experiment of modified-BiFPN and BiFPN, the comparison experiment with EffcientDet-D1/D2 and the comparison experiment with other state-of-the-art object detectors are carried out on the MS COCO dataset. The effectiveness of modified-BiFPN is verified by ablation experiments of modified-BiFPN and BiFPN. The lightweighting of MRFPN is verified by MRFPN parametric quantities. MRFPN also has high detection accuracy compared to other detectors on MS coco dataset. These two benchmark datasets verify that MRFPN has high validity and strong generalization ability.
Footnotes
Acknowledgements
This work was supported by planned project of Shaanxi Provincial Department of science and technology (2019GY-036).
