An efficient fire detection network with enhanced multi-scale feature learning and interference immunity

Abstract

Effective fire detection can identify the source of the fire faster, and reduce the risk of loss of life and property. Existing methods still fail to efficiently improve models’ multi-scale feature learning capabilities, which are significant to the detection of fire targets of various sizes. Besides, these methods often overlook the accumulation of interference information in the network. Therefore, this paper presents an efficient fire detection network with boosted multi-scale feature learning and interference immunity capabilities (MFII-FD). Specifically, a novel EPC-CSP module is designed to enhance backbone’s multi-scale feature learning capability with low computational consumption. Beyond that, a pre-fusion module is leveraged to avoid the accumulation of interference information. Further, we also construct a new fire dataset to make the trained model adaptive to more fire situations. Experimental results demonstrate that, our method obtains a better detection accuracy than all comparative models while achieving a high detection speed for video in fire detection task.

Keywords

Object detection fire detection efficient multi-scale feature learning interference immunity

1 Introduction

Efficient fire detection is crucial for reducing the loss of human life and property caused by fire. In early research on fire detection, researchers [1 –5] focus on employing traditional manual feature extraction in training neural networks for fire detection. Celik et al. [1] proposed a generic color model based on YCbCr color space to separate and detect fire regions, which is more effective than models based on RGB color space in fire detection. Phillips et al. [4] proposed a system using color and motion information to locate fire, which first detects fire pixels by a Gaussian-smoothed color histogram [6] and a temporal variation of pixels. Despite the positive performance of the above methods [1 –5], they are susceptible to the subjective factors of researchers. So that these methods cannot adapt themselves to fire detection tasks in complex situations.

With the rapid development of the deep learning technique, deep neural networks [7 –9] are demonstrated to achieve better performance than traditional machine learning in feature extraction. Ren et al. [10] introduced Faster R-CNN, a precise and generic two-stage approach for object detection. Many researches [11 –15] proposed fire detection algorithms based on this framework [10]. Barmpoutis et al. [11] employed Faster R-CNN to generate candidate fire regions, which are then modelled by linear dynamical systems and classified by a vector representation approach. Zhang et al. [14] presented the MS-FRCNN model based on Faster R-CNN for a better detection performance of small target forest fires. These fire detection researches [11 –15] based on two-stage object detection approaches like Faster R-CNN, despite their high accuracy, have the drawback of slow detection speed. Thus, they fail to satisfy the real-time demand of fire detection task.

In recent years, the You Only Look Once (YOLO) series [16 –23], a kind of single-stage object detection framework, is proposed and sets it apart from the two-stage object detection frameworks like Faster R-CNN by offering real-time detection capability without sacrificing too much accuracy. The popularity of the YOLO series has prompted many researchers looking into utilizing YOLO frameworks for fire detection task.

Many previous researchers [24 –33] often rely on integrating attention modules to the YOLO framework for better attention on fire features. Chen et al. [25] added CA attention [34] and CoT attention [35] modules to the backbone of YOLOv5s [17] for better focus on forest fire. Lin et al. [26] adopted the Transformer module [36] for global feature extraction of forest fires, CA attention [34] for better fusion of fire features. Wu et al. [28] employed SE attention [37] module into YOLOv4-tiny [16]’s neck to improve the precision of ship fire detection. However, these methods [24 –33] based on YOLO framework overlook the importance of multi-scale feature learning capability of model to fire detection tasks.

Besides, some works [26 , 38–40] have chosen to integrate the ASFF [41] module into YOLO framework as a post-fusion module for reducing interference information. They use this method to avoid interference information being sent to the head of the detection network. But, they all overlook the potential accumulation of interference information in the period of the fusion of features.

In this paper, to tackle the aforementioned issues, we present an efficient fire detection network with enhanced multi-scale feature learning and interference immunity capabilities (MFII-FD). In details, we first replace the original CSP module [42] in backbone with the novel presented EPC-CSP module, which can enable the backbone to be adaptively robust to fire targets of different sizes while remaining low computational cost. Then, an ASFF module is adopted as a pre-fusion module to help reduce interference information extracted by the backbone. Moreover, in order to make our methods suitable to more fire situations, we also construct a novel fire dataset. This dataset is categorized into three parts: ‘fire with only flame’, ‘fire with only smoke’ and ‘fire with both flame and smoke’, which makes our trained fire detection network can detect fire and smoke while avoiding human-made noise during labeling.

In brief, this paper mainly makes three contributions:

The novel EPC-CSP module is designed and replacing the original CSP module in the backbone, which improves the multi-scale feature learning capability of the backbone while remaining low computational consumption.

An ASFF module is leveraged as a pre-fusion module to avoid the accumulation of the interference information.

A new fire dataset labelled with three categories is built to make our model adapted to more fire scenarios while avoiding human-made noise in labeling.

The remainder of this this paper is organized as follows. Section 2 gives the details of our constructed fire dataset as well as methods used for data augmentation. Section 3 describes the proposed MFII-FD network for fire detection tasks. Section 4 reports the experimental results and ablation study. Finally, we provide a conclusion of this paper in Section 5.

2 Dataset and data augmentation

2.1 Fire dataset

In recent years, studies [43 –45] have trained detectors by labelling their fire datasets with only one category ‘fire’ (as illustrated in Fig. 1(a)(d)). However, in many actual fire incidents, the surveillance camera may only capture smoke without any visible flames (as displayed in Fig. 1(b)). Consequently, the aforementioned fire detector may erroneously assume that everything is normal and fail to alert the fire department, missing the optimal time to rescue individuals in the fire.

In studies [46 –48], their fire datasets was labelled with two categories: ‘fire’ and ‘smoke’. Although these datasets can cover more fire scenarios, it can be difficult to distinguish the boundaries between the flame and smoke in certain fire scenarios (as depicted in Fig. 1(c)). This will hinder the determination of bounding boxes and create hand-made noise in labeling, further impacting the converge of fire detection networks.

In summary, the fire datasets used in existing methods contain two problems: single and inadequate detection category (e.g. Fire) and potential hand-made noise. Thus, these methods can not applicable to various fire situations. To address these issues, this paper builds the fire dataset labelled with three categories: ‘fire with only flame’ (Fire), ‘fire with only smoke’ (Smoke) and ‘fire with both flame and smoke’ (FireSmoke). The labelled samples in the fire dataset are visible in Fig. 1(d)(e)(f).

As shown in Fig. 1(a)(b)(c), the fire dataset constructed in this paper contains about 5,500 high-definition images of various types of fire scenarios. The dataset’s main sources comprise publicly available datasets like MIVIA [49, 50], NIST [51], BoWFire [52] and FireDataSet [53], along with images crawled from Google [54], Baidu [55] and Bing [56].

Fig. 1

Illustration of the fire dataset.

After labeling the dataset, we divide it into training set, validation set and testing set in the ratio of 8:1:1, as shown in Fig. 2(a). In the dataset, the distribution of each category is balanced, as illustrated in Fig. 2(b).

Fig. 2

Illustration of the fire dataset.

As indicated in Fig. 3, compared with other existing fire datasets, our three-categories fire dataset can provide unambiguous bounding boxes with corresponding categories in a wide range of fire scenarios. Consequently, our fire detection network trained on this dataset will be adaptable to many fire scenarios and converge well.

Fig. 3

Comparison of different fire datasets.

2.2 Data augmentation

In this paper, Mosaic [16] and Mixup [57] techniques are utilized within the process of data augmentation. As demonstrated in Fig. 4(a), Mosaic data augmentation involves splicing four images onto one image as training data after random cropping. This technique can improve the diversity of the training data, in other words, increase the training batch size, which can save VRAM. In Fig. 4(b), Mixup data augmentation randomly mixes and superimposes training images with other images, thereby extending the training dataset. In this paper, besides the typical random rotation, cropping, and hue data augmentation, Mosaic and Mixup data augmentation techniques are included with a probability of 50%. The training data with augmentation is displayed in Fig. 4(c).

Fig. 4

Illustration of data augmentation.

3 Experimental model

3.1 The proposed fire detection network, MFII-FD

As presented in Fig. 5, this paper introduces MFII-FD, a fire detection network with enhanced multi-scale feature learning and interference immunity capabilities. This network is based on the YOLOv5s network and has two enhancements: (i) We first design the novel EPC-CSP module with Elastic [58], Partial Convolution (PConv) [59] and Cascade Fusion Network (CFNet) [60]. Then, we replace all the CSP modules with our proposed EPC-CSP modules in the backbone. (ii) We adopt the ASFF module as a pre-fusion module and place it between the backbone and the neck of the detection network.

Fig. 5

The architecture diagram of MFII-FD.

In the following Subsection (3.2,3.3,3.4), we will describe the design process of the EPC-CSP module. In Subsection 3.5, the pre-fusion module will be described.

3.2 Using the elastic module to efficiently learn feature information at multiple scales

Each CSP module of YOLOv5s only learns a single scale of feature information, which hinders the network’s ability to handle fire detection tasks that contain targets of varying sizes. Wang et al. [58] introduced the Elastic module, which adds extra branches into each residual block of the network, enabling the network to efficiently learn multiple different scales of feature information from the images. In the Elastic branch, the input will first get 2× downsampling, and followed by operations like convolution, normalization, activation, and so on. Finally, the output of the previous step will get 2× upsampling and summed with the remaining branches to obtain the final output. The network with Elastic module can flexibly adjust weights at original branches and Elastic branches in response to different instances. For example, the original residual branches will be favored for large targets, while the Elastic branches will be favored for small targets.

The process of Elastic module can be expressed as Equation (1)colon

$F (x) = σ (x + \sum_{i = 1}^{e} U_{ri} (τ_{i} (D_{ri} (x))) + \sum_{i = 1}^{c} τ_{i} (x))$ (1)

e and c are the count of Elastic branches and original residual branches respectively. U_ri (x) and D_ri (x) represent the upsampling and downsampling respectively. τ_i (x) denotes any combination of functions like convolution, normalization, activation and so on. σ is a non-linear activation function.

As shown in Fig. 6, in this paper, we first add an extra Elastic branch to the BottleNeck in the backbone’s CSP module, enabling the network to learn feature information with different scales within each CSP module. This helps improve the detection of fire targets of different sizes.

Fig. 6

Schematic of BottleNeck adding elastic.

3.3 Using PConv module to reduce network’s computational cost

Since this paper introduces an Elastic branch to BottleNeck of CSP module of YOLOv5s, the floating point operations (FLOPs) of the network will significantly increase if we use regular convolution in the Elastic branch, leading to a considerable decrease in network throughput. Therefore, we need to seek lightweight and efficient convolution operator.

In current research, Depthwise convolution [61] (DWConv) is the most commonly used lightweight convolution operator. However, despite achieving low FLOPs compared to regular convolution, numerous experiments [62 –64] have demonstrated that DWConv often suffers from high memory accesses (MAC). Eventually, DWConv provides a limited improvement in computational efficiency compared with regular convolution.

Recently, Chen et al. [59] proposed a new convolution operator: PConv. Its purpose is to address the inefficiency issue of DWConv. Chen et al. [59] stated that deep neural networks often contain redundant and repetitive feature maps within their channels. Thus, they proposed PConv to perform the convolution only on a part of the input channels (e.g. $\frac{1}{4}$ ), while leaving the remaining channels untouched. This method can substantially reduce FLOPs while decreasing MAC to maintain high FLOPS, leading to significant improvement when compared to DWConv.

Meanwhile, Chen et al. [59] also proposed a FasterNet module based on PConv. Specifically, as is shown in Fig. 7, each FasterNet module has a PConv layer followed by two 1 × 1 convolutions. Two 1 × 1 convolutions will expand and recover the number of channels to fully and efficiently leverage the information from all channels.

Fig. 7

Architecture of PConv and FasterNet.

As depicted in Fig. 8, following with modification of network in the last Subsection, we further replace the convolution operators with FasterNet module in the Elastic branch. This substitution allows the network to learn feature information at multiple scales while maintaining low computational expense.

Fig. 8

Schematic of BottleNeck adding FasterNet module.

3.4 Using cascade fusion network architecture for better learning and extraction of multi-scale feature information

In YOLOv5s, multi-scale feature fusion is mainly done by neck’s PANet [59]. However, in a recent study, Zhang et al. [60] argued that it may not be sufficient to fuse the multi-scale features only depending on the neck’s fusion module. Thus, they presented CFNet, an architecture designed to enhance the multi-scale feature learning and fusion capabilities for the backbone network.

The main process of CFNet is as follows. For the input I ∈ R^C×H×W, 2× downsampling and operators (e.g. convolution, normalization, activation et al.) are performed twice to obtain feature $I^{'} \in R^{\frac{C}{2} \times \frac{H}{2} \times \frac{W}{2}}$ and $I^{″} \in R^{\frac{C}{4} \times \frac{H}{4} \times \frac{W}{4}}$ sequentially. Then, starting with I″, a 2× upsampling and sum with previous level feature are performed twice to get the final output.

As shown in Fig. 9, following with the last two Subsections’ modification, we adopt CFNet’s architecture in the Elastic branch, forming the novel EPC-BottleNeck as well as EPC-CSP module. This enables each stage of the backbone network to learn and fuse up to three different scales of feature information, resulting in improved recognition of fire targets of varying sizes.

Fig. 9

Architecture of EPC-BottleNeck.

3.5 Using ASFF module for reducing interference

Liu et al. [41] proposed the ASFF module. In this module, three different scale output features of YOLO can be adaptively fused with each other. This helps each feature obtain meaningful information and reduce interference information.

Let’s denote Level-1, Level-2 and Level-3 are the feature map outputs of the YOLOv5s at three different scales, respectively. As the number of channels and spatial sizes vary among diferent Levels, a 1 × 1 convolution is applied to allign the number of channels, followed by a 2/4× up/downsampling to align the spatial sizes. Let $x_{ij}^{n \to l}$ be the feature map that changes the size of the feature map Level-n to that of Level-l, the process of obtaining the new Level-l feature map $y_{ij}^{l}$ after ASFF module is shown in Equation (2):

$y_{ij}^{l} {= α}_{ij}^{l} {* x}_{ij}^{1 \to l} {+ β}_{ij}^{l} {* x}_{ij}^{2 \to l} {+ γ}_{ij}^{l} {* x}_{ij}^{3 \to l}$ (2)

$α_{ij}^{l}$ , $β_{ij}^{l}$ and $γ_{ij}^{l}$ are the weights of Level-1, Level-2 and Level-3 for Level-l, respectively. They are made to satisfy $α_{ij}^{l} + β_{ij}^{l} + γ_{ij}^{l} = 1 (α_{ij}^{l}, β_{ij}^{l}, γ_{ij}^{l} \in [0, 1])$ by Softmax Equation (3). $α_{ij}^{l} = \frac{e^{λ_{α_{ij}}^{l}}}{e^{λ_{α_{ij}}^{l}} + e^{λ_{β_{ij}}^{l}} + e^{λ_{γ_{ij}}^{l}}}$ (3)

$λ_{α_{ij}}^{l}$ , $λ_{β_{ij}}^{l}$ and $λ_{γ_{ij}}^{l}$ are the weight scalar maps of $x_{ij}^{1 \to l}$ , $x_{ij}^{2 \to l}$ and $x_{ij}^{3 \to l}$ obtained by the convolution respectively. Finally, all 3 Level feature maps are fused in the way of Equation (2) to output three new feature maps with the same size of each.

In recent years, fire detection studies [26,38,39,40 , 26,38,39,40] have chosen to place the ASFF module between the neck and the head of YOLO network for reducing the interference information. However, they overlook a potential issue, that is interference information may be extracted by the backbone and accumulated in the neck. The accumulated interference information may become difficult to be reduced. Thus, we suggest that the ASFF module be placed between the backbone and the neck of the detection network to avoid the accumulation of interference information in the neck’s fusion network. This helps maximize the fusion capability of neck and improve the detection of fire targets. The experimental results in Table 1 prove our suggestion. Therefore, as shown in Fig. 10, this paper places the ASFF module between the backbone and the neck of the detection network.

Table 1

Comparison of ASFF module’s placement

Method	Params(M)	FLOPs(G)	mAP@50(%)
YOLOv5s	7.069	16.494	92.16
YOLOv5s+ASSF
(Neck-Head)	9.374	19.196	92.18
YOLOv5s+ASSF
(Backbone-Neck)	9.374	19.196	94.30

Fig. 10

Schematic of adding ASFF to the detection network.

4 Experimental results and analysis

4.1 Implementation details

In this paper, the hardware used in all experiments is Intel i7 11700 CPU, 32 GB of RAM, and NVIDIA GeForce RTX3090 GPU with 24 GB VRAM. Environment is Ubuntu 22.04 64bit, Pytorch 1.13, Python 3.9.13 and CUDA 11.7.

The input size of this paper is 640*640, the batch size is 24, the optimiser is Stochastic Gradient Descent (SGD), the initial learning rate is 0.002, the momentum is 0.937, the weight decay is 0.0005, cosine annealing is used for the learning rate decay, and the epoch for training is 300. Note that all experiment models were trained and tested in the fire dataset built in this paper, using the same configuration.

4.2 Evaluation metrics

In this paper, the effectiveness of the fire detection network is evaluated by the mean average precision (mAP), and the detection speed of the network is assessed by the frames per second (FPS).

The mAP value is calculated as shown in Equation (4), which is obtained by averaging the average precision (AP) values under all categories. The FPS, which evaluates the network’s detection speed, is the reciprocal of the inference latency (Latency) of the fire detection network, as shown in Equation (5).

$mAP = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}$ (4)

$FPS = \frac{1}{Latency}$ (5)

$IoU = \frac{Pred \cap GT}{Pred \cup GT}$ (6)

Note that in this paper, the metric of the fire detection network’s effectiveness uses the mAP value when the intersection over union (IoU) threshold is 0.5: mAP@50. Here, the IoU, which measures the ratio of overlap between the predicted boxes (Pred) and the ground true boxes (GT), as shown in Equation (6).

4.3 Ablation study

4.3.1 Ablation study on modifications of EPC-CSP

Before investigating the effectiveness of modules made in this paper based on YOLOv5s, we first verify the validity concerning the proposed EPC-CSP module. We examined each modification in the CSP module independently. When the Elastic is integrated in the CSP module solely, the model’s mAP@50 increases by 0.74% compared to model with original CSP module. However, due to the inefficient regular convolution, the model’s FPS have a notably decrease in FPS by 28. When the PConv is used in the CSP module with Elastic, the model achieves mainly the same mAP@50 as the model mentioned above, but with a higher FPS. Surprisingly, when the Elastic is added in the CSP module only with the CFNet, there is a significant decrease in both mAP@50 and FPS of the model compared with the original model. When the Elastic, PConv and CFNet are integrated in the CSP module, forming the proposed EPC-CSP module, the model achieves the best mAP@50 among all modifications in the CSP module with a slightly decrease on FPS compared with the original model. This indicates that the EPC-CSP module can boost network’s multi-scale feature learning capability with low computational cost. The experimental results on EPC-CSP module are displayed in Table 2.

Table 2
Results of ablation experiments on EPC-CSP

Module Params FLOPs mAP@50 FPS

(M) (G) (%)

ELASTIC PConv CFNet

✕ ✕ ✕ 7.069 16.494 92.16 121

✓ ✕ ✕ 8.223 17.505 92.90 93

✓ ✕ ✓ 8.223 17.758 86.97 90

✓ ✓ ✕ 7.656 17.045 92.82 108

✓ ✓ ✓ 7.656 17.183 93.58 101

Module	Params	FLOPs	mAP@50	FPS
✕	✕	✕	7.069	16.494	92.16	121
✓	✕	✕	8.223	17.505	92.90	93
✓	✕	✓	8.223	17.758	86.97	90
✓	✓	✕	7.656	17.045	92.82	108
✓	✓	✓	7.656	17.183	93.58	101

4.3.2 Ablation study on modules of MFII-FD

In this ablation study, to further investigate the effectiveness of the modules made in this paper based on YOLOv5s, we will perform experiments on each of the proposed modules and their combinations.

As shown in Table 3, when the EPC-CSP and ASFF module are respectively integrated into the detection model, the model’s mAP@50 increases by 1.42% and 1.64%. Then, when the above-mentioned modules are added to the model together, the model’s mAP@50 increases by 3.41%, achieving an additional improvement of about 0.40% compared to total improvement brought by these modules. This indicates that boosted multi-scale feature learning and interference immunity capabilities achieved by EPC-CSP module and ASFF module can improve the detection performance of fire detection tasks respectively. And the combination of these modules can bring additional improvement of detection performance.

Table 3
Results of ablation experiments

Module Params(M) FLOPs(G) mAP@50(%) FPS

EPC-CSP ASFF

✕ ✕ 7.069 16.494 92.16 121

✓ ✕ 7.656 17.183 93.58 101

✕ ✓ 9.374 19.196 93.8 97

✓ ✓ 9.961 19.886 95.57 86

Module	Params(M)	FLOPs(G)	mAP@50(%)	FPS
✕	✕	7.069	16.494	92.16	121
✓	✕	7.656	17.183	93.58	101
✕	✓	9.374	19.196	93.8	97
✓	✓	9.961	19.886	95.57	86

4.4 Comparison experiments

In this subsection, we evaluate the performance of the proposed MFII-FD fire detection network against several commonly used network such as YOLOv5s (baseline), YOLOv5m, YOLOv6s, YOLOv7tiny, YOLOXs and FasterRCNN-ResNet50.

As shown in Table 4, the experimental results indicate that the proposed MFII-FD fire detection network outperforms all compared models in mAP@50. In detail, the proposed network shows a notably increase of 3.41% /3.29% in mAP@50 compared to YOLOv5s baseline and YOLOv7tiny network, respectively, while remaining a relatively high FPS. When compared to YOLOXs, our proposed model achieves an improvement of 0.75% in mAP@50, while offering comparable FPS. Moreover, MFII-FD even gets an improvement of 0.89% /1.27% /1.95% in mAP@50 when compared to YOLOv5m, YOLOv6s and FasterRCNN-ResNet50, respectively, while achieving operating at a faster speed of 31/26/75 FPS. These three models are more than double the proposed network’s size. These results clearly show that our proposed MFII-FD fire detection network can have high detection performance while remaining low computational cost.

Table 4
Comparison with commonly used network

Method Params FLOPs mAP@50 FPS

(M) (G) (%)

YOLOv5s(baseline) 7.069 16.494 92.16 121

YOLOv5m 21.064 50.624 94.68 55

FasterRCNN-ResNet50 28.296 57.203 93.62 11

YOLOv6s 18.5 45.17 94.30 60

YOLOv7tiny 6.019 13.198 92.28 127

YOLOXs 8.938 26.761 94.82 95

MFII-FD(ours) 9.961 19.886 95.57 86

Method	Params	FLOPs	mAP@50	FPS
YOLOv5s(baseline)	7.069	16.494	92.16	121
YOLOv5m	21.064	50.624	94.68	55
FasterRCNN-ResNet50	28.296	57.203	93.62	11
YOLOv6s	18.5	45.17	94.30	60
YOLOv7tiny	6.019	13.198	92.28	127
YOLOXs	8.938	26.761	94.82	95
MFII-FD(ours)	9.961	19.886	95.57	86

Furthermore, we believe that a simple numerical comparison does not prove the superiority of our network. Several visual comparative analyses of the detection results of our proposed MFII-FD fire detection network and the baseline YOLOv5s network are as follows.

As shown in Fig. 11, the baseline network YOLOv5s incorrectly detects the white willow flakes as smoke and detects the firefighter’s yellow shoes and black trousers as a mixture of fire and smoke. That’s because these interference information are extracted by the backbone wrongly and accumulated in the neck’s fusion network, leading to incorrect detection. In contrast, the ASFF module integrated in our proposed MFII-FD detection network can reduce interference information of the backbone’s output features by adaptively fusing them. In this way, the detection network can avoid the accumulation of interference information in neck and provide correct as well as precise detection. Consequently, the proposed network can accurately locate the flame with the complex background.

Fig. 11

Visual comparison of anti-interference capability.

As illustrated in Fig. 12, the baseline network YOLOv5s cannot fully extract the global information of the smoke target. Only half of the smoke target is detected. In comparison, our proposed MFII-FD network, which is integrated with EPC-CSP module, has enhanced multi-scale feature learning capability. Therefore, the proposed network can fully detect the whole smoke target.

Fig. 12

Visual comparison of global information localization capability.

In Fig. 13, the baseline network YOLOv5s misses the small smoke target on the right side of the image. That’s because the multi-scale feature learning capability of the baseline network is not enough to detect fire targets of various sizes. On the contrary, our proposed MFII-FD network, owning the boosted multi-scale feature capability, can detect fire targets of various sizes effectively. Thus, the proposed network can not only detect the normal size smoke target, but also detect the small smoke target.

Fig. 13

Visual comparison of missed detection.

5 Conclusion

In this paper, we proposed an efficient fire detection network with boosted multi-scale feature learning and interference immunity capabilities based on YOLOv5s: MFII-FD. In detail, we first proposed the novel EPC-CSP module in the backbone to boost network’s multi-scale feature learning capability with low computational cost. Then, an ASFF module was adopted as a pre-fusion module to reduce and avoid the accumulation of interference information. Notably, we also constructed a new fire dataset with three categories to cover more fire scenarios with less human-made noise in labeling. Extensive experiments demonstrated the effectiveness as well as efficiency of our proposed fire detection network.

Despite effective improvements, we acknowledge that the MFII-FD fire detection network still has room for improvement in terms of FLOPs and FPS. This requires further research and fine-tuning. We will focus on further reducing model’s FLOPs and FPS while remaining high detection accuracy to improve model’s performance. Furthermore, we will expand our fire dataset to include fire objects in more conditions. These improvements improve the applicability and utility of our proposed model.

We hope that our proposed MFII-FD can be integrated in a practical surveillance system to provide timely fire detection and warnings. In this way, we can reduce the loss of human life and their property.

Footnotes

Acknowledgment

This work was supported by the Shenzhen Science and Technology Innovation Commission (Grant No. KCXFZ20211020163402004).

References

Celik

and Demirel

, Fire detection in video sequences using a generic color model, Fire Safety Journal 44(2) (2009), 147–158.

Chen

T.-H.

, Wu

P.-H.

, Chiou

Y.-C.

,An early firedetection method based on image processing, in 2004 International Conference on Image Processing, 2004. ICIP’04 3 1707–1710, IEEE, 2004.

Kong

S.G.

, Jin

, Li

and Kim

, Fast fire flame detection in surveillance video using logistic regression and temporal smoothing, Fire Safety Journal 79 (2016), 37–43.

Phillips Iii

, Shah

and da Vitoria Lobo

, Flame recognition in video, Pattern Recognition Letters 23(1-3) (2002), 319–327.

Töreyin

B.U.

, Dedeoglu

, Güdükbay

and Cetin

A.E.

, Computer vision based method for real-time fire and flame detection, Pattern Recognition Letters 27(1) (2006), 49–58.

Kjeldsen

, Kender

,Finding skin in color images, in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, pp. 312–317, IEEE, 1996.

, Zhang

, Ren

, Sun

,Deep residual learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.

Krizhevsky

, Sutskever

and Hinton

G.E.

, Imagenet classification with deep convolutional neural networks, Communications of the ACM 60(6) (2017), 84–90.

Szegedy

, Liu

, Jia

, Sermanet

, Reed

, Anguelov

, Erhan

, Vanhoucke

, Rabinovich

,Going deeper with convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.

10.

Ren

, He

, Girshick

and Sun

, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems 28 (2015).

11.

Barmpoutis

, Dimitropoulos

, Kaza

, Grammalidis

,Fire detection from images using faster r-cnn and multidimensional texture analysis, in ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8301–8305, IEEE, 2019.

12.

Sucuoglu

H.S.

, Böğrekci;

and Demircioğlu

, Real time fire detection using faster r-cnn model, International Journal of 3D Printing Technologies and Digital Industry 3(3) (2019), 220–226.

13.

and Zhang

, Using popular object detection methods for real time forest fire detection, in 2018 11th International symposium on computational intelligence and design (ISCID) 1, 280–284, IEEE, 2018.

14.

Zhang

, Wang

, Ding

and Bu

, Ms-frcnn: A multi-scale faster rcnn model for small target forest fire detection, Forests 14(3) (2023), 616.

15.

Zhang

Q.-X.

, Lin

G.-H.

, Zhang

Y.-M.

, Xu

and Wang

J.-J.

, Wildland forest fire smoke detection based on faster r-cnn using synthetic smoke images, Procedia Engineering 211 (2018), 441–446.

16.

Bochkovskiy

, Wang

C.-Y.

, Liao

H.-Y.M.

,Yolov4: Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934, 2020.

17.

Jocher

, Stoken

, Borovec

, Changyu

, Hogan

, Diaconu

, Poznanski

, Yu

, Rai

, Ferriday

, et al. ultralytics/yolov5: v3. 0, Zenodo, 2020.

18.

Redmon

, Farhadi

, Yolo9000: better, faster, stronger, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271, 2017.

19.

Redmon

, Farhadi

, Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767, 2018.

20.

Redmon

, Divvala

, Girshick

, Farhadi

, You only look once: Unified, real-time object detection, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.

21.

, Liu

, Wang

, Li

, Sun

, Yolox: Exceeding yolo series in 2021, arXiv preprint arXiv:2107.08430, 2021.

22.

, Li

, Jiang

, Weng

, Geng

, Li

, Ke

, Li

, Cheng

, Nie

, et al. Yolov6: A singlestage object detection framework for industrial applications, arXiv preprint arXiv:2209.02976, 2022.

23.

Wang

C.-Y.

, Bochkovskiy

, Liao

H.-Y.M.

, Yolov7: Trainable bag-of-freebies sets new state-of-the-art for realtime object detectors, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475, 2023.

24.

Cao

, Su

, Geng

and Wang

, Yolo-sf: Yolo for fire segmentation detection, IEEE Access 11 (2023), 111079–111092.

25.

Chen

, Zhou

, Li

, Gao

, Bai

, Xu

and Lin

, Multi-scale forest fire recognition model based on improved yolov5s, Forests 14(2) (2023), 315.

26.

Lin

, Lin

and Wang

, A semi-supervised method for real-time forest fire detection algorithm based on adaptively spatial feature fusion, Forests 14(2) (2023), 361.

27.

Wang

, Hua

, Ding

and Wu

, Real-time detection of flame and smoke using an improved yolov4 network, Signal, Image and Video Processing 16(4) (2022), 1109–1116.

28.

, Hu

, Wang

, Mei

and Xian

, Ship fire detection based on an improved yolo algorithm with a lightweight convolutional neural network model, Sensors 22(19) (2022), 7420.

29.

Xia

, Yu

, Wang

, Hong

, A high-precision lightweight smoke detection model based on se attention mechanism, in 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE), pp. 941–944, IEEE, 2022.

30.

, Xu

, Xing

and Liu

, Yolo-f: Yolo for flame detection, International Journal of Pattern Recognition and Artificial Intelligence 37(01) (2023), 2250043.

31.

Yin

, Chen

, Fan

, Jin

, Hassan

S.G.

and Liu

, Efficient smoke detection based on yolo v5s, Mathematics 10(19) (2022), 3493.

32.

Zhang

, Wang

, Chen

, Peng

, Gao

, Zhou

, An improved yolov3 algorithm combined with attention mechanism for flame and smoke detection, in Artificial Intelligence and Security: 7th International Conference, ICAIS 2021, Dublin, Ireland, July 19–23, 2021, Proceedings, Part I 7, pp. 226–238, Springer, 2021.

33.

Zhang

, Qian

, Jing

, Yang

, Yu

, Fire detection based on convolutional neural networks with channel attention, in 2020 Chinese Automation Congress (CAC), pp. 3080–3085, 2020.

34.

Hou

, Zhou

, Feng

, Coordinate attention for efficient mobile network design, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13713–13722, 2021.

35.

, Yao

, Pan

and Mei

, Contextual transformer networks for visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 45(2) (2022), 1489–1500.

36.

Vaswani

, Shazeer

, Parmar

, Uszkoreit

, Jones

, Gomez

A.N.

, Kaiser

Ł.

and Polosukhin

, Attention is all you need, Advances in neural information processing systems 30 (2017).

37.

, Shen

, Sun

, Squeeze-and-excitation networks, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018.

38.

Wang

, Han

, Yu

, Wang

, Li

and Yu

, Research on black smoke detection and class evaluation method for ships based on yolov5s-cmbi multi-feature fusion, Journal of Marine Science and Engineering 11(10) (2023), 1945.

39.

Xiao

, Wan

, Lei

, Xiong

, Xu

, Ye

, Liu

, Zhou

and Xu

, Fl-yolov7: A lightweight small object detection algorithm in forest fire detection, Forests 14(9) (2023), 1812.

40.

Yan

, Wang

, Zhao

, Zhang

,Yolov5-csf: an improved deep convolutional neural network for flame detection, Soft Computing, pp. 1–11, 2023.

41.

Liu

, Huang

, Wang

, Learning spatial fusion for single-shot object detection, arXiv preprint arXiv:1911.09516, 2019.

42.

Wang

C.-Y.

, Liao

H.-Y.M.

, Wu

Y.-H.

, Chen

P.-Y.

, Hsieh

J.-W.

, Yeh

I.-H.

, Cspnet: A new backbone that can enhance learning capability of cnn, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 390–391, 2020.

43.

Abdusalomov

, Baratov

, Kutlimuratov

and Whangbo

T.K.

, An improvement of the fire detection and classification method using yolov3 for surveillance systems, Sensors 21(19) (2021), 6519.

44.

Park

and Ko

B.C.

, Two-step real-time night-time fire detection in an urban environment using static elastic-yolov3 and temporal fire-tube, Sensors 20(8) (2020), 2202.

45.

Xue

, Lin

and Wang

, A small target forest fire detection model based on yolov5 improvement, Forests 13(8) (2022), 1332.

46.

Jiao

, Zhang

, Xin

, Mu

, Yi

, Liu

, A deep learning based forest fire detection approach using uav and yolov3, in 2019 1st International conference on industrial artificial intelligence (IAI), pp. 1–5, IEEE, 2019.

47.

Kim

and Lee

, A video-based fire detection using deep learning models, Applied Sciences 9(14) (2019), 2862.

48.

and Zhao

, Image fire detection algorithms based on convolutional neural networks, Case Studies in Thermal Engineering 19 (2020), 100625.

49.

Di Lascio

, Greco

, Saggese

, Vento

, Improving fire detection reliability by a combination of videoanalytics, in Image Analysis and Recognition: 11th International Conference, ICIAR 2014, Vilamoura, Portugal, October 22–24, 2014, Proceedings, Part I 11, pp. 477–484, Springer, 2014.

50.

Foggia

, Saggese

and Vento

, Real-time fire detection for video-surveillance applications using a combination of experts based on color, shape, and motion, IEEE TRANSACTIONS on circuits and systems for video technology 25(9) (2015), 1545–1556.

51.

Bryant

R.A.

, Bundy

M.F.

, The nist 20 mw calorimetry measurement system for large-fire research, 2019.

52.

Chino

D.Y.

, Avalhais

L.P.

, Rodrigues

J.F.

, Traina

A.J.

, Bowfire: detection of fire in still images by integrating pixel color and texture analysis, in 2015 28th SIBGRAPI conference on graphics, patterns and images, pp. 95–102, IEEE, 2015.

53.

, FireDataset. http://www.yongxu.org/databases.html

54.

https://www.google.com/imghp?hl=zh-CN&ogbl

55.

https://image.baidu.com

56.

https://www.bing.com/images/feed?form=Z9LH

57.

Zhang

, Cisse

, Dauphin

Y.N.

, Lopez-Paz

, mixup: Beyond empirical risk minimization, arXiv preprint arXiv:1710.09412, 2017.

58.

Wang

, Kembhavi

, Farhadi

, Yuille

A.L.

, Rastegari

, Elastic: Improving cnns with dynamic scaling policies, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2258–2267, 2019.

59.

Chen

, Kao

S.-h.

, He

, Zhuo

, Wen

, Lee

C.-H.

, Chan

S.-H.G.

, Run, don’t walk: Chasing higher flops for faster neural networks, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12021–12031, 2023.

60.

Zhang

, Li

, Hu

, Cfnet: Cascade fusion network for dense prediction, arXiv preprint arXiv:2302.06052, 2023.

61.

Sifre

, Mallat

, Rigid-motion scattering for texture classification, arXiv preprint arXiv:1403.1687, 2014.

62.

, Zhang

, Zheng

H.-T.

, Sun

, Shufflenet v2: Practical guidelines for efficient cnn architecture design, in Proceedings of the European conference on computer vision (ECCV), pp. 116–131, 2018.

63.

Mehta

, Rastegari

, Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer, arXiv preprint arXiv:2110.02178, 2021.

64.

Sandler

, Howard

, Zhu

, Zhmoginov

, Chen

L.-C.

, Mobilenetv2: Inverted residuals and linear bottlenecks, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520, 2018.

65.

Liu

, Qi

, Qin

, Shi

, Jia

, Path aggregation network for instance segmentation, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8759–8768, 2018.

An efficient fire detection network with enhanced multi-scale feature learning and interference immunity

Abstract

Keywords

1 Introduction

2 Dataset and data augmentation

2.1 Fire dataset

3.1 The proposed fire detection network, MFII-FD

4.1 Implementation details

4.2 Evaluation metrics

4.3.1 Ablation study on modifications of EPC-CSP

Table 2 Results of ablation experiments on EPC-CSP Module Params FLOPs mAP@50 FPS (M) (G) (%) ELASTIC PConv CFNet ✕ ✕ ✕ 7.069 16.494 92.16 121 ✓ ✕ ✕ 8.223 17.505 92.90 93 ✓ ✕ ✓ 8.223 17.758 86.97 90 ✓ ✓ ✕ 7.656 17.045 92.82 108 ✓ ✓ ✓ 7.656 17.183 93.58 101

Table 3 Results of ablation experiments Module Params(M) FLOPs(G) mAP@50(%) FPS EPC-CSP ASFF ✕ ✕ 7.069 16.494 92.16 121 ✓ ✕ 7.656 17.183 93.58 101 ✕ ✓ 9.374 19.196 93.8 97 ✓ ✓ 9.961 19.886 95.57 86

Footnotes

Acknowledgment

References

Table 2
Results of ablation experiments on EPC-CSP

Module Params FLOPs mAP@50 FPS

(M) (G) (%)

ELASTIC PConv CFNet

✕ ✕ ✕ 7.069 16.494 92.16 121

✓ ✕ ✕ 8.223 17.505 92.90 93

✓ ✕ ✓ 8.223 17.758 86.97 90

✓ ✓ ✕ 7.656 17.045 92.82 108

✓ ✓ ✓ 7.656 17.183 93.58 101

Table 3
Results of ablation experiments

Module Params(M) FLOPs(G) mAP@50(%) FPS

EPC-CSP ASFF

✕ ✕ 7.069 16.494 92.16 121

✓ ✕ 7.656 17.183 93.58 101

✕ ✓ 9.374 19.196 93.8 97

✓ ✓ 9.961 19.886 95.57 86