Abstract
Low-altitude safety is key to the sustainable development of the low-altitude economy. Drone swarms pose greater risks than individual unmanned aerial vehicles due to their scale and coordination. This paper proposes a situation awareness strategy for the defense of drone swarm. Multi-scale drone target detection is achieved through an anchor-free structure, drone swarm formation recognition is realized by Graph Neural Networks, and the situation of drone swarm is calculated by constructing macroscopic quantitative descriptors. It breaks through the feature extraction and fusion algorithm for multi-scale drones, graph neural networks for intra-layer and inter-layer feature extraction, and macroscopic quantitative descriptors based on divergence and curl to construct scale-invariant and rotation-invariant features. It achieves the detection of whether it is a drone swarm, the identification of which drone swarm it is, and the calculation of the degree of the drone swarm, providing a basis for the classification and graded handling of drone swarm, and effectively promoting the modernization of the low-altitude safety governance capacity.
Introduction
As a representative of both strategic emerging industries and a new form of productive force, the low-altitude economy has evolved beyond mere flight activities or the manufacturing of aerial vehicles. It increasingly reflects a transformative force reshaping human modes of production and everyday life. Wherever human activity expands, security demands follow. The development of the low-altitude economy must begin with addressing low-altitude security issues. Low-altitude security is not only a critical component of the low-altitude economic industrial chain but also a fundamental prerequisite and safeguard for its healthy and sustainable development. 1 Compared to medium- and high-altitude airspace, the low-altitude domain is characterized by a large number of non-cooperative targets and an inherent demand for high-density, high-frequency, diversified, and intelligent flight operations. Increased flight density inevitably heightens the risks associated with low-altitude security. At present, drones constitute the largest number of aircraft operating in low-altitude airspace, with the most frequent flights and the highest associated safety risks. Drone swarms—comprising numerous low-cost unmanned aerial vehicle (UAV)—carry out complex missions through collaborative sensing, information sharing, and coordinated task allocation, exhibiting a high degree of intelligence and autonomy.2,3 Drone swarms function as cohesive collectives; their behavior cannot be reduced to a simple gathering of drones or a linear summation of individual drone actions. The security threats posed by drone swarms in low-altitude airspace far exceed those of individual drones. Failure to detect, identify, or respond to drone swarms in a timely and accurate manner may lead to significant societal disruptions or even security incidents. Traditional electro-optical systems, which are limited to functions such as imaging, storage, and playback, lack situational awareness capabilities. To effectively detect and interpret drone swarm activity, current systems require human operators to analyze video feeds in real time, a process that is prone to fatigue-induced errors and omissions. Integrating intelligent components into video and image processing systems to enable machine-assisted rapid comprehension of drone swarm behavior is therefore essential. The key lies in overcoming core technical challenges in target detection, formation recognition, and situation assessment of drone swarms.
Compared with the extensive application of machine learning in areas such as face detection, 4 vehicle detection, 5 and behavior recognition, 6 research on situational awareness for drone swarms remains relatively underdeveloped. Situational awareness in this context comprises three progressive levels: target detection, formation recognition, and situation assessment—each increasing in complexity and technical difficulty. Traditional target detection methods rely on handcrafted features such as Haar-like features (HAAR), 7 Scale-Invariant Feature Transform (SIFT), 8 Histogram of Oriented Gradients (HOG), 9 and Speeded-Up Robust Features (SURF). 10 However, these methods often lack robustness when applied to multi-scale drone targets, resulting in limited detection speed and accuracy. In contrast, deep learning-based object detection approaches demonstrate advantages in both feature extraction and detection efficiency. These models can be broadly categorized into anchor-based networks11–13 and anchor-free networks,14–18 depending on whether predefined bounding boxes (anchors) are used. Anchor-free models detect objects based on keypoints or reference anchors distributed across the image. In the field of drone swarm formation recognition, two main strategies are commonly employed: template matching and neural network-based recognition. Template matching relies on internal orientation features of individual drones; however, its performance degrades significantly when drone formations become overly dispersed, as the similarity between observed patterns and stored templates diminishes. Alternatively, neural networks such as General Regression Neural Networks (GRNN) and Probabilistic Neural Networks (PNN) have been employed to recognize typical formations such as line-abreast, line-ahead, and V-shaped formations.19,20 Deep learning algorithms are capable of qualitatively characterizing swarm behavior at a macroscopic level. Nonetheless, their core models often require large amounts of training data, limiting their ability to quantitatively model swarm behavior at a microscopic scale. On the other hand, handcrafted feature descriptor languages can offer fine-grained, quantitative representations of drone swarm behavior, but they are typically task-specific and suffer from poor robustness, scalability, and transferability. However, existing studies mostly focus on single UAV detection or formation recognition, lacking an integrated approach covering the full situational awareness pipeline. To address the limitations of existing approaches, this study proposes key technologies for situational awareness aimed at drone swarm prevention and control. The proposed system not only detects whether a drone swarm is present but also identifies the specific type of formation and assesses the degree of swarm cohesion. In this paper, we propose an innovative framework that comprehensively addresses all three layers of drone swarm situation awareness: target detection, formation recognition, and behavior analysis. The proposed framework aims to provide a comprehensive technical solution for drone swarm security monitoring; by achieving multi-level situational awareness, it is expected to promote further development in related fields. This provides a foundation for law enforcement agencies to implement classification-based and tiered response strategies for drone swarm incidents.
Situational awareness strategy
To address the inconsistency of feature information across multiple scales in drone targets, this study proposes an adaptive multi-scale feature extraction network. By computing the weights between features of different scales in the spatial dimension, the model fuses multi-scale information to enhance its ability to detect drones of varying sizes. To mitigate the risk of losing critical features from micro-sized drones, a channel-local attention mechanism is introduced. A channel-local attention feature fusion module is designed to effectively integrate shallow and deep features, improving the utilization of feature information from micro drones. This prevents feature loss and enhances the model’s interpretability for multi-scale drone targets—particularly micro drones—thereby enabling rapid detection of drone swarms based on an anchor-free architecture.
Traditional swarm formation recognition methods based on pattern matching rely heavily on the orientation information of individual drones and are highly sensitive to formation stability. Moreover, conventional graph convolutional networks (GCNs) struggle to adapt to deep hierarchical structures. To address this, we design an intra-layer feature extraction module centered around a graph convolutional neural network with graph pooling layers. This module compresses feature dimensions, thereby reducing time and space complexity and mitigating overfitting. Furthermore, it allows for the adaptive generation of neighborhood connections among nodes in the graph data, avoiding misallocations caused by heterogeneous graph structures. The proposed method eliminates the need for precise positional coordinates or directional angles of drones within the swarm, thus enhancing both accuracy and robustness.
Divergence and curl are classical concepts in electromagnetic physics,21,22 characterizing the degree of divergence/convergence and rotation at a given point in a behavioral vector field. Divergence captures changes in radial motion, while curl reflects changes in tangential motion. These metrics are employed to quantitatively characterize the microscopic motion behavior of drone swarms. The cumulative curl along tangential trajectories represents the macroscopic degree of rotational movement in the swarm, whereas the cumulative divergence along radial trajectories (orthogonal to tangential motion) reflects the degree of divergence or aggregation. Both microscopic and macroscopic features are essential for describing the behavior of drone swarms. Through line integral techniques, micro-level and macro-level features are connected to enable a comprehensive and quantitative situational assessment of drone swarm behavior.
The magnitude of divergence and curl values reflects the risk coefficient of drone swarm behavior: larger values indicate greater abnormality, higher likelihood of dangerous events, and potentially more severe consequences. These quantitative indicators can thus provide a scientific basis for public security agencies to conduct classification-based and hierarchical response strategies when managing drone swarm incidents.
Multi-scale UAV target detection
UAV can be categorized into micro, light, small, medium, and large classes based on their weight and speed. Given the inherently multi-scale nature of drone swarm targets, this study addresses the limitations of conventional anchor-based convolutional neural network (CNN) object detection models. Traditional anchor boxes and object matching methods are often insufficient to cover the wide range of UAV scales, and increasing the number of anchors introduces additional computational and memory burdens.
To overcome the problems of scale variation and feature imbalance in conventional feature pyramid networks, we propose a multi-scale drone swarm detection algorithm based on an anchor-free architecture. We adopt an anchor-free architecture because it does not depend on predefined anchor boxes and can more flexibly adapt to detecting targets of varying sizes. A multi-scale feature extraction module is embedded after the backbone feature extraction network, enabling high-level features to capture richer multi-scale information and substantially enhancing the model’s capability to detect UAV targets across various scales.
To more comprehensively capture contextual information within the image, the feature fusion module is placed after the deconvolution-based upsampling process, significantly boosting the model’s expressiveness with respect to multi-scale UAV targets. Furthermore, channel attention and local attention mechanisms are introduced to fuse shallow and deep features, addressing semantic and scale inconsistencies among feature maps. This fusion approach effectively improves the model’s ability to parse multi-scale UAV targets.
An attention-based feature fusion module is designed to efficiently integrate features across different network layers. When merging high-level and low-level features, the model jointly considers global and local weight distributions to avoid suppressing critical features of micro UAV during the fusion process. This design enhances the detection performance of the anchor-free network. As shown in Figure 1, an adaptive multi-scale feature extraction module is embedded after the downsampling module to capture multi-scale objects and contextual information. Multi scale drone target detection algorithm.
The proposed module consists of five parallel dilated convolutional layers with varying dilation rates, which extract discriminative features for micro, light, small, medium, and large UAVs from high-level semantic representations. These multi-scale features are then fused using spatial importance weighting to further improve detection accuracy. The current method assumes good visibility of UAV targets; severe occlusion or partial visibility scenarios have not been specifically addressed and are left as future work.
The feature maps output by the multi-scale extraction module are processed through an upsampling module. The resulting upsampled feature map P1 is fused with feature map C3 from the backbone via the feature fusion module, then further upsampled to produce feature map P2. Similarly, P2 is fused with C2 from the backbone to produce P3, which is then fused with C1 to obtain the final feature map. This final feature map is fed into a decoder to produce center localization and size regression outputs for multi-scale UAV targets.
Drone swarm formation recognition
To address the challenge of drone swarm formation recognition, a GCN-based approach is proposed. The drone swarm is modeled as a directed graph, where the swarm’s structural characteristics are described by extracting node connectivity, motion features, and node attributes. The graph nodes serve as the input, and their pattern features are extracted through graph convolution operations to achieve formation classification. 23 A graph neural network architecture is designed based on intra-layer and inter-layer feature extraction mechanisms. The input is a graph composed of an arbitrary number of UAV nodes, and the output is the classification label of the graph’s formation type. The proposed neural network enables both node-level feature extraction and aggregation, as well as hierarchical-level learning, allowing the model to capture global and local features of drone swarm formations more effectively.
As illustrated in Figure 2, the intra-layer feature extraction module consists of GCN layers, pooling layers, and a readout layer. When the graph with an arbitrary number of nodes is input into this module, the GCN layer first embeds the graph data from a high-dimensional feature space into a lower-dimensional one. During this process, it extracts high-level node features while aggregating and updating the graph, yielding a subgraph with the same number of nodes but enriched with deeper features. Next, the pooling layer performs downsampling, transforming the subgraph into a fixed-length vector that captures high-level structural features. This process reduces the number of non-essential nodes, thus avoiding overfitting while preserving the most salient topological and attribute information of the original graph. The resulting graph is then passed through the readout layer for scale transformation. This is achieved by concatenating the max-pooling vector and the average-pooling vector of the feature matrix. Max pooling emphasizes important node information, while average pooling captures the overall distribution of node features. This operation converts feature matrices of arbitrary sizes into fixed-dimensional representations suitable for classification tasks. The outputs from each intra-layer module are not simply aggregated but instead form an ordered sequence, which is input into the inter-layer feature extraction module. This sequence is processed by a Long Short-Term Memory (LSTM) network, which fuses outputs across layers to compute adaptive weights and receptive fields for each node. This enables the model to generate adaptive neighborhood connection ranges, mitigating the issue of feature misallocation due to substantial variations in graph data. Finally, a multi-layer perceptron (MLP) maps the fused features to a specific class label, thus completing the drone swarm formation recognition task. Graph neural network for intra-layer and inter-layer feature extraction.
Drone swarm situation computation
To achieve robust and invariant characterization of drone swarm behaviors, this study introduces a macroscopic quantitative behavior descriptor that is invariant to scale and rotation. This descriptor is constructed by linking microscopic behavioral features—namely divergence and curl—with macroscopic motion components—radial and tangential—via curvilinear integration. By clustering and decomposing drone swarm scenes extracted from video footage, a behavioral vector field is generated, which is then encoded into a macroscopic quantitative representation language based on the proposed descriptor. This enables a systematic computation pipeline from drone swarm videos to behavioral vector fields and ultimately to swarm-level behavior analysis. As illustrated in Figure 3, the process includes behavior feature clustering, behavior feature decomposition, encoding, pooling, and situation awareness computation. Drone swarm situation awareness strategy based on macroscopic quantitative descriptors.
A foundational challenge in this framework lies in the clustering and decomposition of complex vector fields, which serves as the basis for swarm situation computation. Accurately characterizing each behavioral pattern is essential for precise drone swarm behavior analysis. To this end, drifting particles are placed into the behavior vector field. These particles are driven to drift under the influence of local vectors. Upon completion of their trajectories, a particle density map is generated. The high-density regions (i.e., aggregation points) in this map represent sub-behavior vector fields.
The proposed model comprehensively integrates microscopic features (divergence and curl) with macroscopic features (radial and tangential motion), using curvilinear integration to unify them into a robust and discriminative macroscopic quantitative behavior descriptor. The generation process of this descriptor involves the following steps: Normalization of the behavior vector field. Computation of the divergence and curl maps. Conjugation of the behavior vector field. Drifting particle analysis within the conjugated field to identify radial (radial lines) and tangential (streamlines) paths. Curvilinear integration of divergence along radial lines and curl along tangential paths.
Given that different sub-behavior vector fields may vary in the dimensions of their divergence maps, curl maps, radial paths, and tangential paths, the resulting behavior descriptors may also differ in dimension. To standardize these into a fixed-dimensional representation suitable for downstream situation computation, an adaptive behavior feature pooling mechanism is introduced. This post-processing step ensures that descriptors derived from diverse behavior patterns can be unified into consistent formats for further swarm situation analysis.
Conclusion
This study addresses the inconsistency issues in extracting multi-scale feature information by designing a multi-scale feature extraction network capable of calculating inter-scale spatial weights. This enables effective fusion of features across different scales and significantly enhances the model’s capacity to detect UAV targets of varying sizes. Furthermore, GNN architecture is introduced into the domain of drone swarm formation recognition. By transforming the detected coordinates of drone swarm targets into graph-structured input data, GCN is constructed with both intra-layer and inter-layer feature extraction mechanisms. This architecture performs feature aggregation at the node level while simultaneously learning hierarchical representations, allowing the final output to more effectively capture both global and local structural characteristics inherent to drone swarms. In terms of behavioral modeling, the macroscopic divergence (indicating dispersion or convergence) and curl (indicating rotational motion) of drone swarms are quantified by performing cumulative integration along radial and tangential motion trajectories, respectively. Through curvilinear integration, the study successfully bridges microscopic and macroscopic behavioral features, yielding divergence and curl descriptors that capture the swarm’s global motion patterns. These descriptors exhibit desirable properties of scale and rotation invariance, enabling robust and quantitative characterization of drone swarm behaviors from a spatial perspective. These technical results can help law enforcement and public safety policymakers implement scientific, classification-based, and hierarchical response strategies for drone swarm incidents.
Footnotes
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funded by Beijing Natural Science Foundation (L232003), the Fundamental Research Funds for the Central Universities (2022JKF432), Public Security Behavior Science and Engineering Action Project of People’s Public Security University of China (2022KXCCKJ07), Project 2024LL26.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
