Digital shielding for cross-domain Wi-Fi signal adaptation using Relativistic average Generative Adversarial Network

Abstract

Wi-Fi sensing exploits radio-frequency signals emitted by Wi-Fi devices to analyze environments, enabling tasks such as people tracking, intruder detection, and gesture recognition. Its growing diffusion is driven by the IEEE 802.11bf standard, which facilitates environmental monitoring, and the increasing demand for tools capable of penetrating obstacles while preserving privacy. However, the performance of Wi-Fi-based sensing solutions is influenced by the environment in which signals are acquired. This is critical when extracting spatial and temporal information from the surrounding scene, as such data reflect both environmental structure and interference sources. A main challenge is achieving generalization across domains, that is, ensuring consistent performance under varying conditions, such as different rooms or buildings, without significant accuracy loss. This paper presents a deep model for domain adaptation of Wi-Fi signals by simulating a digital shielding mechanism. The model is based on a Relativistic average Generative Adversarial Network (RaGAN), which mimics physical shielding to suppress domain-specific features while preserving signal integrity. Both the generator and discriminator use Bidirectional Long Short-Term Memory (Bi-LSTM) architectures, enabling modeling of waveform and time-dimension signal characteristics. To support training, an acrylic box lined with electromagnetic shielding fabric, replicating a Faraday cage, was constructed. Spectra from same-sized objects made of different materials were acquired both inside (domain-free) and outside (domain-dependent) the box. A multi-class Support Vector Machine (SVM), trained on shielded spectra and tested on RaGAN-denoised data, achieved 96 percent accuracy. The SVM also distinguished materials, suggesting a promising approach for security systems aimed at identifying the nature and composition of potentially dangerous objects.

Keywords

Wi-Fi sensing domain adaptation digital shielding security systems RaGAN bi-LSTM multi-class SVM

1. Introduction

In the current literature, the term sensing refers to the process of detecting and measuring physical properties, events, and changes, as well as the behavior of people and objects, in indoor and outdoor environments using various types of sensors.^1–3 These sensors convert physical stimuli, such as light, sound, temperature, motion, or visual acquisitions like depth maps or RGB video streams, into data that can be observed and analyzed to derive a wide range of meaningful information for real-time informed decision-making. In recent years, these sensing technologies have become increasingly widespread and are playing a key role in an ever-growing number of application domains. Among the various sensing technologies, one widely known for its versatile applications across different fields is visual sensing, better known in the literature as scene understanding. The systems implemented with this technology generally consist of a distributed network of RGB cameras (occasionally other types of cameras) placed in various environments; these cameras acquire video streams that feed into computer vision models to achieve a multitude of purposes, such as person re-identification,^4–7 object tracking,^8–12 human action recognition,^13–16 foreground modeling and analysis,^17–21 and much more. Despite the remarkable achievements of computer vision techniques and models, many visual sensing applications still face significant challenges, such as illumination changes,^22,23 background clutter,^24,25 occlusions,^26,27 and perspective distortions.^28,29 Many of these challenges are cross-cutting across various fields. While current RGB devices remain the most suitable for spatial resolution, image quality, data richness, and noise management, they are inherently prone to the reported limitations. Consequently, recent efforts have been intensified to develop technologies capable of replacing or complementing current visual technologies, aiming to overcome, at least in part, these issues in visual tasks.

Wi-Fi sensing is an evolving technology that has already proven to be highly effective in a wide range of applications over the past two decades,^30–33 from smart home automation and security to healthcare monitoring and human-computer interaction. In recent years, Wi-Fi devices have been used not only for current monitoring activities but also as a type of “vision” system capable of capturing and shaping significant information to accomplish computer vision tasks through deep analysis of propagated signal spectra. For example, in Avola et al.,³⁴ the authors use Wi-Fi signal information that has passed through various subjects to develop a person re-identification system capable of providing more robust and reliable biometric signatures than visual ones, which are dependent on factors like changes in lighting or clothing. Meanwhile, Wang et al.³⁵ utilize Wi-Fi signals to reconstruct the skeletons of individuals, on which classical methods can then be applied to determine human body poses and movements. While there is potential to lose valuable information, such as the texture and colors that define the surface of objects, Wi-Fi sensing offers extraordinary capabilities, including analyzing the interior of solid objects, overcoming obstacles to mitigate occlusion issues, and providing stricter privacy constraints. The promising future of Wi-Fi applications in various domains is further supported by the recent establishment of the IEEE 802.11bf standard,³⁶ which formalizes and standardizes Wi-Fi sensing capabilities within the existing IEEE 802.11 Wi-Fi framework. It is now evident, especially in light of the next generation of applications under development, that regardless of the specific application being addressed, signal integrity remains a critical factor for the reliability and performance of Wi-Fi sensing applications. This is especially crucial for preserving signal fidelity and extracting rich semantic information required for accurate and complex manipulations.

At the core of Wi-Fi sensing lies the analysis of Channel State Information (CSI),^37,38 which provides a comprehensive representation of how a Wi-Fi signal propagates from the transmitter to the receiver through an environment. CSI captures a range of critical data, including amplitude, phase, delay spread, Doppler Frequency Shift (DFS), and multipath effects. These characteristics are essential for in-depth signal analysis and the implementation of advanced Wi-Fi sensing applications. However, environmental variations present a fundamental limitation to the robustness of Wi-Fi sensing systems, as they can induce significant fluctuations in CSI characteristics. As highlighted in the study by Chen et al.,³⁹ CSI is inherently sensitive to surrounding environmental factors, including room layout, furniture placement, and static reflectors. Even minor modifications, such as repositioning objects or the presence of transient obstacles, can alter the multipath propagation of Wi-Fi signals, causing domain shifts that negatively impact sensing accuracy. These variations impact key CSI metrics, such as amplitude, phase, and DFS, leading to inconsistencies in signal representation. In such conditions, models designed to perform specific sensing tasks often fail to generalize effectively when deployed elsewhere. This is particularly problematic for pre-trained models, which tend to experience a noticeable drop in performance when applied across domains characterized by different signal propagation conditions. As a result, Wi-Fi sensing systems frequently require continuous recalibration, retraining, or robust domain adaptation strategies to remain reliable in dynamic or previously unseen environments.

Considering the points discussed so far, this paper introduces an innovative deep model specifically designed to emulate the physical effects of a Faraday cage with the aim of filtering signals and removing any domain-dependent components together with possible noises and interferences that could be present in the environment. In other words, the proposed architecture purifies signals as if they had been acquired in a completely neutral and interference-free environment. To achieve this, a model based on the Relativistic average Generative Adversarial Network (RaGAN) was designed, which digitally mimics the protective characteristics of a Faraday cage, thus preserving the integrity and informative content of the Wi-Fi signal. In this proposed version of RaGAN, the generator and discriminator are implemented using Bidirectional Long Short-Term Memory (Bi-LSTM) networks. This configuration allows the model to effectively manage and interpret the sequential data present in Wi-Fi signals. The Bi-LSTM networks process the signal in two stages: first, by analyzing the sequence in the forward direction, thus capturing the dependencies as the signal evolves over time; then, by processing the sequence in the backward direction, thus ensuring that the model also considers future dependencies and provides a more comprehensive understanding of the temporal relationships within the waveform and time-series data of the Wi-Fi signals. To generate the data needed to train the RaGAN model, an acrylic box was constructed and lined with electromagnetic field shielding fabric designed to replicate the effects of a Faraday cage. Using this box, we collected an ad-hoc dataset in which various objects of the same shape and size but composed of different materials were acquired both inside the box, i.e., neutral environment, and outside the box, i.e., with real-world interference and domain-dependent. These acquisitions were then used to teach the RaGAN model to recognize the spectral representation with and without environment-dependent data, thus enabling the network to learn how to transform the spectrum of an object affected by the environment and possible interferences back to that of the same environment-free object. To evaluate the effectiveness of the obtained results, a multi-class Support Vector Machine (SVM) was subsequently employed. The multi-class SVM was trained using the spectra acquired inside the shielded box and tested with the spectra denoised by the RaGAN, achieving an accuracy of 96%. As an additional outcome of the research presented in this paper, which focuses on Wi-Fi signal cross-domain adaptation, the application of a multi-class classifier to verify the performance of the RaGAN also revealed the ability to classify the specific nature of the signal, successfully distinguishing whether a spectrum originated from an object composed of one material (e.g., aluminum) or another material (e.g., copper). In conclusion, the main contributions of the proposed work can be summarized as follows:

To the best of our knowledge, this is the first work in the literature to propose a RaGAN-based architecture specifically designed to emulate the physical shielding effect of a Faraday cage, with the goal of effectively suppressing environment-specific information from Wi-Fi signals. The experimental results confirm the effectiveness of the proposed approach.

Given that the proposed method is designed to transform a signal affected by any type of interference and domain-specific data into its ideal form, as if it were acquired in an interference-free environment, this model does not rely on knowing or adapting to the specific nature of the interference or environment. It thus represents an initial step toward effective cross-domain adaptation of Wi-Fi sensing systems.

A one-of-a-kind dataset has been created, consisting of objects made from different materials but with identical dimensions and shapes, acquired both in the presence and absence of environment data. This dataset can serve as a valuable starting point for various studies in the application fields of cross-domain adaptation and material analysis.

The research proposed in this paper has shown that it is possible to distinguish between different materials based on their Wi-Fi signal features. This discovery opens up new possibilities in the emerging field of Wi-Fi-based material sensing, with significant implications, for example, in security-related studies such as detecting concealed objects or identifying hazardous materials.

The remainder of this paper is organized as follows. Related Work provides a comprehensive overview of prior studies addressing signal denoising across different areas. Proposed Method describes the architecture, detailing the Bi-LSTM-based generator and discriminator. Experimental Setup and Results introduces the collected dataset and discusses the results obtained, both in terms of the cross-domain generalization capability of the proposed method and the ability of the classifier to distinguish between different materials. Finally, Conclusion concludes the paper and outlines our ongoing research directions in Wi-Fi sensing.

2. Related work

Signal noise has consistently been a significant challenge in telecommunication systems due to the unpredictable nature of various noise types and their sources. Different signals and communication channels encounter distinct forms of noise, each requiring specific techniques to mitigate these disturbances. Furthermore, the term noise can refer to very different types of signal interference depending on the specific task the system is designed to perform, ranging from signal and hardware-related disturbances to environment-specific factors. In the context of Wi-Fi sensing, noise generally refers to all kinds of information gathered by the CSI that is not relevant to the specific task. Denoising techniques are typically divided into time-domain and frequency-domain approaches. The time-domain methods focus on mitigating signal noise by analyzing temporal patterns. A representative approach is median filtering, as reported by Wang et al.⁴⁰ on fall detection, where a weighted moving average filter was employed to reduce noise caused by environmental factors such as temperature and room conditions. Another commonly used method is Principal Component Analysis (PCA), as described by Wang et al.⁴¹ on human activity recognition, where the first principal component, representing the highest noise variance, is discarded, while the subsequent five components are retained. The frequency-domain methods, instead, focus on reducing noise by applying filters to isolate useful signal components while attenuating unwanted noise. Common examples of these methods are represented by the work of Ali et al.⁴² on keystroke recognition, where a low-pass filter is used, and the work of Liu et al.⁴³ on tracking vital signs during sleep, where a band-pass filter is employed. However, time-domain denoising approaches can distort the signal, thus potentially leading to the loss of high-frequency components or the removal of significant useful information. In contrast, frequency-domain denoising techniques struggle to effectively eliminate noise within the passband.

Moving into the time-frequency domain to leverage the benefits discussed above, the Discrete Wavelet Transform (DWT) has proven effective in denoising transient waveforms, such as ElectroCardioGrams (ECG) and Transient ElectroMagnetic (TEM) signals, as described in Guo et al.⁴⁴ and Wei et al.,⁴⁵ respectively. Although CSI signals differ in nature, they share key characteristics like non-stationarity, which DWT effectively addresses. Its noise reduction capabilities are demonstrated in the work of Fang et al.,⁴⁶ where the authors apply the Multilevel Discrete Wavelet Transform (MDWT) to denoise and extract signal features. Recent studies have also introduced Synchrosqueezed Wavelet Transform (SWT)^47–49 as a powerful time-frequency analysis and denoising tool. The time-frequency approach enables DWT and SWT to capture transient signal details that single-domain methods might miss, thus making it especially suited for analyzing dynamic and non-stationary waveforms.

While the techniques discussed above help reduce fluctuations and interferences caused by the environment, they still fail to eliminate domain-dependent information embedded in the CSI. To address this limitation, several cross-domain adaptation strategies have been proposed to improve the generalization capabilities of Wi-Fi sensing models. Much of the existing work in Wi-Fi sensing focuses on tasks involving moving subjects, such as activity recognition, user identification, motion detection, human tracking, and fall detection. Consequently, many studies aim to remove the static component of the CSI and retain only the dynamic part, which reflects changes due to motion. Several methods have been developed to achieve this, including the Local Extreme Value Detection (LEVD) algorithm,^50–52 conjugate multiplication of CSI measurements between antennas,⁵³ recursive filtering,^54,55 and, to some extent, frequency-based filters, since motion-induced variations typically reside in the low-frequency range.⁵⁶ However, while these techniques are effective in motion-related scenarios, they are not suitable for applications where the dynamic component is irrelevant, such as material identification. In such cases, a different approach is required to preserve static, domain-relevant information and suppress domain-specific noise.

In the past decade, Deep Learning (DL) methods have been applied with increasing frequency across various fields, including the area addressed in this paper.^57,58 Convolutional Neural Networks (CNNs) were the first DL approaches widely adopted for denoising, particularly in the image domain. Actually, many works have approached signal denoising as an image-denoising problem. For instance, Chen et al.⁵⁹ transformed a 1-D TEM signal into a 2D image, which was then processed by CNNs for noise reduction. Similarly, Zhu et al.⁶⁰ applied the Short-Time Fourier Transform (STFT) to convert seismic signals from the time domain to the time-frequency domain, thus creating an image input for a denoising U-Net. Likewise, Almazrouei et al.⁶¹ computed the STFT of MATLAB-simulated IEEE 802.11a/n signals and fed these into a Convolutional Denoising AutoEncoder (CDAE) for effective noise reduction. Note that precise separation of noise from clear signal is not always possible and obtaining such a dataset is part of the challenge in denoising frameworks. A solution to address this issue was proposed by Wang et al.,⁶² which utilized a GAN with Wasserstein distance and Gradient Penalty (WGAN-GP) to learn the noise characteristics present in noisy TEM signals, generating noise signals to build an effective dataset for denoising models. In contrast, Yang et al.⁶³ introduced an alternative approach using a Conditional GAN (CGAN) to separate crucial signal information from noise. While these works have demonstrated promising results, several limitations remain to be explored and addressed. A key challenge lies in the extended training time and slow convergence speed of current noise reduction methods based on GANs. Another limitation is ensuring the denoising method is sufficiently generalized to handle various noise types. In addition, generative model cannot remove completely the domain-specific information contained in CSI if not supported with an ad-hoc and clean dataset to learn with. Finally, maintaining the informative content within the signal after noise removal remains an essential aspect to address. Inspired by works such as that of Jolicoeur-Martineau,⁶⁴ which proposed the relativistic discriminator to enhance limitations in standard GANs and that of Peng et al.,⁶⁵ which examined the performance of a RaGAN for noise reduction in communication signals; the work proposed in this paper introduces a new paradigm for cross-domain adaptation and noise reduction based on the concept of digital shielding. This approach leverages the fast convergence of RaGAN while enabling an abstraction level in signal denoising that remains independent of the specific environment. The proposed method achieves a high degree of accuracy in cleaning, preserving carrier information and ensuring the informative content even in fine-grained classification case studies, such as distinguishing materials of different objects.

3. Proposed method

This section presents the three key aspects of the proposed method. In the first, an explanation of the simple yet effective concept behind the proposed approach, i.e., digital shielding, is given. In the second, the implementation of the RaGAN-based architecture designed to emulate the physical effect of a Faraday cage is presented. In the last, details of the simple multi-class SVM classifier, used both to measure the performance of the RaGAN-based architecture and to validate material differentiation as further evidence of the implemented domain-independent mechanism, are shown.

3.1. Digital shielding in Wi-Fi signals

As is well known, Wi-Fi sensing covers various tasks beyond traditional communication, such as fall detection or gesture recognition. These tasks, conducted in real-world environments, are inherently susceptible to diverse settings and contexts in which they are carried out. Indeed, Wi-Fi signals are subject to influence from numerous factors, such as electromagnetic interference, environmental morphology, and other contextual variables. The concept behind digital shielding is as follows: if a deep model can be trained to understand the behavior of a Wi-Fi signal during the execution of a specific task in an ideal, noise-free environment, then when that same signal is used in a real-world, noisy environment-dependent setting, the model should be able to clean the signal and restore it to a state similar to its ideal condition, allowing cross-domain adaptation and better generalization. A significant advantage of this approach is that the cleaning process becomes largely independent of the specific environment, as the model consistently aims to extract the fingerprint of the target signal regardless of the environment through which the signal propagates.

3.2. Bi-LSTM-based RaGAN

In Figure 1, the proposed model architecture for digital shielding is shown. The model consists of a RaGAN, where both the generator and the discriminator are implemented using Bi-LSTM networks. This design enables the model to interpret the sequential patterns present in Wi-Fi signals. By utilizing Bi-LSTM networks, the architecture processes the signal in both the forward direction, capturing dependencies as the signal evolves over time, and the backward direction, incorporating future relationships. This bidirectional processing allows for a more comprehensive understanding of the temporal dynamics embedded within the waveform and time-series data. In order to train the network to act as a digital Faraday cage, capable of restoring real-world signals as closely as possible to their ideal representations, a specific training strategy is employed. In particular, the discriminator is trained with shielded samples, i.e., signals acquired inside an acrylic box lined with electromagnetic shielding fabric, while the generator is trained with unshielded samples, i.e., signals acquired in real-world conditions. Regarding the features extracted from the CSI, only the amplitude is used, as it encapsulates both the waveform structure and temporal characteristics of the Wi-Fi signals. As an example, Figure 2 illustrates the amplitude computation for two acquisitions of a copper cube, performed in an unshielded (top) and shielded (bottom) environment, respectively.

Figure 1.

Proposed model architecture for digital shielding. The generator and discriminator are implemented using Bi-LSTM networks to process the sequential data in Wi-Fi signals. In the first stage, the model analyzes the sequence in the forward direction to capture dependencies as the signal progresses over time; in the second, it analyzes the sequence in the backward direction to account for future dependencies. This dual approach enhances comprehension of the temporal relationships within the waveform and time-series data of Wi-Fi signals.

Figure 2.

Amplitudes extracted from the CSI: the top plot shows the acquisition of the copper cube in a real-world environment, while the bottom plot represents the acquisition inside the acrylic box lined with electromagnetic shielding fabric (i.e., Faraday cage).

3.2.1. CSI Estimation and Amplitude Feature:

details regarding the construction of the shielded box for noiseless acquisitions, the transmission and reception devices employed, and the dataset created will be provided in the experimental section. In this subsection, however, the primary focus will be on the estimation of the CSI and the subsequent extraction of the amplitude, which will serve as a fundamental feature in the proposed method to be presented in this paper.

Modern wireless communication systems rely on advanced techniques to characterize and adapt to the complex nature of propagation environments. When a Wi-Fi signal spreads through an environment, it undergoes modifications influenced by the structures it encounters, such as walls, persons, and objects. These interactions alter the signal features, including amplitude, phase, and propagation path, shaping it according to the specific properties of the crossed elements. Among the advanced techniques designed to manage these effects, Orthogonal Frequency Division Multiplexing (OFDM)⁶⁶ has become a key modulation scheme in wireless communications. OFDM divides the available bandwidth into multiple subcarriers, each carrying a portion of the transmitted signal. This division enables efficient utilization of the frequency spectrum and provides significant mitigation against multipath fading, a phenomenon caused by the interactions between the signal and the environment. A key outcome of the OFDM structure is the ability to analyze the behavior of the channel at the level of individual subcarriers. This is achieved through the Channel Frequency Response (CFR),⁶⁷ which represents the frequency-dependent characteristics of the channel. The CFR captures how the channel modifies the transmitted signal at each subcarrier, encapsulating changes in signal strength, attenuation, and other frequency-selective effects introduced during propagation. To further quantify these modifications and generalize the analysis across multiple packets, the concept of CSI is introduced. The latter extends the CFR representation by encompassing multiple subcarriers and multiple time instances, providing a comprehensive picture of the behavior of the channel. Formally, in the frequency domain, the relationship between the transmitted and received signals can be expressed as:

y = H \cdot x + n,

(1)

where

y \in C^{k}

is the vector of the received signal,

H \in C^{k}

is the CFR vector that describes the effect of the channel on each subcarrier,

x \in C^{k}

is the vector of the transmitted signal, and

n \in C^{k}

is the Additive White Gaussian Noise (AWGN)⁶⁸ at the receiver. The CFR is sampled at the subcarrier level, thus producing discrete measurements of the channel behavior. For an OFDM system with

K

subcarriers, the CFR for the

k^{th}

subcarrier is defined as:

H_{k} = \frac{Y_{k}}{X_{k}},

(2)

where

H_{k} \in C

Y_{k} \in C

, and

X_{k} \in C

represent, respectively, the CFR value, the received signal, and the transmitted signal at the

k^{th}

subcarrier. Each CFR value

H_{k}

is a complex number expressed as:

H_{k} = | H_{k} | e^{j ϕ_{k}},

(3)

where

| H_{k} | \in R^{+}

and

ϕ_{k} \in [0, 2 π)

represent the amplitude and phase, respectively. For a single transmission packet, the CFR can be represented as a vector:

H = [\begin{matrix} H_{1} \\ H_{2} \\ ⋮ \\ H_{K} \end{matrix}] \in C^{K},

(4)

where

K

is the total number of subcarriers. When multiple packets are transmitted, the CFR values for all subcarriers across packets form the CSI matrix. For

n

transmitted packets and

K

subcarriers the CSI matrix is given by:

C S I = [\begin{matrix} H_{1, 1} & H_{1, 2} & \dots & H_{1, K} \\ H_{2, 1} & H_{2, 2} & \dots & H_{2, K} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ H_{n, 1} & H_{n, 2} & \dots & H_{n, K} \end{matrix}] \in C^{n \times K},

(5)

where

H_{p, k} \in C

is the CFR value for the

k^{th}

subcarrier in the

p^{th}

packet, and

n

and

K

are the total number of transmitted packets and subcarriers, respectively. Observe that the matrix has dimensions

n \times K

where each row represents the CFR values for a single packet and each column corresponds to a specific subcarrier. Due to the specific focus of this work, a SISO (Single Input Single Output) configuration is adopted, which involves a single transmitting antenna and a single receiving antenna. This choice simplifies the CSI representation, as it avoids the additional complexity introduced by MIMO (Multiple Input Multiple Output) systems, where multiple transmitting and receiving antennas are used.

By analyzing the base case of SISO, the method concentrates on the core signal cleaning mechanism, thus allowing for a precise evaluation of the proposed digital shielding method. To achieve this, the amplitude component of the CSI is utilized as the primary feature for signal analysis and transformation. Representing the signal strength for each subcarrier, the amplitude encapsulates critical channel properties, including attenuation, multipath reflections, and scattering, which are directly influenced by the environmental interactions of the Wi-Fi signal during propagation. This makes the amplitude a robust abstraction for the channel state, providing a comprehensive and reliable way to represent how the channel behaves. Unlike the phase, which requires precise alignment, the amplitude offers a stable and computationally efficient feature for modeling the channel, preserving the essential characteristics needed for effective signal restoration. Furthermore, by focusing on the amplitude, the proposed method ensures that the restored signal closely approximates its interference-free state, effectively addressing broader signal properties such as phase and spectral coherence. Formally, the amplitude $| H_{k} | \in R^{+}$ for the $k^{th}$ subcarrier can be defined as follows:

| H_{k} | = \sqrt{Re (H_{k})^{2} + Im (H_{k})^{2}},

(6)

where

Re (H_{k})

and

Im (H_{k})

denote the real and imaginary parts of

| H_{k} |

, respectively. The extracted amplitudes, corresponding to both shielded and unshielded conditions, serve as the basis for training the RaGAN-based architecture.

3.2.2. Generator and Discriminator Networks:

the initial idea behind the proposed digital shielding method took inspiration from GANs, which are applied to signal denoising in a wide range of application domains, such as ElectroEncephaloGraphy (EEG),⁶⁹ Magnetic Resonance Imaging (MRI),⁷⁰ Sound Recognition (SR),⁷¹ and many others. These operational scenarios highlight the versatility of GANs in addressing noise-related challenges across different areas. However, traditional GANs are known to suffer from diverse issues,^72–74 including instability during training, generally caused by the balance required between the generator and discriminator, slow convergence due to the iterative nature of adversarial optimization, and challenges in processing signals with complex dynamic patterns that demand a deep understanding of temporal dependencies. To overcome these limitations, this work proposes the adoption of a RaGAN, a variant specifically designed to address the shortcomings of conventional GANs. In general, the RaGAN modifies the objective of the discriminator to ensure a more stable training process, thus improving both convergence speed and performance. In particular, RaGAN introduces a relativistic average discriminator, which considers the real and generated samples as relative to each other rather than absolute entities. This approach aims to provide a more balanced and robust training process by incorporating the notion of relativistic comparison.

In Figure 3, details of the implemented generator and discriminator in the proposed RaGAN-based architecture are presented. As well known, traditional GANs designed for signal denoising consist of two primary components, i.e., a generator and a discriminator. These networks collaborate within an adversarial process to improve the quality of noisy input signals. Specifically, the generator is trained to consider every environment information contained in the CSI amplitude data as noise and transform signals acquired from real-world environments into their denoised counterparts by learning the mapping between noisy and clean signal domains. Through adversarial training, the generator iteratively enhances its capability to produce outputs that closely approximate reference clean signals, which represent the ground truth of the environment-free spectrum. The discriminator is tasked with distinguishing between real clean signals and those generated by the generator. By evaluating the quality of the generated outputs and providing constructive feedback, the discriminator plays a crucial role in enabling the generator to progressively refine its ability to create realistic and accurate denoised signals. Once the training process is complete, the generator operates independently to process new noisy signals, reconstructing their clean spectra with high fidelity. This separation allows the trained generator to function as a standalone model for effective signal denoising, capable of extracting and restoring spectral information from noisy inputs.

Figure 3.

Details of the proposed architecture. The first part (left) shows the generator, whose main components are an initial fully connected layer, a Bi-LSTM layer, and a final fully connected layer. The second part (right) shows the discriminator, whose main components are a Bi-LSTM layer and two sequential fully connected layers.

From an overall perspective, the proposed RaGAN-based architecture follows the same logical pipeline as the previously described GAN-based architecture for signal denoising but differs from both conventional GANs and standard RaGANs in several fundamental aspects. To begin with, differently from conventional GANs, the introduction of a relativistic average discriminator allows the proposed RaGAN-based architecture to overcome the limitations outlined earlier. By evaluating the relative likelihood that real samples appear more authentic compared to generated ones, instead of relying on absolute classifications, the relativistic approach reduces instability during training. This comparative mechanism minimizes sensitivity to imbalances between the generator and discriminator, thus fostering a more stable optimization process. Additionally, the relativistic discriminator enhances convergence by providing richer informative gradients during adversarial training. Lastly, this comparative approach enables the model to better handle complex dynamic patterns by concentrating on the relative distinctions between noisy and clean signals, thus enhancing the ability of the generator to adapt to complex temporal and spectral variations in the input data. In addition, unlike standard RaGANs, the proposed architecture integrates Bi-LSTM networks as the core computational units for both the generator and discriminator. This modification introduces significant advantages as Bi-LSTMs are specifically designed to capture both forward and backward temporal dependencies in sequential data, thus making them well-suited for analyzing the waveform and time-dimension characteristics of noisy signals. By processing the signal bidirectionally, Bi-LSTMs enable the architecture to extract contextual information from both past and future signal states, providing a comprehensive understanding of the input.

Following the architecture presented in Figure 3 (left), the generator is designed to transform noisy amplitudes extracted from CSI into clean, interference-free counterparts. The input to the generator is represented by a matrix $X \in R^{n \times K}$ , where $n$ is the number of packets and $K$ is the number of subcarriers. The initial stage of the generator consists of a fully connected layer that maps the input from its original dimensionality, $R^{n \times K}$ , into a latent space, $R^{n \times d}$ , where $d$ is a predefined dimension. This projection is defined as follows:

X^{'} = ψ (W_{1} \cdot X + b_{1}),

(7)

where

W_{1}

and

b_{1}

represent the weight matrix and bias vector, respectively, and

ψ (.)

is a nonlinear activation function designed to introduce a small slope in the negative region to prevent gradient saturation (i.e., Leaky ReLU). The latent representation

X^{'}

is then processed by a Bi-LSTM layer, which captures both forward and backward temporal dependencies. For each timestep

t

, the forward and backward hidden states,

\vec{h_{t}}

and

\overset{\leftarrow}{h_{t}}

, are concatenated to form the complete hidden state:

h_{t} = [\vec{h_{t}}, \overset{\leftarrow}{h_{t}}] \in R^{2 d} .

(8)

Subsequently, the Bi-LSTM processes the sequence to generate an output matrix:

H = [h_{1}, h_{2}, \dots, h_{n}] \in R^{n \times 2 d},

(9)

which is passed through a final fully connected layer to map the data back to its original dimensions:

Y^{'} = σ (W_{2} \cdot H + b_{2}),

(10)

where

W_{2} \in R^{K \times 2 d}

and

b_{2} \in R^{K}

are the weight matrix and bias vector of the concluding fully connected layer, and

σ (.)

is a nonlinear activation function designed to map input values to the range

[0, 1]

, making it suitable for representing probabilities and normalizing outputs (i.e., Sigmoid).

Following the architecture presented in Figure 3 (right), the discriminator is designed to evaluate the authenticity of the amplitudes generated by the generator. Its role is to differentiate between clean amplitudes from the ground truth and those synthesized by the generator. This is achieved through a Bi-LSTM network complemented by two sequential fully connected layers. The discriminator takes as input the same matrix $X \in R^{n \times K}$ , representing either real or generated amplitudes. The Bi-LSTM processes the temporal structure of $X$ and outputs a matrix $H \in R^{n \times 2 d}$ , analogous to the hidden states of the generator. The first fully connected layer reduces the dimensionality:

H^{'} = ψ (W_{3} \cdot H + b_{3}),

(11)

where

W_{3} \in R^{d \times 2 d}

and

b_{3} \in R^{d}

. The second fully connected layer computes the final output:

z = σ (W_{4} \cdot H^{'} + b_{4}),

(12)

where

W_{4} \in R^{1 \times d}

and

b_{4} \in R^{1}

. The output

z

is a scalar indicating the probability that the input belongs to the real data distribution.

It is important to emphasize that the functionality of the Bi-LSTM differs between the generator and the discriminator. In the generator, Bi-LSTM is employed to capture temporal correlations within noisy inputs, thus enabling the network to reconstruct clean amplitudes that closely approximate their interference-free counterparts. This process involves learning a mapping that adapts to variations introduced by environmental noise, thereby ensuring effective denoising. Conversely, in the discriminator, Bi-LSTM is utilized to differentiate temporal patterns between real (i.e., clean) and generated (i.e., denoised) signals. This is achieved by analyzing the sequential structure of the input, where bidirectional processing enhances the ability of the network to detect inconsistencies in the generated samples.

A final innovation introduced in the proposed model concerns the loss functions. Specifically, while the discriminator employs the relativistic average loss, a standard choice for RaGAN-based architectures, the loss function of the generator has been customized. Inspired by the work of Ledig et al.,⁷⁵ the proposed model incorporates an overall loss function for the generator that combines content loss and contrastive loss. Considering both aspects, the generator is guided by both content information and confrontation between signal patterns. Formally, the customized loss function is expressed as follows:

L_{G} = L_{c} + λ L_{G}^{R a},

(13)

where

L_{c}

and

L_{G}^{R a}

represent the content loss and confrontation loss, respectively, and

λ

is the coefficient balancing the contributions of these loss components. By jointly optimizing the content and confrontation losses, the generator effectively restores noisy signals

s

to their noise-free counterparts

s_{r}

. The content loss function,

L_{c}

, is made up of two components, i.e.,

L_{M S E}

and

L_{1}

, more specifically:

L_{c} = \frac{(L_{M S E} + L_{1})}{2},

(14)

where

L_{M S E}

and

L_{1}

quantify the Mean Square Error (MSE) and the Absolute Error (AE) between the denoised amplitude

G (s)

and the clean amplitude

s_{r}

, respectively. Note that

G (s)

is the output of the generator and

n

denotes the length of a single sample. Formally, the two measures are expressed as follows:

\begin{aligned} L_{M S E} & = \frac{1}{n} \sum_{i = 1}^{n} (s_{r_{i}} - G (s)_{i})^{2}, \end{aligned}

(15)

\begin{aligned} L 1 & = \frac{1}{n} \sum_{i = 1}^{n} | s_{r_{i}} - G (s)_{i} | . \end{aligned}

(16)

3.3. Multi-Class SVM

Through iterative adversarial training, the RaGAN enhances the ability of the generator to produce amplitudes that closely resemble their ground truth environment-free representations. Following training, the generator is deployed as a digital shielding mechanism, actively denoising real-world signal amplitudes on-the-fly, restoring them to their domain-free forms. To evaluate the effectiveness of the proposed method, a practical and challenging task was selected, i.e., material classification. This task involves distinguishing between four representative materials, i.e., acrylic, aluminum, copper, and pine, by analyzing the unique modifications induced in Wi-Fi signals as they propagate through the materials of varying compositions. Note that the four materials can be considered highly representative of the objects and items commonly present in real-world environments.

To evaluate the performance of the proposed domain adaptation method, a multi-class SVM classifier was employed. This choice was motivated by two key considerations. On one hand, the SVM is generally considered as a robust and reliable classifier. On the other hand, it is known that its performance can degrade when dealing with particularly complex, non-linear, and noisy data. Given the nature of the Wi-Fi signal amplitudes, characterized by significant interference and a high degree of disorder, a poorly performing denoising method would have resulted in low classification accuracy. However, as demonstrated in the experimental section, the classification accuracy achieved in the material classification task remains remarkably high, despite the complexity of the input.

4. Experimental setup and results

This section shows the main stages involved in obtaining the experimental results. The first describes the construction of the shielded box and the process of acquiring the data used for training the RaGAN-based architecture and testing the multi-class SVM. The second details the training of the architecture and presents the accuracy achieved in the material classification case study, demonstrating the effectiveness of the domain adaptation method.

4.1. Shielded box and data collection

As shown in Figure 4, a shielded box was constructed to replicate the effects of a Faraday cage. The structure, made of acrylic due to its mechanical properties, has internal dimensions measuring $150 \times 100 \times 100$ cm. The external surfaces of each panel of the box were lined with high-performance electromagnetic shielding fabric, designed to minimize external radio frequency interference and ensure a fully isolated, noise-free internal environment for signal acquisition. To facilitate measurements, the box was designed with the removable upper panel, thus allowing for the easy placement and adjustment of objects inside. Additionally, two openings were placed on opposite sides of the box to accommodate two ESP32 Wi-Fi-enabled microcontrollers.⁷⁶ One device was used for signal transmission and the other for signal reception.

Figure 4.

Acrylic box lined with electromagnetic shielding fabric, designed to replicate the effects of a Faraday cage. The box isolates objects, enabling the RaGAN to learn the impact of physical shielding. The legend shows four cubes made of different materials (acrylic, aluminum, copper, and pine), each with dimensions of $2 \times 2 \times 2$ cm, which were used in the classification task to validate the effectiveness of the proposed denoising method. Two ESP32 devices are positioned within the box, serving as the transmitter and receiver for Wi-Fi signals.

Regarding ESP32, it is a highly versatile device known for its dual-core processors and integrated Wi-Fi capabilities. Operating at 2.4 GHz and supporting the IEEE 802.11n standard, the ESP32 is configured in this work with a SISO setup involving one transmitting and one receiving antenna. This configuration simplifies the representation of CSI while preserving the essential characteristics of the signal. Leveraging the OFDM modulation scheme, a key feature of the IEEE 802.11n standard, the ESP32 divides the available bandwidth into multiple subcarriers, thus enhancing robustness against multipath fading and interference and ensuring data integrity during transmission. In the SISO configuration, the ESP32 achieves a theoretical maximum transmission speed of 150 Mbps with a 40 MHz channel width. This capability makes it well-suited for Wi-Fi signal denoising tasks, providing high spectral efficiency and enabling precise extraction of channel properties, including amplitude. In the experimental setup, one ESP32 device functions as a transmitter, sending Wi-Fi packets at a constant rate of 100 packets per second to create a stable signal source. The transmitted signal interacts with the object placed inside the shielded box, and the resulting signal is received by the second ESP32 device configured as a receiver. The receiver records the CSI, capturing variations in amplitude introduced by the material properties of the object during signal propagation. To ensure high-quality data collection, the ESP32 receiver is configured to extract CSI data from 64 subcarriers, thus providing a detailed frequency-domain analysis of the channel. Both ESP32 devices are synchronized during acquisition to maintain consistency in packet transmission and reception. This setup ensures accurate and reproducible measurements of Wi-Fi signals under both shielded and unshielded conditions. The modular design of the box and the adaptability of the ESP32 devices contribute to a controlled and flexible experimental environment, enabling the reliable acquisition of clean Wi-Fi signal data for various test scenarios. In Table 1, the main parameters selected for configuring the device settings are reported.

As previously mentioned, each Wi-Fi packet transmitted under the IEEE 802.11n standard consists of 64 subcarriers. Of these, 52 are actively used for data transmission, while the remaining subcarriers, categorized as pilot and guard subcarriers, serve as reference signals. Pilot subcarriers assist with channel estimation and phase correction, ensuring synchronization and stability during transmission. Guard subcarriers, positioned at the edges of the frequency spectrum, help mitigate inter-channel interference and prevent spectral leakage. Together, these unused subcarriers contribute to the overall robustness and reliability of the communication channel.

Table 1.

ESP32 – device settings.

Parameter	Transmitter (Tx)/Receiver (Rx)
Wi-Fi Standard	IEEE 802.11n
Modulation	OFDM
Frequency	2.4 GHz
Configuration	SISO
Bandwidth	40 MHz
N° of Packets	100/s
N° of Subcarriers	64
Acquisition Time	10 s
Output Data	N/A(Tx)/Raw-CSI(Rx)
Synchronization	Clock(Tx)/With-Tx(Rx)

Now that the acquisition specifications, such as the number of packets and subcarriers, have been detailed, it is possible to quantify the network configuration to facilitate the reproducibility of the proposed method. Referring to Figure 3 (left), the generator consists of two fully connected layers and a Bi-LSTM layer, designed to effectively capture both the waveform and time-dimension characteristics of the noisy signal. The input and output dimensions of the generator are both set to $1000 \times 52$ . Initially, the input tensor undergoes normalization, after which it is passed through a fully connected layer with an input size of $52$ and an output size of $256$ . To enhance generalization, a dropout layer with a probability of $0.3$ is applied to the tensor. Subsequently, a Leaky ReLU activation function is introduced to add non-linearity. The tensor is then processed by a Bi-LSTM layer with an input size of $256$ and a hidden size of $256$ . Since the Bi-LSTM processes the input bidirectionally, it produces an output with a dimension of $512$ . The output of the Bi-LSTM is then fed into another fully connected layer with an input size of $512$ and an output size of $1000 \times 52$ . Another dropout layer, with a dropout probability of $0.3$ , is applied to this tensor, followed by another Leaky ReLU activation. Finally, the tensor undergoes layer normalization to ensure numerical stability and better convergence. Referring to Figure 3 (right), the discriminator network consists of two fully connected layers and a Bi-LSTM layer. Initially, the input tensor undergoes layer normalization. It is then passed through a Bi-LSTM layer with an input size of $52$ and a hidden size of $256$ . Following the Bi-LSTM, the output corresponding to the last time step is processed by a fully connected layer with an input size of $512$ (i.e., bidirectional output size) and an output size of $256$ . A dropout layer, with a probability of $0.3$ , is applied to the tensor, followed by a Leaky ReLU activation to introduce non-linearity. The tensor is then passed through another fully connected layer with an input size of $256$ and an output size of $1$ , representing the probability of the input being real or fake.

The dropout rate of $0.3$ was selected based on established best practices in literature,^77,78 particularly for architectures designed to handle high-dimensional data like Wi-Fi signal amplitudes. This value provided a good balance between preventing overfitting and retaining critical information for effective learning. Initial experiments with alternative dropout rates, e.g., $0.2$ , $0.4$ , demonstrated that $0.3$ offered the most consistent performance in terms of convergence and generalization on validation data.

An ad-hoc dataset was collected to train the RaGAN-based architecture and evaluate its signal denoising capabilities using a multi-class SVM. Four objects with identical sizes of $2 \times 2 \times 2$ cm but composed of different materials, i.e., acrylic, aluminum, copper, and pine, were acquired following the parameters specified in Table 1. The objects were positioned at the center of the box, as shown in Figure 4, and acquisitions were performed twice. The first set of acquisitions was conducted inside the shielded box, thus ensuring noise-free conditions, while the second set was performed with all the shielding panels removed, maintaining the same distance between devices, thus reflecting real-world conditions with noise. As a result, two datasets were obtained, $D_{S}$ , i.e., shielded dataset, containing the spectra of the four objects under shielded conditions, and $D_{U}$ , i.e., unshielded dataset, containing the spectra of the same objects under unshielded conditions. More specifically, inside the shielded box, each of the four objects was acquired $30$ times, along with $30$ background noise recordings without any objects present. Similarly, outside the shielded box, $30$ acquisitions were performed for each object and $30$ for the background noise. To ensure generalization, particularly in the noisy environment outside the box, acquisitions were conducted on different days and at varying times. In total, $300$ acquisitions were collected, with $150$ obtained under shielded conditions and $150$ in real-world noisy conditions. It is important to note that the use of only four materials for training and validation should not be seen as a limitation. First, the selected materials can be considered highly representative of the objects and items commonly found in real-world environments. Second, the task of recognizing the material composition of an object using Wi-Fi signals that pass through it is inherently challenging and complex, and remains an area still underexplored in the current literature.

4.2. RaGAN training and classification

Once the datasets $D_{S}$ and $D_{U}$ were prepared, each consisting of $150$ files (i.e., $30$ per object and $30$ for background noise), the RaGAN model was trained on both to set and refine the ability of the generator to denoise Wi-Fi signals. To further evaluate the model performance, experiments were conducted using two distinct versions of the datasets. The first version included the complete datasets, consisting of all $150$ files from each set. The second set of experiments utilized subsets of the datasets, each comprising data for all objects and background noise, but limited to acquisitions performed on a single day. Over a span of ten days, each day included the acquisition of all four objects and background noise both inside and outside the shielded box, resulting in a total of $30$ acquisitions per day. This setup allowed for a detailed examination of the model ability to generalize across temporal variations while maintaining a robust evaluation framework. Based on the described setup, a grid search approach⁷⁹ was employed to optimize the hyperparameters of the RaGAN model and assess its performance across different temporal subsets. The ranges of the hyperparameters explored during this process and the optimal values selected after evaluation on the validation set are summarized in Tables 2 and 3, respectively. In both tables, $G$ and $D$ are used to denote the generator and discriminator components of the proposed model.

Table 2.
Grid search hyperparameter ranges.

Hyperparameter Range

Batch size {5, 10, 15, 30}

Dropout {0.2, 0.3, 0.4}

Learning rate (G) {0.0001, 0.001, 0.01, 0.1}

Learning rate (D) {0.0001, 0.001, 0.01, 0.1}

Optimizer {AdamW}

$λ$ {10, 50, 100, 500}

Hyperparameter	Range
Batch size	{5, 10, 15, 30}
Dropout	{0.2, 0.3, 0.4}
Learning rate (G)	{0.0001, 0.001, 0.01, 0.1}
Learning rate (D)	{0.0001, 0.001, 0.01, 0.1}
Optimizer	{AdamW}
$λ$	{10, 50, 100, 500}

Table 3.

Optimal hyperparameter set.

Hyperparameter	Value
Batch size	5
Dropout	0.3
Learning rate (G)	0.001
Learning rate (D)	0.0001
Optimizer	AdamW
$λ$	100

The main objective of applying the RaGAN model was to create a new dataset $D_{S}^{'}$ of reconstructed and denoised signals, derived from the unshielded dataset $D_{U}$ . The effectiveness of RaGAN in achieving this goal is demonstrated through both qualitative and quantitative analyses using a multi-class SVM classifier. In Figure 5, the example shown in Figure 2 is continued, where the amplitudes extracted from the CSI for the copper cube were displayed in the unshielded real-world environment (top) and the shielded environment (bottom), with the latter representing the ground truth spectrum obtained in a noise-free setting. Specifically, Figure 5 presents the reconstructed amplitude for the same copper cube acquired in the unshielded environment illustrated in Figure 2 (top). The results clearly demonstrate that the spectra reconstructed by RaGAN closely match the ground truth, accurately reflecting the key features and attenuation patterns linked to the shielded environment. This emphasizes the capability of the model to produce a denoised dataset, $D_{S}^{'}$ , that effectively represents the shielded spectrum.

Figure 5.

Amplitudes extracted from the CSI: the plot shows the reconstructed spectrum of the copper cube derived from data acquired in a real-world environment.

To quantitatively evaluate the synthesized spectra generated by the RaGAN model, a multi-class SVM classifier was employed. The classification task aimed to distinguish between spectra corresponding to different materials, each one characterized by unique spectral features influenced by its physical and electromagnetic properties. The classifier was trained using both the shielded spectra, serving as the ground truth, and the unshielded spectra as input data. This evaluation approach simultaneously assesses the similarity between the generated spectra and the shielded spectra while also evaluating the capacity of the model to retain the distinct features that differentiate materials. The multi-class SVM was initially trained on the shielded spectra to learn material-specific features, e.g., attenuation patterns and frequency responses, that are unique to each material. Note that these features allow the classifier to differentiate between spectra corresponding to different materials. The trained classifier was subsequently tested on the spectra reconstructed by the RaGAN generator, achieving an impressive accuracy score of 96%. This result demonstrates that the proposed method effectively removes noise from the unshielded input while preserving the distinguishing characteristics that make each material identifiable. The confusion matrix, presented in Figure 6, further illustrates the classification performance, highlighting high accuracy in differentiating the reconstructed spectra. Most classifications align correctly along the main diagonal, thus confirming the ability of the RaGAN-generated spectra to effectively retain the distinctive features of the specific materials. In addition to the classification-based evaluation, a Mean Squared Error (MSE)⁸⁰ analysis was conducted to quantify the pointwise similarity between the signals reconstructed by the RaGAN model and the corresponding ground truth spectra acquired in the shielded environment. The reported MSE value of approximately 0.19 (normalized over a 0-to-1 scale), computed as an average across all ground truth and reconstructed signal pairs obtained during experimentation, was calculated after normalizing the signal amplitudes on a scale from zero to one. This normalization ensures a consistent evaluation metric across all samples and facilitates fair comparisons between different signals. While MSE offers a useful numerical indication of the overall reconstruction quality, it does not necessarily capture the preservation of discriminative spectral features that are essential for classification tasks. In Wi-Fi sensing scenarios, such as the material identification task proposed in this study, the primary objective is not only to reduce noise but also to preserve semantic information that enables class separability. In this regard, the classification accuracy achieved by the SVM, 96% on the reconstructed spectra, demonstrates that the denoised signals retain the critical features required to distinguish among different materials. Taken together, the MSE evaluation, classification accuracy, and visual inspection of the signal profiles provide a multi-faceted and robust assessment of the performance of the model. This integrated evaluation framework confirms the ability of the proposed method to effectively suppress environment-specific interference while preserving the information necessary for downstream classification.

Figure 6.

SVM classifier: confusion matrix.

4.3. Ablation study and analysis

The final architecture of our model was carefully selected through a thorough process of empirical testing, which involved exploring a range of different combinations of layers and generative techniques. We began by evaluating various configurations of Bi-LSTM layers, exploring different possibilities, including setups with and without linear layers for data processing. This systematic approach allowed us to gather valuable performance data, which is summarized in Table 4. Throughout our testing, we found that the generative Bi-LSTM models exhibited remarkably similar performance, whether they employed only LSTM layers or utilized a combination of a single LSTM layer along with linear layers. Given the significant increase in the number of parameters required when using only LSTM layers, we made the informed decision to adopt the lighter model. This model not only proved to be less complex but also demonstrated performance metrics that were comparable to those of the more parameter-heavy configurations. In addition to our analysis of the Bi-LSTM models, we also explored variations alongside the RaGAN method, examining the performance of the CGAN approach through a similar analytical framework. The results revealed that the Bi-LSTM_CGAN achieved levels of performance that were comparable to those of the Bi-LSTM_RaGAN. This finding demonstrated that linear layers can perform effectively, similarly to LSTM layers, while potentially simplifying the architecture. To provide a comprehensive understanding of our experimental framework, we also presented the baseline models utilized in our analysis. These included a Denoising Autoencoder (DAE) and a SVM, both of which were applied to the preprocessed noisy data collected during the testing phase. The DAE represents a straightforward implementation, consisting of four hidden layers in both the encoder and decoder components, allowing for effective denoising of the input data. On the other hand, the SVM used in our experiments was consistent with the one leveraged for classifying the generated signals produced by the GAN models. This SVM was based on a Gaussian radial basis function, with a specified parameter of $γ = 1 / 2 σ^{2}$ , enabling robust classification performance within the context of our analyses. By assessing these various models, we were able to gain deeper insights into their effectiveness and suitability for the tasks at hand.

Table 4.
Ablation study.

Model Accuracy

Bi-LSTM_RaGAN / w linear 0.96

Bi-LSTM_RaGAN / wo linear 0.96

Bi-LSTM_CGAN / w linear 0.92

Bi-LSTM_CGAN / wo linear 0.92

SVM 0.72

DAE 0.88

Model	Accuracy
Bi-LSTM_RaGAN / w linear	0.96
Bi-LSTM_RaGAN / wo linear	0.96
Bi-LSTM_CGAN / w linear	0.92
Bi-LSTM_CGAN / wo linear	0.92
SVM	0.72
DAE	0.88

5. Conclusion

To the best of our knowledge, this paper introduces, for the first time in the literature, a RaGAN-based architecture with customized Bi-LSTM-based generator and discriminator networks, specifically designed to denoise Wi-Fi signals by digitally replicating the effects of a Faraday cage. The effectiveness of this foundational study on cross-domain adaptation is demonstrated through a challenging material classification task. Wi-Fi signals of four objects made of distinct materials (i.e., acrylic, aluminum, copper, and pine) were acquired in real-world scenarios and subsequently purged of environment-specific information using the proposed architecture. A multi-class classifier, trained on interference-free counterparts acquired within the shielded box, was employed to classify the denoised signals, achieving an remarkable accuracy of 96%. This result underscores the ability of the domain adaptation and denoising mechanism to effectively restore signal fidelity. Furthermore, the evaluation highlights the added value of the proposed approach, as the high classification accuracy demonstrates its potential for advanced security applications. By reliably distinguishing materials based on the denoised spectra, the method provides an innovative solution for identifying the nature and composition of objects, including those that may be carried by individuals in sensitive or high-security environments. In future work, a more comprehensive dataset will be collected, including a broader variety of materials. The proposed model will also be evaluated in more complex real-world environments to assess its generalizability. Finally, while this study adopted a multi-class SVM to validate the effectiveness of the signal purification process, future efforts will explore advanced classification frameworks specifically tailored to material recognition tasks. These may include state-of-the-art deep learning models, such as Neural Dynamic Classification (NDC) algorithms,⁸¹ Dynamic Ensemble Learning (DEL) techniques,⁸² Finite Element Machines (FEM) for fast learning,⁸³ and self-supervised learning approaches.^84,85

Footnotes

Acknowledgement

This work was supported by “EYE-FI.AI: going bEYond computEr vision paradigm using wi-FI signals in AI systems” project of the Italian Ministry of Universities and Research (MUR) within the PRIN 2022 Program (Grant Number: 2022AL45R2) (CUP: B53D23012950001) and MICS (Made in Italy – Circular and Sustainable) Extended Partnership and received funding from Next-Generation EU (Italian PNRR – M4 C2, Invest 1.3 – D.D. 1551.11-10-2022, PE00000004). CUP MICS B53C22004130001.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Kerr

Ostrovsky

. From space to species: ecological applications for remote sensing. Trends Ecol Evol (Amst.) 2003; 18: 299–305.

Zhang

Zhao

, et al. Progress and challenges in intelligent remote sensing satellite systems. IEEE J Selec Top Appl Earth Observ Remote Sens 2022; 15: 1814–1822.

Chen

Zhou

Voogt

, et al. Remote sensing of diverse urban environments: From the single city to multiple cities. Remote Sens Environ 2024; 305: 114108.

Tsutsui

Nakatani

Kamitani

, et al. Polarization and propagation property of electromagnetic pulses in the earth. In Proceedings of the IEEE international geoscience and remote sensing symposium (IGARSS). pp.838–841.

Fan

Zheng

Yan

, et al. Unsupervised person re-identification: Clustering and fine-tuning. ACM Trans Multim Comput, Commun, Appl 2018; 14: 1–18.

Wang

Liu

Raychaudhuri

, et al. Learning person re-identification models from videos with weak supervision. IEEE Trans Image Process 2021; 30: 3017–3028.

Liu

Zhu

, et al. Instance-level adversarial source-free domain adaptive person re-identification. ACM Transa Multim Comput, Commun, Appl 2024; 20: 1–18.

Quesada

León

. Unsupervised markerless 3-dof motion tracking in real time using a single low-budget camera. Int J Neural Syst 2012; 22: 1250019.

Marshall

Srikanth

. Curved trajectory prediction using a self-organizing neural network. Int J Neural Syst 2000; 10: 59–70.

10.

Wang

Shi

Cheng

, et al. Visual object tracking based on light-field imaging in the presence of similar distractors. IEEE Trans Ind Inform 2023; 19: 2705–2716.

11.

You

Yao

Bao

, et al. Multi-object tracking with spatial-temporal tracklet association. ACM Trans Multim Comput, Commun, Appl 2024; 20: 1–21.

12.

Urdiales

Martín

Armingol

. An improved deep learning architecture for multi-object tracking systems. Integr Comput Aided Eng 2023; 30: 121–134.

13.

Subramanian

Suresh

. Human action recognition using meta-cognitive neuro-fuzzy inference system. Int J Neural Syst 2012; 22: 1250028.

14.

Avola

Cascio

Cinque

, et al. 2-d skeleton-based action recognition via two-branch stacked lstm-rnns. IEEE Trans Multim 2020; 22: 2481–2496.

15.

Koutrintzes

Spyrou

Mathe

, et al. A multimodal fusion approach for human activity recognition. Int J Neural Syst 2023; 33: 2350002.

16.

Feng

, et al. Multi-view time-series hypergraph neural network for action recognition. IEEE Trans Image Process 2024; 33: 3301–3313.

17.

López-rubio

Luque-Baena

Domínguez

. Foreground detection in video sequences with probabilistic self-organizing maps. Int J Neural Syst 2011; 21: 225–246.

18.

Zhong

Zhang

, et al. An adaptive background modeling method for foreground segmentation. IEEE Trans Intell Transp Syst 2017; 18: 1109–1121.

19.

Romero

Lado

Mèndez

. A background modeling and foreground detection algorithm using scaling coefficients defined with a color model called lightness-red-green-blue. IEEE Trans Image Process 2018; 27: 1243–1258.

20.

Avola

Bernardi

Cinque

, et al. Fusing self-organized neural network and keypoint clustering for localized real-time background subtraction. Int J Neural Syst 2020; 30: 2050016.

21.

Goodman

, et al. Integrating a statistical background-foreground extraction algorithm and svm classifier for pedestrian detection and tracking. Integr Comput Aided Eng 2013; 20: 201–216.

22.

Zeng

Wang

, et al. Illumination-adaptive person re-identification. IEEE Trans Multim 2020; 22: 3064–3074.

23.

Zhang

Luo

Chen

, et al. Illumination unification for person re-identification. IEEE Trans Circuits Syst Video Technol 2022; 32: 6766–6777.

24.

Eom

Lee

, et al. Disentangled representations for short-term and long-term person re-identification. IEEE Trans Pattern Anal Mach Intell 2022; 44: 8975–8991.

25.

Yang

Liu

, et al. Diverse feature learning network with attention suppression and part level background suppression for person re-identification. IEEE Trans Circuits Syst Video Technol 2023; 33: 283–297.

26.

Idrees

Soomro

Shah

. Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning. IEEE Trans Pattern Anal Mach Intell 2015; 37: 1986–1998.

27.

Vernikos

Spyrou

Kostis

, et al. A deep regression approach for human activity recognition under partial occlusion. Int J Neural Syst 2023; 33: 2350047.

28.

Shan

Zhang

Yang

, et al. Adaptive slice representation for human action classification. IEEE Trans Circuits Syst Video Technol 2015; 25: 1624–1636.

29.

Avola

Cascio

Cinque

, et al. Affective action and interaction recognition by multi-view representation learning from handcrafted low-level skeleton features. Int J Neural Syst 2022; 32: 2250040.

30.

Zhou

Wang

. Wifi sensing with channel state information: A survey. ACM Comput Surv 2019; 52: 1–36.

31.

Wang

Jiang

Hou

, et al. A survey on human behavior recognition using channel state information. IEEE Access 2019; 7: 155986.

32.

Cao

Liu

. Deep ai enabled ubiquitous wireless sensing: A survey. ACM Comput Surv 2021; 54: 1–35.

33.

Abuhoureyah

Wong

Mohd Isira

ASB

. Wifi-based human activity recognition through wall using deep learning. Eng Appl Artif Intell 2024; 127: 107171.

34.

Avola

Cascio

Cinque

, et al. Person re-identification through wi-fi extracted radio biometric signatures. IEEE Trans Inform Foren Secur 2022; 17: 1145–1158.

35.

Wang

Zhou

Panev

, et al. Person-in-wifi: Fine-grained person perception using wifi. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV). pp.5451–5460.

36.

Chen

Song

, et al. Wi-fi sensing based on ieee 802.11bf. IEEE Commun Magaz 2023; 61: 121–127.

37.

Sutivong

Chiang

Cover

, et al. Channel capacity and state estimation for state-dependent gaussian channels. IEEE Trans Inform Theory 2005; 51: 1486–1495.

38.

Habib

Rupp

. Antenna selection in polarized multiple input multiple output transmissions with mutual coupling. Integr Comput Aided Eng 2012; 19: 299–312.

39.

Chen

Zhou

Lin

. Cross-domain wiFi sensing with channel state information: A survey. ACM Comput Surv 2023; 55: 1–37.

40.

Wang

. Wifall: Device-free fall detection by wireless networks. IEEE Trans Mobile Comput 2017; 16: 581–594.

41.

Wang

Liu

Shahzad

, et al. Device-free human activity recognition using commercial wifi devices. IEEE J Selected Areas Commun 2017; 35: 1118–1131.

42.

Ali

Liu

Wang

, et al. Keystroke recognition using wifi signals. In Proceedings of the annual international conference on mobile computing and networking (MobiCom). pp.90–102.

43.

Liu

Wang

Chen

, et al. Tracking vital signs during sleep leveraging off-the-shelf wifi. In Proceedings of the ACM international symposium on mobile AD Hoc Networking and Computing (MobiHoc). pp.267–276.

44.

Guo

Zhang

Lim

, et al. A review of wavelet analysis and its applications: Challenges and opportunities. IEEE Access 2022; 10: 58869–58903.

45.

Wei

Feng

, et al. Comparative research on noise reduction of transient electromagnetic signals based on empirical mode decomposition and variational mode decomposition. Radio Sci 2021; 56: 1–19.

46.

Fang

Chang

Tsao

, et al. Channel state reconstruction using multilevel discrete wavelet transform for improved fingerprinting-based indoor localization. IEEE Sens J 2016; 16: 7784–7791.

47.

Amezquita-Sanchez

Adeli

. Synchrosqueezed wavelet transform-fractality model for locating, detecting, and quantifying damage in smart highrise building structures. Smart Mater Struct 2015; 24: 065034.

48.

Park

Adeli

. New method for modal identification of super high-rise building structures using discretized synchrosqueezed wavelet and hilbert transforms. Struct Des Tall Special Build 2017; 26: e1312.

49.

Perez-Ramirez

Amezquita-Sanchez

Adeli

, et al. New methodology for modal parameters identification of smart civil structures using ambient vibrations and synchrosqueezed wavelet transform. Eng Appl Artif Intell 2016; 48: 1–12.

50.

Wang

Liu

, et al. Qgesture: Quantifying gesture distance and direction with wifi signals. Proce ACM Interact, Mobile, Weara Ubiq Technol 2018; 2: 1–23.

51.

Zhu

Mishra

, et al. Calibrating time-variant, device-specific phase noise for cots wifi devices. In Proceedings of the ACM conference on embedded network sensor systems (SenSys). pp. 1–12.

52.

Zhang

Zheng

. Wisign: Ubiquitous american sign language recognition using commercial wi-fi devices. ACM Trans Intell Syst Technol 2020; 11: 1–24.

53.

Qian

Zhang

, et al. Widar2.0: Passive human tracking with a single wi-fi link. In Proceedings of the annual international conference on mobile systems, applications, and services (MobiSys). pp.350–361.

54.

Shi

Zhang

, et al. Towards environment-independent human activity recognition using deep learning and enhanced csi. In Proceedings of the IEEE global communications conference (GLOBECOM). pp.1–6.

55.

Wang

Yang

Mao

. Resilient respiration rate monitoring with realtime bimodal csi data. IEEE Sens J 2020; 20: 10187–10198.

56.

Shi

Cheng

Zhang

, et al. Environment-robust wifi-based human activity recognition using enhanced csi and deep learning. IEEE Int Things J 2022; 9: 24643–24654.

57.

Koziarski

Cyganek

. Image recognition with deep neural networks in presence of noise – dealing with and taking advantage of distortions. Integr Comput Aided Eng 2017; 24: 337–349.

58.

Alonso

Morán

Pérez

, et al. Gap imputation in related multivariate time series through recurrent neural network-based denoising autoencoder. Integr Comput Aided Eng 2024; 31: 157–172.

59.

Chen

Ren

, et al. Temdnet: A novel deep denoising network for transient electromagnetic signal with signal-to-image transformation. IEEE Trans Geosci Remote Sens 2022; 60: 1–18.

60.

Zhu

Mousavi

Beroza

. Seismic signal denoising and decomposition using deep neural networks. IEEE Trans Geosci Remote Sens 2019; 57: 9476–9488.

61.

Almazrouei

Gianini

Almoosa

, et al. A deep learning approach to radio signal denoising. In Proceedings of the IEEE Wireless Communications and Networking Conference Workshop (WCNCW). pp.1–8.

62.

Wang

Lin

Chen

, et al. Tem-nlnet: A deep denoising network for transient electromagnetic signal with noise learning. IEEE Trans Geosci Remote Sens 2022; 60: 1–14.

63.

Yang

Yuan

. Extraction and denoising of human signature on radio frequency spectrums. In Proceedings of the IEEE international conference on consumer electronics (ICCE). pp. 1–6.

64.

Jolicoeur-Martineau

. The relativistic discriminator: a key element missing from standard GAN. In Proceedings of the international conference on learning representations (ICLR). pp.1–25.

65.

Peng

Fang

Fan

, et al. A method of noise reduction for radio communication signal based on ragan. Sensors 2023; 23: 1–16.

66.

Molisch AF. Orthogonal Frequency Division Multiplexing (OFDM). 2011. pp.417–443.

67.

Zhu

Zhuo

Liu

, et al.

π

-splicer: Perceiving accurate csi phases with commodity wifi devices. IEEE Trans Mobile Comput 2018; 17: 2155–2165.

68.

Xiao

, et al. Csi-based indoor localization. IEEE Trans Parallel Distrib Syst 2013; 24: 1300–1309.

69.

Yin

Liu

, et al. A gan guided parallel cnn and transformer network for eeg denoising. IEEE J Biomed Health Inform 2023; 1: 1–12.

70.

Wang

Mahler

Steiglechner

, et al. Disgan: Wavelet-informed discriminator guides gan to mri super-resolution with noise cleaning. In Proceedings of the IEEE/CVF international conference on computer vision workshops (ICCVW). pp.2444–2453.

71.

Morsalin

SMS

Wang

, et al. Integration of GAN environmental sound filtering and the CNN sound recognition system. In Proceedings of the interdisciplinary conference on electrics and computer (INTCEC). pp.1–6.

72.

Saxena

Cao

. Generative adversarial networks (gans): Challenges, solutions, and future directions. ACM Comput Surv 2021; 54: 1–42.

73.

Cai

Xiong

, et al. Generative adversarial networks: A survey toward private and secure applications. ACM Comput Surv 2021; 54: 1–38.

74.

Wang

She

Ward

. Generative adversarial networks in computer vision: A survey and taxonomy. ACM Comput Surv 2021; 54: 1–38.

75.

Ledig

Theis

Huszàr

, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE international conference on computer vision and pattern recognition (CVPR). pp.105–114.

76.

Hernandez

Bulut

. Wifi sensing on the edge: Signal processing techniques and challenges for real-world systems. IEEE Commun Surv Tutor 2023; 25: 46–76.

77.

Srivastava

Hinton

Krizhevsky

, et al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014; 15: 1929–1958.

78.

Wan

Zeiler

Zhang

, et al. Regularization of neural networks using dropconnect. In Proceedings of the international conference on international conference on machine learning (ICML). pp.III–1058–III–1066.

79.

Bergstra

Bengio

. Random search for hyper-parameter optimization. J Mach Learn Res 2012; 13: 281–305.

80.

Rainio

Teuho

Klén

. Evaluation metrics and statistical tests for machine learning. Sci Rep 2024; 14: 1–14.

81.

Rafiei

Adeli

. A new neural dynamic classification algorithm. IEEE Trans Neural Netw Learn Syst 2017; 28: 3074–3083.

82.

Alam

KMR

Siddique

Adeli

. A dynamic ensemble learning algorithm for neural networks. Neural Comput Appl 2020; 32: 8675–8690.

83.

Pereira

Piteri

Souza

, et al. Fema: a finite element machine for fast learning. Neural Comput Appl 2020; 32: 6393–6404.

84.

Rafiei

Gauthier

Adeli

, et al. Self-supervised learning for electroencephalography. IEEE Trans Neural Netw Learn Syst 2024; 35: 1457–1471.

85.

Rafiei

Gauthier

Adeli

, et al. Self-supervised learning for near-wild cognitive workload estimation. J Med Syst 2024; 48: 6393–6404.

Digital shielding for cross-domain Wi-Fi signal adaptation using Relativistic average Generative Adversarial Network

Abstract

Keywords

1. Introduction

2. Related work

3. Proposed method

3.1. Digital shielding in Wi-Fi signals

3.2. Bi-LSTM-based RaGAN

4. Experimental setup and results

4.1. Shielded box and data collection

Table 2. Grid search hyperparameter ranges. Hyperparameter Range Batch size {5, 10, 15, 30} Dropout {0.2, 0.3, 0.4} Learning rate (G) {0.0001, 0.001, 0.01, 0.1} Learning rate (D) {0.0001, 0.001, 0.01, 0.1} Optimizer {AdamW} λ {10, 50, 100, 500}

Table 4. Ablation study. Model Accuracy Bi-LSTM_RaGAN / w linear 0.96 Bi-LSTM_RaGAN / wo linear 0.96 Bi-LSTM_CGAN / w linear 0.92 Bi-LSTM_CGAN / wo linear 0.92 SVM 0.72 DAE 0.88

Footnotes

Acknowledgement

Funding

Declaration of conflicting interests

References

Table 2.
Grid search hyperparameter ranges.

Hyperparameter Range

Batch size {5, 10, 15, 30}

Dropout {0.2, 0.3, 0.4}

Learning rate (G) {0.0001, 0.001, 0.01, 0.1}

Learning rate (D) {0.0001, 0.001, 0.01, 0.1}

Optimizer {AdamW}

$λ$ {10, 50, 100, 500}

Table 4.
Ablation study.

Model Accuracy

Bi-LSTM_RaGAN / w linear 0.96

Bi-LSTM_RaGAN / wo linear 0.96

Bi-LSTM_CGAN / w linear 0.92

Bi-LSTM_CGAN / wo linear 0.92

SVM 0.72

DAE 0.88