Abstract
Traditionally, the mission of intercepting malicious traffic between the Internet and the internal network of entities like organizations and corporations, is largely fulfilled by techniques such as deep packet inspection (DPI). However, steganography, the methodology of hiding secret data in seemingly benign public mediums (e.g., images), has been leveraged by advanced persistent threat (APT) groups in recent years, and is almost impossible to be detected and intercepted by traditional techniques, posing a pervasive and realistic threat to cybersecurity. Additionally, internal networks’ vulnerability to steganography is further exacerbated by the connectivity and large attack surface of the Internet of Things (IoT), whose adoption and deployment are quickly expanding. To protect computer systems against malicious communications that apply steganographic methods potentially unknown to cybersecurity stakeholders, we propose StegEraser, an approach to removing the secret information embedded in public mediums by adversaries, that is fundamentally distinct from existing research which is primarily designed for known steganographic methods. Implemented for images, StegEraser injects an excessively huge amount of random binary data with a novel steganographic method into the images, by utilizing the information-merging capabilities of invertible neural networks (INNs), in order to “overload” adversaries’ steganographic hiding capacity of images transmitted through the firewall performing DPI. In the meantime, StegEraser preserves the perceptual quality of the images. In other words, StegEraser “defeats unknown steganography with steganography”. Extensive evaluation verifies that StegEraser significantly outperforms state-of-the-art (SOTA) methods in terms of removing secret information embedded with both traditional and neural network-based steganographic methods, while visually maintaining the image quality.
Keywords
Introduction
Internet connectivity has now become commonplace for electronic devices that are rapidly growing in numbers, such as smartphones, corporate computers, the Internet of Things (IoT), and the Industrial IoT (IIoT). For instance, with the technological advances in the industrial sector, including the IIoT and Industry 4.0, industrial systems, such as critical infrastructure, cyber-physical systems (CPSs), and factories, are no longer isolated from the Internet as they used to be [40]. Despite the benefits brought forward by such connectivity, including a higher level of automation and lower cost, cyberattacks and major data leaks have become increasingly prevalent in recent years, jeopardizing the security of computer systems. As an example, according to IBM, it is estimated that the average cost of data breach has reached $4.35 million in 2022 [26]. Apart from the economic damages and privacy issues, the consequences of cyber threats also include potential Internet and power outages, or even severe damage to critical infrastructure [47].
Traditionally, the responsibility of protecting the cyber perimeter of entities like corporations, government agencies, organizations, and industrial systems, largely falls on firewalls with deep packet inspection (DPI) capabilities [10,39]. However, as an emerging threat, steganography, the methodology of hiding secret information in seemingly benign public mediums (e.g., images and audio that can be analyzed by DPI), has the capability to enable APT groups and malicious insiders to transmit malware or sensitive information like intellectual properties from and to computers on the internal network. With neural networks, researchers have proposed a variety of methods to hide secret images or even arbitrary binary data in “cover” images or audio [22,48]. For instance, a popular framework based on generative adversarial networks (GANs) can effectively hide more than 4 bits of arbitrary binary information in a single pixel on average [52]. In other words, an RGB image of 1920 × 1080 pixels has a hiding capacity of 1012 KB, which is large considering that many malware payloads are tailored to small sizes. When combined with cryptographic techniques like AES or RSA, binary secret data hidden with steganography pose a greater threat and are even more difficult to defend against [24].
The threat of steganography to cybersecurity is not just a theory, but already a reality. In recent years, the United States Cybersecurity and Infrastructure Security Agency (CISA) has issued several alerts about advanced persistent threat (APT) groups that apply steganography to obscure the Command & Control (C&C) communications and target critical infrastructure, as well as private sector organizations [46]. In addition, the MITRE Corporation, maintainer of the CVE (Common Vulnerabilities and Exposures) program, has identified multiple APT groups that leverage steganography in cyber attacks [44], with examples including hiding malicious Portable Executables (PEs) and shellcode within PNG and JPEG files. As noted by MITRE, except for certain steganographic (or
To defend against malicious communications based on steganography, researchers have proposed various methods for steganalysis, which aim to detect the presence of hidden messages [28]. However, steganalysis tools can not sufficiently prevent covert communications, since different stego methods may be applied by adversaries that were not considered at the development stage of the tools [58]. Moreover, apart from the potential false positive and false negative results of steganalysis tools, positive results would also require proper further processing, such as the removal of the embedded secret information. Therefore, instead of detecting the existence of hidden messages, researchers have recognized the necessity to directly modify all transmitted public messages (e.g., images) and remove potential stego information regardless of whether the transmitted messages are benign or malicious [28,58]. This paradigm is sometimes referred to as the “active warden” in the literature [42]. Hence, the covert communication channel provided by steganography is disrupted without the necessity of successfully classifying the messages as benign or malicious.
For the blind removal of potential secret information embedded in public mediums, such as images, video, and audio, the following primary
A large variety of stego techniques with distinct characteristics need to be defended against, and many techniques might even be unknown to the active warden.
Such removal should be as transparent as possible to ordinary users, i.e., without severely impairing the perceptual quality of the public medium.
Faced with the above practical challenges, most existing approaches of active wardens are only designed specifically for certain known stego methods [4,5]. This implicit assumption may, to a great extent, limit the practical application of these methods, since new stego techniques are constantly being developed. More recent measures of active warden, such as PixelSteganalysis [28], do not require knowledge of the exact stego method applied by adversaries, but nevertheless make some other restrictive assumptions, as discussed in Section 2.
To address these challenges while considering the limitations of existing research, we propose StegEraser, an approach to disrupting malicious covert communications applying steganography, that is fundamentally distinct from existing research, as shown in Fig. 1. In the figure, APT groups can bypass traditional DPI techniques, and persistently transmit large payloads embedded in seemingly benign medium without being noticed by the firewall. The payloads are then extracted by an internal device compromised via previous attacks or a malicious insider, followed by lateral movement (i.e., moving across the network from the compromised devices for greater access to the internal resources). In the reverse direction, insiders or victim devices collect sensitive information, embed the information into public medium with stego methods, and then send it to the Internet (external network).

An overview of the threat of steganography to industrial systems, and StegEraser, the proposed scheme for disrupting steganography-based malicious covert communication.
With a realistic assumption that the firewall with DPI capabilities has access to the relevant cryptographic keys and decrypts the network traffic for inspection, StegEraser serves as an enhancement to traditional firewalls. Unauthorized encrypted traffic could be blocked by the firewall by default. In StegEraser, we propose to “overload” the stego hiding capacity of public medium, and “overwrite” potential malicious secret information with random binary data. Specifically, by leveraging the transformation capabilities of invertible neural networks (INNs) [11], along with a metric of perceptual image quality [53], StegEraser injects an excessively huge amount of random binary data into the public medium, and consequently inhibits adversaries’ ability to correctly decode hidden information. Thus, the covert communication channel enabled by steganography is disrupted, such that the cost and effort required by adversaries to transmit data across the firewall is significantly increased.
As a demonstration, we implement StegEraser for images due to the abundance of public datasets and relevant research on steganography, while our idea can also be extended to other forms of data, such as video and audio.
The main contributions of this paper are summarized as follows,
A novel methodology is proposed to disrupt malicious covert communications based on steganography, which is fundamentally distinct from existing approaches.
We propose to adopt the perceptual quality metric instead of the commonly used pixel-level difference to preserve the visual quality of processed images.
As a byproduct, StegEraser can also be repurposed as a stego method for hiding arbitrary binary data in images, with a capacity higher than SOTA models.
Extensive evaluation indicates that StegEraser outperforms existing SOTA methods in terms of removing secret information while preserving the quality of images.
The remainder of this paper is organized as follows. In Section 2, we briefly introduce the related work and research gap. The threat model and details of StegEraser are presented in Section 3. Then, the evaluation of StegEraser is shown in Section 4. Finally, the limitations and potential countermeasures against StegEraser are discussed in Section 5, and Section 6 concludes the paper.
In this section, clarification regarding several relevant concepts is presented, and the related work is discussed, as well as the major differences between them and this paper.
Steganography and blind watermarking
As the most widely studied form of steganography, the objective of image steganography is to hide secret information (e.g., images, encrypted messages) in “cover” images, such that the generated image (“stego image”) is as indistinguishable as possible from the cover image. Methods of image steganography are generally categorized as either conventional or “deep steganography”, i.e., based on neural networks (NNs). Examples of conventional methods include modifying the least significant bits (LSBs) of pixels, as well as hiding secret information in the wavelet domain [29]. With regard to deep steganography, Zhang et al. develop UDH [51], a general NN-based framework that disentangles the encoding of secret images and cover images, and is robust to pixel intensity shifts. In [7], the author applies several neural networks to hide a full-size color image in a cover image.
Conceptually, image steganography is quite similar to blind watermarking [2], i.e., hiding invisible watermarks (in the forms of bits or images) within an image. While the term steganography focuses on covert communication, blind watermarking is generally oriented towards the identification of ownership, and typically has a smaller hiding capacity.
Apart from images, steganography can also be conducted in many other forms. For instance, secret information can be embedded into text [37], video [34], or even the activity patterns network traffic [18]. As a demonstration, this paper focuses on image steganography for simplicity, but the idea can be extended to other forms of steganography.
Steganalysis
To detect the presence of secret information embedded in images, researchers have proposed a variety of methods. These methods are generally based on complicated statistics of pixel values, or analysis in the transform domains (e.g., discrete cosine transform and discrete wavelet transform) [30].
For instance, Fillatre et al. [19] propose an approach to detecting secret information embedded in LSBs of images by an adaptive statistical test. In [9], the authors apply deep residual networks to detect the existence of both spatial-domain and JPEG steganography in images.
However, as mentioned in §1, it is unrealistic to solely rely on steganalysis to protect the cyberspace, due to factors including unknown advanced stego methods, and potential false positive as well as false negative classification results.
Active wardens
The majority of existing steganographic destruction techniques (i.e., active wardens) are based on conventional methods like wavelet transforms, and often target a specific known stego method [4,5]. For instance, the LSBs of pixels can be overwritten with Gaussian noise to prevent LSB-based steganography [20], and image filters can be applied to remove the secret information embedded with certain stego methods [43]. Despite the simplicity of implementation, these conventional methods typically lead to significant degradation in image quality [58].
By contrast, PixelSteganalysis [28] utilizes an architecture based on convolutional neural networks to establish pixel and edge distribution for each image, and then removes the hidden secret information at the suspicious pixels, while maximally preserving the quality of the processed images. It does not require knowledge of the exact stego method used by adversaries. However, PixelSteganalysis makes the assumption that the probability distribution of pixel values for the cover images used by adversaries is similar to the distribution of the dataset available to the active warden. This may limit its real-world application.
In [58], Zhu et al. propose to first use a simple neural network to simulate a Gaussian filter, which is an effective yet non-differentiable measure of removing hidden information, and then apply another neural network to compensate for the loss of image quality caused by the filter. With GANs, Corley et al. [12] propose the Deep Digital Steganography Purifier (DDSP), which removes the stego content from images. In DDSP, the authors also assume that the stego methods are already known, such that an autoencoder (generator) can be trained to purify the stego images, while the discriminator distinguishes between the clean cover image and the purified image.
Without making restrictive assumptions regarding the distribution of cover images, StegEraser differs from the aforementioned approaches in the sense that, instead of attempting to identify suspicious pixels or to simulate a traditional filter, it adopts a fundamentally distinct method of “defeating steganography with steganography”, i.e., overloading the stego hiding capacity of images with noise and invertible neural networks. Using a SOTA metric of perceptual image quality, StegEraser is able to preserve the visual quality of the images being processed.
Invertible neural networks
With carefully designed network architectures, invertible neural networks (INNs) are mathematically guaranteed to be invertible [27,31], i.e., mapping their input into their output in the forward process, and mapping the output back into the input in the inverse process. Typical invertible operations include coupling and specially designed convolutions [31].
INNs inherently excel at merging information together without loss [11]. In most cases, they are used for normalizing flows [35], which are designed to map a complex distribution (e.g., natural images) into a tractable distribution (e.g., Gaussian), with applications including density estimation and image generation.
However, as it is unnecessary to calculate the exact probability (likelihood) of images in our case, INNs are directly used without involving normalizing flows.
Details of StegEraser
Threat model and assumptions
As depicted in Fig. 1, the threat model and major assumptions made by us are described as follows,
As discussed in Section 2, these assumptions are less restrictive than those of existing research (e.g., [28,58]).
An overview of StegEraser

The StegEraser model, with modifications to IICNet [11]. During training, the forward direction generates stego images, and the inverse direction recovers the embedded binary data. After training, the model is used for erasing potential hidden information, and only the forward direction is utilized, which produces the purified image.
As a proof of concept, we modify the architecture of the IICNet [11] (a generic framework for the lossy embedding of several color images within a single image using INNs) and develop the proposed StegEraser model shown in Fig. 2.
During training, the overall objective is to acquire a steganographic model which has a substantial hiding capacity of arbitrary binary data, while maintaining low visual degradation to the image quality. Resultantly, after training, the model can be utilized to “overwrite” potential secret information hidden in images that need to be purified.
The design of StegEraser is primarily based on the following observations. First, in most cases, whether in the spatial domain or transform domain, image steganography inevitably result in changes to the pixel values of images, while attempting to preserve the appearance of the original image (with exceptions discussed in Section 5). Second, humans’ perception of images greatly differs from that of machines. One intuitive example is DeepFool [38], in which a small and hardly noticeable perturbation to the image leads to wrong NN classification results. Based on the above observations, StegEraser pre-emptively encodes random binary data in images, with the goal of only preserving the perceptual quality of images instead of pixel-level similarity. In other words, it is conceptually more reasonable to use perceptual metrics of image quality like DISTS [14], instead of pixel-level metrics like mean squared error and PSNR.
Regarding our choice of encoding random binary data using StegEraser, one may naturally ask the question that why not encode random natural images instead, since most stego methods, especially those based on NNs, are designed to embed secret color images within cover images (e.g., [51]). The reasons are twofold. First, from the perspective of implementation, it is easier to design the neural network if the cover image and the secret data to be embedded have the same spatial sizes. Then, as secret images would typically have three color channels, it can be complicated to acquire a stego hiding capacity (measure in bpp, i.e., bits per pixel) other than multiples of three. On the contrary, with binary secret data, it is much straightforward to precisely control the desired bpp, making it more favorable. Second, natural images exhibit strong spatial correlations, which is often taken advantage of by stego neural networks [52]. In other words, natural images have a less amount of entropy than random binary data. Hence, using random binary data can be considered to be less restrictive than using natural images.
Denote by
To avoid confusion with the backward propagation, we describe the two directions of information flows in Fig. 2 as
It is straightforward to verify that the mapping from
Specifically, given an input with
Loss functions
The aim of the StegEraser model during training is to maintain the visual quality of images, while embedding as much binary secret data as possible into the images, such that it has a higher probability of overwriting potential secret information hidden in images by adversaries after training.
Although the INN module itself is (mathematically) invertible, a certain degree of unavoidable information loss breaks the invertibility of the whole model, due to the inherent and inevitable error of computers’ representation of floating-point numbers, the non-invertible nonlinearity module, channel squeeze layer, as well as the quantization operation. As a result, it is impossible to perfectly embed and extract secret binary data, leading to the necessity of introducing a loss function to be minimized.
Therefore, the total loss to be minimized is as follows,
Complexity of training and inference
The training process of StegEraser involves both the forward and inverse directions. Therefore, it would take longer and more GPU memory as the number of INN blocks increases. The relatively slow speed of training is common for models of normalizing flows [31], which are also based on INNs.
Fortunately, model inference, i.e., execution after deployment of the model, only involves the forward direction, and is fast apart from being resource-efficient.
Details of the training and inference speed in our experiments are given in Section 4.
Evaluation
In this section, we evaluate StegEraser to answer the following questions:
How well does StegEraser perform in removing hidden secret images and binary data? (§4.2.1, §4.2.2) How does StegEraser affect the stego images and benign images? (§4.2.3, §4.2.5) Does processing images with StegEraser for multiple passes lead to better results? (§4.2.4) How much overhead does StegEraser introduce (i.e., inference speed)? (§4.2.6) What are the impacts of the perceptual quality metric and the number of invertible blocks in the INN module (ablation study)? (§4.2.8)
Evaluation setup
J-UNIWARD (JPEG UNIversal WAvelet Relative Distortion) [23], a conventional JPEG stego method utilizing a carefully designed distortion cost function of embedding secret binary information, such that the embedding effects can be estimated.
DWT+SVD, a blind watermarking scheme based on discrete wavelet transform (DWT) and singular value decomposition (SVD) [8] for hiding binary data, using open-source implementation.1
Deep Steganography (DS) [7], an NN-based method for hiding a secret color image in a cover image of the same size.
Universal Deep Hiding (UDH) [51], an NN-based general framework for hiding a color image within a cover image.
SteganoGAN (SG) [52], a popular framework2
Gaussian Filter (GF), an effective approach for removing hidden information in images [58]. The values of the parameter σ are selected from {0.5, 1.0, 1.5, 2.0}. A higher σ corresponds to a higher level of blurring. We use the implementation from scikit-image3
Gaussian Noise (GN). For images normalized to [0.0, 1.0], Gaussian noise is added, with a total of 4 levels of the parameter σ, namely {0.02, 0.03, 0.04, 0.05}. A greater σ leads to more noise.
OSN [58], the state-of-the-art method based on NNs. It simulates the secret-removal effect of a traditional filter (specifically, the Gaussian filter) with a neural network, while improving the quality of the processed image with another neural network. The details are shown below.
The final loss function of OSN is defined as,
Noted in the original paper [58], as well as confirmed in our own experiments, different values of λ have a negligible impact on the quality of processed images and the destruction rate of secret information. For a fair comparison, the OSN model is trained with almost identical settings as StegEraser, with λ chosen from {1.0, 5.0, 10.0, 20.0}.
In our experiments, J-UNIWARD has a success rate of almost zero in extracting the secret information from images processed by any of the above methods for secret info destruction, its results are therefore not further shown below.
PSNR, peak signal-to-noise ratio, measured in dB.
SSIM, structural similarity index measure.
MS-SSIM, multi-scale SSIM.
To measure the perceptual quality of images, we adopt the metric of Deep Image Structure and Texture Similarity index (DISTS) [14] based on neural networks, in addition to the LPIPS [53] metric with two variants, namely LPIPS-Alex (LPIPS-A) and LPIPS-VGG (LPIPS-V). Some more recent metrics of perceptual image quality can be found in [21], but they are not adopted in this paper due to their current insufficient popularity.
Incidentally, the DISTS metric is considered slightly better than LPIPS, but unfortunately cannot be used as a loss function for image reconstruction (e.g., training an autoencoder). Thus, DISTS is not compatible with the training of StegEraser, and is only adopted for the evaluation of perceptual quality.
To measure the efficacy of removing secret binary information, the Recovery Rate (RR) is defined as the percentage of bits successfully decoded by the stego methods with regard to the originally embedded bits. In other words, it negatively correlates with the BER(bit error rate), i.e.,
In cases where one can not recover binary data hidden in the cover image with 100% accuracy, it is necessary to consider forward error correction (FEC) to appropriately measure the effective hiding capacity of stego methods. An adversary could theoretically attempt to embed as much secret information as possible into the image, but without an FEC mechanism, the embedded secret information cannot be recovered with 100% certainty and accuracy. In the extreme case, the BER of the recovered secret is so high that it is equivalent to a random guess (noise). With the Reed Solomon (RS) FEC scheme, the RS bits per pixel (RS-bpp) defined in [52] is as follows,
Except for only a few learning rates experimented with the aim of stabilizing training while achieving a suitable convergence speed, no extensive tuning of the hyperparameters (e.g., grid search) is conducted in our experiments for the following two reasons. First, the training cost is high, especially considering that multiple models need to be trained to investigate the tradeoff curves. Second, primarily as a proof of concept, StegEraser exhibits satisfying performance even without such tuning.
RS-bpp and impacts of β
The hiding capacity of StegEraser (SE) measured in RS-bpp, when treated as a stego method
The hiding capacity of StegEraser (SE) measured in RS-bpp, when treated as a stego method
The best results are in bold face, and the second-best results are in italics.
Since StegEraser per se can be repurposed and utilized as a stego method, the RS-bpp of different values of β is listed in Table 1, given the original “clean” images as input. In the table, StegEraser is only compared with SG which embeds secret binary information into images, because stego methods like DS and UDH hide secret images into images and DWT+SVD has a rather low hiding capacity. It is obvious in the table that on the COCO dataset, StegEraser provides an effective hiding capacity (RS-bpp) of 5.47 when
As shown in Fig. 3 and Table 1, a higher β puts more emphasis on decoding binary data with higher accuracy during training, and leads to better destruction of adversaries’ secret information during inference, at the cost of degraded image quality of the processed images measured against the stego images.

Impacts of β on the RR of secret binary information (for DWT+DCT, SG-B, SG-R, SG-D), quality of the processed images against the stego images (for all stego methods), and quality of the secret images revealed from processed images against the original revealed secret images (for UDH and DS). Numbers are averaged across the stego methods.

Comparison of methods for removing secret information, measured on the COCO dataset. (a)–(f), the recovery rate (
The tradeoff curves for the four methods of removing secret information are given in Fig. 4, in which the performance of OSN and StegEraser are better than the conventional approaches of Gaussian noise and Gaussian filtering. It is worth mentioning that, according to the aforementioned explanation of RS-bpp, a recovery rate of 50% is equivalent to a random guess, i.e., unable to be used to hide secret information, since its RS-bpp is zero.
For removing embedded secret binary information of low bpp (i.e., DWT+DCT, 0.001-0.016 bpp), OSN provides slightly better performance than StegEraser, as shown in Fig. 4(a)–(c). However, OSN only has a very narrow range of performance (mentioned in §4.1), leaving users with few options for the tradeoff, which might become a limiting factor in its real-world deployment. For secret binary information removal with high bpp (i.e., SG-b, SG-r, and SG-d), StegEraser consistently outperforms other methods, as shown in Fig. 4(d)–(f).
For removing embedded secret color images (DS and UDH), as illustrated by Fig. 4(g)–(h), StegEraser leads to the lowest quality of secret images revealed from processed images by adversaries (i.e., the highest level of destruction regarding secret images), at the same level of image quality for the processed images.

Visualization of images processed by StegEraser (SE) and other methods, in which (s) denotes stego, (r) denotes the revealed hidden image before processing (erasing), and (pr) denotes the revealed hidden image after processing.
A qualitative investigation of the images processed by different methods is presented in Fig. 5. The stego method of DWT+DCT is configured with a payload ratio of 0.9, the parameter σ for GN is 0.04, σ for GF is 1.5, λ for OSN is 10.0, and β for StegEraser is 0.5. Note that Fig. 5 only serves the purpose of visualization, and readers are referred to figures like Fig. 4 for a more rigorous comparison between the performance of different methods. These parameters are selected to be relatively high within their respective ranges, such that their effects can be clearly seen.
In Fig. 5, the original “clean” images are not shown, because they are almost indistinguishable from the first five columns. After embedding secret information into the clean images, the stego images corresponding to the five methods are shown in the first five columns. The stop signs in the sixth column are the revealed (extracted) secret images embedded by DS, and the next column DS (pr) shows the secret images revealed from stego images processed by erasing methods like OSN. It should be noted that DS and UDH embed secret images into the cover image, and consequently have the columns labeled with (r) and (pr), while other methods embed binary data and therefore do not have the corresponding revealed images. As expected, by comparing the rows of images, GN adds a noticeable amount of noise to the processed images, and perceptually degrades the image quality. GF, as an effective conventional method of removing secret information, makes the images blurry. The quality difference of images processed by OSN and StegEraser is less obvious.
For the secret images embedded with DS and UDH, by comparing the columns of DS (r), DS (pr), and UDH (pr), it can be seen that GN successfully makes the revealed image of DS unrecognizable, but leaves the revealed image of UDH partially intact (notice the red stop sign). The effectiveness of GF is exemplified in the second row, in which it is almost impossible to extract information from the revealed images. At a lower σ, e.g., 0.5, the recovered image of DS processed by GF is similar to the originally embedded secret image (not shown in Fig. 5), instead of appearing black in column DS (pr). For OSN, a low but non-negligible amount of information is retained in the revealed secret images. StegEraser renders the lowest quality of revealed images, which can be regarded as noise, and therefore StegEraser successfully disrupts covert communications.

The effect of running StegEraser for multiple passes for removing secret information on the COCO dataset, measured for the stego methods of (a) DWT+SVD, with a payload ratio of 0.9, and (b) SG-D.
To investigate whether injecting random binary data by repeatedly running StegEraser for multiple times leads to better destruction of stego contents, we illustrate in Fig. 6 the effects of running multiple forward passes on images, with different randomly generated binary data encoded in each pass.
For secret information embedded with DWT+SVD, running more forward passes leads to worse image quality, as expected, and the tradeoff between image quality and destruction of secret information is better than a single pass. However, this is not the case for secret information embedded with SG-D, where a single pass consistently outperforms multiple passes. The results for other stego methods are similar, suggesting that running StegEraser for multiple passes does not necessarily lead to better or worse performance.
Impacts of StegEraser (SE) and other methods on benign images. Values of A/B are for the COCO dataset and the DIV2K dataset respectively
Impacts of StegEraser (SE) and other methods on benign images. Values of A/B are for the COCO dataset and the DIV2K dataset respectively
The best results are in bold face.
Since not all images would contain secret information in practice, the impacts of all the methods of active warden on benign images are listed in Table 2.
It is evident from the table that the degradation of image quality caused by StegEraser, especially the perceptual quality (DISTS and LPIPS), is low compared with other methods. By comparing the results of GF and StegEraser, it can be verified that although GF could lead to slightly better traditional scores of PSNR and SSIM, it is outperformed by StegEraser on perceptual metrics, which aims to optimize the perceptual image quality instead of pixel-level similarity.
Comparison of the models’ complexity and processing speed
Comparison of the models’ complexity and processing speed
To preliminarily investigate the overhead that StegEraser would bring to firewall systems, we measure the inference speed and model complexity and present the results in Table 3. In the table, MAC represents the “Multiply-accumulate operation” [3], and is a common objective metric indicating the computational complexity of neural networks, since processing speeds on different GPUs can vary to a great extent. The decoding and encoding of image files (e.g., PNG), as well as file operations (reading and storing) are excluded from our calculations. GN and GF are executed on the CPU, while OSN and StegEraser models are run on a single GPU.
As evident in the table, the StegEraser model is relatively lightweight, considering the fact that the well-known image classification model VGG-16 has 138.36 million parameters, and requires 15.5 GMACs on an image of the size 224 × 224. The processing time of StegEraser is greater than that of other methods, reaching 24 ms per image, but is still acceptable.
It should be noted that the inference performance of StegEraser in real-world deployment would be much faster than the results reported in the table, because many techniques would increase speed and resource efficiency, such as using specially designed inference engine like TensorRT,4

Comparison of methods for removing secret information embedded by StegEraser, i.e., StegEraser itself being used as a stego method. Results are measured on the COCO dataset. The image quality corresponds to the processed images measured against the stego images (
As mentioned above, StegEraser can also be regarded as a stego method. Therefore, different methods can be evaluated to measure their effectiveness in terms of removing the stego information embedded by StegEraser. To avoid potential confusion, the corresponding results are separately shown here, instead of being incorporated in §4.2.2. The StegEraser model is set with

Ablation study, measured for SG-D on the COCO and DIV2K datasets. (a)–(b) The effect of the LPIPS perceptual quality loss, vs the MSE loss. (c)–(d) The effect of the number of invertible blocks in the INN module.
To investigate the extent to which the LPIPS perceptual quality loss contributes to the overall performance of StegEraser, in Figs 8(a), 8(b), we present the recovery rate and image quality tradeoff for replacing LPIPS with the mean squared error (MSE) in the loss function. The training settings of them are unchanged. As shown in the figures, the perceptual quality loss (orange curves) significantly improves the destruction of the secret information compared with the pixel-level difference loss of MSE, at the same level of perceptual image quality measured by DISTS.
Similarly, the performance of StegEraser benefits from the INN module, as shown in Figs 8(c), 8(d). The zero blocks setting is equivalent to completely removing the INN module in StegEraser. It is evident that without the information processing capabilities of INNs, StegEraser yields an inferior destruction rate of secret information at the same level of perceptual image quality.
Evaluation on other stego methods leads to similar results, and are consequently not plotted.
In this section, we discuss the limitations of StegEraser and potential countermeasures against it.
Limitation
One of the potential limitations of StegEraser is its inference speed. Although our model is quite fast as shown in Section 4, such speed might be inadequate in scenarios with stringent latency requirements, such as time-sensitive industrial systems [36]. To alleviate this problem, one could distill, quantize, or prune [45] the model for a smaller size and faster speed.
Potential countermeasures
As mentioned in Section 1, with the lower effective channel capacity due to StegEraser, the stego malicious transmission by APT groups is disrupted. Therefore, as a countermeasure, they could instead take a longer time, or use a greater number of compromised devices in the internal network to collaboratively send data, such that their ability to transmit large amounts of data is partly restored, while their risk of being exposed is manageable. Nevertheless, the cost and effort required by adversaries is still raised by StegEraser.
As the offensive and defensive measures are constantly evolving in cybersecurity, the adversaries may alternatively devise and apply new stego methods that could circumvent the effects of StegEraser. For instance, in recent years, researchers have proposed methods of generative steganography, i.e., semantically manipulating images, such that realistic images can be generated directly from the secret information and no embedding process is involved. In [33], Liu et al. design a scheme to convert the bits of secret information into a synthetic image using structure and texture characteristics. Zhou et al. show that a capacity of 4 bpp can be achieved by utilizing the Glow model for generative steganography [56]. As a more recent example, considering that diffusion models have now become the de facto SOTA paradigm for generating highly realistic images, Zhao et al. demonstrate that it is possible to embed watermark strings into the images generated by diffusion models [55].
Furthermore, faced with the foreseeable modifications on images conducted by active wardens, adversaries could incorporate into stego models the existing techniques that increase the robustness of their stego models, such that the covert communication channel is resilient against interference. In [49], the authors consider perturbations to images including noise and JPEG compression, and introduce a stego scheme robust against such distortions.
Owing to the fundamental limitations of StegEraser regarding generative steganography, to the best of our knowledge and effort, the authors currently list two categories of potential defense against it.
DeepFake detection. It is possible to detect whether an image contains stego information hidden with generative steganography, by largely utilizing existing research on DeepFake detection [54] (in a general sense), which aims to detect whether the images are naturally captured or generated by neural networks, but it nevertheless also inevitably involves the problem of accuracy and brings the issue of giving false positive and false negative results. As a more restrictive but more powerful strategy of defense, the malicious programs running algorithms of generative steganography (and other malware) can be mostly prevented by the paradigm of trusted computing [50], in which only the trusted and authorized programs can be executed on computers. For instance, the trusted execution environment (TEE) [57] provides the opportunity to run programs safely by applying security measures implemented with CPU hardware.
Conclusion
As an advanced technique extremely difficult to defend against with traditional deep packet inspection, covert communication based on steganography poses a pervasive and realistic threat to the cybersecurity of corporations, the IoT, industrial systems, etc. To address this challenge, we propose StegEraser, an approach that is fundamentally distinct from existing research, and is intuitively described as defeating steganography with steganography. In StegEraser, a large volume of randomly generated binary data are effectively embedded in the images with invertible neural networks, such that potential secret information hidden by adversaries is overwritten. The visual quality of the processed images is maximally preserved by leveraging the image perceptual quality loss. Evaluation on multiple steganographic methods with different characteristics and two datasets indicates that StegEraser outperforms the state-of-the-art method, and has the capability to disrupt covert communications involving both secret images and secret binary data.
Footnotes
Acknowledgments
This work is partially supported by the Natural Science Foundation of Tianjin (20JCZDJC00610), the National Natural Science Foundation of China (No. 62172241), the Technology Research and Development Program of Tianjin (No.18ZXZNGX00200).
