Abstract
The main problem with any tomography is the transformation of measurements into images. It is the so-called “inverse problem”, which, due to its indeterminacy, can never be solved perfectly. An additional factor contributing to the deterioration of the quality of tomograms is measurement noise. This article shows how to denoise electrical capacitance tomography measurements using the LSTM autoencoder. The presented model is two-staged. First, the autoencoder is trained using very noisy measurements. Then, the decoder autoencoder generates a training set to using activations ofe the latent layer. In the second stage, the LSTM network is trained, which has encoder latent layer activations at the input and pattern images at the output. The results of the experiments show that using an autoencoder to denoise the measurements improves the reconstruction quality.
Introduction
Process tomography is a monitoring technique that reconstructs three-dimensional objects by analyzing the electrical capacitance, voltages, radio waves, sound, light, or radiation passing through the tested object [1]. Tomographic methods are beneficial for process manufacturing and logistics systems because, unlike discrete systems, they provide detailed information about the structure and composition of the object without the need to dismantle it physically [2],[3]. Tomography enables non-intrusive monitoring and control of industrial processes such as fluid flow and mixing, solids transfer and flow, and the observation of chemical reactions [4]. The advantages of tomography over other methods of industrial process control include the ability to generate high-resolution three-dimensional images of the tested object’s internal structure, simultaneously measuring many properties, and working in difficult conditions [5]. Industrial tomography can be used in many different fields, such as the oil and gas, chemical, petrochemical, food [6,7], pharmaceutical, energy, mining, wastewater treatment [8–10], and mineral processing industries [11,12].
Several methods of monitoring and controlling industrial processes include instrumentation and control systems, programmable logic controllers, supervisory control and data acquisition systems, distributed control systems, manufacturing execution systems, artificial intelligence, and machine learning-based control [13]. Instrumentation and control systems use sensors and actuators to measure process variables, such as temperature, pressure, or flow rate, and control the process accordingly. Programmable logic controllers (PLC) are specialized computers that control industrial processes. They use a combination of hardware and software to monitor and control process variables in real time. Monitoring and data acquisition systems control industrial processes remotely. They gather data from remote locations using sensors and transmit that data to a central control room, where it can be analyzed and used to control the process. Distributed control systems (DCS) are computerized control systems for a process or plant, typically with many control loops in which autonomous controllers are distributed throughout the system, but there is central operator supervisory control. Manufacturing execution systems (MES) are used to manage and control manufacturing processes. They provide real-time data on process performance, production schedules, and inventory levels and can be used to optimize production and improve efficiency. Finally, artificial intelligence and machine learning-based control can be used to analyze process data and predict process behaviour. It can be used to optimize process control and improve process performance.
All of the above methods of monitoring and controlling industrial processes require a system of various sensors that provide point information in the form of quantitative values such as temperature, pressure, flow rate, current-voltage parameters, and others. On this basis, the systems reproduce general process conditions, such as the degree of crystallization of the liquid in the tank or the intensity of gas bubble formation. However, these indirect methods use many point observations and analyses of other physical quantities to model dynamic objects. For this reason, these systems are imprecise and unreliable. Electrical capacitance tomography (ECT) is an effective alternative to point-based industrial process monitoring systems. Combined with efficient and effective algorithms for solving the so-called “inverse problem”, tomography creates a 2D or 3D image of a specific part of the object’s interior. In electrical capacitance tomography (ECT), the inverse problem involves reconstructing the distribution of dielectric permittivity within a region of interest from measurements of the capacitance between electrodes placed around the region. The solution to this problem is typically obtained using iterative algorithms, such as the filtered back-projection algorithm or the algebraic reconstruction technique. These algorithms use capacitance measurements to iteratively update an estimate of the permittivity distribution until a satisfactory solution is obtained. However, the inverse problem in ECT can be ill-posed and highly nonlinear, making it challenging to obtain accurate reconstructions.
Machine learning techniques have been used to resolve the inverse problem in electrical tomography by providing an alternative solution to traditional iterative algorithms [14]. These techniques include neural networks, deep learning, and inverse modelling [15]. Neural networks have been used to learn the mapping between the measured data and the permittivity distribution, allowing for fast and accurate reconstructions. Deep learning algorithms, such as convolutional neural networks (CNN) and long-short-term memory networks (LSTM), have been used to analyze the measured data and automatically learn features that are useful for reconstruction [16]. Finally, inverse modelling uses machine learning to learn the system’s underlying physics, such as the forward mode, to reconstruct the permittivity distribution. These techniques have been shown to provide more accurate and robust reconstructions than traditional iterative algorithms, but they require large amounts of training data and are computationally expensive. In summary, machine learning techniques provide a more accurate and robust reconstruction of the inverse problem in electrical capacitance tomography than traditional iterative algorithms, but they require a large amount of training data and are computationally expensive.
This paper shows an original way to use LSTM networks to turn capacitance measurements into images. The novelty is using a two-stage machine-learning solution to the inverse problem. In the first stage, the measurements are denoised using the LSTM autoencoder. In the second step, a separate LSTM network converts latent layer activations from the encoder into 3D tomographic images.
Materials and methods
This section describes the hardware of the ECT (Electro-Capacitance Tomography) and software of the LSTM (Long Short-Term Memory).
Hardware
The research project examined a physical model of a tank reactor using a hybrid tomography prototype specifically designed and constructed in the Nertix SA laboratory. This prototype was equipped with electrodes that were able to perform simultaneous or separate measurements of Electrical Impedance Tomography (EIT) and/or Electrical Capacitance Tomography (ECT) [17,18]. The main constraint was the measurement speed, which was limited by the period of the generated signal. While theoretically, a measurement time of 16 ms could be achieved with a 1 kHz signal, the measurement sampling period was slightly longer at less than 100 ms due to the need for a stabilization period between switching the excitation from one pair of electrodes to another.
The ECT device was constructed using a set of Intel Altera Cyclone IV and Cyclone V FPGA chips, which allowed for using parallel function blocks independent of each channel. The device consisted of several components: a main board with a power supply and excitation current controller, measurement cards, and a data controller with an image reconstruction algorithm. The main board served as the connection point for data and address buses for individual blocks and featured a power supply unit that converted the 12 V DC voltage from the battery into the voltage required to operate the individual function cards. The battery control and charging system were also integrated into the power supply unit. The main board also functioned as the generator of the excitation current signal and included a system for verifying the signal’s accuracy and the correctness of the electrode connections.
The measurement cards were another important component and included four active electrodes with blocks for forming the measured signal, gaining control, and detecting the zero-intersection point. It was necessary for determining the wavelength and phase shift. The Cyclone IV FPGA system and A/D converter, along with the ADS8588 system, performed the measurement function, including signal filtering, calculating the RMS value, and measuring the signal phase. The initial data was then transferred to the control unit via buses. Eight Cyclone IV systems were used in the CT scanner.
Finally, the data controller, equipped with a built-in dual-core ARM Cortex-A9 processor, collected data from the individual measurement cards via FPGA blocks on an Intel Altera Cyclone V. It also transferred configuration data to the individual measurement blocks and supervised the accuracy of the measurements. The user interface and reconstruction mechanisms were managed by a processor operating under Linux. As a result, the portable ECT prototype can perform in situ reconstructions, meaning that the images can be generated on-site and in real-time. This allows for immediate analysis and interpretation of the data without the need to transfer the data to an external location for processing. In addition, the device is equipped to perform image reconstruction on board, which enables real-time analysis and decision-making. Figure 1 presents a research setup consisting of a physical tank model with ECT electrodes connected to a prototype tomograph.

The test stand - a tank reactor with an electrical capacitance tomograph.
The tank, acting as a physical model of the reactor, was filled with tap water. The outer diameter of the tank was ∅200 mm, while the inner diameter was ∅194 mm. Phantoms imitating internal objects were thin-walled plastic tubes with an outer diameter of ∅28 mm. There was air inside the phantoms.
To train the LSTM model, 30,000 synthetic cases were generated, of which 20% (6,000) of randomly selected cases were separated as the testing set. The training and testing data included a 120-element vector of capacitance measurements (inputs) and a 20,445-element set of values corresponding to voxels in the reconstruction image of the object’s interior in 3D (outputs). Simulation cases were generated by solving a forward tomographic problem. In electrical capacitance tomography (ECT), the forward problem is determining the distribution of the electrical properties (such as dielectric constant and conductivity) within an object based on the measured capacitance values. It is typically done by using a mathematical model that relates the capacitance measurements to the electrical properties of the object. The mathematical model used for the forward problem in ECT typically involves the use of second-order partial differential equations (PDEs) that describe the distribution of the electrical properties within the object. The most common PDE used in ECT is the Poisson equation, which relates the potential difference within the object to the electrical properties and the applied electric field. In the case of a three-dimensional ECT system, the forward problem can be modelled using the following equation [19]
In formula (2), the spatial location of the measurement (excitation) electrode is represented by 𝛤
i
, the spatial location of the detecting electrodes is represented by 𝛤
j
, the spatial location of the sensor screen is represented by 𝛤
s
, and the spatial location of the grounded guard electrodes is represented by 𝛤
g
. This model is used to simulate a real sensor and can solve the ECT forward problem by calculating capacitances between all possible electrode pairs. To accomplish this, Eq. (1) must first be solved to obtain the potential distribution, represented by 𝜙(x1, x2, x3), within the sensor. The following formula (3), which is the result of the integration of the Poisson equation, defines how to obtain the capacitance C
i
of the i-th pair of source and electrode [21].
In formula (3), A i is the surface area covering the electrode detector, ΔV i is the voltage difference between the pair of electrodes. In order to solve a forward problem, the finite element method (FEM) was used. The Eidors toolbox, which is part of the Matlab software, was used [22].
The synthetically generated set of measurements was contaminated with 10% Gaussian noise. Then, the LSTM neural network with the architecture shown in Fig. 2 was trained.

The architecture of the classic LSTM network that converts measurements into images.
Figure 3 shows the Matlab code defining the layers of the LSTM network. The “sequence” layer is the input layer, allowing 120 electrical capacitive measurements input. The next layer is “lstm_1” which contains 2048 hidden units. Next is the “gmpool_1d” layer, which is the global max pooling layer for the vector. A global max pooling 1-D layer will output the input’s largest time or space dimension to accomplish downsampling. The layer’s input determines the dimension over which its pools are calculated. The layer pools along the time axis for time series and vector sequence inputs. A fully connected layer multiplies the input by the weight matrix to produce the output and then adds the bias vector.
LSTM networks are already a known method used to transform measurements into tomographic images. In general, they cope quite well with measurement noise, but there are situations when noise caused by external factors, usually related to the method of carrying out the measurements, the test object, and the conditions in which the measurements take place, seriously distorts the tomographic images. In such situations, noise reduction autoencoders can be used. In this study, we present the use of the LSTM autoencoder, whose task is to reduce the noise contained in the ECT measurement data.
Autoencoders are a type of neural network that is used for unsupervised learning. They consist of an encoder network that compresses the input data into a lower or upper-dimensional representation and a decoder network that reconstructs the original data from the compressed representation. The autoencoder aims to learn a representation of the input data that is efficient, or compact, while still preserving the most important information. One common use case for autoencoders is data denoising. Typically, the autoencoder is trained on a clean input data dataset, and then used to remove noise from new, noisy input data. The encoder part of the autoencoder learns to extract the important features from the clean input data, and the decoder part learns to reconstruct the clean data from these features.

Matlab code defining the layers of the classic LSTM network.
In this case, the autoencoder was trained with noisy data as input and clean data as a reference. The goal of the denoising autoencoder was to learn a representation of the input data that is robust to noise while preserving the most important information. The training process of a denoising autoencoder was like a regular autoencoder, but with one key difference: the input data is corrupted with noise before being fed into the network. The network was then trained to reconstruct the original, clean data from the noisy input. During training, the encoder part of the network learns to extract the important features from the noisy input data, and the decoder part learns to reconstruct the clean data from these features. This process helps the autoencoder to learn a robust representation of the input data, which is less sensitive to noise. Once the denoising autoencoder is trained, it can remove noise from new, unseen input data by passing it through the encoder and then using the decoder to reconstruct the denoised data. The denoising autoencoder can also be used for image or signal restoration, inpainting, and anomaly detection.
Figure 4 shows the two-step process of generating ECT images using denoised measurements. In stage 1, we train an autoencoder consisting of an encoder and a decoder. The encoder includes the input layer, i.e., a sequence of 120 strongly noisy (10% Gaussian noise) measurements, the LSTM layer, and the latent layer. The decoder is made up of the LSTM layer, the fully connected layer, and the output regression layer, which is also a vector of 120 real values (cleaned measurements) like the input layer.

Two-stage LSTM architecture using denoising autoencoder and LSTM network decoding activations into images.
After training the autoencoder, an additional training set should be generated using the encoder. Noisy measurements are fed back into the encoder input, and FullyConnected latent layer activations are recorded at its output. Figure 5 shows the architecture of the autoencoder in the form of Matlab code. The “latentLayerDensity” variable in the “fcLatent” layer determines the number of neurons that can be properly adjusted to maximize the efficiency of the autoencoder.

Matlab code defining the layers of the autoencoder divided into encoderLayers and decoder layers.
In stage 2, we train the LSTM network, which has encoder-generated activations at the input and a collection of 20,445 voxels that make up the 3D tomographic image at the output. Figure 6 shows the LSTM network architecture from Stage 2 in Matlab code.

Matlab code defining the layers of the LSTM network decoding activations of the latent layer into images.
The architecture of neural networks and the parameters of the training process should be selected experimentally. In this case, the addition of normalization layers reduced the quality of the models. In order to verify the influence of the number of neurons in the latent layer of the autoencoder, the model from Stage 2, consisting of the autoencoder and the LSTM network, was trained 200 times in an iterative loop. Starting at 30, in each subsequent iteration, the number of neurons was increased by 5. Thus, the highest value of the “latentLayerDensity” variable was 1025, and the lowest was 30. A lot of tests have shown that 30 activations in the latent layer is the best number. An early stopping method was used to counter overfitting. The learning process ended if the mean squared error of the validation set did not decrease for 5 consecutive iterations. In this case, the validation set is the same as the test set.
In this research the first one-stage model with the classic LSTM network architecture converts the measurements into an input image. There is no separate denoising process here, so to compensate for this lack, the neural network was equipped with two LSTM layers with a large number of hidden units. The second model is a two-stage model and requires separate training of the autoencoder and the LSTM network with a reduced number of inputs. The inputs are activations of the latent layer of the autoencoder, which in fact are carriers of denoised information. So, the number of hidden units in the LSTM layers of both the autoencoder and the decoding network was half of what it was in the first model. This approach ensures the comparability of both models in terms of the level of complexity of their architectures. In addition, during numerous experiments in which LSTM models were trained by changing various parameters, including the number of hidden neurons, it was found that increasing the number of neurons from 1024 to 2048 caused a slight increase in the quality of the trained model. All the more, there is no justification for further increasing the number of hidden units above 2048.
In this section, the quality of the obtained model was assessed on the basis of quantitative indicators and qualitative observations. Quantitative indicators are possible to use only when reference images are available, which means that it is necessary to use synthetically generated data. A qualitative assessment was made by visual comparison of reconstructions obtained from real measurements with images of the reactor, in which various configurations of pipes immersed in water were placed.
Model quality assessment
In order to assess the effectiveness of the proposed two-stage LSTM architecture using a denoising autoencoder and LSTM network decoding activations into tomograms, reconstructions obtained using trained models were compared with reference (ideal) images. Figure 7 shows the differences between reconstructions from a denoising autoencoder and tomograms made from both noisy and non-noisy measurements. As mentioned earlier, trained LSTM models generate spatial (3D) reconstructions. However, to facilitate their comparison, Fig. 7 shows top cross-sectional views. In the first line titled “pattern”, there are reference images. The second line, “no noise” shows tomographic reconstructions obtained with the classical LSTM network (see Figs 2 and 3), which transforms noise-free validation measurements into images. The next row titled “noised” contains reconstructions made with this same LSTM network model as in the “no noise” row, but the measurements were noisy with 5% Gaussian noise. The level of Gaussian noise added to synthetic training data in tomography can vary depending on the specific application and the anatomical features of the area under study. Typically, the choice of noise level is associated with the research objective, the type of tomographic scanner, and the desired quality of reconstruction. In practice, the standard deviation of Gaussian noise can range, for example, from 0.5% to 10% of the maximum signal value in the measurement set. However, the exact values of noise depend on multiple factors and may differ based on specific conditions and studies [23–25].
A type of noise referred to as “white” noise due to its spectral flatness across the entire sampling bandwidth was generated using the Matlab function awgn(). This noise, known as Additive White Gaussian Noise (AWGN), is utilized to simulate the movement of electrons in a receiver’s RF front end. The noise is added to the signal and has a normal probability distribution for its amplitude, making it Gaussian.
The titles of the last three lines (denoised: L n = 30, L n = 35, and L n = 40) give the number of activations of the autoencoder latent layer (“latentLayerDensity” in Figs 4 and 5). The “denoised” images contain reconstructions obtained using measurements contaminated with 5% Gaussian noise. It’s important to know that the LSTM neural models were trained with measurements up to twice as noisy (10%) as the noise used for reconstructions (5%).

Comparison of 2D reconstructions received through a denoising autoencoder with tomograms based on noisy and non-noisy measurements.
An interesting observation results from the comparison of reconstructions denoised with encoders with a latent layer having 30 activations (L n = 30) and 35 activations (L n = 35). It turns out that a small change in the number of neurons in the latent layer can greatly impact the quality of noise reduction. Another interesting finding is that a further increase in the activation by five neurons (up to L n =40) again significantly improved the efficiency of the encoder.
Figure 8 shows the same Case #4 presented in Fig. 7 but reconstructed as a 3D image. Spatial (3D) tomograms are more valuable than 2D cross-sections only during the dynamic inspection on a computer screen when they can be rotated and zoomed. In static publications that do not offer image editing, 2D tomograms are more readable.

3D reconstructions from a denoising autoencoder with tomograms from noisy and non-noisy measurements – Case #4.
Four indices were used to compare the quality of ECT reconstructions obtained with the new method: mean square error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and image correlation coefficient (ICC). All the above metrics are described in detail in the publication [16]. The MSE of the tomogram is calculated according to the formula
SIMM indicator is obtained from Eq. (6)
The last indicator used is the image correlation coefficient (ICC). It can be derived through
Table 1 shows a summary of the average quality indicators for 6,000 test cases. Based on the obtained results, it can be concluded that the quality of the denoised reconstructions exceeded the quality of tomograms obtained without the use of an autoencoder.
Comparison of reconstruction quality indicators
In Fig. 9, you can see a comparison of reconstruction images made from real measurements. Reconstructions (f–j) were performed using the classic LSTM model (see Fig. 2). Images (k–o) were generated using a two-stage noise reduction model with the use of an autoencoder (see Fig. 4).

Reconstructions from real measurements performed using raw and denoised measurements.
Five different configurations of plastic tubes filled with air were placed in the tank. It can be seen that the biggest problem is the reconstruction of the pipe placed in the middle of the reactor. Comparing tomograms obtained using a classic neural network with images generated by a model with a denoising autoencoder, we can conclude that the autoencoder contributes to improving the quality of reconstruction. Images obtained from denoised data are more accurate and devoid of most artifacts.
There are a lot of studies that show that many machine learning methods are very good at solving the tomographic inverse problem. The purpose of this research is neither to compete with other methods nor to present a method that is better than others. The real goal is to answer the question of whether the two-stage approach, based on the use of an autoencoder and a network trained on the basis of a learning source generated by an autoencoder, is more effective than the classical approach. The fact that LSTM layers were used and not, e.g., convolutional networks, U-Net architecture, or residual networks is of secondary importance. This also explains the fact that we do not discuss all the parameters of the network structure or the learning process in detail. In this case, the comparability of both concepts—the classic one-stage and the new two-stage with an autoencoder—is of key importance.
These studies presented the concept of using an autoencoder to denoise measurements in electrical impedance tomography. A two-stage model was proposed in which the autoencoder is trained first. Then, using the encoder, teaching cases are generated so that the input is noisy measurements, and the output is the activation of the latent layer, which is also the last layer of the encoder. In the next step, a model using the LSTM neural network is trained, which accepts activations of the latent layer as input and reconstruction images as output. Denoising consists of transforming the noisy measurements with a trained encoder and then converting the results obtained from the activation encoder into an improved tomographic image. The method was tested by subjective comparisons of images obtained with the new method with images obtained using the classic LSTM network and the MSE, PSNR, SSIM, and ICC indices. The obtained results confirm the high effectiveness of the new method. In future research, the authors plan to use noise-reducing autoencoders in other models, such as convolutional neural networks.
