Novel U-net based deep neural networks for transmission tomography

Abstract

BACKGROUND:

The fusion of computer tomography and deep learning is an effective way of achieving improved image quality and artifact reduction in reconstructed images.

OBJECTIVE:

In this paper, we present two novel neural network architectures for tomographic reconstruction with reduced effects of beam hardening and electrical noise.

METHODS:

In the case of the proposed novel architectures, the image reconstruction step is located inside the neural networks, which allows the network to be trained by taking the mathematical model of the projections into account. This strong connection enables us to enhance the projection data and the reconstructed image together. We tested the two proposed models against three other methods on two datasets. The datasets contain physically correct simulated data, and they show strong signs of beam hardening and electrical noise. We also performed a numerical evaluation of the neural networks on the reconstructed images according to three error measurements and provided a scoring system of the methods derived from the three measures.

RESULTS:

The results showed the superiority of the novel architecture called TomoNet2. TomoNet2 improved the quality of the images according to the average Structural Similarity Index from 0.9372 to 0.9977 and 0.9519 to 0.9886 on the two data sets, when compared to the FBP method. This network also yielded the best results for 79.2 and 53.0 percent for the two datasets according to Peak-Signal-to-Noise-Ratio compared to the other improvement techniques.

CONCLUSIONS:

Our experimental results showed that the reconstruction step used in skip connections in deep neural networks improves the quality of the reconstructions. We are confident that our proposed method can be effectively applied to other datasets for tomographic purposes.

Keywords

Computed Tomography Deep Learning U-net FBP

1 Introduction

Computer tomography is a well-known set of tools for the non-destructive investigation of the internal structure of an unknown object [13]. In transmission X-ray tomography the object is located between the X-ray source and the detector. This enables us to measure the attenuating characteristics of the materials of the object at the detector. The attenuation depends on the linear attenuation coefficient of the object, as the X-rays passing through the object. If the measurements are made in a lot of different directions and the geometry of the projections (i.e., the path of the beams) are known, then the reconstruction of the object under study can be made from the measured projection data [8, 13].

Our aim was to provide highly accurate methods for the case of tomography when the projection data shows strong signs of beam hardening and random electrical noise. Beam hardening is a physical phenomenon, which causes serious difficulties in tomography. Beam hardening occurs because the lower energy photons of the polychromatic radiation are absorbed with a higher probability in the material of the studied object than the higher energy photons. Therefore, if polychromatic radiation passing through an object, it will lose a greater proportion of its lower energy photons, so the ratio of higher energy photons will increase relative to the lower energy photons. As this happens, one can say that the beam becomes harder, which means that the inner layers of the object will interact with radiation having a different characteristic. Beam hardening artifacts appear in two forms such as cupping (the interior of the object appearing to be darker) and dark or light streaks (see Figure 2). Electrical noise is a random factor in the measurement process causing random changes of the measured values. In the reconstruction, it causes streaks and random changes in the pixel values.

In order to cope with these measurement errors, we created a database with a virtual CT scanner, producing realistic simulations of projection data and we designed two novel deep-learning based methods for reducing artifacts in the reconstructions.

One of the sub-fields of artificial intelligence, called deep learning [14], has achieved outstandingly good result in the field of computer vision and digital image processing in the last decade. When using deep neural networks, the word “deep” means that the network structure contains multiple hidden layers [23]. In this paper, we used an outstandingly versatile and useful tool of deep learning called U-net [22].

The literature provides a variety of options (e.g., in [24]) for the combination of computer tomography and deep learning methods. There are approaches for the reduction of various artifacts, for example beam hardening [7 , 20] and metal artifact [4 , 28]. Moreover, researchers in [3 , 26] were interested in performing the reconstruction with neural networks, while others applied deep learning as a pre- or post-processing tools before- or after the reconstruction in [2 , 17].

In this paper, we present two novel deep convolutional neural network architectures for image reconstruction from projections. These methods provide end-to-end solutions taking projection data as an input and producing reconstructed images on their outputs. In comparison, the end-to-end solution in [26] has simpler architecture than ours and also it contains a fully connected layer. The end-to-end reconstruction architecture in [3] has a more complicated architecture with three well-separated parts. The first part is working on the projection data with convolution layers. The second part performs the reconstruction, while the third part is working on the reconstructed images with convolution and deconvolution layers. This method maintains a weak connection between the parts working with the projection data and the reconstructed images. Our methods were built on the U-net structure by incorporating the reconstruction in the skip connections. In this way, the reconstruction is not before, or after the main part of the network, but in the middle of a U-net structure, thus the connection is stronger, which makes the training easier, and more efficient.

This paper is structured as follows. In Section 2 we formulate a model for tomographic reconstruction that we used, with providing the most important equations. Next, we describe the evaluation datasets in Section 3 and the architectures in Section 4. After that, we detail the training process of the neural networks in Section 5 and present our result in Section 6. Finally, we summarise the main points in Section 7 and make conclusions in Section 8.

2 Computer tomography

Let $f : ℝ^{2} \to ℝ$ be the attenuation coefficients of a material in a two-dimensional cross-section of the examined object. Then, the Radon transform gives the projections of f as $[R f] (α, t) = \int_{- \infty}^{\infty} f (t cos (α) - q sin (α), t sin (α) + q cos (α)) dq .$ (1) In (1) the (α, t) pair determines a line in the two dimensional cross-section, with α representing the direction of the line, while the t is the signed distance of the line from the origin. With this notation, the reconstruction task can be written as solving $[R f^{'}] (α, t) = [R f] (α, t)$ for f′, where the solution f′ satisfies the equation for all measured (α, t) pairs.

Practically, the object of study is located between the X-ray source (S) and the detector (D), resulting in a limited span of f. Therefore, the range to be integrated can be simplified to the [S, D] interval in (1). The I_D and I_S values are measured by CT scanners, where I_D is the number of X-ray photons sensed by the detector and I_S corresponds to the number of photons leaving the source. The relation between I_D, I_S and (1) is given by the Beer-Lambert law as $I_{D} (α, t) = I_{S} exp - [R f] (α, t)$ . After rearranging: $[R f] (α, t) = - exp (\frac{I_{D} (α, t)}{I_{S}}) .$ (2)

A common algorithm for creating reconstructions from projections is the Filtered Back Projection (FBP) [13]. The FBP algorithm contains a filtering step traditionally in the Fourier domain. There are a couple of well examined basic filters in the literature [8]. In Section 6 we shared our results using different filters in the FBP algorithm, which table helped us decide, what kind of FBP filter to use in the final comparison if the reconstruction step located outside of the neural network.

The FBP is fast and can give images of good quality but requires a high number of projections. The FBP algorithm was not designed to deal with the difficulties caused by physical phenomena such as beam-hardening (i.e., it assumes perfect measurements), therefore it creates various artifacts on the reconstructed image if it is supplied with distorted projection data.

3 Datasets

We used computer-simulated artificial data in this study, generating the projections of software phantoms. Our physically correct projection data was generated using the GATE software [10, 11] without Compton scattering. We have set a parallel beam geometry and photon-counting detectors in the GATE model. 400 000 photons left the source in each projection angle. The projection data were generated with 596 angles each containing 362 detector points. The projection data was reconstructed as 256x256 images.

We have split our versatile dataset into two big parts, that we call Dataset A and B, in order to get a simple notation. For more convenient discussion, from this point on, we will be mentioning these two groups of data as different datasets. The two datasets are different in the shape and material composition of the phantom object.

3.1 Dataset A

In the case of Dataset A, we used five different X-ray sources assigned in equal proportions to 5 000 phantoms to be more realistic. We used pre-filtering during the generation of source characteristics as it is a common way to reduce beam hardening [12]. The characteristics of the sources were calculated by [21]. The sources differ only in the thickness of a pre-hardening aluminum filter. Figure 1 shows the characteristic of the sources. Each phantom was present in the dataset using only one selected source. In addition, we calculated the images of Figure 2 showing an example for all of the used sources with one phantom, which way the effect of the different sources is more visible. Figure 3 shows the intensity value profiles along the yellow lines in Figure 2. As one can see, the effects of beam hardening are decreasing towards the wider aluminum filters, while the electrical noise is increasing.

Fig. 1

The characteristics of the X-ray sources with different aluminum thicknesses used in our experiments.

Fig. 2

The ground truth phantom and its reconstructions from distorted projection data acquired using different pre-hardening filters.

Fig. 3

Intensity value profiles along the yellow lines in Figure 2.

The phantoms were generated as a combination of randomly chosen shapes, i.e., circles, ellipses, and rectangles. The objects may contain each other but partial overlap was not allowed. One object can consist of only one material from the following set: air, spine bone, rib bone, skull, blood, cartilage, kidney, kidney stone, and adipose. Moreover, the objects are various in size and location, which parameters were chosen randomly during phantom generation. The phantoms were generated in 4 groups based on their object counts. Each group contained a quarter of all the images (i.e., 1250 images), and groups were generated with a maximal object count of 4, 5, 6, or 7 objects. Figure 4 introduces a few examples from this dataset.

Fig. 4

Examples from the Dataset A.

For a ground truth projection set of the Dataset A, we calculated analytically correct projections with the help of [27] without any noise. We set the values of the materials to their mass linear attenuation coefficient. To this end, we performed specific GATE measurements, where the quotient of the detector intensity behind the object and the source intensity (in other words, the transmission value) were in the [0.4995 ; 0.5005] interval. After the measurements, we calculated the linear attenuation coefficient of the materials for each X-ray source based on (2) which was divided by the density of the material.

Before training, we applied (2) to the projection data. Then, we normalized the intensities to be in the [0, 1] interval. We used five different scaling factors corresponding to the five sources.

3.2 Dataset B

A second dataset was produced using hand-drawn shapes as templates from [1] with the same simulation method as Dataset A. This dataset was excluded from training and was only used in the testing of the methods. Figure 5 introduces a few examples from this data set, which consists of 66 phantoms with various non-basic geometrical shapes. Results were generated with the source pre-filtered by 5 mm aluminum. The 66 icon phantoms can be partitioned into three groups with 22 phantoms in each group. The object or objects of the phantoms in the first group (I) consist of rib bone while the background is always air. The material of the objects in the second group (II) can be air, spine bone, rib bone, skull, blood or adipose, while air, adipose, teeth, skull, ribs spongiosa or rib bone in the case third group (III). This means, that in the third group there are two materials (teeth and ribs spongiosa), which were not used during the training of the networks. In each group, random cracks were created and every phantom appears with and without the cracks. The cracks consist of air. Here, we also performed normalization with a distinct scaling factor in each group. The dimensions of the projection data and reconstructed images were the same as with Dataset A.

Fig. 5

Examples from the Dataset B.

4 Network architectures

We examined the merits of our two novel architectures called TomoNet2 and TomoNet3 is this study. All of these networks are based on the special deep-, fully-convolutional neural network structure called U-net [22]. The structures of the networks are shown in Fig. 6.

Fig. 6

The structures of the used deep convolutional neural networks.

The structure of the SinoNet and the ReconNet are the simplest ones. They have a U-shape consisting of the contracting path (left side) and the expansive path (right side) and they also have an additional average pooling connection which is concatenated to the contracting path. The difference between SinoNet and ReconNet are the inputs and outputs. The SinoNet is working on only the projection data. The reconstruction step can be carried out as an additional step after processing the projection data. The usage of ReconNet, on the other hand, takes place after the reconstruction as a post-processing step. Results using only projection data or reconstructed images in a U-net-like structure can be found in [2 , 19] and [5 , 20] respectively.

The TomoNet1 starts with the SinoNet structure, but the authors added new elements corresponding to the FBP. The new elements are similar to the construct in [6, 26] and exactly the same as FBP_U_net in [18]. The first element is a convolution with a Ram-Lak filter in projection space. The inverse Radon transform is the differentiable second element, which is part of the training as a neural network layer. The first and second part contains non-trainable parameters. The third element is a ReLU activation layer. The network prefers the positive values with the ReLU, which is optimal for the reconstruction tasks. The TomoNet1 accepts projection data as input and provides reconstructed images as output. The main novelty of this network was that the reconstruction and the network did not separate processing steps. The Network contains the FBP as a differentiable layer, so the training process includes the effects of the FBP.

In the case of TomoNet2, we mixed the SinoNet, ReconNet and TomoNet1 in a well-organized structure. The three elements correspond to the FBP located between the contracting and expansive paths in every level of the U-shape. Accordingly, the projection data of the contracting path transformed to reconstructed images at the expansive path. The TomoNet2 uses projection data as input and provides reconstructed images as output.

The TomoNet3 uses projection data as input and provides reconstructed images as output too, but here the contracting path and the expansive path of the U-shape are operating with projection data. Nevertheless, the FBP is present at every level forming a new expansive path.

5 Training

The training was performed on an NVIDIA GeForce RTX 2080 GPU using CUDA 10.1 and cuDNN v7. We used the TensorFlow Keras library running on Python v3.6. We used only Dataset A for training and validation. We have split the 5 000 images of Dataset A into three partitions, namely, train, validation and test. The proportions were 70%, 20%, and 10% respectively. The presented results were made with the same split, but shuffle was allowed among epochs during training and validation. An additional test was done using the phantoms of Dataset B, in which phantoms remained unseen during training and validation.

We trained the networks several times with different hyper parameters and we chose the final settings leading to the best reconstruction quality. These settings are summarised in Table 1. We applied early stopping, which stopped the training if the validation loss stopped decreasing for twenty epoch and resets the network snapshot that resulted in less validation performance. In this way, we completely excluded the compromising effect of overfitting from our results.

Table 1
Hyperparameters of the deep convolutional neural networks

SinoNet ReconNet TomoNet1 TomoNet2 TomoNet3

Loss function Mean Squared Error

Optimizer Adam

AMSGrad True

Early Stopping True

Activation function ReLU

Initial learning rate 0.0001 0.001 0.0001

Epochs 48 82 43 105 107

second/Epochs 43 42 60 61 70

Batch size 43 7 43

	SinoNet	ReconNet	TomoNet1	TomoNet2	TomoNet3
Loss function	Mean Squared Error
Optimizer	Adam
AMSGrad	True
Early Stopping	True
Activation function	ReLU
Initial learning rate	0.0001	0.001	0.0001
Epochs	48	82	43	105	107
second/Epochs	43	42	60	61	70
Batch size	43	7	43

The loss and validation function diagrams of all the networks can be seen in Figure 7. We marked (with a red arrow) the best state, which was saved by the early stopping. The loss and validation loss of the models decreased rapidly during the first few epochs, then they moved slowly toward the zero line. We considered the curves satisfactory in all respects. Although, the curves of loss and validation loss functions moved away from each other before the stopping point in the case of ReconNet.

Fig. 7

The normalized loss and the validation loss values of the deep neural networks during the training phase. The red arrows mark the best learning state of the networks, which were used for prediction.

6 Results

We evaluated the reconstructed images by numerical measurements. During the evaluation, we used the Mean-Squared Error (MSE), the Peak-Signal-to-Noise-Ratio (PSNR) and the Structural Similarity Index (SSIM) [25]. All of these measurements compare the reconstructed image to the ground truth. The (SSIM) is originally limited to the (-1, 1] interval, but we considered complementary similarity as errors, therefore, negative values were replaced by zeroes. In this composition better results are marked with a higher PSNR value, an SSIM value closer to one and a MSE value closer to zero.

We compared the networks to each other and we also included an FBP algorithm (referred as (FBP) in the figures) using the Han filter. We chose the Han filter based on a test on Dataset A examining the Ram-Lak, Shepp-Logen, Cosine, Hamming, and Hann filters. Table 2 shows the results of the test. Clearly, the performance of the Han filter seems to be the best, therefore we included only the columns of the Han filter in the upcoming tables.

Table 2
Calculated mean and standard deviation (SD) values for all tested imaged according to Dataset A by categories with three error types. The first column corresponds to the five sources having pre-hardening aluminum filters of different thicknesses (i.e. the categories). The last row of the first column shows the overall average. The columns of Ram-Lak, Shepp-Logen, Cosine, Hamming and Hann corresponds of the five most common filter used in FBP

Al [mm] Error type RamLak SheppLogen Cosine Hamming Hann

Average SD Average SD Average SD Average SD Average SD

0 PSNR 20.9252 4.0782 22.3924 4.4433 22.2455 4.4017 23.0518 4.6553 23.1297 4.6824

SSIM 0.8441 0.0774 0.8874 0.0621 0.8836 0.0634 0.9045 0.0563 0.9066 0.0556

MSE 0.0116 0.0092 0.0087 0.0072 0.0089 0.0074 0.0077 0.0066 0.0076 0.0065

1 PSNR 24.1043 3.9576 26.9589 4.8238 26.6664 4.7169 28.5640 5.6155 28.7827 5.7529

SSIM 0.8856 0.0653 0.9306 0.0482 0.9267 0.0497 0.9483 0.0410 0.9504 0.0401

MSE 0.0058 0.0054 0.0036 0.0038 0.0037 0.0039 0.0029 0.0033 0.0028 0.0032

5 PSNR 22.7526 1.8531 26.8156 2.3309 26.3808 2.2566 29.5870 3.0570 30.0248 3.2310

SSIM 0.8500 0.0397 0.9230 0.0281 0.9165 0.0292 0.9535 0.0230 0.9573 0.0224

MSE 0.0058 0.0027 0.0024 0.0015 0.0026 0.0015 0.0014 0.0011 0.0013 0.0011

10 PSNR 20.9796 1.6583 25.2375 1.9055 24.7761 1.8634 28.2783 2.4215 28.7768 2.5669

SSIM 0.7988 0.0424 0.8974 0.0266 0.8882 0.0282 0.9421 0.0186 0.9478 0.0176

MSE 0.0086 0.0031 0.0033 0.0015 0.0036 0.0016 0.0017 0.0010 0.0016 0.0010

20 PSNR 18.8682 1.3609 23.4168 1.4644 22.9103 1.4480 26.8872 1.7619 27.4971 1.8609

SSIM 0.7233 0.0462 0.8545 0.0292 0.8417 0.0313 0.9194 0.0176 0.9282 0.0158

MSE 0.0136 0.0041 0.0048 0.0017 0.0054 0.0018 0.0022 0.0011 0.0020 0.0010

Average PSNR 21.5835 3.4374 24.9265 3.8725 24.5691 3.8010 27.1495 4.5412 27.5021 4.6894

SSIM 0.8232 0.0799 0.8991 0.0503 0.8921 0.0529 0.9328 0.0406 0.9372 0.0399

MSE 0.0091 0.0064 0.0047 0.0046 0.0050 0.0047 0.0034 0.0043 0.0032 0.0042

Table 3 and Table 4 show the average results of the methods with Dataset A and B by category. In the case of Dataset A, the categories correspond to the used sources. In Table 4 the category I denotes the phantoms with two intensities (background is air and object is rib bone). The phantoms belonging to row II contain four to six materials, which were seen by the networks during training. In the phantoms of category III the number of different materials can vary between four and six, from which two have not been seen by the networks during training. In addition, we calculated the overall average in both tables and provided the standard deviation of each category.

Table 3

Calculated mean and standard deviation (SD) values for all tested imaged according to Dataset A by categories. The first column corresponds to the five sources having pre-hardening aluminum filters of different thicknesses (i.e. the categories). The last row of the first column shows the overall average

Al [mm]	Error type	FBP		SinoNet		ReconNet		TomoNet1		TomoNet2		TomoNet3
		Average	SD	Average	SD	Average	SD	Average	SD	Average	SD	Average	SD
0	PSNR	23.1297	4.6824	30.6019	4.1088	30.6378	4.0829	32.7588	4.2920	35.9307	4.4062	34.7219	4.7548
	SSIM	0.9066	0.0556	0.9838	0.0091	0.9913	0.0052	0.9884	0.0063	0.9970	0.0019	0.9969	0.0017
	MSE	0.0076	0.0065	0.0012	0.0009	0.0013	0.0014	0.0008	0.0009	0.0004	0.0004	0.0006	0.0007
1	PSNR	28.7827	5.7529	32.4271	4.9667	33.2776	4.6385	34.3974	5.2409	37.9697	5.1554	36.9896	5.7430
	SSIM	0.9504	0.0401	0.9887	0.0064	0.9936	0.0041	0.9912	0.0048	0.9976	0.0018	0.9974	0.0018
	MSE	0.0028	0.0032	0.0010	0.0011	0.0008	0.0011	0.0007	0.0010	0.0003	0.0005	0.0005	0.0008
5	PSNR	30.0248	3.2310	33.3140	3.8925	34.6786	3.9260	35.2207	3.9602	40.9700	3.8661	38.7901	4.4705
	SSIM	0.9573	0.0224	0.9887	0.0052	0.9946	0.0028	0.9903	0.0038	0.9986	0.0008	0.9980	0.0012
	MSE	0.0013	0.0011	0.0007	0.0005	0.0005	0.0004	0.0005	0.0005	0.0001	0.0001	0.0002	0.0004
10	PSNR	28.7768	2.5669	32.5171	4.1650	33.9986	4.8718	34.5703	4.3657	38.8459	4.8006	36.7349	5.4773
	SSIM	0.9478	0.0176	0.9869	0.0059	0.9946	0.0032	0.9911	0.0042	0.9981	0.0012	0.9972	0.0017
	MSE	0.0016	0.0010	0.0008	0.0008	0.0008	0.0016	0.0006	0.0007	0.0003	0.0004	0.0005	0.0008
20	PSNR	27.4971	1.8609	31.3519	3.9710	32.9442	5.0721	33.0634	4.3449	37.7962	4.1573	35.4217	4.9142
	SSIM	0.9282	0.0158	0.9844	0.0056	0.9940	0.0032	0.9878	0.0048	0.9972	0.0015	0.9964	0.0019
	MSE	0.0020	0.0010	0.0011	0.0009	0.0010	0.0016	0.0008	0.0008	0.0003	0.0003	0.0005	0.0007
Average	PSNR	27.5021	4.6894	31.9951	4.3509	33.0133	4.7227	33.9611	4.5637	38.1958	4.7906	36.4728	5.2715
	SSIM	0.9372	0.0399	0.9865	0.0070	0.9935	0.0041	0.9897	0.0051	0.9977	0.0016	0.9972	0.0018
	MSE	0.0032	0.0042	0.0010	0.0009	0.0009	0.0013	0.0007	0.0008	0.0003	0.0004	0.0005	0.0007

Table 4

Calculated mean and standard deviation values for all tested imaged according to Dataset B by categories. The category I contains phantoms with two intensities. The phantoms belonging to the category of II can be constructed only of known materials showed to the networks during training. The phantoms of III contain at least one material unseen of the networks during training. The last row of the first column shows the overall average

Category	Error type	FBP		SinoNet		ReconNet		TomoNet1		TomoNet2		TomoNet3
		Average	SD	Average	SD	Average	SD	Average	SD	Average	SD	Average	SD
I	PSNR	22.9847	0.8083	23.4377	1.5777	7.6204	9.6137	25.2104	2.0990	26.8979	1.9009	26.0787	2.2368
	SSIM	0.9373	0.0074	0.9728	0.0070	0.9517	0.0189	0.9756	0.0068	0.9875	0.0045	0.9878	0.0051
	MSE	0.0051	0.0009	0.0048	0.0014	1.9855	5.5222	0.0033	0.0010	0.0022	0.0008	0.0027	0.0011
II	PSNR	25.0866	1.7419	25.3617	1.6025	14.6258	6.7897	26.9704	1.6068	27.6045	1.4897	27.8985	1.8768
	SSIM	0.9476	0.0088	0.9796	0.0052	0.9584	0.0139	0.9823	0.0049	0.9901	0.0020	0.9896	0.0030
	MSE	0.0033	0.0012	0.0031	0.0011	0.1230	0.2245	0.0021	0.0008	0.0018	0.0006	0.0018	0.0007
III	PSNR	28.4547	2.8829	26.2879	3.5975	21.1095	3.8025	27.1091	3.6830	28.8856	2.3983	27.1993	3.5178
	SSIM	0.9706	0.0107	0.9789	0.0081	0.9642	0.0089	0.9845	0.0044	0.9881	0.0046	0.9850	0.0052
	MSE	0.0017	0.0009	0.0031	0.0021	0.0105	0.0074	0.0027	0.0020	0.0015	0.0008	0.0025	0.0019
Average	PSNR	25.5087	3.0051	25.0291	2.6918	14.4519	8.9573	26.4300	2.7199	27.7960	2.1048	27.0588	2.7061
	SSIM	0.9519	0.0166	0.9771	0.0074	0.9581	0.0151	0.9808	0.0066	0.9886	0.0040	0.9875	0.0049
	MSE	0.0034	0.0017	0.0037	0.0018	0.7063	3.2713	0.0027	0.0014	0.0018	0.0008	0.0023	0.0014

Comparing to the basic FBP, all of the networks improved the quality of the reconstructed images according to the average measurements in Table 3. Moreover, Table 3 and Table 4 shows, that TomoNet2 outperformed the others, but it was followed closely by TomoNet3.

For a better insight, we summed up, how many 1^st, 2^nd, 3^rd, 4^th, 5^th and 6^th best places were achieved by the networks. In this manner, 1^st place means that the given network gave the best result. Figure 8 shows detailed statistics about this ranking.

Fig. 8

The achieved ranks by the networks by all of the measured errors.

In the case of Dataset A, we observed clear dominance in favor of the TomoNet2 looking at Figure 8. TomoNet2 gave the best in 396 cases according to PSNR and MSE and in 374 based on SSIM out of the 500 test phantoms preserved for testing. This means, that the TomoNet2 out-performed the other methods certainly in 74.8 percent of the cases, but we argue that it is closer to 79.2 based on PSNR and MSE. This ranking corresponds to the results seen in Table 3, which shows the dominance of one of our proposed methods in each category.

In the case of Dataset B, we can say, that TomoNet2 had the best performance with all error measurements looking at Figure 8. Note, that ReconNet only made the reconstruction worse looking at the PSNR and MSE diagram.

Table 5 shows the total score of the networks for all errors. The total score was calculated by the formula $Total Score = \sum_{rank = 1}^{6} N_{i} rank$ , where rank ∈ {1, 2, 3, 4, 5, 6} is the number corresponding to the rank and N_i is the number of the test cases at the given method and rank. Therefore, lower Total Score means better results. According to the summarized rankings, TomoNet2 provided the best quality of reconstructed images.

Table 5

The total rank weighted score gained by the methods for all errors. The smaller is better

Final score	Error type	FBP	SinoNet	ReconNet	TomoNet1	TomoNet2	TomoNet3
Dataset A	PSNR and MSE	2917	2316	1940	1641	633	1053
	SSIM	2996	2427	1634	1911	633	899
Dataset B	PSNR and MSE	251	291	396	196	111	141
	SSIM	364	256	343	198	99	126

Figure 9 and Figure 11 present examples from datasets A and B. The FBP reconstruction shows strong signs of beam hardening artifacts and electrical noise with the FBP in Figure 9. The errors have been highlighted by the MSE and SSIM error maps. The usage of ReconNet lead to highly homogeneous objects, but ReconNet did not preserve the shapes of the objects. SinoNet and TomoNet1 kept the edges well, but they also left a significant amount of artifacts on the images. TomoNet2 and TomoNet3 gave the best looking results looking at the images in Figure 9 and Figure 11. Note, that TomoNet2 resulted in more homogeneous objects, which is preferable according to the original image in Figure 10. We argue the error measurements are consistent with what is seen in the images.

Fig. 9

Results, MAE and SSIM error maps examples from Dataset A. SSIM maps are inverted while the MSE maps are square routed for better visibility. Dark pixel means better results in the given location in the case of error maps.

Fig. 10

The original phantoms of Figure 9 and Figure 10.

Fig. 11

Results, MAE and SSIM error maps examples from Dataset B. SSIM maps are inverted while the MSE maps are square routed for better visibility. Dark pixel means better results in the given location in the case of error maps.

The reconstruction in Figure 11 also shows signs of beam hardening artifacts and electrical noise with the FBP. This phantom is from the “two unseen materials” category and it also contains three cracks. Here again, ReconNet was struggling to keep the shape of the objects and as a result, it gave back an amorphous form. SinoNet yielded a decent result but left some streak artifact on the image. The results of TomoNet1 and TomoNet3 are blurry, but the cracks are all along recognizable especially with TomoNet1. The reconstructed images are sharper according to the results of TomoNet2, and the surrounding low-intensity object is better preserved. However, the cracks have disappeared in some places.

7 Discussion

In this paper, we proposed two novel neural networks for image reconstruction from projections. The novelty of the network is that they contain multiple back-projection layers, which provide a strong connection among the two main parts of the network working on projection data and working on reconstructed images, respectively.

In the literature, there are many publications that use deep learning techniques combined with tomography. But these solutions operate mainly as pre- or post-processing steps added before or after the reconstruction. Looking at our results it is clear, that the end-to-end solutions yielded better results. In our opinion, the reason for the superiority of the new models is, that if the network sees the whole path of the data, then it can better optimize the output and learn more complex features.

The authors in [6] has already demonstrated, that the FBP can be mapped into a neural network. But the authors did not intend two solve common tomographic problems like beam-hardening in this study. The authors in [3] modified the previous network to be more general and created the ADAPTIVE-NET. In their paper, there is an exhausting study about the ADAPTIVE-NET and its variants. The ADAPTIVE-NET is made up of three parts: a Projection domain unit, a Domain transformation unit, and an Image domain unit. The Domain transformation unit is responsible for the image reconstruction and the connection of the other two parts. In comparison, the base of our proposed networks is a modified U-net, where the reconstruction is performed on the skip connections. This means, that if we would divide our proposed networks into Projection domain unit and Image domain unit using the names in [3], then we had multiple Domain transformation units among them.

Here we list our five major observations based on the Section 6 to highlight the merits of our proposed methods:

In our opinion the stronger connection among the domains is crucial for a more general solution, which can work with our raw projection data affected with strong distortions.

We also argue that the methods improved the image quality regardless of the used pre-filtering according to Table 3, although the highest improvement was detected in the case of the 5 mm aluminum filter.

The ranking analysis revealed, that the networks and the averages are reliable because there is a clearly dominant network in each place. This means, that there is a network in every given place, which ends up at the given place more than 50% of the cases. Only ReconNet made an exception in the case of Dataset A with PSNR and MSE, because its ranks are spread out over the rankings (i.e., 3-th place in 30.8%, 4-th place in 31%, 5-th place in 23%).

Testing the networks against “unknown shapes” (i.e. shapes not present when training the networks) in Dataset B definitely caused a drop in the performance of all networks. Especially for the ReconNet, in which PSNR and MSE values became worse as if we had not done anything (i.e. FBP). As a reminder, we did not use the phantoms of Dataset B during training, therefore we concluded, that the ReconNet strongly relies on shape priors and it could not learn the patterns general enough to be able to gain good results on the unseen phantoms of Dataset B.

The introduction of new materials caused no problem. All methods handled the phantoms with unseen material well – according to the minor difference between the category of ”II” and ”III” in Table 4.

8 Conclusion

In this paper, we presented two novel deep convolutional neural network architectures (TomoNet2 and TomoNet3). These methods outperformed previously existing approaches on our heterogeneous data sets, in which datasets were exposed to beam hardening artifacts and a high amount of electrical noise. Our experimental results showed that the reconstruction step used as an inner part of the deep neural networks improves the quality of the reconstructions when the projections are affected by beam hardening. The SinoNet and ReconNet are not attached to the reconstruction, while in the case of TomoNet1 the connection is weaker than with TomoNet2 and TomoNet3. We are confident, that both the TomoNet2 and TomoNet3 will be useful on other datasets for tomographic purposes.

As an improvement, we are interested in replacing the non-trainable Ram-Lak filter with a trainable one in the future. Moreover, we are working on a more effective loss function, while also implementing the TomoNet2 and TomoNet3 networks using fan-beam geometry. We are also planning to make our dataset available online.

In addition, we are currently working on the acquisition of real data. We expect, that the proposed algorithms will operate well on industrial data with some adjustments. On the other hand, we may need to implement changes for instance using fan-beam geometry and expanding the training dataset for applications on clinical data.

Footnotes

Acknowledgments

This research was supported by grant TUDFO/47138-1/2019-ITM of the Ministry for Innovation and Technology, Hungary.

This research was supported by the project “Integrated program for training new generation of scientists in the fields of computer science”, no. EFOP-3.6.3-VEKOP16-2017-00002.

This research was supported by the projects “Extending the activities of the HU-MATHS-IN Hungarian Industrial and Innovation Mathematical Service Network” EFOP-3.6.2-16-2017-00015.

References

Game-icons.net. https://game-icons.net. Accessed: 2021-03-16.

Dong

, Fu

and He

, A deep learning reconstruction framework for x-ray computed tomography with incomplete data PLOS ONE 14 (2019), e0224426.

Yongshuai

, su

Ting

, Zhu

Jiongtao

, Deng

Xiaolei

, Zhang

Qiyang

, Chen

Jianwei

, Hu

Zhanli

, Zheng

Hairong

and Liang

Dong

, Adaptive-net: Deep computed tomography reconstruction network with analytical domain transformation knowledge, Quantitative Imaging in Medicine and Surgery 10 (2020), 415–427.

Ghani

M.U.

and Karl

, Deep learning based sinogram correction for metal artifact reduction, Electronic Imaging 2018 (2018), 4721–4728.

Gjesteby

, Yang

, Xi

, Shan

, Claus

, Jin

, De Man

and Wang

, Deep learning methods for ct image-domain metal artifact reduction. In Developments in X-Ray Tomography XI, volume 10391, pages 147–152. International Society for Optics and Photonics, SPIE, 2017.

Hammernik

Kerstin

, Würfl

Tobias

, Pock

and Maier

, A deep learning architecture for limited-angle computed tomography reconstruction, In Bildverarbeitung für die Medizin, 2017.

Hansen

, Landry

, Kamp

, Li

, Belka

, Parodi

and Kurz

, Scatternet: A convolutional neural network for cone-beam ct intensity correction, Medical Physics 45 (2018), 4916–4926.

Herman

G.T.

, Fundamentals of computerized tomography: image reconstruction from projections. Springer Science & Business Media, 2009. 2nd Edition.

Huang

, Wang

, Tang

, Zhong

and Zhang

, Metal artifact reduction on cervical ct images by deep residual learning, BioMedical Engineering OnLine 17 (2018).

10.

Jan

, Benoit

, Becheva

, Carlier

, Cassol

, Descourt

, Frisson

, Grevillot

, Guigues

, Maigne

, Morel

, Perrot

, Rehfeldand

, Sarrut

, Schaart

, Stute

, Pietrzyk

, Visvikis

, Zahra

and Buvat

, Gate v6: A major enhancement of the gate simulation platform enabling modelling of ct and radiotherapy, Physics in Medicine and Biology 56 (2011), 881.

11.

Jan

, Santin

, Strul

, Staelens

, Assie

, Autret

, Avner

, Barbier

, Bardiès

, Bloomfield

, Brasse

, Breton

, Bruyndonckx

, Buvat

, Chatziioannou

A.F.

, Choi

, Chung

Y.H.

, Comtat

, Donnarieix

and Morel

, Gate: a simulation toolkit for pet and spect, Phys Med Biol 49 (2004).

12.

Jennings

R.J.

, A method for comparing beam-hardening filter materials for diagnostic radiology, Medical Physics 15 (1988), 588–599.

13.

Kak

A.C.

, Slaney

, Principles of computerized tomographic imaging. IEEE Press, New York, 1999.

14.

LeCun

, Bengio

and Hinton

, Deep learning, Nature 521 (2015), 1476–4687.

15.

Lee

, Lee

, Kim

, Cho

and Cho

, Deep-neural-network-based sinogram synthesis for sparse-view ct image reconstruction, IEEE Transactions on Radiation and Plasma Medical Sciences 3(2) (2019), 109–119.

16.

Maier

, Sawall

, Knaup

and Kachelrieß

, Deep scatter estimation (dse): Accurate real-time scatter estimation for x-ray ct using a deep convolutional neural network, Journal of Nondestructive Evaluation 37 (2018).

17.

Nauwynck

, Bazrafkan

, Van Heteren

, De Beenhouwer

and Sijbers

, Ring artifact reduction in sinogram space using deep learning. In 6th International Conference on Image Formation in X-Ray Computed Tomography, 2020.

18.

Olasz

Cs.

, Varga

L.G.

and Nagy

, Beam hardening artifact removal by the fusion of FBP and deep neural networks. In Thirteenth International Conference on Digital Image Processing (ICDIP 2021), volume 11878, pages 350–360. International Society for Optics and Photonics, SPIE, 2021.

19.

Park

H.S.

, Lee

S.M.

, Kim

H.P.

, Seo

J.K.

and Chung

Y.E.

, Ct sinogramconsistency learning for metal-induced beam hardening correction, Medical Physics 45(12) (2018), 5376–5384.

20.

Pauwels

, Cao

, Wang

, Xiao

and Dewulf

, Exploratory research into reduction of scatter and beam hardening in industrial computed tomography using convolutional neural networks, 9th Conference on Industrial Computed Tomography (iCT), 2019.

21.

Poludniowski

, Landry

, Deblois

, Evans

and Verhaegen

, Spekcalc: a program to calculate photon spectra from tungsten anode x-ray tubes, Physics in Medicine and Biology 54 (2009), N433–8.

22.

Ronneberger

, Fischer

and Brox

, U-net: Convolutional networks for biomedical image segmentation.pages, Springer International Publishing (2015), 234–241.

23.

Schmidhuber

, Deep learning in neural networks: An overview, Neural Networks 61 (2015), 85–117.

24.

Wang

, A perspective on deep imaging, IEEE Access 4 (2016), 8914–8924.

25.

Wang

Zhou

, Bovik

A.C.

, Sheikh

H.R.

and Simoncelli

E.P.

, Image quality assessment: from error visibility to structural similarity, IEEE Transactions on Image Processing 13(4) (2004), 600–612.

26.

Würfl

, Hoffmann

, Christlein

, Breininger

, Huang

, Unberath

and Maier

A.K.

, Deep learning computed tomography: Learning projection-domain weights from image domain in limited angle problems, IEEE Transactions on Medical Imaging 37(6) (2018), 1454–1463.

27.

, Noo

, Dennerlein

, Lauritsch

, Wunderlich

and Hornegger

, Simulation tools for two-dimensional experiments in x-ray computed tomography using the forbild head phantom, Phys. Med. Biol 57(13) (2012), 237–252.

28.

Zhang

and Yu

, Convolutional neural network based metal artifact reduction in x-ray computed tomography, IEEE Transactions on Medical Imaging pp, 2017.

Al [mm]	Error type	RamLak		SheppLogen		Cosine		Hamming		Hann
		Average	SD	Average	SD	Average	SD	Average	SD	Average	SD
0	PSNR	20.9252	4.0782	22.3924	4.4433	22.2455	4.4017	23.0518	4.6553	23.1297	4.6824
	SSIM	0.8441	0.0774	0.8874	0.0621	0.8836	0.0634	0.9045	0.0563	0.9066	0.0556
	MSE	0.0116	0.0092	0.0087	0.0072	0.0089	0.0074	0.0077	0.0066	0.0076	0.0065
1	PSNR	24.1043	3.9576	26.9589	4.8238	26.6664	4.7169	28.5640	5.6155	28.7827	5.7529
	SSIM	0.8856	0.0653	0.9306	0.0482	0.9267	0.0497	0.9483	0.0410	0.9504	0.0401
	MSE	0.0058	0.0054	0.0036	0.0038	0.0037	0.0039	0.0029	0.0033	0.0028	0.0032
5	PSNR	22.7526	1.8531	26.8156	2.3309	26.3808	2.2566	29.5870	3.0570	30.0248	3.2310
	SSIM	0.8500	0.0397	0.9230	0.0281	0.9165	0.0292	0.9535	0.0230	0.9573	0.0224
	MSE	0.0058	0.0027	0.0024	0.0015	0.0026	0.0015	0.0014	0.0011	0.0013	0.0011
10	PSNR	20.9796	1.6583	25.2375	1.9055	24.7761	1.8634	28.2783	2.4215	28.7768	2.5669
	SSIM	0.7988	0.0424	0.8974	0.0266	0.8882	0.0282	0.9421	0.0186	0.9478	0.0176
	MSE	0.0086	0.0031	0.0033	0.0015	0.0036	0.0016	0.0017	0.0010	0.0016	0.0010
20	PSNR	18.8682	1.3609	23.4168	1.4644	22.9103	1.4480	26.8872	1.7619	27.4971	1.8609
	SSIM	0.7233	0.0462	0.8545	0.0292	0.8417	0.0313	0.9194	0.0176	0.9282	0.0158
	MSE	0.0136	0.0041	0.0048	0.0017	0.0054	0.0018	0.0022	0.0011	0.0020	0.0010
Average	PSNR	21.5835	3.4374	24.9265	3.8725	24.5691	3.8010	27.1495	4.5412	27.5021	4.6894
	SSIM	0.8232	0.0799	0.8991	0.0503	0.8921	0.0529	0.9328	0.0406	0.9372	0.0399
	MSE	0.0091	0.0064	0.0047	0.0046	0.0050	0.0047	0.0034	0.0043	0.0032	0.0042