Abstract
BACKGROUND:
Due to large dimensional matrix multiplications, the existing iterative algorithms for cone beam computed tomography (CBCT) reconstruction often face problems of heavy computational workload and large volume of memory usage.
OBJECTIVE:
This study proposes and tests an iterative algorithm of 3DA-TVAL3 for fast reconstruction of CBCT images using undersampled measurement data and the reduced amount of computer memories.
METHODS:
In order to reduce computational workload and computer memories based on the sparsity of the CBCT measurement matrix, the proposed iterative algorithm applies elementwise scalar multiplications in the iterative computation to search for optimal solution. Through a number of tests on three different CT data sets with different number of projections, the reconstruction performance of the proposed algorithm is compared with that of two accelerated iterative algorithms and the conventional FDK algorithm.
RESULTS:
The visual and quantitative evaluations using the normalized mean square error, peak signal to noise ratio and structural similarity metrics demonstrated the faster computational time and the higher image quality of using the proposed 3DA-TVAL3 algorithm than using other conventional algorithms under comparison.
CONCLUSIONS:
The proposed 3DA-TVAL3 algorithm can perform efficient and fast computation of CBCT reconstruction using the reduced amount of computer memories.
Introduction
Computed tomography (CT) is one of most significant innovations medical imaging in recent decades, which can provide fast and effective analysis and identification of lesions by providing 3D visual inspection of tissues and organs of patients. With the development of CT technologies, the on-board cone-beam CT (CBCT) attracts growing attention in radiation imaging and therapy for its high spatial resolution, easy operation and relatively low acquisition expense [1–4].
Despite the high on-board volumetric imaging quality, there has been a concern of the excessive X-ray radiation does into the patient body, which is much higher than that of conventional diagnostic X-rays [5, 6]. It is expected that the CBCT scanning and imaging process can reduce the X-ray radiation does as much as possible while maintaining the quality of CT images. To reduce the number of X-ray projections in the CBCT scan directly reduces the radiation dose, which results in under sampled measurement data. In the literature, a type of filtered back-projection (FBP) algorithms, originally proposed by Feldkamp, Davis, and Kress (FDK) [7] and its derivatives [8, 9], have been widely applied to CBCT reconstruction. It is found that when the FDK type algorithms are applied to undersampled projections, the quality of reconstructed images are degraded dramatically. This is due to the incomplete information in the Fourier domain.
In contrast, another type of iterative reconstruction algorithms can provide better imaging quality using undersampled measurement data [10–14]. This type of algorithms is to solve the linear matrix equation of the CT measurement data based on computational optimization of a specified cost function. Because the measurement data and the measurement matrix are of large dimension, the iteration algorithms carry out repeated large dimensional matrix multiplication operations for computation of the optimal reconstruction solution. These matrix multiplication operations take a large amount of computational workload and require a large amount of computer memories for storage of the matrix data.
This paper extends the result of [15] on computing the X-ray transform and adjoint matrices to present an algorithm for fast iterative CBCT reconstruction to deal with the large dimensional matrix multiplication problem. Based on that the rows and columns of the CBCT measurement matrix are sparse, the proposed algorithm performs direct elementwise operations on the measurement matrix multiplication to reduce the computational workload and usage of computer memories. Such elementwise computation does not require large dimensional measurement matrix multiplications and large amount of memory storages of the large dimensional attenuation coefficients. Only a small number of scalar multiplications and a small set of parameters of the structural and geometric information of the CBCT scan are used in each elementwise computation. Using the direct elementwise operation for measurement matrix multiplication, the 3D accelerated total variation minimization by augmented Lagrangian and alternating direction algorithm (3DA-TVAL3) is proposed for CBCT image reconstruction. The fast iterative 3DA-TVAL3 algorithm is proposed for circular CBCT reconstruction and is readily extendible to spiral CBCT reconstruction.
The rest of this paper is organized as follows. Section 2 on materials and methods presents formulations of the elementwise matrix multiplications and the augmented Lagrangian and alternating direction algorithm. Section 3 presents the reconstruction results of the proposed algorithm on three difference CT data sets at different numbers of projections in comparison with other accelerated algorithms and the conventional FDK algorithm. Finally, Section 4 is on discussion and conclusion.
Materials and methods
CT systematic description and acceleration
A circular CBCT scanning system basically consists of an X-ray source, the scanned object and a flat panel detector. The geometric configuration of scanning system in a Cartesian coordinate system is shown in Fig. 1. Suppose that the attenuation coefficient of reconstructed image is discretized into a 3D matrix with N
u
= N
x
× N
y
× N
z
voxels in the Cartesian x-y-z coordinate system, the size of each voxel is d
x
× d
y
× d
z
, N
p
represents the number of projections and N
a
× N
b
represents the size of each projection. Let

Configuration of the cone beam CT imaging geometry.
As a result, the attenuation coefficient of a voxel indexed by (x, y, z) in the 3D object is the kth entry of the vector
Then the CBCT system can be formulated into the following linear equation
where
Though iterative reconstruction methods often offer better image reconstruction quality than that of the conventional FBP type methods, there is a problem of heavy computational workload that the iterative methods have not been commonly applied to clinical diagnosis. The problem is caused by repetitive calculations of
Let
For the simplicity and without loss of generality, consider the first case that the radiation ray satisfies |x
e
- x
s
| > |y
e
- y
s
| and |x
e
- x
s
| > |z
e
- z
s
|. The 3D image can be discretized into planes perpendicular to the x-axis in the x-y-z coordinate system, which are called y-z-planes. The radiation ray nontrivially intersects
It follows from [15] that, under the conditions |x e - x s | > |y e - y s | and |x e - x s | > |z e - z s |, the intersection lengths of the voxels, denoted by l(x,y,z) in each y-z-plane, indexed by x, can be computed by the formulae summarized in Table 1 for five different cases. It is shown that the intersection lengths l(x,y,z) and the corresponding coordinates of the voxels in each y-z-plane can be configured based on the coordinate information of the start and end points, i.e. (x s , y s , z s ) and (x e , y e , z e ).
Intersection coordinates and lengths of voxels in a y-z-plane
Intersection coordinates and lengths of voxels in a y-z-plane
Under the conditions |x
e
- x
s
| > |y
e
- y
s
| and |x
e
- x
s
| > |z
e
- z
s
| and with a given image update
This result under the conditions |x
e
- x
s
| > |y
e
- y
s
| and |x
e
- x
s
| > |z
e
- z
s
| can be generalized as follows
where the definitions of the x-z-planes and x-y-planes follow from that of the y-z-planes and the computations of
As a result, for a given image update
For elementwise computation of the adjoint transform
For a fixed scan projection on a voxel of the object, there exist radiation beams passing through the eight corners of the voxel as shown in Fig. 2. The straight lines connecting the intersection points of these beams on the detector panel form a closed region on the detector panel. And the radiation beams through the voxel are only projected onto the detectors within this closed region. At each projection angle n
p
× dφ, the coordinate of the voxel at (x
cj
, y
cj
, z
cj
) is transformed to a new rotated coordinate system (x
rj
, y
rj
, z
cj
), so the detector panel is perpendicular to the x
r
-axis of the transformed coordinate system. As a result

The projected X-ray beams through a voxel onto a region of the detector panel. S is the source of x-ray and D is the detector panel. The inner cube represents the jth voxel. The ball encloses the voxel and the outer cube encloses the ball. (a) the 3D view, (b) the 2D view in x-y plane.
To determine the enclosed region on the detector panel, a ball that encloses the voxel is constructed and an upright cube is formed to enclose the ball as shown in Fig. 2. Then boundaries on the detector panel, in terms of the panel detector indices can be found as follows.
In fact, for each voxel of the object, there are only a small number of detectors that collect the radiation beams through the voxel. The intersection length of each of these beams through the fixed voxel can be computed according to the start and end points of each of these beams and following the formulae as given in Table 1. As a result, the elementwise computation of
where
The subsection proposes a fast iteration algorithm for CBCT reconstruction using the elementwise computation of elementwise computation of
To separate the non-differentiable l1-norm term to reduce the computational workload in the optimization,
For reconstruction from noise-corrupted measurements, the constrained optimization problem is formulated into an unconstrained optimization problem by introducing the following augmented Lagrangian function
where
To compute the solution for the optimal minimization of the augmented Lagrangian function (11), the Alternating Direction Method (ADM) [16] is applied here to perform the following update of
where η is a step length in the update of , and it can guarantee the convergence of ADM when
Based on the above formulated optimization problem, the computational procedure of the proposed 3DA-TVAL3 is as follows. Input projections t : =1
calculate calculate step length d
t
( update
update Lagrangian multipliers calculate current augmented Lagrangian function using Equation (11)
output
t = t + 1
For quantitative analysis, three error metrics were used. The first is the normalized mean square error (NMSE) defined below which is an index of error between the reference image
The second one is the peak signal to noise ratio (PSNR) which can measure the accuracy of reconstruction subject to noise. The PSNR between the reference image
where max(
The PSNR value approaches infinity as the MSE approaches zero. A higher PSNR value indicates higher reconstruction accuracy and, at the other side of the scale, a smaller value of the PSNR implies higher reconstruction error.
The third one is the structural similarity index (SSIM) which is a measure of similarity between the reference image
where
Experiment setup
The proposed 3DA-TVAL3 for CBCT reconstruction was applied, respectively, to a 3D digital Sheep-Logan phantom, a set of real CT data downloaded from the cancer imaging archive (TCIA) [20] and a set of scanned Catphan 500 physical phantom data acquired by a CBCT system of our institute. In the first two cases, the size of the Sheep-Logan phantom is 128×128×128 voxels, that of the real CT is 512×512×138 voxels, the source to detector distance is 1000 mm, that to the rotation center is 550 mm, the size of detection panel is 512×512 mm2 and the size of each projection is 1024×1024 voxels. Since the Sheep-Logan phantom is smaller than the real CT data, only a quarter of the panel detector for faster reconstruction was used for simulation of the Sheep-Logan phantom. In the third case of the Catphan 500 physical phantom scan, the source to detector distance is 952 mm, that to the rotation center is 593 mm, the size of detection panel is 430×71.73 mm2, the size of each projection is 1440×240 voxels, the number of projections is 502 with uniformly spaced angle increment. All the tests were implemented in the Matlab R2014a programming environment. The codes were implemented on a Win7 system using Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00 GHz and 48.00 GB memory. The elementwise computation of transform and adjoint transform in section 2.2 and 2.3 were accelerated by a NIVIDIA Tesla K20c GPU.
Reconstruction of digital Sheep-Logan phantom
The 3D Sheep-Logan phantom was first tested, respectively, from 360, 180, 90, 60 projections along a circular orbit uniformly distributed in [0, 2π]. The reconstruction results of the proposed 3DA-TVAL3 were compared with that by FDK method. The mid slices of original phantom and reconstructed images in the x-y plane are shown in Fig. 3. With the decrease of projections, the performance of reconstruction results using the proposed 3DA-TVAL3 method can well preserve the critical information while the results by FDK method display considerable noise and artifacts.

The comparison of mid slice of 3D Sheep-Logan phantom reconstructed by FDK and 3DA-TVAL3. (a) the origin phantom; (b)∼(e) the FDK reconstruction results using 360, 180, 90, 60 projections respectively in the left column; (f)∼(i) the 3DA-TVAL3 reconstruction results using 360, 180, 90, 60 projections respectively in the right column.
To further illustrate the edge information, Fig. 4 shows 1D profiles of the original phantom and reconstructed images in x and y directions. It shows that results by 3DA-TVAL3 can preserve the image fidelity with highly undersampled projections, which outperform that of the FDK method.

1D profiles of the Sheep-Logan phantom in x and y directions which reconstructed by 3DA-TVAL3 and FDK. (a), (b) using 360 projections; (c), (d) using 180 projections; (e), (f) using 90 projections; (g), (h) using 60 projections.
To test the denoising effect of the proposed 3DA-TVAL3, additive Gaussian random noise matrices of zero-mean and variances 0.01, 0.05 and 0.10 were applied, respectively, to the projection data set acquired from 90 views. For each case of different variance, the noise matrix is produced using the normally distributed random number generation function of MATLAB. The reconstruction is repeatedly tested for 5 times with test using a newly generated noise matrix. As the ASD-POCS presented in [11] is a popularly known TV regularization method and EM-TV [21–23] is a recently popular TV regularization method, their performance was compared with our proposed method. The averaged NMSEs, PSNRs and SSIMs of 3DA-TVAL3, ASD-POCS and EM-TV reconstructions subject to different external noise data are shown in Table 2. It is shown that the averaged NMSE is increased and that of PSNR and SSIM are reduced as the variance value of noise increased. All the averaged PSNRs of the noisy images reconstructed by 3DA-TVAL3 are higher than 40, which indicates that the differences between the reconstructed images and reference cannot be distinguished by human eye. In contrast, the averaged PSNRs of other TV methods are considerably smaller. So the proposed 3DA-TVAL3 algorithm outperforms the other algorithms in noise suppression.
Averaged NMSEs, PSNRs and SSIMs of 3DA-TVAL3, ASD-POCS and EM-TV reconstructions with different external noise levels in the case of 90 projections
The ASD-POCS and EM-TV algorithms were also accelerated by the proposed computation of the transform and adjoint transform in Subsections 2.2 and 2.3. The numbers of iterations of the ASD-POCS and EM-TV algorithms were set to be the same as that of the 3DA-TVAL3 to compare their convergence performance. Since the performance of reconstructed images degrades with reduction of number of projections, the reconstruction performance of these algorithms in the worst case of 60 projections is evaluated. The NMSE, PSNR and SSIM values of the three algorithms are shown in Table 3. It is shown that the 3DA-TVAL3 algorithm has the lowest NMSE and the highest PSNR and SSIM values with about the same reconstruction time as that of the other algorithms.
Computational time, NMSEs, PSNRs and SSIMs of 3DA-TVAL3, ASD-POCS and EM-TV reconstructions with fixed number of iterations
Another experiment is carried out to further evaluate the reconstruction speed of the proposed 3DA-TVAL3 algorithm in comparison with that of the ASD-POCS and EM-TV algorithms. The three algorithms were set to terminate at the same NMSE value of 0.0134 and their numbers of iterations and computation times are recorded and shown in Table 4. It can be seen that, compared with ASD-POCS and EM-TV, the 3DA-TVAL3 algorithm has the least number of iterations and computational time.
Number of iterations and computational time of 3DA-TVAL3, ASD-POCS and EM-TV reconstructions with fixed terminating NMSE
It is important to mention the NMSE value of the EM-TV algorithm eventually converged to NMSE = 0.0134 in this test. It changed little even if the algorithm further iterated to more than 952 times. Whereas the ASD-POCS and 3DA-TVAL3 algorithms could to converge to NMSE = 0.01 and even less. For this reason, their computational times were compared under the condition that their NMSE arrived at the same level 0.0134.
The mid slices of reconstructed images in transverse plane, sagittal plane and frontal plane are shown in Figs. 5–7, separately. Some boundary details are quite fuzzy in the reconstructed results of ASD-POC and EM-TV compared with that of the proposed 3DA-TVAL3. And there is little difference between the 3DA-TVAL3 reconstructed results and the references. Meanwhile, there are no significant artifacts caused by sparse sampling are observed in the boundary slices. All these can show that the proposed 3DA-TVAL3 algorithm can effectively suppress the noise and preserve the structural information in the reconstructed images.

Mid slices in transverse plane from (a) The origin phantom; (b) ASD-POCS reconstruction; (c) EM-TV reconstruction; (d) 3DA-TVAL3 reconstruction. The number of iteration is 165 and the number of projections is 60. All the images are displayed with the same window.

Mid slices in sagittal plane from (a) The origin phantom; (b) ASD-POCS reconstruction; (c) EM-TV reconstruction; (d) 3DA-TVAL3 reconstruction. The number of iteration is 165 and the number of projections is 60. All the images are displayed with the same window.

Mid slices in frontal plane from (a) The origin phantom; (b) ASD-POCS reconstruction; (c) EM-TV reconstruction; (d) 3DA-TVAL3 reconstruction. The number of iteration is 165 and the number of projections is 60. All the images are displayed with the same window.
Since the ASD-POCS and EM-TV algorithms have to use more computing time to carry out further iterations for possible improvement of the reconstruction quality, our proposed 3DA-TVAL3 algorithm certainly requires less computational time to produce the same reconstruction quality as that of the ASD-POCS and EM-TV algorithms. Hence our proposed 3DA-TVAL3 can perform faster reconstruction that performed by the accelerated ASD-POCS and EM-TV algorithms.
The proposed 3DA-TVAL3 algorithm, together with the ASD-POCS and EM-TV algorithms, was applied to a set of real CT data downloaded from the cancer imaging archive for reconstruction. The quantitative evaluation results of the three algorithms, in terms of NMSE, PSNR and SSIM, are listed in Table 5. It is shown that the NMSE of 3DA-TVAL3 is significantly smaller than that of ASD-POCS and EM-TV all along for all tested projection numbers. The PSNR of 3DA-TVAL3 is about twice higher than that of ASD-POCS and EM-TV for all cases. With reduction of projection numbers, the SSIM of 3DA-TVAL3 remained almost unchanged while the SSIMs of ASD-POCS and EM-TV drop rapidly. These results can show that the 3DA-TVAL3 algorithm has advantages in noise suppression and structural information preservation in the case of undersampled projections. To illustrate visual superiority of the reconstruction by 3DA-TVAL3, the mid slices of the reconstructed images using ASD-POCS, EM-TV and 3DA-TVAL3 algorithms are shown in Fig. 8. It is shown that the boundary and detail of ASD-POCS reconstructed images are blurred. The images reconstructed by EM-TV preserve more structural information but still display considerable streaking artifacts as the projection number is reduced. In contrast, most structural information is preserved and few streaking artifacts exist in the images reconstructed by 3DA-TVAL3 with small projection numbers. These results can demonstrate the visual quality of the reconstruction by the proposed 3DA-TVAL3 algorithm subject to under sampled projections.

Comparison of mid slice of reconstructed images by the three algorithm. (a) the origin phantom; (b)∼(e) reconstructed images by ASD-POCS using 360, 180, 90, 60 projections respectively; (f)∼(i) reconstructed images by EM-TV using 360, 180, 90, 60 projections respectively; (j)∼(m) reconstructed images by 3DA-TVAL3 using 360, 180, 90, 60 projections respectively.
NMSEs, PSNRs and SSIMs of 3DA-TVAL3, ASD-POCS and EM-TV reconstruction with different number of projections
The proposed 3DA-TVAL3 algorithm was also applied to the reconstruction of a set of Catphan 500 physical phantom data scanned by a CBCT system of our institute. The reconstruction results were compared with that the conventional FDK algorithm. Figure 7 shows the spatial resolution slices of the 3D Catphan 500 phantom reconstructed by the FDK and 3DA-TVAL3 algorithms, respectively. With the reduction of projection number to 1/7 (= 72 projections) of the full projection number, reasonable reconstruction quality was achieved by the proposed 3DA-TVAL3 as shown in Fig. 9 (b). In comparison, images reconstructed by the FDK methods using 502 projections and 72 projections are shown in Fig. 9 (a) and Fig. 9 (c), respectively, where the better reconstruction result of the proposed 3DA-TVAL3 algorithm can be clearly seen.

Spatial resolution slices of the reconstructed Catphan 500 physical phantom using (a) the FDK reconstruction from 502 projections; (b) the 3DA-TVAL3 reconstruction from 72 projections; (c) the FDK reconstruction from 72 projections.
Discussion
A distinctive characteristic of the proposed 3DA-TVAL3 algorithm is that it performs elementwise computation of the matrix multiplication operations in each iterative updating. As formulated in Subsections 2.1–2.3, the elementwise computation in the iterations only uses a few geometrical parameters of the scan system without requiring storage of the large dimensional X-ray transform matrix. This directly leads to reduction of a significant amount of memories in the reconstruction processor. As a result, the RAM requirement of our proposed algorithm is about the same order of the reconstructed image size. For example, for reconstruction of the downloaded CT data of size 512×512×138 in Subsection 3.3, it was sufficient with 1GB of RAM in the GPU for implementing the 3DA-TVAL3 reconstruction.
The formulations of the elementwise operations in Subsections 2.1–2.2, as well as that in the Appendix, can indicate that the number of scalar multiplications is of the same order as the sparse number of voxel intersections. This implies that the proposed 3DA-TVAL3 algorithm carries out least number of scalar multiplications in each iteration.
Evaluations of the reconstruction speed of the proposed 3DA-TVAL3 algorithm were in comparison with the ASD-POCS and EM-TV algorithms. In the evaluations, the elementwise scalar multiplication operations on the transform and adjoint matrix multiplications were also implemented in the ASD-POCS and EM-TV algorithms. This is to make fair evaluations of the reconstruction speeds that the amounts of RAM usage of these algorithms were the same.
The fast reconstruction speed of the proposed 3DA-TVAL3 algorithm is under the condition that it is implemented with a small amount of RAM in the processor. There exist other reconstruction algorithms which may have faster reconstruction speed but may require more memory implementation in the processor. It is noted that not every reconstruction algorithm is feasible for direct implementation of the elementwise operations of the 3DA-TVAL3 algorithm. For example, the accelerated fast iterative shrinkage thresholding algorithms in [24] were based on preconditioning of the X-ray transform matrix. The proposed formulations for elementwise operations in Subsections 2.1–2.3 are not directly applicable to the preconditioned X-ray transform matrix.
Conclusion
An iterative 3DA-TVAL3 algorithm for CBCT reconstruction has been proposed in this paper. This algorithm is formulated as an augmented Lagrangian and alternating direction optimization problem and it performs elementwise computation of the matrix multiplication operations in each iterative update. A distinctive feature of the elementwise computation in the iterations is that it only uses geometrical parameters of the scan systems without storage of the X-ray transform matrix. This results in significant reduction of memories in the processor. Because of the sparse nature of the row and column vectors of the measurement matrix, each elementwise computation only involves a small sum of scalar multiplications, so it can result in efficient and fast computation of the iteration algorithm. Extensive qualitative and quantitative evaluations of the proposed 3DA-TVAL3 reconstruction algorithm were carried out in comparison with the conventional FDK and accelerated ASD-POCS and EM-TV algorithms. The evaluation results can demonstrate that the 3DA-TVAL3 reconstruction algorithm has advantages in memory reduction, noise suppression, reconstruction speed and accuracy.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China [81571772] and [61801475], by Jiangsu Planned Projects for Postdoctoral Research Funds [2018K180C] and by Key Science and Technology Plan of Jiangsu [BE2017671].
