Abstract
Nowadays, diversities of task-specific applications for computed tomography (CT) have already proposed multiple challenges for algorithm design of image reconstructions. Consequently, efficient algorithm design tool is necessary to be established. A fast and efficient algorithm design framework for CT image reconstruction, which is based on alternating direction method (ADM) with ordered subsets (OS), is proposed, termed as OS-ADM. The general ideas of ADM and OS have been abstractly introduced and then they are combined for solving convex optimizations in CT image reconstruction. Standard procedures are concluded for algorithm design which contain 1) model mapping, 2) sub-problem dividing and 3) solving, 4) OS level setting and 5) algorithm evaluation. Typical reconstruction problems are modeled as convex optimizations, including (non-negative) least-square, constrained L1 minimization, constrained total variation (TV) minimization and TV minimizations with different data fidelity terms. Efficient working algorithms for these problems are derived with detailed derivations by the proposed framework. In addition, both simulations and real CT projections are tested to verify the performances of two TV-based algorithms. Experimental investigations indicate that these algorithms are of the state-of-the-art performances. The algorithm instances show that the proposed OS-ADM framework is promising for practical applications.
Keywords
Introduction
Since its advent in 1970s, computed tomography (CT) has already been widely applied in fields such as medical imaging and industrial inspections [1]. The researches focused on CT imaging systems have also drawn numerous attentions, especially in areas of high precision electromechanical control, imaging algorithms, three-dimensional data processing and related applications [2]. Furthermore, the development of CT has, indeed, already gained great progresses. However, there still remains some important issues to be paid enough attention. High precision image reconstruction with under-sampled projections (sparse, few or limited views) is greatly important in areas needing fast data acquisition or low radiation levels. Conventional reconstruction algorithms often refer to analytical ones [3, 4] (e.g. the FBP based-algorithms) and algebraic or statistical ones [5, 6] (e.g. ART, SART and etc.) which are facing with challenging difficulties and limitations for under-sampled projections and also some other applications [4, 7]. The rise of optimization-based algorithms [2, 8] sheds some light on solving these problems and is now becoming the mainstream in the research focus. While optimization-based reconstructions have been applied for many years, Pan [2] puts forward the concept of optimization-based algorithms in a more concrete fashion. Optimization-based algorithm has definitely some important differences and applications where conventional algorithm may meet with dilemmas. Optimization-based method generally regards the image reconstruction from projections as an optimization model. The model is often built by some function as objective, which depicts one or more properties of the intended image itself, with some constraints which describes the imaging system.
Actually, the fashion of building the optimization model can be of a high flexibility which is according to data acquisition system and the intended applications of the image [2]. At the primary stage, specifically speaking, the term of optimization-based reconstruction specially means the reconstruction models and algorithms inspired by compressed sensing (CS) [9, 10]. Afterwards, with computational barriers having been lowered enough and the increasing developments of numerical and computational methods, a large number of reconstruction problems [8, 11–17] are formed as optimization models and the corresponding algorithms solving them are followed to be developed.
However, diversities of task-specific applications for CT imaging nowadays have already proposed multiple challenges for algorithm design [2, 11]. For specific CT scanners in practical applications, one may meet with different data acquisition patterns and the produced CT images are requested to be of some particular characteristics. Under these circumstances, engineers and technicians have to face with the problems to develop working reconstruction algorithms to test the imaging systems. However, dealing with algorithm development is not an easy task allowing for different objectives and constraints. Therefore, it is in great demand for scientific and theoretical researchers to propose an easy-implemented algorithm design framework. This kind of framework should better comprise some standard steps which provides a tool to make the specific algorithm development as it were the production on pipelines. Moreover, for practical applications, this theoretical tool should be easy enough for engineers and technicians to completely understand.
The alternating direction method (ADM) [18–20] is firstly proposed to tackle with the bivariate convex problem with or without constraints. Since the inspiration of CS-based applications of the recent decade, the ADM has seen a resurgence in lots of large-scale data processing fields [21] including machine learning, signal processing, and imaging. The ADM applies variable-splitting method which is just like the divide and conquer strategy. It is the fact that ADM has very simple descriptions and stable convergence properties [22] which make it very appropriate and valuable for algorithm development in imaging applications [23]. Besides in the applications mentioned above, ADM has seen some important applications [15, 24–28] in CT reconstruction. The CT algorithms based on ADM have provided engineers and technicians powerful tools to solve some of the difficult problems. These methods are straightforward to implement on computers, and have (nearly) state-of-the-art performance for large-scale problems. Although ADM techniques were introduced over 60 years ago, their importance has significantly increased in the past decade.
Aiming at finding easy implemented framework for CT algorithm development, this paper demonstrate that ADM method combined with ordered subset (OS) [29–34] can be an appropriate choice. We regard and map the specific problems as the form of ADM, and incorporate OS technique in the specific reconstruction procedures, generating a new framework for prototyping algorithm design. The framework is termed as OS-ADM in this paper. We argue that for a wide class of convex problems in CT image reconstructions, the proposed OS-ADM framework can be of fine performances. Furthermore, this framework often generates easy-coded algorithms.
Actually, we notice that, indeed, Sidky and Pan have proposed an algorithm design framework [11, 12] which is based on the Chambolle-Pock primal-dual (CPPD) [35, 36] algorithm for convex problems. The application of CPPD framework for convex algorithms in CT reconstruction is a very interesting, valuable and inspiring attempt, and its derived algorithm instances have very promising applications [37, 38]. However, the proposed framework by us is quite different from CPPD, and OS-ADM can be an efficient complement to the CPPD framework. The fashion of ADM is a more straightforward and natural idea with almost no complicated derivations and difficult mathematical concepts. The simplicity and robustness of ADM are of essential characteristics for practical use. It must be pointed out that the argument on that which is better between ADM and primal-dual methods is resultless for both of them are powerful tools for practical applications. Indeed, they two both have different limitations. CPPD is not proper for some problems when the related functions are too complicated or non-convex [11]. ADM is not proper for problems with non-linear constraints which is determined by the problem formation in (1). Besides CPPD, the gradient descent method (GDM) is also often applied to convex optimizations. GDM is a conventional method and it simply search for the descent direction in each iteration and choose some step size to minimize the objective function. GDM is potential choice for smooth and is may be not so efficient for non-smooth but convex problems. Moreover, the convergence rate is often suffering with the choice of step size for improper step size may lead to undesired solutions. It should be pointed out that, Nien and Fessler [15] have proposed the linearized augmented Lagrangian method with ordered subsets (OS-LALM) in dealing with regularized (weighted) least-squares problems raised in CT reconstruction. In their work, a method of using the linearized variant replacing the quadratic penalty is applied. The proofs of the convergence analysis of this inexact process in supplementary material of [15] indicate that it can help to improve the efficiency of the algorithm. While in this paper, we focus on OS-ADM framework for the various problems in image reconstruction in CT imaging. Nevertheless, also, the linearized variant technique is incorporated in this paper when deriving some algorithm instances.
The outline of this paper is as follows. In section 2, the general ideas of ADM methods and OS methods are introduced, and the OS-ADM framework for prototyping algorithms design is then proposed. In section 3, some typical applications where the OS-ADM can be applied as development tool are introduced. The corresponding algorithms instances are derived in detail and all of these instances are of promising value for real usages. In section 4, some experimental demonstrations of the (selected) algorithm instances are conducted. Both simulation studies and real CT data experiments suggest that the instances derived by OS-ADM are of relatively high efficiencies. Finally, some short discussions and conclusions are presented in section 5.
Generic alternating direction method and ordered subset acceleration
Some essential theoretical backgrounds
For the self-contain consideration of this paper, the alternating direction method (ADM) is firstly introduced here. ADM is an efficient framework for solving a class of convex problems with solid convergence guarantee. ADM mainly focuses on solving the following bivariate convex optimization:
For constrained optimization (1), seeking for the minimizer can be achieved by approaching the original constrained problem by a sequence of unconstrained sub-problems. A simple and intuitive way is the application of quadratic penalty term formed from the constraints. This method puts a quadratic penalty term instead of the constraint in the objective function where each penalty term is a square of the constraint violation with the multiplier. However, it requires multipliers to go to infinity to guarantee the convergence, which may cause the ill-conditioned problem numerically. A more efficient way to ensure the convergence is proposed by Hestenes [40] and Powell [41] that the augmented Lagrangian method can successfully avoid the dilemma caused by purely usage of quadratic penalty term. The augmented Lagrangian method introduces an extra linear term besides the quadratic term, and the corresponding augmented Lagrangian (AL) function to this problem is
Actually, the iteration involves three variables, i.e., (x k , y k , λ k ), where the update of multiplier λ k ensure that the convergent solutions of (2) are also the minimizer of (1). The ALM treats the original problem as a generic form regardless of its separate structure of the objective function. Making use of this special property can greatly facilitate the efficiency of the iteration, and ADM is an intuitive and efficient way to problems of this kind. The ALM convert the original problem into an unconstrained form. Minimizing an AL function over both x and y is often difficult in ALM, and thus ADM is applied to alternatingly minimize over x and y for efficiency but the convergence is still guaranteed.
ADM alternatively minimizes the objective as univariate problem with respect to x or y and use an iterative fashion to ensure the convergence. The typical form of ADM can be expressed as:
When facing with practical problems, the specific solutions for sub-problems remains to be further discussed. Actually, for general occasions, finding the exact solution to each sub-problem in each iteration may be a little bit troublesome. ADM is expected to find the minimizer for augmented Lagrangian function by solving x and y sub-problem alternately. In this paper, the inexact updates for efficiency are utilized which are derived from the linearized approximation techniques. When the errors (compared with the exact solutions) are absolutely summable, the convergence of the updates can still be guaranteed. As to finding an approximate solution to each sub-problem, a practical approach using the second order expansion at current point x k or y k is described in Appendix 1. An iteration procedure using the ADM framework is summarized in the list of Algorithm 1. In this list, the superscript on MT stands for the transpose of the matrix M, and it also applies for other matrix in this paper.
Pseudocode for K steps ADM framework for a general bivariate convex optimization. xi ∈ (0, 1) is a constant, and β > 0 is used to control the penalty strength. Each sub-problem is solved directly or by the method in Appendix 1. The initial vectors for x, y, λ are set to x0, y0, λ0 according to real life applications, respectively. Integer number k stands for iteration index.
The proximal operator proxβ [f] in the above list is used to generate a descent direction for the convex function f and is obtained by the following optimization:
The main advantage of ADM is to make full use of the separable structure of the objective function f (x) + g (y). The theoretical analysis of the convergence of ADM under the generic form of (4) are investigated thoroughly [20, 22]. Moreover, the convergence of the variant forms using the proximal operators are also covered by Xiao [42] using an unified framework proposed by He [43].
With the development of the imaging science, theoretical research has pointed out that ADM has close relationship with split-Bregman methods [44–46]. A practical algorithm with respect to a real life problem should be developed according to the specific form of f and g. As to the problems in imaging fields, some techniques, such as gradient descent methods, shrinkage operators [47], proximal operators and etc., are often incorporated into algorithm developments.
ADM provides a practical method for prototyping algorithms design for CT image reconstruction. An image reconstruction problem can often modeled as optimizations which generally comprise image regularization and data fidelity term. These models are easily mapped into the form of (1) with some direct and mathematical techniques, and ones among them are the introductions of auxiliary variables which can separate the two parts with respect to image regularization and data fidelity.
ADM put forward a generic algorithm design framework for a class of convex problems. However, ADM does not make full use of the CT data acquisition pattern. In ADM framework, some important properties of the linear transform, i.e., M and N, which can help to improve the convergence rate, have not been incorporated into the algorithm design. The projection matrix is determinate in CT image reconstruction, and it is quite different from that in typical signal recovery where some kinds of random sensing matrices are usually utilized. When developing reconstruction algorithms, better practice is to incorporate the special properties of the matrix into the designing procedures. The projection matrix in CT imaging is comprised of sub-matrices corresponding to each projection view. Once the parameters for the geometry scanning are provided, the projection matrix is a constant and determinate matrix.
Ordered subset (OS) [29] has been proven to be an efficient technique to increase the convergence rate in both algebraic [48, 49] and optimization-based algorithms [15, 50]. This method divides the acquired projection data into several parts or subsets and update the intended solution in an ordered sequence with each sub-data. The number of the subsets is often referred to as the OS-level. The ordered subset technique is effective based on the fact that the system matrix in CT imaging has strong correlation inside each projection view, and the correlation in different views becomes weaker as the difference of views goes greater. In algorithm design, if the summation involves every index of the data, e.g., the back-projection operation ATp = ∑ i A ij p i with system matrix A and the projection data p, then the OS technique is suitable to be utilized. However, the scope of the usage of OS cannot be abused. Generally speaking, the condition for OS should be satisfied with the so-called balance of subset level [29, 51] which can summarily be interpreted as follows.
Assume T stands for subset level, and the ordered sequence S1, S2, … S
T
stands for each subset, and m∈ { 1, 2, …, T } is the subset index. When the update iteration comes to subset m, the objective function can be expressed as:
In practice, the division of subsets should generally satisfy that the count of projection photon of each subsets is (approximately) equal. Thus, the objective functions of each subset should be (approximately) equal, i.e.
This requires that the general objective function Φ (x) should be splitted and independent for each entity in x which means that the L1-norm and the square of the L2-norm are well suited for this demand.
In this paper, we incorporate the OS method into the ADM framework, which is referred to OS-ADM framework, and this extension is quite straightforward. The effectiveness of OS in accelerating convergence rate is obvious and also verified by numerical experiments in what follows in this paper. The OS iteration scheme for general problem argmin f (x) + g (y) can be expressed as Algorithm 2:
Algorithm 2 Similar to that in Algorithm 1, finding the solution for and can be implemented by the method in Appendix 1, substituting M and N with M m and N m . It should be noted that, for exact and consistency projections with subset balance, OS accelerated expectation maximum (EM) [29] has been proven to be convergent.
Although the theoretical analysis for OS-ADM has not been thoroughly investigated, the OS does accelerate the convergence rate in the preliminary iteration stage. In the final stage of the iteration, some tiny oscillation on x and y may come to appear, and this phenomenon can be observed in numerical tests. The oscillation is in close relationship with the data consistency and the order of update of the subsets.
However, the effective OS-level-times image updates using OS are still very promising for ADM methods. We incorporate OS techniques into the beginning iterations (say first 20–100 loops) of the reconstruction in order to accelerate the global convergence, and then normal ADM iteration is utilized to guarantee the global convergence. This combined methods may seem to be a little complicated, but can effectively accelerate the iterations which are also verified by the experiments in section 4.
Pseudocode for K steps OS iteration scheme for general problem
argmin f (x) + g (y). T is subset level, and m∈ { 1, 2, …, T } is the subset index. S1, S2, … S
T
stand for each subset. The initial
vector for x is set to starting image x0. k is the iteration index. M
m
stands for m-th part of M = (M1 ; M2 ; … ; M
T
), and the same operation also
applies for N
m
. Other settings can be referred in Algorithm 1.
Pseudocode for K steps OS iteration scheme for general problem argmin f (x) + g (y). T is subset level, and m∈ { 1, 2, …, T } is the subset index. S1, S2, … S T stand for each subset. The initial vector for x is set to starting image x0. k is the iteration index. M m stands for m-th part of M = (M1 ; M2 ; … ; M T ), and the same operation also applies for N m . Other settings can be referred in Algorithm 1.
Many CT imaging problems can be modeled as convex optimizations, and in this paper, we introduce prototyping these algorithms using OS-ADM framework developed above. The implementation of ADM for algorithms is very simple and direct without the bothering of too much complicated mathematical derivations. ADM divides a hybrid multivariate optimization into several easy-handled univariate problem. With the incorporation of OS, the convergence rate can enjoy fine improvement.
For various data acquisition settings, one should pay enough attention into the design of the subset level and the division of the subsets which may have influences on the acceleration of the convergence. However, for a broad class of CT problems, OS-ADM can be flexibly adapted for prototype algorithm design. Aiming at specific reconstruction model, the algorithm development can be implemented according to the following five steps: Mapping the reconstruction model into the generic ADM model; Constructing the augmented Lagrangian function and dividing sub-problems; Solving each sub-problem (by some concrete methods mentioned above); Analyzing and determining the OS level and subsets, converting ADM to OS version; Implementing and running the entire algorithm, evaluating the algorithm performance.
These five steps generally describe how to use OS-ADM to develop a practical algorithm, and it must be noted that when finishing these steps, one only get a working algorithm. Some necessary optimization techniques must be utilized when putting these algorithms into real CT scanners.
As these steps are too abstract for one to follow, therefore, we first give some remarks on them. In step 1, a reconstruction model is usually mapped into ADM model by introducing one or more necessary variables. With the introduction of auxiliary variables, one can convert the original problem into one with separate structure with respect to each independent variable. The rule for this introduction is that the auxiliary variables should be a replacement of some complicated parts (e.g., say a linear transform like Mx or Ny) and can decouple the structures of the problem. The divide of the sub-problems in step 2 is straightforward according to each variable. In step 3, the solution for each problems can be achieved flexibly, and both exact and approximate methods can work well for the framework. The determination of OS level and subsets should be set according the pattern of projection acquisition. However, a guide is that the adjacent two subsets in the sequence of update should be as orthogonal as possible. In step 5, simulation studies can help to evaluate the algorithm performance, and in practical applications some more optimizations and tuning must be taken into consideration. For a truly available algorithm in practice, parameters tuning, software and hardware optimizations, data stream interface and others key factors should also be investigated. With the introduction of OS-ADM, we will give some algorithms instances which show the concrete details in practical applications in the followed context of this paper.
CT algorithms instances by OS-ADM
A typical CT system mainly consists of X-ray source, flat panel detector, mechanical gantry system and the followed computer-based data processing system. When implementing the optimization base algorithms, usually a discrete form of the X-ray transform is taken into consideration:
Unconstrained least-square reconstruction
A simple and straightforward idea for finding the solution to
For this model, a non-ADM method is often to use gradient descent method (GDM). GDM simply search for the descent direction in each iteration and choose some step size to minimize the objective function. However, the convergence rate is often suffering with the choice of step size for improper step size may lead to undesired solutions. In this subsection, we make use of OS-ADM to develop a new algorithm which can avoid this drawback appearing in GDM. We introduce a new variable
Compare (11) with (1), a direct mapping between the two models can be expressed as
Aiming at finding the minimizer of with respect to
When facing with the
Compare the above expression with (76), it is found that when set f (x) =0, M =
Moreover, the update of the multipliers
λ is
The OS level is set according to pattern of the data
Although an auxiliary
Note that it is easy to handle the choice of the parameters of τ, β, ξ, T. The value of τ is in (1, 2) and we found that τ = 1.0 always works well when ξ is simply set to 1.0. However, the OS level T should be chosen according to the practical performance in experiments. And experimentally, setting T equal to or less than the number to views of projections is an appropriate choice.
Pseudocode for K steps OS-ADM iteration scheme for LS problem. T is
subset level, and m∈ { 1, 2, …, T } is the subset index.
For a CT image, the attenuation coefficients are always of non-negative values, and this is always a very strong constraint which must be taken into the model design. In this subsection, we demonstrate how to apply ADM framework to address this concern. Before we start algorithm derivation, the indicator function are first introduced. An indicator function δ
Ω
(
Particularly, if we chose , the stands for non-negative indicator function.
The non-negative LS model can be built as
Once again we slightly abuse “≥” for vector
Apply similar mapping fashion as (12) except for , we have the AL function
Pseudocode for K steps OS-ADM iteration scheme for non-negative constrained LS problem. Other settings are the same with those in Algorithm 3
Similar to the method used in 3.1 tackling with the
The proximal mappings of indicator functions can be computed according to the method in Appendix 2. Directly applying the formula of (87) and (88), we can get the update of
The only difference between Algorithm 3 lays in line 6, where a pos() function is incorporated into the iterations scheme here. Note that the computational cost is almost the same as that in Algorithm 3, while the non-negative constraints can help to improve the stability of the iteration.
Sparsity-based optimization reconstruction has drawn great attention since the advent of compressed sensing. One of the most prominent work is finished by Candes et al on exact recovery of a medical image from sparse samples of its discrete Fourier transform. The exact recovery depends on the fact that the intended image is sparse, or there exists sparse representation which means that in some transformed domain the non-zero coefficients is sparse (or very few). Moreover, some typical mathematical theories and results of CS have given a comprehensive knowledge on recovery model, sampling condition, noise property on general sparse signal reconstruction.
The advantages of CS-based algorithm have shown great potential in under-sampled data for CT imaging which has significant influence in reconstruction theory and may have practical applications of CT scanners. CS-based methods often regard the reconstructions as optimization, based on the idea of sparsity-exploiting principles, and the model and corresponding algorithms are of the most important positions.
When developing CS-based algorithms, one must first select the measurement of the image sparsity. Generally, the sparsity can be defined as the total amount of the non-zero entities in a discrete signal, thus making the L0 quasi-norm is the direct measurement. However, the optimization for L0 quasi-norm is generally nondeterministic in polynomial time which is difficult and unstable for numerical implementation. The replacement using L1-norm is always taken into consideration. Consequently, a simple and direct model for CS-based reconstruction can be built on constrained L1 minimization of image itself, thus we have the following optimization problem
The above problem can be interpreted as finding image with minimum L1-norm within the solution space of linear equation
In order to apply ADM scheme, an auxiliary variable
The mapping into the general form of (1) needs a small trick. We write the constraints of (27) in a compact form
Note that
Pseudocode for K steps OS-ADM iteration scheme for constrained L1 minimization. Other settings are the same with those in Algorithm 3.
The two sub-problems can be decoupled from (31) with respect to
The solution for the above problem has an exact expression which is compact and analytic according to the method in Appendix 3, and it is expressed as
The x sub-problems has the form of
where
The updates for multipliers λ1, λ2 are
With the updates of each sub-problem and multiplier, the OS-ADM scheme can be obtained as Algorithm 5.
Note that if the non-negative constraint on
With the aid of indicator function , the AL can be easily obtained. Consequently, the corresponding derivation is similar to that in Algorithm 4. Thus, only a tiny modification on the update of
Therefore, again, a new algorithm based on OS-ADM can thus be developed. As the non-negative constrained L1 minimization is straightforward on the basis of Algorithm 5, we do not list it in here.
Although, in the derivation, two different penalty parameters (i.e., β1, β2) are introduced on the quadratic terms in (31), the convergence of ADM can also be guaranteed. Actually, the matrix M in (30) is composed of several different parts, and these parts have different constructs which are corresponding to different transform. Therefore, it is not very appropriate to use the same penalty parameter for all parts in the AL function. Allowing for this, the different values of β1, β2 are oft-used in order in keep balance of the two terms in AL function.
For inconsistent observation, the observation equation is no longer
If the energy of
Note that the variable of
Compare the above function to that in (31), it can be easily found that the solvations of
With some easy processing, the above can be solved by projection onto convex set of B (ɛ) as described in Appendix 2. For the completeness of the derivation, we simply conclude the updates for each variables and multipliers as
Again, thus, a new working algorithm for allowing for the noise can be generated under OS-ADM framework. We do not list it here, either, for it is actually very straightforward and one can obviously implement it easily.
It is indeed the truth that the image itself is not always sparse, and this lead to a drawback for the constrained L1 minimization. However, when the image is piecewise constant, the total variation (TV) is considered as an appropriate objective function when designing optimization-based algorithm. The TV objective is formed by using the L1-norm of the gradient magnitude of image. If we take the gradient operator as ∇ for a d dimensional image (see Appendix 4 for details), thus the operation and the TV of an image
Similarly, with the aid of indicator function and the introduction of
Actually the fashions of variable introduction and the mapping into the generic problem can be of different patterns. For the above problem, a similar trick as that in (29), we consider a compact form as
The other sub-problem is
Convert the second quadratic into the approximate form according to (77), and the method of (91) can be directly applied. As the derivation is a little bit too long, we only expressed the final solution here as
Where Consequently, the OS-ADM algorithm for constrained TV minimization lists as Algorithm 6.
In conventional applications, the projection data fitting or fidelity term is always built as a quadratic term using the square of L2-norm which is always appropriate for Gaussian noise. However, for applications with different noise properties, L2-norm is not always presenting satisfying outcomes. This generates the needs for alternative projection data fitting term. In the presence of noise with some singular values of irregular properties, such as pepper and sault noise or noises caused by bad detector bins and etc., L1-norm based data fidelity term may provide better performance. Moreover, when the data noise is a significant physical factor and the data are modeled as being drawn from a multivariate Poisson probability distribution, we then consider the Kullback–Leibler (KL) data divergence, which is implicitly employed by many iterative algorithms based on maximum likelihood expectation maximization.
Pseudocode for K steps OS-ADM iteration scheme for constrained TV
minimization. Other settings are the same with those in Algorithm 3.
Pseudocode for K steps OS-ADM iteration scheme for constrained TV minimization. Other settings are the same with those in Algorithm 3.
In this subsection, we focus on TV minimization problem with different data fitting terms of L1-norm based and KL-based ones. We consider the objective function formed by TV regularization term plus these alternative data fidelity terms. The purpose for solving these models is not only for obtaining a working algorithm but also demonstrate to the readers how to flexibly apply the OS-ADM-based algorithm design framework in tackling with some practical problems.
When applying TV and L1-norm-based data-error term, we form the objective combining TV norm term and the L1 norm of data-error term which can be expressed as:
The original problem can be converted into the following form:
The AL function for (58) can be written as
The solution for sub-problem for
The solution for x sub-problem can also obtained without too much bothering, similar to the case of (53), and it can be expressed as
Pseudocode for K steps OS-ADM iteration scheme for TV and L1-data-error reconstruction. Other settings are similar to those in Algorithm 3.
Actually, the strategy of introducing variables in (59) can have various choices which lead to different algorithm instances. Appropriate choice may simplify the derivation and the implementation, and in this algorithm instance we decouple the operation ∇
The statistic iterative reconstruction algorithm often assume that the observed data obey some statistical principle which built the reconstruction as expectation maximum models. This is always an efficient method to allow for and suppress the noise (and in many situations, Poisson noise model is considered).
Actually, modeling the data fitting term can be defined as finding a certain metric to describe the minimization of data error. When the update of iteration is conducted in the additive fashion, L2 and L1 norm is always utilized. In the sense of least square, the square of L2 norm is applied. Similarly, L1 norm is taken into use when the data error is described in the sense of absolute error of each observation. However, when applying the multiplicative update fashion, such as the cases in MART, EM, ML-EM and etc., data error function of KL form is chosen as the fidelity term which can be written as
The KL data term is based on the fact that f (x) = (x - 1 - ln x) ≥0 for . Moreover, the function f (x) on is convex which makes the optimization easily solvable.
In this subsection, we incorporate the KL term into the TV (KL-TV) model which is
When developing OS-ADM-based algorithms, a similar fashion as that in (57) for variable introduction is applied. For explicitly demonstration, we written here once again as:
Consequently, the KL-TV can be equivalently converted into
Similar as that in (59), the mapping procedure is straightforward as
The corresponding AL function of (66) is
Note that the sub-problems for
Note that
Consequently, with the update formulas of
In the previous section, the algorithm derivations under some typical applications have been demonstrated by utilizing the proposed design framework. In this section, some experiments are conducted to exhibit the practical performances of them. However, the algorithms considered here are only working versions which need to be carefully improved when put into real CT scanners.
The experiments considered in this section contain three aspects. One among them is the sparsity exploiting reconstruction on consistent and ideal condition, and the second one is on noisy condition with the presence of Poisson noise in the observed projections. Both the first two scenarios are conducted on simulation projection data. The third experiment takes the real CT data as the projection source which can be seen as a primary investigation of the applied algorithm when facing with real imaging applications. The selected algorithms are Algorithm 7 (termed as L1-TV algorithm) and algorithm 8 (termed as KL-TV algorithm) developed previously, and the 2D reconstructions using them are performed.
Pseudocode for K steps OS-ADM iteration scheme for KL-TV
reconstruction. Other settings are the same with those in Algorithm 3.
Pseudocode for K steps OS-ADM iteration scheme for KL-TV reconstruction. Other settings are the same with those in Algorithm 3.
The experiments are conducted on a PC equipped with dual-core Intel Core i5-2400 CPU at 3.10 GHz, with a memory size of 8.00 Gigabytes. The codes are written in MATLAB 2014b, and some of them are in C and CUDA with the “mex” interface provided by MATLAB and interacting with MATLAB code. The graphics processing unit (GPU) used is NVidia GeForce GTX 570 with 1.25 Gigabytes global memory and 480 CUDA cores.
In the simulation experiments, a fan beam geometry is set to simulate the data acquisition of CT imaging. The typical shepp-logan phantom with a discretization of 256×256 pixels is utilized as the standard image. This is generated by MATLAB code phantom(‘Shepp-Logan’,256) which returns a low contrast image, widely used by tomography researchers. The distance from the X-ray source to the iso-center is 680.0 mm and the distance from the X-ray source to the center of the line detector is 1360.0 mm. The number of detector bins is nbin = 512 with a size of 0.776 mm for each.
Sparse view reconstruction
The view numbers for sparsity exploiting reconstruction are set according to the sparsity of gradient magnitude image (GMI) which is the number non-zero entities in the GMI. The sparsity of the GMI of the test image is k = 2184 and the minimum number of projections for exact reconstruction should not be less than (2k + 1)/nbin ≈ 8.5 based on CS theory [8, 53]. Thus, the minimum number for the projection view is 9. For the purposes of comparison, both 7 and 9 projections are utilized to conduct the reconstructions for L1-TV and KL-TV algorithms.
For the practical implementations of the two algorithms, the choices for the values of parameters should be paid enough attentions. Obviously, the settings of parameters have important affects to the algorithm performances. The parameters involved in L1-TV and KL-TV are almost the same, i.e., the penalty weights ρ1, ρ2, AL weights β1, β2, τ and ξ. The important ones of the parameters are ρ1, ρ2, β1, β2, and actually they are not independent variables which implies that actually ρ1/ρ2, β1/β2 make sense in the implementations which imply that the tuning of them is actually not that bothering. Although the tuning for the searching for the best parameters are non-trivial tasks, some empirical choices are sufficient for the practical applications. In order to provide the reader a guide for rapid implementations of these algorithms, we offer a group of relatively appropriate choices for L1-TV and KL-TV, respectively. The specific values for each parameter are listed in Table 1, and they are used for both ideal and noisy projection data.
Available choice for parameters selection for simulation studies
Available choice for parameters selection for simulation studies
For the reconstructions under ideal and noiseless data, the maximum iteration numbers for the two algorithms are set to 2000, and once the program attains to the maximum iteration number they will stop. The running time for the algorithms of MATLAB implementation is 19 seconds and 21 seconds for 7 views and 9 views, respectively. It should be noted that the difference of time consumption between L1-TV and KL-TV is very little under the same settings. Since in the first experiment we mainly care about the basic performances under very sparse projection views, the OS level for them is 1 for convenience. And we allow for the properties of OS levels in the next focus. The reconstruction images are shown in Figs. 1 and 2. For better visual comparisons, we choose two display gray-scale window. Note that the phantom itself is of very low contrast, the first window is set as [0.005 0.05] under which we can get an intuitive impression of the test phantom. For more detailed comparison, a much more narrow display window of [0.016 0.022] under which even very tiny errors and fluctuations can be easily observed by naked eyes.

L1-TV reconstructions from sparse views for low contrast shepp-logan phantom: from the left column to the right, there are images of background truth, KL-TV results from 7 views and 9 views, respectively. The top row is in the gray scale window of [0.005 0.05], and the bottom row is in a more narrow window of [0.016 0.022].

KL-TV reconstructions from sparse views for low contrast shepp-logan phantom: from the left column to the right, there are images of background truth, L1-TV results from 7 views and 9 views, respectively. The top row is in the gray scale window of [0.005 0.05], and the bottom row is in a more narrow window of [0.016 0.022].
When the amount of data acquisition is under the theoretical lower bound, the exact (or high accurate) reconstructions for both TV-based algorithms is unreachable. These are verified for the reconstruction under 7 views shown as Figs. 1(5) and 2(5) where the artifacts are obvious under narrow display window. When the case is of adequate acquisition, both the two algorithms produce very clear images without any observed artifacts under both windows as shown in Figs. 1(3), 1(6), 2(3) and 2(6). Some typical numerical evaluations are listed in Table 2 in which the visual observation are agreed with the numerical statistics. The errors compared with the background truth are evaluated using the root mean square error (RMSE):
Performances for L1-TV and KL-TV reconstructions
In order to obtain some global knowledge of the performances for both algorithms, the two methods are tested under various projection views. The global reduction of RMSEs under different views for both algorithms are plotted in Fig. 3. The curves in figures suggest that the performances of iterations for both algorithms are very stable and only very tiny fluctuation can be observed. An interesting phenomenon observed in this figure is that the curves of 12 views, 36 views and 90 views are almost overlapped completely. This may imply that, under ideal and noiseless conditions, when the amount of acquired data is above some threshold point, the error reduction performance may stay stable.

RMSE behaviors of L1-TV and KL-TV and reconstructions: the top and the bottom are RMSE reduction curves of different projection views of L1-TV and KL-TV reconstructions, respectively.
For the tests of the performance under noisy projection, we add Poisson noise to the projection data. For the test KL-TV reconstruction, we use 60 projection acquired within 180 degrees evenly. It should be pointed out that the phantom we utilized in simulation experiments has very low contrast parts which makes it sensitive to the presence of noise. To make an intuitive impression of the noise level, the FBP reconstruction result is shown in Fig. 4. Note that the amount of projections for FBP is 900, relatively huge compared to iterative algorithms.

Reconstructions for noisy projections for low contrast shepp-logan phantom: from the left column to the right column there are background truth images, FBP reconstruction from 900 views, KL-TV reconstruction from 60 views. The top and the bottom rows are the same images displayed in gray scale windows of [0.005 0.05] and [0.016 0.022].
The experiments for the presence of noises, the stop criterion for the iterative algorithm should be modified and in our tests the iterative program is stopped when the RMSE is no longer reducing. And for the tested KL-TV reconstruction, it takes about 300 loops of iteration to obtain the best RMSE. Figure 4 shows the results which present a primary visual comparison of the two reconstructions. The root mean square errors for FBP and KL-TV reconstruction are 0.0591 and 4.2E-4, respectively.
For the last parts of the simulation data experiments, the properties of OS-levels are studied. The test algorithm is KL-TV reconstruction. We test the KL-TV method under different projection views and OS-levels. Note that complete iteration for all the above setting can be very time consuming. In the previous context, we point out that OS can accelerate the primary stage of the iteration, and thus we make a compromise in the experiments. Therefore, we only focus the first 40 loops of each individual reconstruction. Projections of 180, 90, 36, 12 views are used for tests, and the OS-levels are chosen from 1 12, 1 12, 1 8, 1 4, respectively. Each subset of projection data is chosen evenly from the global data set. For the global set and OS-level = T, and the subset index m, the m-th subset
The comparisons of RMSE behaviors are plotted in Fig. 5. The curves are basically agreed with the theoretically analysis as that the OS can accelerated the convergence in the beginnings of reconstructions. Furthermore, for different dataset, in a proper OS-level range, higher OS-levels may have stronger power in acceleration. However, when the number of subsets is out of the proper range, this relationship does not always exist, and the related more in-depth theoretical analysis may be the future work. Another essential fact that must be mentioned is that, for better convergence performance, it is better to make the OS-level = 1 for the middle and the final loops of reconstruction, especially when the data is very sparse.

RMSEs behaviors for different projection views and OS-levels. From the top left, top right, bottom left and bottom right, there are RMSE curves for different OS-levels for projection views of 180, 90, 36 and 12, respectively. The tips of the legend depict the specific settings, for example, “KL-TV-OS-08-ADM” stands for KL-TV reconstructions under OS-ADM framework with OS-level = 8.
In this subsection the real CT projections are utilized to test the selected algorithms. The same to that in the above subsection, the L1-TV and KL-TV algorithms are chosen to be investigated. The experimental data are taken on a CBCT system. An anthropomorphic head phantom is utilized as the scanning object. In order to get quick impressions of the performance of these developed working algorithms, only 2D reconstructions are conducted. The sinogram image used is shown in Fig. 6. The projections are acquired at X-ray tube energy of 125 kVp. 655 projections are acquired over 360 degrees. The distance of X-ray source to the iso-center is 1100 mm, and the distance of the center of the detector to the iso-center is 550 mm. The equivalent line detector has 1024 bins with size of 0.388 mm for each. Thus, the projection data has a data size of 1024×655. The reconstruction image has a size of 512×512 with each pixel having the size of 0.49 mm×0.49 mm.

The sinogram used in real CT data testings.
Note that we take full projection data for both L1-TV and KL-TV reconstruction. The forward-projection and backward-projection operations in the iterative algorithms are launched on GPU via CUDA code. Under the tested dataset, the running time for each iteration loop is about 0.24 seconds. The iterative algorithms are stopped when the program reach 1000 loops and we take the results as the “converged” outcome.
When dealing with the real CT projections, the parameters for the iterative algorithms are possible different from those in simulation studies. In real CT data investigation, the weights on the data fidelity (or data fitting) term have been risen to a proper extent. A group of available choices are listed in Table 3. It should be noted that the values in this table are not the best choices, and we only provide these parameters for working version. More careful tuning for them can be incorporated when facing various real data tests. The reconstructions of FDK, L1-TV and KL-TV is as shown in Fig. 7. It is a common sense that real CT projections is definitely contaminated by noise which lead to actually no true images as simulation studies. Visual comparisons of each method shows that there are almost no differences between these results. This can be verified from the both full HU window and a much narrower window display in Fig. 7.
Available choice for parameters selection for real CT projections

Reconstructions from real CT projections: from the left column to the right column, the images are the results of FDK reconstructions, L1-TV reconstruction and KL-TV reconstructions, respectively. From the top row to the bottom row, for each column, there are: the same results in full HU window of [–1000 2200] (upper row), in a much narrower HU window of [–500 500] (middle row), and local zoom-in of the region-of-interest (bottom row). The zoom-in display is located by the red solid rectangle in middle image of the left column. The iteration numbers for both L1-TV and KL-TV are 1000.
Investigations for the OS-level on the reconstruction are also conducted here. In the aforementioned text, there is actually no true images but we can take the above mentioned “converged” outcome as the “true” image to calculate the RMSEs in the following tests. The RMSEs are computed by comparing the intermediate images with this “true” image. The divide of subsets is similar to (75). We test both L1-TV and KL-TV algorithms under OS-level of 1, 2, 4 and 5. The RMSEs behaviors are plotted in Fig. 8. Note that only beginning iterations with about 25 loops are plotted in Fig. 8. As we have pointed out in the previous context, the combination of conventional ADM following the OS-ADM with OS-level>1 can efficiently accelerate the convergence rate at the beginning stage of iterations.

The RMSEs of L1-TV and KL-TV reconstructions with different OS-levels. The left is the RMSEs reduction curves with OS-level of 1, 2, 4 and 5 for L1-TV algorithm, and the right is for KL-TV algorithm.
This paper has presented the OS-ADM framework for working algorithms design. The main derivation is based on ADM scheme, and OS is incorporated to accelerate the convergence of the iteration. We demonstrate that the proposed method is flexible and can be applied to many convex problems in CT imaging. The various problems in practical applications can be formed as convex optimizations, and the design of these optimizations can be very flexible due to the diversities of demands. The ADM serves as a powerful tool for developing practical algorithms which often apply an alternative iteration scheme.
There are at least two promising aspects for the application of OS-ADM algorithm. One of the most effect merits of ADM may be the usage of decouple of complex structures in the original problems by the relaxation of introducing new variables. When in the dealing of the constraints, the augmented Lagragian function is applied which contains both first-order term and quadratic term. The AL function with the update of multipliers guarantees the equivalence to the original problem. The alternating update scheme divides the original problem into several simple problems. Moreover, OS techniques has been incorporated into the practical implementation of ADM. We have demonstrate that there are many problems raised in reconstructions which can be modeled as convex optimizations, and thus the ADM with OS can be promisingly applied to develop the working algorithms. Note that we mean the working algorithms which means that the developed algorithms may be not the most efficient ones but at least can serve as an available option for some practical problems.
We have proposed and tested several algorithms developed using OS-ADM in the previous sections. Both the simulations and real data studies have suggested that the proposed design method can be of practical value. In sparse view test, the results of L1-TV and KL-TV algorithms may suggest that the ADM scheme can reach the state-of-the-art performance in image qualities and running speed. The experiments with real CT projections have provided some intuitive impressions of L1-TV and KL-TV reconstruction. Both the two tested algorithms and the other instances are pretty easy-coded which are appropriate and important for rapid prototyping algorithms. However, it must be noted that for better practice and performance much more tunings must be considered.
The running time for the iterative algorithms is an important factor that must be taken into consideration when put these algorithm instances into real scanners. In this paper, the proposed framework applies an iterative fashion to obtain the results and the converged images may need thousands or at least hundreds iteration loops which need very long time. This problem is even much more notable for 3D reconstruction in application needing fast imaging. Therefore, both fast convergent algorithm framework and computational accelerations are of crucial positions. To this end, as parts of our future work, some accelerated versions of ADM can be studied and applied to the applications of image reconstructions. Besides the algorithms optimizations, the supported hardware accelerations should also be taken into our focuses. Nowadays, some high performance computation devices and facilities, such as high-end CPUs and GPUs or clusters, have already gained the attentions of imaging engineers and actually successfully promoted the imaging speed. Once the computation cost is lower enough, the practical applications of optimization-based algorithms will greatly be broaden.
Future work will focus on solving some more practical and specific problems raised in CT imaging field, and developing efficient corresponding algorithms. Furthermore, also, more investigations for the comprehensive performances of these algorithms should be studied.
Footnotes
Acknowledgments
This work is supported by the National Natural Science Foundation of China (No. 61372172 and 61601518). We also want to thank the anonymous referees for giving us useful comments and suggestions to improve this paper.
Appendix 1:
Approximate minimizer of sub-problems in (3) using linearized approximation and the corresponding proximal mapping.
For simplicity of notations, we denote the fixed variables in the previous iterations as y
k
= y, λ
k
= λ in . When minimizing with respect to x when fixing y and λ, we first simply denote the objective as
It is sometimes not very easy to compute the pseudo inverse of the linear transform M, and some alternative method needs to be developed which can avoid computing the inverse of big matrix. In this appendix, like in the reference [15], we linearized the quadratic term in functional (76) at the current point x
k
as:
1) If , then the optimal condition is , thus,
Note that the last expression in (78) is in a similar form of proximal mapping of f (x). If we set , x = x′, x
k
- τg
k
= x in (78), we get:
This operation dose allow for non-smooth convex functions, but f (x) does need to be simple enough so that the above optimization can be solved in a closed form.
2) Particularly, if f (x) =0, the proximal mapping of f at x turn to be x itself, i.e.,
For the applications in CT imaging, usually the form of f (x) can often guarantee that the proximal mapping has a closed form or can be solved easily in very high precision, such as f (x) is in the form of L1-nrom or the square of L2-norm.
3) Generally, compare equation (78) to (80), an approximate solution for x* is
Plug g
k
and C into (82), we will get:
Appendix 2:
Proximal mappings of indicator functions
The problem we are facing is
Here we make an intuitive analysis for the solution of (85). Note the fact that the solution to the above problem can be computed component-wise. Therefore, we take out the following
If If
Summarize the two situations, the proximal mappings of indicator functions can be obtained by the projection onto convex set Ω:
Particularly, if , then can be easily computed by
Generally, if an linear transform
Assume that the magnitude of
Appendix 3:
Proximal mappings of L1-norm functions
The problem we are facing with is
Therefore, we take a single component out of the above equation:
The first order optimal condition is
If x′ > 0, (95) turns out to be 1 + β (x′ - x) = 0, thus If x′ < 0, (95) turns out to be -1 + β (x′ - x) = 0, thus If x′ = 0, (95) turns out to be κ - βx = 0, thus
Or, briefly, x′ = max(|x| - 1/β, 0) · sgn (x), where sgn (x) is a component-wise operation and it returns the sign of x.
Thus the solution for (92) can be expressed as
Appendix 4:
Definition of finite difference operator and its transpose
For a discrete image with dimension s = N1 × N2, the forward and backward finite difference can be defined with the periodic condition:
Therefore, the operator can be defined as
The corresponding adjoint (or the transpose in matrix form) operator of ∇ can be defined with the notation of divergence as
Note that ∇T = - div . Furthermore, the extension to high dimension image is straightforward.
Appendix 5:
The computation of (54) using FFTs
We consider the following equation
