Optimization-based image reconstruction in computed tomography by alternating direction method with ordered subsets

Abstract

Nowadays, diversities of task-specific applications for computed tomography (CT) have already proposed multiple challenges for algorithm design of image reconstructions. Consequently, efficient algorithm design tool is necessary to be established. A fast and efficient algorithm design framework for CT image reconstruction, which is based on alternating direction method (ADM) with ordered subsets (OS), is proposed, termed as OS-ADM. The general ideas of ADM and OS have been abstractly introduced and then they are combined for solving convex optimizations in CT image reconstruction. Standard procedures are concluded for algorithm design which contain 1) model mapping, 2) sub-problem dividing and 3) solving, 4) OS level setting and 5) algorithm evaluation. Typical reconstruction problems are modeled as convex optimizations, including (non-negative) least-square, constrained L1 minimization, constrained total variation (TV) minimization and TV minimizations with different data fidelity terms. Efficient working algorithms for these problems are derived with detailed derivations by the proposed framework. In addition, both simulations and real CT projections are tested to verify the performances of two TV-based algorithms. Experimental investigations indicate that these algorithms are of the state-of-the-art performances. The algorithm instances show that the proposed OS-ADM framework is promising for practical applications.

Keywords

Image reconstruction alternating direction method ordered subsets optimization-based algorithm designframework fast working algorithm design

1 Introduction

Since its advent in 1970s, computed tomography (CT) has already been widely applied in fields such as medical imaging and industrial inspections [1]. The researches focused on CT imaging systems have also drawn numerous attentions, especially in areas of high precision electromechanical control, imaging algorithms, three-dimensional data processing and related applications [2]. Furthermore, the development of CT has, indeed, already gained great progresses. However, there still remains some important issues to be paid enough attention. High precision image reconstruction with under-sampled projections (sparse, few or limited views) is greatly important in areas needing fast data acquisition or low radiation levels. Conventional reconstruction algorithms often refer to analytical ones [3, 4] (e.g. the FBP based-algorithms) and algebraic or statistical ones [5, 6] (e.g. ART, SART and etc.) which are facing with challenging difficulties and limitations for under-sampled projections and also some other applications [4, 7]. The rise of optimization-based algorithms [2, 8] sheds some light on solving these problems and is now becoming the mainstream in the research focus. While optimization-based reconstructions have been applied for many years, Pan [2] puts forward the concept of optimization-based algorithms in a more concrete fashion. Optimization-based algorithm has definitely some important differences and applications where conventional algorithm may meet with dilemmas. Optimization-based method generally regards the image reconstruction from projections as an optimization model. The model is often built by some function as objective, which depicts one or more properties of the intended image itself, with some constraints which describes the imaging system.

Actually, the fashion of building the optimization model can be of a high flexibility which is according to data acquisition system and the intended applications of the image [2]. At the primary stage, specifically speaking, the term of optimization-based reconstruction specially means the reconstruction models and algorithms inspired by compressed sensing (CS) [9, 10]. Afterwards, with computational barriers having been lowered enough and the increasing developments of numerical and computational methods, a large number of reconstruction problems [8 , 11–17] are formed as optimization models and the corresponding algorithms solving them are followed to be developed.

However, diversities of task-specific applications for CT imaging nowadays have already proposed multiple challenges for algorithm design [2, 11]. For specific CT scanners in practical applications, one may meet with different data acquisition patterns and the produced CT images are requested to be of some particular characteristics. Under these circumstances, engineers and technicians have to face with the problems to develop working reconstruction algorithms to test the imaging systems. However, dealing with algorithm development is not an easy task allowing for different objectives and constraints. Therefore, it is in great demand for scientific and theoretical researchers to propose an easy-implemented algorithm design framework. This kind of framework should better comprise some standard steps which provides a tool to make the specific algorithm development as it were the production on pipelines. Moreover, for practical applications, this theoretical tool should be easy enough for engineers and technicians to completely understand.

The alternating direction method (ADM) [18 –20] is firstly proposed to tackle with the bivariate convex problem with or without constraints. Since the inspiration of CS-based applications of the recent decade, the ADM has seen a resurgence in lots of large-scale data processing fields [21] including machine learning, signal processing, and imaging. The ADM applies variable-splitting method which is just like the divide and conquer strategy. It is the fact that ADM has very simple descriptions and stable convergence properties [22] which make it very appropriate and valuable for algorithm development in imaging applications [23]. Besides in the applications mentioned above, ADM has seen some important applications [15 , 24–28] in CT reconstruction. The CT algorithms based on ADM have provided engineers and technicians powerful tools to solve some of the difficult problems. These methods are straightforward to implement on computers, and have (nearly) state-of-the-art performance for large-scale problems. Although ADM techniques were introduced over 60 years ago, their importance has significantly increased in the past decade.

Aiming at finding easy implemented framework for CT algorithm development, this paper demonstrate that ADM method combined with ordered subset (OS) [29 –34] can be an appropriate choice. We regard and map the specific problems as the form of ADM, and incorporate OS technique in the specific reconstruction procedures, generating a new framework for prototyping algorithm design. The framework is termed as OS-ADM in this paper. We argue that for a wide class of convex problems in CT image reconstructions, the proposed OS-ADM framework can be of fine performances. Furthermore, this framework often generates easy-coded algorithms.

Actually, we notice that, indeed, Sidky and Pan have proposed an algorithm design framework [11, 12] which is based on the Chambolle-Pock primal-dual (CPPD) [35, 36] algorithm for convex problems. The application of CPPD framework for convex algorithms in CT reconstruction is a very interesting, valuable and inspiring attempt, and its derived algorithm instances have very promising applications [37, 38]. However, the proposed framework by us is quite different from CPPD, and OS-ADM can be an efficient complement to the CPPD framework. The fashion of ADM is a more straightforward and natural idea with almost no complicated derivations and difficult mathematical concepts. The simplicity and robustness of ADM are of essential characteristics for practical use. It must be pointed out that the argument on that which is better between ADM and primal-dual methods is resultless for both of them are powerful tools for practical applications. Indeed, they two both have different limitations. CPPD is not proper for some problems when the related functions are too complicated or non-convex [11]. ADM is not proper for problems with non-linear constraints which is determined by the problem formation in (1). Besides CPPD, the gradient descent method (GDM) is also often applied to convex optimizations. GDM is a conventional method and it simply search for the descent direction in each iteration and choose some step size to minimize the objective function. GDM is potential choice for smooth and is may be not so efficient for non-smooth but convex problems. Moreover, the convergence rate is often suffering with the choice of step size for improper step size may lead to undesired solutions. It should be pointed out that, Nien and Fessler [15] have proposed the linearized augmented Lagrangian method with ordered subsets (OS-LALM) in dealing with regularized (weighted) least-squares problems raised in CT reconstruction. In their work, a method of using the linearized variant replacing the quadratic penalty is applied. The proofs of the convergence analysis of this inexact process in supplementary material of [15] indicate that it can help to improve the efficiency of the algorithm. While in this paper, we focus on OS-ADM framework for the various problems in image reconstruction in CT imaging. Nevertheless, also, the linearized variant technique is incorporated in this paper when deriving some algorithm instances.

The outline of this paper is as follows. In section 2, the general ideas of ADM methods and OS methods are introduced, and the OS-ADM framework for prototyping algorithms design is then proposed. In section 3, some typical applications where the OS-ADM can be applied as development tool are introduced. The corresponding algorithms instances are derived in detail and all of these instances are of promising value for real usages. In section 4, some experimental demonstrations of the (selected) algorithm instances are conducted. Both simulation studies and real CT data experiments suggest that the instances derived by OS-ADM are of relatively high efficiencies. Finally, some short discussions and conclusions are presented in section 5.

2 Generic alternating direction method and ordered subset acceleration

2.1 Some essential theoretical backgrounds

For the self-contain consideration of this paper, the alternating direction method (ADM) is firstly introduced here. ADM is an efficient framework for solving a class of convex problems with solid convergence guarantee. ADM mainly focuses on solving the following bivariate convex optimization: $\begin{matrix} min_{x, y} f (x) + g (y) \\ s . t . Mx + Ny = c . \end{matrix},$ (1) where $f : ℝ^{s} \to ℝ$ , $g : ℝ^{t} \to ℝ$ are convex functions, and $M \in ℝ^{l \times s}$ , $N \in ℝ^{l \times t}$ stands for linear transform, and $c \in ℝ^{l}$ . This optimization model can go back to several decades ago in the work of Glowinski and Marocco [18, 19] and Gabay and Mercier [39].

For constrained optimization (1), seeking for the minimizer can be achieved by approaching the original constrained problem by a sequence of unconstrained sub-problems. A simple and intuitive way is the application of quadratic penalty term formed from the constraints. This method puts a quadratic penalty term instead of the constraint in the objective function where each penalty term is a square of the constraint violation with the multiplier. However, it requires multipliers to go to infinity to guarantee the convergence, which may cause the ill-conditioned problem numerically. A more efficient way to ensure the convergence is proposed by Hestenes [40] and Powell [41] that the augmented Lagrangian method can successfully avoid the dilemma caused by purely usage of quadratic penalty term. The augmented Lagrangian method introduces an extra linear term besides the quadratic term, and the corresponding augmented Lagrangian (AL) function to this problem is $\begin{matrix} L_{A} (x, y; λ) = f (x) + g (y) - λ^{T} (Mx + Ny - c) \\ + \frac{β}{2} {∥ Mx + Ny - c ∥}_{2}^{2}, \end{matrix}$ (2) where $λ \in ℝ^{l}$ is the Lagrangian multiplier and β > 0 is the penalty parameter. And λ^T returns a row vector which denotes the transpose of the column vector λ. The same notation of “T” also applies for other column vector in this paper. The augmented Lagrangian method (ALM) seeks for the optimized solution for (2) using the following iterative scheme: ${\begin{matrix} (x_{k + 1}, y_{k + 1}) \leftarrow arg min_{x, y} L_{A} (x, y, λ_{k}), \\ λ_{k + 1} \leftarrow λ_{k} - β ({Mx}_{k + 1} + {Ny}_{k + 1} - c) . \end{matrix}$ (3)

Actually, the iteration involves three variables, i.e., (x_k, y_k, λ_k), where the update of multiplier λ_k ensure that the convergent solutions of (2) are also the minimizer of (1). The ALM treats the original problem as a generic form regardless of its separate structure of the objective function. Making use of this special property can greatly facilitate the efficiency of the iteration, and ADM is an intuitive and efficient way to problems of this kind. The ALM convert the original problem into an unconstrained form. Minimizing an AL function over both x and y is often difficult in ALM, and thus ADM is applied to alternatingly minimize over x and y for efficiency but the convergence is still guaranteed.

ADM alternatively minimizes the objective as univariate problem with respect to x or y and use an iterative fashion to ensure the convergence. The typical form of ADM can be expressed as: ${\begin{matrix} x_{k + 1} \leftarrow arg min_{x} L_{A} (x, y_{k}, λ_{k}), \\ y_{k + 1} \leftarrow arg min_{y} L_{A} (x_{k + 1}, y, λ_{k}), \\ λ_{k + 1} \leftarrow λ_{k} - β ξ [{Mx}_{k + 1} + {Ny}_{k + 1} - c], \end{matrix}$ (4) where the scalar parameter ξ ∈ (0, 1) is a constant. It should be noted that the iteration order of each sub-problem (including the multipliers) is of high flexibility and the initial guesses of x₀ and y₀ can be set arbitrary.

When facing with practical problems, the specific solutions for sub-problems remains to be further discussed. Actually, for general occasions, finding the exact solution to each sub-problem in each iteration may be a little bit troublesome. ADM is expected to find the minimizer for augmented Lagrangian function by solving x and y sub-problem alternately. In this paper, the inexact updates for efficiency are utilized which are derived from the linearized approximation techniques. When the errors (compared with the exact solutions) are absolutely summable, the convergence of the updates can still be guaranteed. As to finding an approximate solution to each sub-problem, a practical approach using the second order expansion at current point x_k or y_k is described in Appendix 1. An iteration procedure using the ADM framework is summarized in the list of Algorithm 1. In this list, the superscript on M^T stands for the transpose of the matrix M, and it also applies for other matrix in this paper.

Algorithm 1.

Pseudocode for K steps ADM framework for a general bivariate convex optimization. xi ∈ (0, 1) is a constant, and β > 0 is used to control the penalty strength. Each sub-problem is solved directly or by the method in Appendix 1. The initial vectors for x, y, λ are set to x₀, y₀, λ₀ according to real life applications, respectively. Integer number k stands for iteration index.

1: xi ← ξ₀ ∈ (0, 1); β ← β₀ > 0; k ← 0

2: x ← x₀; y ← y₀; λ ← λ₀

3: Do

x_{k + 1} \leftarrow arg min_{x} L_{A} (x, y_{k}, λ_{k})

x_{k + 1} \leftarrow {prox}_{\hat{β}} {x_{k} - τ M^{T} ({Mx}_{k} + {Ny}_{k} - c - λ_{k} / β)},

y_{k + 1} \leftarrow arg min_{y} L_{A} (x_{k + 1}, y, λ_{k})

y_{k + 1} \leftarrow {prox}_{\hat{β}} [g] {y_{k} - τ N^{T} ({Mx}_{k + 1} + {Ny}_{k} - c - λ_{k} / β)},

6: λ_k+1 ← λ_k - βξ [Mx_k+1 + Ny_k+1 - c] ,

7: k ← k + 1,

8: untilk ≥ K, end do

9: returnx, y.

The proximal operator prox_β [f] in the above list is used to generate a descent direction for the convex function f and is obtained by the following optimization: ${prox}_{β} [f] (x) = \arg \min_{x^{'}} {f (x^{'}) + \frac{β}{2} {∥ x^{'} - x ∥}_{2}^{2}} .$ (5)

The main advantage of ADM is to make full use of the separable structure of the objective function f (x) + g (y). The theoretical analysis of the convergence of ADM under the generic form of (4) are investigated thoroughly [20, 22]. Moreover, the convergence of the variant forms using the proximal operators are also covered by Xiao [42] using an unified framework proposed by He [43].

With the development of the imaging science, theoretical research has pointed out that ADM has close relationship with split-Bregman methods [44 –46]. A practical algorithm with respect to a real life problem should be developed according to the specific form of f and g. As to the problems in imaging fields, some techniques, such as gradient descent methods, shrinkage operators [47], proximal operators and etc., are often incorporated into algorithm developments.

ADM provides a practical method for prototyping algorithms design for CT image reconstruction. An image reconstruction problem can often modeled as optimizations which generally comprise image regularization and data fidelity term. These models are easily mapped into the form of (1) with some direct and mathematical techniques, and ones among them are the introductions of auxiliary variables which can separate the two parts with respect to image regularization and data fidelity.

ADM put forward a generic algorithm design framework for a class of convex problems. However, ADM does not make full use of the CT data acquisition pattern. In ADM framework, some important properties of the linear transform, i.e., M and N, which can help to improve the convergence rate, have not been incorporated into the algorithm design. The projection matrix is determinate in CT image reconstruction, and it is quite different from that in typical signal recovery where some kinds of random sensing matrices are usually utilized. When developing reconstruction algorithms, better practice is to incorporate the special properties of the matrix into the designing procedures. The projection matrix in CT imaging is comprised of sub-matrices corresponding to each projection view. Once the parameters for the geometry scanning are provided, the projection matrix is a constant and determinate matrix.

Ordered subset (OS) [29] has been proven to be an efficient technique to increase the convergence rate in both algebraic [48, 49] and optimization-based algorithms [15, 50]. This method divides the acquired projection data into several parts or subsets and update the intended solution in an ordered sequence with each sub-data. The number of the subsets is often referred to as the OS-level. The ordered subset technique is effective based on the fact that the system matrix in CT imaging has strong correlation inside each projection view, and the correlation in different views becomes weaker as the difference of views goes greater. In algorithm design, if the summation involves every index of the data, e.g., the back-projection operation A^Tp = ∑_iA_ijp_i with system matrix A and the projection data p, then the OS technique is suitable to be utilized. However, the scope of the usage of OS cannot be abused. Generally speaking, the condition for OS should be satisfied with the so-called balance of subset level [29, 51] which can summarily be interpreted as follows.

Assume T stands for subset level, and the ordered sequence S₁, S₂, … S_T stands for each subset, and m∈ { 1, 2, …, T } is the subset index. When the update iteration comes to subset m, the objective function can be expressed as: $Φ_{m} (x) = T {\sum_{i \in S_{m}} g ([Ax]_{m})} + f (x),$ (6) where [Ax] _m stands for m-th subset of Ax. The presence of constant T is utilized to keep the regularization term not be affected by the subset level. The objective function in the original problem is in relationship with that of each subset as: $Φ (x) = \sum_{m = 1}^{T} \frac{1}{T} Φ_{m} (x) .$ (7)

In practice, the division of subsets should generally satisfy that the count of projection photon of each subsets is (approximately) equal. Thus, the objective functions of each subset should be (approximately) equal, i.e. $Φ (x) \approx Φ_{m} (x) .$ (8)

This requires that the general objective function Φ (x) should be splitted and independent for each entity in x which means that the L1-norm and the square of the L2-norm are well suited for this demand.

2.2 Ordered-subset-based alternating direction method

In this paper, we incorporate the OS method into the ADM framework, which is referred to OS-ADM framework, and this extension is quite straightforward. The effectiveness of OS in accelerating convergence rate is obvious and also verified by numerical experiments in what follows in this paper. The OS iteration scheme for general problem argmin f (x) + g (y) can be expressed as Algorithm 2:

Algorithm 2 Similar to that in Algorithm 1, finding the solution for $x_{k}^{m + 1}$ and $y_{k}^{m + 1}$ can be implemented by the method in Appendix 1, substituting M and N with M^m and N^m. It should be noted that, for exact and consistency projections with subset balance, OS accelerated expectation maximum (EM) [29] has been proven to be convergent.

Although the theoretical analysis for OS-ADM has not been thoroughly investigated, the OS does accelerate the convergence rate in the preliminary iteration stage. In the final stage of the iteration, some tiny oscillation on x and y may come to appear, and this phenomenon can be observed in numerical tests. The oscillation is in close relationship with the data consistency and the order of update of the subsets.

However, the effective OS-level-times image updates using OS are still very promising for ADM methods. We incorporate OS techniques into the beginning iterations (say first 20–100 loops) of the reconstruction in order to accelerate the global convergence, and then normal ADM iteration is utilized to guarantee the global convergence. This combined methods may seem to be a little complicated, but can effectively accelerate the iterations which are also verified by the experiments in section 4.

Algorithm 2:
Pseudocode for K steps OS iteration scheme for general problem argmin f (x) + g (y). T is subset level, and m∈ { 1, 2, …, T } is the subset index. S₁, S₂, … S_T stand for each subset. The initial vector for x is set to starting image x₀. k is the iteration index. M^m stands for m-th part of M = (M₁ ; M₂ ; … ; M_T), and the same operation also applies for N^m. Other settings can be referred in Algorithm 1.

1:   ξ ← ξ₀ ∈ (0, 1); β ← β₀ > 0; k ← 0

2:   x ← x₀; y ← y₀; λ ← λ₀

3: Do

4:     a) $x_{k}^{1} \leftarrow x_{k}$ , $y_{k}^{1} \leftarrow y_{k}$ , $λ_{k}^{1} \leftarrow λ_{k}$

5:     b) For subsets m ∈ 1, 2, …, T do

6:          $x_{k}^{m + 1} \leftarrow arg min_{x} L_{A} (x, y_{k}^{m}, λ_{k}^{m})$

7:          $y_{k}^{m + 1} \leftarrow arg min_{y} L_{A} (x_{k}^{m + 1}, y, λ_{k}^{m})$

8:          $λ_{k}^{m + 1} \leftarrow λ_{k}^{m} - β ξ [M^{m} x_{k}^{m + 1} + N^{m} y_{k}^{m + 1} - c],$

9:     c) End for

10:   d) $x_{k + 1} \leftarrow x_{k}^{m + 1}$ , $y_{k + 1} \leftarrow y_{k}^{m + 1}$ , $λ_{k + 1} \leftarrow λ_{k}^{m + 1}$ , k ← k + 1,

11: untilk ≥ K, end do

12: returnx, y.

2.3 The OS-ADM framework for prototyping CT reconstruction algorithms

Many CT imaging problems can be modeled as convex optimizations, and in this paper, we introduce prototyping these algorithms using OS-ADM framework developed above. The implementation of ADM for algorithms is very simple and direct without the bothering of too much complicated mathematical derivations. ADM divides a hybrid multivariate optimization into several easy-handled univariate problem. With the incorporation of OS, the convergence rate can enjoy fine improvement.

For various data acquisition settings, one should pay enough attention into the design of the subset level and the division of the subsets which may have influences on the acceleration of the convergence. However, for a broad class of CT problems, OS-ADM can be flexibly adapted for prototype algorithm design. Aiming at specific reconstruction model, the algorithm development can be implemented according to the following five steps:

Mapping the reconstruction model into the generic ADM model;

Constructing the augmented Lagrangian function and dividing sub-problems;

Solving each sub-problem (by some concrete methods mentioned above);

Analyzing and determining the OS level and subsets, converting ADM to OS version;

Implementing and running the entire algorithm, evaluating the algorithm performance.

These five steps generally describe how to use OS-ADM to develop a practical algorithm, and it must be noted that when finishing these steps, one only get a working algorithm. Some necessary optimization techniques must be utilized when putting these algorithms into real CT scanners.

As these steps are too abstract for one to follow, therefore, we first give some remarks on them. In step 1, a reconstruction model is usually mapped into ADM model by introducing one or more necessary variables. With the introduction of auxiliary variables, one can convert the original problem into one with separate structure with respect to each independent variable. The rule for this introduction is that the auxiliary variables should be a replacement of some complicated parts (e.g., say a linear transform like Mx or Ny) and can decouple the structures of the problem. The divide of the sub-problems in step 2 is straightforward according to each variable. In step 3, the solution for each problems can be achieved flexibly, and both exact and approximate methods can work well for the framework. The determination of OS level and subsets should be set according the pattern of projection acquisition. However, a guide is that the adjacent two subsets in the sequence of update should be as orthogonal as possible. In step 5, simulation studies can help to evaluate the algorithm performance, and in practical applications some more optimizations and tuning must be taken into consideration. For a truly available algorithm in practice, parameters tuning, software and hardware optimizations, data stream interface and others key factors should also be investigated. With the introduction of OS-ADM, we will give some algorithms instances which show the concrete details in practical applications in the followed context of this paper.

3 CT algorithms instances by OS-ADM

A typical CT system mainly consists of X-ray source, flat panel detector, mechanical gantry system and the followed computer-based data processing system. When implementing the optimization base algorithms, usually a discrete form of the X-ray transform is taken into consideration: $p = Wx,$ (9) where $p \in ℝ^{l}$ is the data vector, $x \in ℝ^{s}$ is the object vector, and $W \in ℝ^{l \times s}$ is the system matrix. With the linear equation expressed above, image reconstruction is to solve the equation of (9). However, even under ideal and consistent condition, finding the solution is not an easy task. This is mainly for two reasons, one of which is that the projection matrix W is tremendously huge thus computing its (pseudo) inverse is computationally intractable. Another reason is that W is often severely ill-conditioned having huge condition number resulting numerical computing unstable. In this section, following the problems presented in reference [11], some typical algorithm instances are derived and presented by OS-ADM.

3.1 Least-square reconstruction

3.1.1 Unconstrained least-square reconstruction

A simple and straightforward idea for finding the solution to p = W & x is to minimize the error in the sense of least-square (LS). Actually, this is a general method for solving a broad class of linear equations in many areas. For a simple demonstration of developing reconstruction algorithms, we begin with the LS model. The LS aims at minimizing the quadratic term forming by the estimated data and the measured data: $\tilde{x} = arg min_{x} {∥ Wx - p ∥}_{2}^{2} .$ (10)

For this model, a non-ADM method is often to use gradient descent method (GDM). GDM simply search for the descent direction in each iteration and choose some step size to minimize the objective function. However, the convergence rate is often suffering with the choice of step size for improper step size may lead to undesired solutions. In this subsection, we make use of OS-ADM to develop a new algorithm which can avoid this drawback appearing in GDM. We introduce a new variable u to convert (10) into a new form: $\begin{matrix} arg min_{u, x} {∥ u - p ∥}_{2}^{2} \\ s . t . u = Wx . \end{matrix}$ (11)

Compare (11) with (1), a direct mapping between the two models can be expressed as $\begin{matrix} f (x) = 0, g (y) = {∥ y - p ∥}_{2}^{2}, \\ M = W, N = - I, \\ x = x, y = u, c = 0, \end{matrix}$ (12) where I stands for identity matrix (or transform). The augmented Lagrangian function of (11) is $L_{A} (u, x; λ) = {∥ u - p ∥}_{2}^{2} + λ^{T} (u - Wx) + \frac{β}{2} {∥ u - Wx ∥}_{2}^{2} .$ (13)

Aiming at finding the minimizer of $L_{A} (u)$ with respect to u when fixing x, one can compute the derivative of $L_{A} (u)$ and set it to 0: $(2 + β) u = β Wx + 2 p - λ,$ (14) and the solution for u sub-problem is obviously as $u = \frac{β W x + 2 p - λ}{2 + β} .$ (15)

When facing with the x sub-problem, there is a little complicated problem to tackle with. The x sub-problem can be written as (regardless of some constants): $\begin{matrix} x^{*} = \arg \min_{x} L_{A} (x), \\ = \arg \min_{x} \frac{β}{2} {‖ u - W x + \frac{λ}{β} ‖}_{2}^{2} . \end{matrix}$ (16)

Compare the above expression with (76), it is found that when set f (x) =0, M = W, and C = - u - λ#x03BB;#x03BB;/β in (76), and then they two turn out to be identical. Therefore, we apply the method in Appendix 1, and on the basis of (81) and (83), the update formula for x can be express as $x = x_{k} - τ W^{T} ({Wx}_{k} - u - \frac{λ}{β}),$ (17) where the parameter τ ∈ (1, 2) is a constant. In (17), the operator W^T denotes the back-projection operation and the convergence of ADM is only guaranteed when W^T is the exact transpose of W.

Moreover, the update of the multipliers λ is $λ \leftarrow λ + β ξ (u - Wx) .$ (18)

The OS level is set according to pattern of the data p, and it is usually set in accordance to the number of projection views. Assume that T is an appropriate OS level, and the subset index is m∈ { 1, 2, …, T }, and the corresponding subsets are p₁, p₂, …, p_T. Based on the substitution and the derivation for the LS model, one have the OS-ADM reconstruction scheme in Algorithm 3.

Although an auxiliary u has been introduced into this algorithm, yet the computational burden does not have significant increase. This is because that the most of the computation cost is the forward-projection and backward-projection and in each iteration. The two processings are only needed to be implemented once in each inner iteration. Although Wx has been appeared three times (i.e., two $W^{m} x_{k}^{m + 1}$ , and one $W^{m} x_{k}^{m}$ ), $W^{m} x_{k}^{m + 1}$ can be seen as $W^{m} x_{k}^{m}$ for the next inner iteration.

Note that it is easy to handle the choice of the parameters of τ, β, ξ, T. The value of τ is in (1, 2) and we found that τ = 1.0 always works well when ξ is simply set to 1.0. However, the OS level T should be chosen according to the practical performance in experiments. And experimentally, setting T equal to or less than the number to views of projections is an appropriate choice.

Algorithm 3:

Pseudocode for K steps OS-ADM iteration scheme for LS problem. T is subset level, and m∈ { 1, 2, …, T } is the subset index. p₁, p₂, …, p_T stands for subsets. The initial vector for x and u are set to starting vectors x₀ and u₀, respectively. k is the iteration index. W^m stands for m-th part of W = (W₁ ; W₂ ; … ; W_T)

1: ξ ← ξ₀ ∈ (0, 1); β ← β₀ > 0; k ← 0

2: x ← x₀; u ← u₀; λ ← λ₀

3: Do

4: a)

x_{k}^{1} \leftarrow x_{k}

u_{k}^{1} \leftarrow u_{k}

λ_{k}^{1} \leftarrow λ_{k}

5: b) For subsets m ∈ 1, 2, …, T do

x_{k}^{m + 1} \leftarrow x_{k}^{m} - τ (W^{m})^{T} (W^{m} x_{k}^{m} - u_{k}^{m} - λ_{k}^{m} / β)

u_{k}^{m + 1} \leftarrow (β W^{m} x_{k}^{m + 1} + 2 p_{m} - λ_{k}^{m}) / (2 + β)

λ_{k}^{m + 1} \leftarrow λ_{k}^{m} - β ξ (W^{m} x_{k}^{m + 1} - u_{k}^{m + 1})

9: c) End for

10: d)

x_{k + 1} \leftarrow x_{k}^{m + 1}

u_{k + 1} \leftarrow u_{k}^{m + 1}

λ_{k + 1} \leftarrow λ_{k}^{m + 1}

, k ← k + 1,

11: untilk ≥ K, end do

12: returnx.

3.1.2 Non-negative constrained LS reconstruction

For a CT image, the attenuation coefficients are always of non-negative values, and this is always a very strong constraint which must be taken into the model design. In this subsection, we demonstrate how to apply ADM framework to address this concern. Before we start algorithm derivation, the indicator function are first introduced. An indicator function δ_Ω (x) define on convex set Ω has the form of $δ_{Ω} (x) \equiv {\begin{matrix} 0, & x \in Ω \\ + \infty, & x \notin Ω \end{matrix} .$ (19)

Particularly, if we chose $Ω = ℝ_{+}^{s}$ , the $δ_{ℝ_{+}^{s}} (x)$ stands for non-negative indicator function.

The non-negative LS model can be built as $\begin{matrix} \tilde{x} = arg \min_{x} {∥ Wx - p ∥}_{2}^{2}, \\ s . t . x \geq 0 . \end{matrix}$ (20)

Once again we slightly abuse “≥” for vector x and x ≥ 0 means that all the entities in x must be non-negative. Then with the aid of indicator function, we can convert (20) into an unconstrained form as $\tilde{x} = arg \min_{x} {∥ Wx - p ∥}_{2}^{2} + δ_{ℝ_{+}^{s}} (x) .$ (21)

Apply similar mapping fashion as (12) except for $f (x) = δ_{ℝ_{+}^{s}} (x)$ , we have the AL function $L_{A} (u, x; λ) = {∥ u - p ∥}_{2}^{2} + λ^{T} (u - Wx) + \frac{β}{2} {∥ u - Wx ∥}_{2}^{2} + δ_{ℝ_{+}^{s}} (x),$ (22) where the u sub-problem and its solution have the same formula as (15). The other sub-problem with respect to x is $\begin{matrix} x^{*} = arg min_{x} L_{A} (x), \\ = arg min_{x} \frac{β}{2} {∥ u - Wx + \frac{λ}{β} ∥}_{2}^{2} + δ_{ℝ_{+}^{s}} (x) . \end{matrix}$ (23)

Algorithm 4:

Pseudocode for K steps OS-ADM iteration scheme for non-negative constrained LS problem. Other settings are the same with those in Algorithm 3

1: ξ ← ξ₀ ∈ (0, 1); β ← β₀ > 0; k ← 0

2: x ← x₀; u ← u₀; λ ← λ₀

3: Do

4: a) λ ← λ₀,

u_{k}^{1} \leftarrow u_{k}

λ_{k}^{1} \leftarrow λ_{k}

5: b) For subsets m ∈ 1, 2, …, T do

x_{k}^{m + 1} \leftarrow pos (x_{k}^{m} - τ (W^{m})^{T} (W^{m} x_{k}^{m} - u_{k}^{m} - λ_{k}^{m} / β))

u_{k}^{m + 1} \leftarrow (β W^{m} x_{k}^{m + 1} + 2 p_{m} - λ_{k}^{m}) / (2 + β)

λ_{k}^{m + 1} \leftarrow λ_{k}^{m} - β ξ (W^{m} x_{k}^{m + 1} - u_{k}^{m + 1})

9: c) End for

10: d)

x_{k + 1} \leftarrow x_{k}^{m + 1}

u_{k + 1} \leftarrow u_{k}^{m + 1}

λ_{k + 1} \leftarrow λ_{k}^{m + 1}

, k ← k + 1,

11: untilk ≥ K, end do

12: returnx.

Similar to the method used in 3.1 tackling with the x sub-problem. Compare the above expression with (76), the only difference is set $f (x) = δ_{ℝ_{+}^{s}} (x)$ instead of f (x) =0. Therefore, an approximate solution for (23) can be induced by applying (83), and more specifically, it is ${prox}_{β} [δ_{ℝ_{+}^{s}}] (x_{k} - τ W^{T} ({Wx}_{k} - u - λ / β)) .$ (24)

The proximal mappings of indicator functions can be computed according to the method in Appendix 2. Directly applying the formula of (87) and (88), we can get the update of x as $x = pos (x_{k} - τ W^{T} ({Wx}_{k} - u - λ / β)),$ (25) where the operator pos() is identically defined as that in Appendix 2. Moreover, the update of the multipliers λ is the same as (18). Consequently, we can obtain the non-negative constrained LS reconstruction based on OS-ADM as described in Algorithm 4.

The only difference between Algorithm 3 lays in line 6, where a pos() function is incorporated into the iterations scheme here. Note that the computational cost is almost the same as that in Algorithm 3, while the non-negative constraints can help to improve the stability of the iteration.

3.2 Constrained L1 minimization reconstruction

Sparsity-based optimization reconstruction has drawn great attention since the advent of compressed sensing. One of the most prominent work is finished by Candes et al on exact recovery of a medical image from sparse samples of its discrete Fourier transform. The exact recovery depends on the fact that the intended image is sparse, or there exists sparse representation which means that in some transformed domain the non-zero coefficients is sparse (or very few). Moreover, some typical mathematical theories and results of CS have given a comprehensive knowledge on recovery model, sampling condition, noise property on general sparse signal reconstruction.

The advantages of CS-based algorithm have shown great potential in under-sampled data for CT imaging which has significant influence in reconstruction theory and may have practical applications of CT scanners. CS-based methods often regard the reconstructions as optimization, based on the idea of sparsity-exploiting principles, and the model and corresponding algorithms are of the most important positions.

When developing CS-based algorithms, one must first select the measurement of the image sparsity. Generally, the sparsity can be defined as the total amount of the non-zero entities in a discrete signal, thus making the L0 quasi-norm is the direct measurement. However, the optimization for L0 quasi-norm is generally nondeterministic in polynomial time which is difficult and unstable for numerical implementation. The replacement using L1-norm is always taken into consideration. Consequently, a simple and direct model for CS-based reconstruction can be built on constrained L1 minimization of image itself, thus we have the following optimization problem $\begin{matrix} \tilde{x} = \arg \min_{x} {‖ x ‖}_{1}, \\ s . t . W x = p . \end{matrix}$ (26)

The above problem can be interpreted as finding image with minimum L1-norm within the solution space of linear equation Wx = p when it is consistent. When the observation is contaminated by noise or errors, some tiny modifications can be made to allow for these inconsistencies in what follows.

In order to apply ADM scheme, an auxiliary variable z is introduced into (26) and the problem can be equivalently converted into $\begin{matrix} (\tilde{x}, \tilde{z}) = arg \min_{x, z} {∥ z ∥}_{1}, \\ s . t . {\begin{matrix} z = x, \\ Wx = p . \end{matrix} \end{matrix}$ (27)

The mapping into the general form of (1) needs a small trick. We write the constraints of (27) in a compact form $Mx = c,$ (28) where $M = (\begin{matrix} W_{l \times s} & O_{l \times s} \\ - I_{s \times s} & I_{s \times s} \end{matrix}), x = (\begin{matrix} \begin{matrix} x \\ z \end{matrix} \end{matrix}), c = (\begin{matrix} p \\ 0 \end{matrix}) .$ (29)

Note that O_l×s denotes a l × s matrix with all entities equal to 0. With the aid of the above compact form, a direct mapping to the generic problem can be expressed as $\begin{matrix} M = (\begin{matrix} W_{l \times s} & O_{l \times s} \\ - I_{s \times s} & I_{s \times s} \end{matrix}), N = O, \\ x = (\begin{matrix} x \\ z \end{matrix}), y = z, c = (\begin{matrix} p \\ 0 \end{matrix}), \\ f (x) = {‖ (\begin{matrix} 0_{s \times 1} \\ 1_{s \times 1} \end{matrix}) \cdot x ‖}_{1}, g (y) = 0, \end{matrix}$ (30) where 0_s×1 and 1_s×1 denote column vectors of size s × 1 with all entities is 0 and 1, respectively. Note that we define a special function f (x) in (30), in which “·” stands for component-wise product of two vectors. Obviously, we have f (x) = ∥ z ∥ ₁. The AL function can be derived as

$\begin{matrix} L_{A} (z, x; λ_{1}, λ_{2}) = {∥ z ∥}_{1} - λ_{1}^{T} (Wx - p) + \frac{β_{1}}{2} {∥ Wx - p ∥}_{2}^{2} \\ - λ_{2}^{T} (z - x) + \frac{β_{2}}{2} {∥ z - x ∥}_{2}^{2}, \end{matrix}$ (31) where some straightforward derivations and expressions have been omitted for the length of this paper is limited.

Algorithm 5:

Pseudocode for K steps OS-ADM iteration scheme for constrained L1 minimization. Other settings are the same with those in Algorithm 3.

1: ξ ← ξ₀ ∈ (0, 1); k ← 0

2: x ← x₀; z ← z₀; λ₁ ← 0, λ₂ ← 0

3: Do

4: a)

x_{k}^{1} \leftarrow x_{k}

z_{k}^{1} \leftarrow z_{k}

(λ_{1})_{k}^{1} \leftarrow (λ_{1})_{k}, (λ_{2})_{k}^{1} \leftarrow (λ_{2})_{k}

5: b) For subsets m ∈ 1, 2, …, T do

x_{k}^{m + 1} \leftarrow \frac{τ}{β_{1} + β_{2} τ} (\frac{β_{1}}{τ} x_{k}^{m} + β_{2} z_{k}^{m} - (λ_{2})_{k}^{m} - β_{1} (W^{m})^{T} (W^{m} x_{k}^{m} - p_{m} - \frac{(λ_{1})_{k}^{m}}{β_{1}})) .

z_{k}^{m + 1} \leftarrow max (| x_{k}^{m + 1} + \frac{(λ_{2})_{k}^{m}}{β_{2}} | - \frac{1}{β_{2}}) \cdot sgn (x_{k}^{m + 1} + \frac{(λ_{2})_{k}^{m}}{β_{2}})

\begin{matrix} (λ_{1})_{k}^{m + 1} \leftarrow (λ_{1})_{k}^{m} - β_{1} ξ (W^{m} x_{k}^{m + 1} - p_{m}), \\ (λ_{2})_{k}^{m + 1} \leftarrow (λ_{2})_{k}^{m} - β_{2} ξ (z_{k}^{m + 1} - x_{k}^{m + 1}) . \end{matrix}

9: c) End for

10: d)

x_{k + 1} \leftarrow x_{k}^{m + 1}

z_{k + 1} \leftarrow z_{k}^{m + 1}

(λ_{1})_{k + 1} \leftarrow (λ_{1})_{k}^{m + 1}, (λ_{2})_{k + 1} \leftarrow (λ_{2})_{k}^{m + 1}

, k ← k + 1,

11: untilk ≥ K, end do

12: returnx.

The two sub-problems can be decoupled from (31) with respect to z and x independently. The z sub-problem reads as $arg \min_{z} L_{A} (z) = arg min_{z} {∥ z ∥}_{1} + \frac{β_{2}}{2} {∥ z - x - \frac{λ_{2}}{β_{2}} ∥}_{2}^{2} .$ (32)

The solution for the above problem has an exact expression which is compact and analytic according to the method in Appendix 3, and it is expressed as $z = max (| x + \frac{λ_{2}}{β_{2}} | - \frac{1}{β_{2}}) \cdot sgn (x + \frac{λ_{2}}{β_{2}}) .$ (33)

The x sub-problems has the form of $arg \min_{x} L_{A} (x) = arg min_{x} \frac{β_{1}}{2} {∥ Wx - p - \frac{λ_{1}}{β_{1}} ∥}_{2}^{2} + \frac{β_{2}}{2} {∥ x - z + \frac{λ_{2}}{β_{2}} ∥}_{2}^{2},$ (34) where we use the replacement as ${∥ Wx - p - \frac{λ_{1}}{β_{1}} ∥}_{2}^{2} \approx {∥ {Wx}_{k} - p - \frac{λ_{1}}{β_{1}} ∥}_{2}^{2} + 2 g_{k}^{T} (x - x_{k}) + \frac{1}{τ} {∥ x - x_{k} ∥}_{2}^{2},$ (35)

where g_k = W^T (Wx_k - p - λ₁/β₁). With the above replacement, and set the derivative of (34) to 0, thus the solution for x is $x = \frac{τ}{β_{1} + β_{2} τ} (\frac{β_{1}}{τ} x_{k} + β_{2} z - λ_{2} - β_{1} W^{T} ({Wx}_{k} - p - {\frac{λ_{1}}{β}}_{1})) .$ (36)

The updates for multipliers λ₁, λ₂ are

$\begin{matrix} λ_{1} \leftarrow λ_{1} - β_{1} (Wx - p), \\ λ_{2} \leftarrow λ_{2} - β_{2} (z - x) . \end{matrix}$ (37)

With the updates of each sub-problem and multiplier, the OS-ADM scheme can be obtained as Algorithm 5.

3.2.1 Non-negative constraint on image

Note that if the non-negative constraint on x is considered and incorporated into the model of (27), the constraints are ${\begin{matrix} z = x, \\ Wx = p, \\ x \geq 0 . \end{matrix}$ (38)

With the aid of indicator function $δ_{ℝ_{+}^{N}} (x)$ , the AL can be easily obtained. Consequently, the corresponding derivation is similar to that in Algorithm 4. Thus, only a tiny modification on the update of x in line 6 of Algorithm 5 and it can be expressed as $x_{k}^{m + 1} \leftarrow pos {\frac{τ}{β_{1} + β_{2} τ} (\frac{β_{1}}{τ} x_{k}^{m} + β_{2} z_{k}^{m} - (λ_{2})_{k}^{m} - β_{1} (W^{m})^{T} (W^{m} x_{k}^{m} - p_{m} - \frac{(λ_{1})_{k}^{m}}{β_{1}}))} .$ (39)

Therefore, again, a new algorithm based on OS-ADM can thus be developed. As the non-negative constrained L1 minimization is straightforward on the basis of Algorithm 5, we do not list it in here.

Although, in the derivation, two different penalty parameters (i.e., β₁, β₂) are introduced on the quadratic terms in (31), the convergence of ADM can also be guaranteed. Actually, the matrix M in (30) is composed of several different parts, and these parts have different constructs which are corresponding to different transform. Therefore, it is not very appropriate to use the same penalty parameter for all parts in the AL function. Allowing for this, the different values of β₁, β₂ are oft-used in order in keep balance of the two terms in AL function.

3.2.2 Inconsistent observations with noise

For inconsistent observation, the observation equation is no longer Wx = p under the influences of noise. Actually, there are many available methods to deal with noise, and here a simple and direct method is introduced to allow for the presence of noise. Note that it can directly apply to the next algorithm instances in the following subsections. Assume the noise is denoted by e, then the observation can be modified by adding e in (9), then getting $Wx - p = e .$ (40)

If the energy of e is restricted as ${∥ e ∥}_{2}^{2} \leq ɛ$ , where ɛ stands for noise level. Then with the aid of the following indicator function $δ_{B (ɛ)} (x) = {\begin{matrix} 0, & x \in B (ɛ) \\ + \infty, & x \notin B (ɛ) \end{matrix}, B (ɛ) : = {x | {∥ x ∥}_{2}^{2} \leq ɛ},$ (41) the optimization of (27) can be modified as $\begin{matrix} (\tilde{x}, \tilde{z}) = arg \min_{x, z} {∥ z_{1} ∥}_{1} + δ_{B (ɛ)} (z_{2} - p), \\ s . t . {\begin{matrix} z_{1} = x, \\ z_{2} = Wx . \end{matrix} \end{matrix}$ (42)

Note that the variable of z here is z = (z₁ ; z₂) which is different with that in (27). The AL function of the above problem is $L_{A} (x, z) = {∥ z_{1} ∥}_{1} + δ_{B (ɛ)} (z_{2} - p) + \frac{β_{1}}{2} {∥ z_{1} - x - λ_{1} / β_{1} ∥}_{2}^{2} + \frac{β_{2}}{2} {∥ z_{2} - Wx - λ_{2} / β_{2} ∥}_{2}^{2} .$ (43)

Compare the above function to that in (31), it can be easily found that the solvations of z₁, x in (43) are almost the same to z, x except for some notations. The z₂ sub-problem can be written as $L_{A} (z_{2}) = δ_{B (ɛ)} (z_{2} - p) + \frac{β_{2}}{2} {∥ z_{2} - Wx - λ_{2} / β_{2} ∥}_{2}^{2} .$ (44)

With some easy processing, the above can be solved by projection onto convex set of B (ɛ) as described in Appendix 2. For the completeness of the derivation, we simply conclude the updates for each variables and multipliers as $\begin{matrix} x = \frac{τ}{β_{2} + β_{1} τ} (\frac{β_{2}}{τ} x_{k} + β_{1} z_{1} - λ_{1} - β_{2} W^{T} (W x_{k} - z_{2} + \frac{λ_{2}}{β_{2}})), \\ z_{1} = \max (| x + \frac{λ_{1}}{β_{1}} | - \frac{1}{β_{1}}) \cdot sgn (x + \frac{λ_{1}}{β_{1}}), \\ z_{2} = \min (1, \sqrt{ε} / {‖ W x - p + λ_{2} / β_{2} ‖}_{2}) \cdot (W x - p + λ_{2} / β_{2}) + p, \\ λ_{1} \leftarrow λ_{1} - β_{1} (z_{1} - x), \\ λ_{2} \leftarrow λ_{2} - β_{2} (z_{2} - W x) . \end{matrix}$ (45)

Again, thus, a new working algorithm for allowing for the noise can be generated under OS-ADM framework. We do not list it here, either, for it is actually very straightforward and one can obviously implement it easily.

3.3 Constrained TV minimization reconstruction

It is indeed the truth that the image itself is not always sparse, and this lead to a drawback for the constrained L1 minimization. However, when the image is piecewise constant, the total variation (TV) is considered as an appropriate objective function when designing optimization-based algorithm. The TV objective is formed by using the L1-norm of the gradient magnitude of image. If we take the gradient operator as ∇ for a d dimensional image (see Appendix 4 for details), thus the operation $\nabla x \in ℝ^{d \times s}$ and the TV of an image x can be defined as ∥x ∥ _TV = ∥ ∇ x ∥ ₁. Consider the data fitting and the non-negative constraint, the constrained TV minimization is written as $\begin{matrix} \tilde{x} = arg min_{x} {∥ \nabla x ∥}_{1}, \\ s . t . {\begin{matrix} Wx = p, \\ x \geq 0 . \end{matrix} \end{matrix}$ (46)

Similarly, with the aid of indicator function and the introduction of z = ∇ x, the above optimization can be equivalently converted into the following form $\begin{matrix} (\tilde{x}, \tilde{z}) = arg min_{x, z} {∥ z ∥}_{1} + δ_{ℝ_{+}^{s}} (x), \\ s . t . {\begin{matrix} \nabla x = z, \\ Wx = p . \end{matrix} \end{matrix}$ (47)

Actually the fashions of variable introduction and the mapping into the generic problem can be of different patterns. For the above problem, a similar trick as that in (29), we consider a compact form as $Mx + Ny = c,$ (48) where, $\begin{matrix} M = (\begin{matrix} W_{l \times s} & O_{l \times s} \\ \nabla & - I_{d \cdot s \times d \cdot s} \end{matrix}), N = (\begin{matrix} W_{l \times s} & O_{l \times s} \\ \nabla & - I_{d \cdot s \times d \cdot s} \end{matrix}), \\ x = (\begin{matrix} x \\ 0_{d \cdot s} \end{matrix}), y = (\begin{matrix} 0_{s} \\ z \end{matrix}), c = (\begin{matrix} p \\ 0_{d \cdot s} \end{matrix}), \\ f (x) = δ_{ℝ_{+}^{s}} (x), g (y) = {∥ z ∥}_{1}, \end{matrix}$ (49) where the f (x) , g (y) only applies for partial components of x and y, and thus the minimization with respect to x and y can be replaced with x and z. Therefore, the AL function of (47) can be written as $\begin{matrix} L_{A} (x, z; λ_{1}, λ_{2}) = {∥ z ∥}_{1} + δ_{ℝ_{+}^{N}} (x) + λ_{1}^{T} (\nabla x - z) + \frac{β_{1}}{2} {∥ \nabla x - z ∥}^{2} \\ + λ_{2}^{T} (Wx - p) + \frac{β_{2}}{2} {∥ Wx - p ∥}^{2}, \end{matrix}$ (50) where z sub-problem can be extracted from AL function and expressed as $arg min_{z} L_{A} (z) = arg min_{z} {∥ z ∥}_{1} + \frac{β_{1}}{2} {∥ \nabla x - z + \frac{λ_{1}}{β_{1}} ∥}^{2},$ (51) and its solution can be easily found $z = \max (| \nabla x + \frac{λ_{1}}{β_{1}} | - \frac{1}{β_{1}}, 0) \cdot sgn (\nabla x + \frac{λ_{1}}{β_{1}}) .$ (52)

The other sub-problem is $L_{A} (x) = δ_{ℝ_{+}^{s}} (x) + \frac{β_{1}}{2} {∥ \nabla x - z + \frac{λ_{1}}{β_{1}} ∥}^{2} + \frac{β_{2}}{2} {∥ Wx - p + \frac{λ_{2}}{β_{2}} ∥}^{2} .$ (53)

Convert the second quadratic into the approximate form according to (77), and the method of (91) can be directly applied. As the derivation is a little bit too long, we only expressed the final solution here as $(β_{1} \nabla^{T} \nabla + \frac{β_{2}}{τ} I) x = β_{1} \nabla^{T} z - \nabla^{T} λ_{1} + \frac{β_{2}}{τ} x_{k} - β_{2} g_{k},$ (54) where $g_{k} = W^{T} ({Wx}_{k} - p + \frac{λ_{2}}{β_{2}})$ is the gradient at the current point of x_k. Under the definition of operator ∇, the equation of (54) can be easily solved using FFT techniques as described in Appendix 5, and take the non-negative indicator function into consideration, we get: $x = pos (𝔽^{- 1} {Λ^{- 1} 𝔽 (β_{1} \nabla^{T} z - \nabla^{T} λ_{1} + \frac{β_{2}}{τ} x_{k} - β_{2} g_{k})}),$ (55)

Where $Λ = β_{1} 𝔽 \nabla^{T} \nabla 𝔽^{H} + \frac{β_{2}}{τ} I .$ Consequently, the OS-ADM algorithm for constrained TV minimization lists as Algorithm 6.

3.4 TV minimizations with different data fitting terms

In conventional applications, the projection data fitting or fidelity term is always built as a quadratic term using the square of L2-norm which is always appropriate for Gaussian noise. However, for applications with different noise properties, L2-norm is not always presenting satisfying outcomes. This generates the needs for alternative projection data fitting term. In the presence of noise with some singular values of irregular properties, such as pepper and sault noise or noises caused by bad detector bins and etc., L1-norm based data fidelity term may provide better performance. Moreover, when the data noise is a significant physical factor and the data are modeled as being drawn from a multivariate Poisson probability distribution, we then consider the Kullback–Leibler (KL) data divergence, which is implicitly employed by many iterative algorithms based on maximum likelihood expectation maximization.

Algorithm 6:
Pseudocode for K steps OS-ADM iteration scheme for constrained TV minimization. Other settings are the same with those in Algorithm 3.

1:    ξ ← ξ₀ ∈ (0, 1); β ← β₀ > 0; k ← 0

2:    x ← x₀; z ← z₀; λ₁ ← 0, λ₂ ← 0

3: Do

4:        a) $x_{k}^{1} \leftarrow x_{k}$ , $z_{k}^{1} \leftarrow z_{k}$ , $(λ_{1})_{k}^{1} \leftarrow (λ_{1})_{k}, (λ_{2})_{k}^{1} \leftarrow (λ_{2})_{k}$

5:        b) For subsets m ∈ 1, 2, …, T do

6:                $x_{k}^{m + 1} \leftarrow pos {𝔽^{- 1} {Λ^{- 1} 𝔽 (β_{1} \nabla^{T} z_{k}^{m} - \nabla^{T} (λ_{1})_{k}^{m} + \frac{β_{2}}{τ} x_{k}^{m} - β_{2} g_{k}^{m})}},$

7:                $z_{k}^{m + 1} \leftarrow \max (| \nabla x_{k}^{m + 1} + \frac{(λ_{1})_{k}^{m}}{β} | - \frac{1}{β}, 0) \cdot sgn (\nabla x_{k}^{m + 1} + \frac{(λ_{1})_{k}^{m}}{β}),$

8:                $\begin{matrix} (λ_{1})_{k}^{m + 1} \leftarrow (λ_{1})_{k}^{m} + β ξ (\nabla x_{k}^{m + 1} - z_{k}^{m + 1}), \\ (λ_{2})_{k}^{m + 1} \leftarrow (λ_{2})_{k}^{m} + β ξ (W^{m} x_{k}^{m + 1} - p_{m}) . \end{matrix}$

9:        c) End for

10:      d) $x_{k + 1} \leftarrow x_{k}^{m + 1}$ , $z_{k + 1} \leftarrow z_{k}^{m + 1}$ , $(λ_{1})_{k + 1} \leftarrow (λ_{1})_{k}^{m + 1}$ , $(λ_{2})_{k + 1} \leftarrow (λ_{2})_{k}^{m + 1}$ , k ← k + 1,

11: untilk ≥ K, end do

12: returnx.

In this subsection, we focus on TV minimization problem with different data fitting terms of L1-norm based and KL-based ones. We consider the objective function formed by TV regularization term plus these alternative data fidelity terms. The purpose for solving these models is not only for obtaining a working algorithm but also demonstrate to the readers how to flexibly apply the OS-ADM-based algorithm design framework in tackling with some practical problems.

3.4.1 L1-norm data fitting terms

When applying TV and L1-norm-based data-error term, we form the objective combining TV norm term and the L1 norm of data-error term which can be expressed as: $\tilde{x} = arg min_{x} ρ_{1} {∥ \nabla x ∥}_{1} + ρ_{2} {∥ Wx - p ∥}_{1} + δ_{ℝ_{+}^{s}} (x),$ (56) where we also take data observation and image non-negative constraints into consideration. Note that the direct computing the optimal solution $\tilde{x}$ is not a quite easy task because of the presence of the composite operation of ∇x and Wx in the L1 norm. In order to decouple the composition, an auxiliary variable z is introduced here, and it is defined as $z = (\begin{matrix} z_{1} \\ z_{2} \end{matrix}), z_{1} = \nabla x, z_{2} = Wx .$ (57)

The original problem can be converted into the following form: $\begin{matrix} (\tilde{x}, z_{1}, z_{2}) = arg min_{x, z_{1}, z_{2}} ρ_{1} {∥ z_{1} ∥}_{1} + ρ_{2} {∥ z_{2} - p ∥}_{1} + δ_{ℝ_{+}^{s}} (x), \\ s . t . {\begin{matrix} z_{1} = \nabla x \\ z_{2} = Wx \end{matrix}, \end{matrix}$ (58) and therefore, the mapping into the generic problem can be applied by using $\begin{matrix} x = x, y = z, \\ M = (\begin{matrix} \nabla \\ W \end{matrix}), N = - I, c = 0, \\ f (x) = δ_{ℝ_{+}^{s}} (x), g (y) = ρ_{1} {∥ z_{1} ∥}_{1} + ρ_{2} {∥ z_{2} - p ∥}_{1} . \end{matrix}$ (59)

The AL function for (58) can be written as $\begin{matrix} \tilde{x} = arg min_{x} ρ_{1} {∥ z_{1} ∥}_{1} + ρ_{2} {∥ z_{2} - p ∥}_{1} + δ_{ℝ_{+}^{s}} (x) \\ + \frac{β_{1}}{2} {∥ \nabla x - z_{1} + \frac{λ_{1}}{β_{1}} ∥}^{2} + \frac{β_{2}}{2} {∥ Wx - z_{2} + \frac{λ_{2}}{β_{2}} ∥}^{2} . \end{matrix}$ (60)

The solution for sub-problem for z (z₁ and z₂) can be directly obtained as $\begin{matrix} z_{1} = max {| \nabla x + \frac{λ_{1}}{β_{1}} | - \frac{ρ_{1}}{β_{1}}, 0} \cdot sgn (\nabla x + \frac{λ_{1}}{β_{1}}), \\ z_{2} = max {| Wx + \frac{λ_{2}}{β_{2}} - p | - \frac{ρ_{2}}{β_{2}}, 0} \cdot sgn (Wx + \frac{λ_{2}}{β_{2}} - p) + p . \end{matrix}$ (61)

The solution for x sub-problem can also obtained without too much bothering, similar to the case of (53), and it can be expressed as $x = pos (𝔽^{- 1} {Λ^{- 1} 𝔽 (β_{1} \nabla^{T} z_{1} - \nabla^{T} λ_{1} + \frac{β_{2}}{τ} x_{k} - β_{2} g_{k})}) .$ (62) where $g_{k} = W^{T} ({Wx}_{k} - z_{2} + \frac{λ_{2}}{β_{2}})$ and $Λ = β_{1} 𝔽 \nabla^{T} \nabla 𝔽^{H} + \frac{β_{2}}{τ} I$ . Therefore, the OS-ADM version for TV and L1-data-error reconstruction algorithm can be listed in Algorithm 7.

Algorithm 7:

Pseudocode for K steps OS-ADM iteration scheme for TV and L1-data-error reconstruction. Other settings are similar to those in Algorithm 3.

1: ξ ← ξ₀ ∈ (0, 1); k ← 0

2: x ← x₀; z₁ ← (z₁) ₀, z₂ ← (z₂) ₀; λ₁ ← 0, λ₂ ← 0

3: Do

4: a)

x_{k}^{1} \leftarrow x_{k}

(z_{1})_{k}^{1} \leftarrow (z_{1})_{k}, (z_{2})_{k}^{1} \leftarrow (z_{2})_{k}

(λ_{1})_{k}^{1} \leftarrow (λ_{1})_{k}, (λ_{2})_{k}^{1} \leftarrow (λ_{2})_{k}

5: b) For subsets m ∈ 1, 2, …, T do

x_{k}^{m + 1} \leftarrow pos {𝔽^{- 1} {Λ^{- 1} 𝔽 (β_{1} \nabla^{T} (z_{1})_{k}^{m} - \nabla^{T} (λ_{1})_{k}^{m} + \frac{β_{2}}{τ} x_{k}^{m} - β_{2} g_{k}^{m})}},

(z_{1})_{k}^{m + 1} \leftarrow \max (| \nabla x_{k}^{m + 1} + \frac{(λ_{1})_{k}^{m}}{β_{1}} | - \frac{ρ_{1}}{β_{1}}, 0) \cdot sgn (\nabla x_{k}^{m + 1} + \frac{(λ_{1})_{k}^{m}}{β_{1}}),

(z_{2})_{k}^{m + 1} \leftarrow max {| W^{m} x_{k}^{m + 1} + \frac{(λ_{2})_{k}^{m}}{β_{2}} - p_{m} | - \frac{ρ_{2}}{β_{2}}, 0} \cdot sgn (W^{m} x_{k}^{m + 1} + \frac{(λ_{2})_{k}^{m}}{β_{2}} - p_{m}) + p_{m},

(λ_{1})_{k}^{m + 1} \leftarrow (λ_{1})_{k}^{m} + β_{1} ξ (\nabla x_{k}^{m + 1} - (z_{1})_{k}^{m + 1}),

10:

(λ_{2})_{k}^{m + 1} \leftarrow (λ_{2})_{k}^{m} + β_{2} ξ (W^{m} x_{k}^{m + 1} - (z_{2})_{k}^{m + 1}) .

11: c) End for

12: d)

x_{k + 1} \leftarrow x_{k}^{m + 1}

(z_{1, 2})_{k + 1} \leftarrow (z_{1, 2})_{k}^{m + 1}

(λ_{1})_{k + 1} \leftarrow (λ_{1})_{k}^{m + 1}, (λ_{2})_{k + 1} \leftarrow (λ_{2})_{k}^{m + 1}

, k ← k + 1,

13: untilk ≥ K, end do

14: returnx.

Actually, the strategy of introducing variables in (59) can have various choices which lead to different algorithm instances. Appropriate choice may simplify the derivation and the implementation, and in this algorithm instance we decouple the operation ∇x and Wx leading an easy-coded algorithm for TV plus L1-error-based reconstruction. Similar to some above instances, the parameters for the quadratic terms in (60) can have different values where the balance of the two terms is needed.

3.4.2 KL-based data fitting terms

The statistic iterative reconstruction algorithm often assume that the observed data obey some statistical principle which built the reconstruction as expectation maximum models. This is always an efficient method to allow for and suppress the noise (and in many situations, Poisson noise model is considered).

Actually, modeling the data fitting term can be defined as finding a certain metric to describe the minimization of data error. When the update of iteration is conducted in the additive fashion, L2 and L1 norm is always utilized. In the sense of least square, the square of L2 norm is applied. Similarly, L1 norm is taken into use when the data error is described in the sense of absolute error of each observation. However, when applying the multiplicative update fashion, such as the cases in MART, EM, ML-EM and etc., data error function of KL form is chosen as the fidelity term which can be written as $min_{x} \sum_{i} p_{i} \cdot {(\frac{Wx}{p} - 1 - ln (\frac{Wx}{p}))}_{i} .$ (63)

The KL data term is based on the fact that f (x) = (x - 1 - ln x) ≥0 for $x \in ℝ_{+}$ . Moreover, the function f (x) on $x \in ℝ_{+}$ is convex which makes the optimization easily solvable.

In this subsection, we incorporate the KL term into the TV (KL-TV) model which is $\begin{matrix} \tilde{x} = arg min_{x} ρ_{1} {∥ \nabla x ∥}_{1} + ρ_{2} \sum_{i} {[Wx - p + p ln p - p ln (pos (Wx))]}_{i} \\ + δ_{ℝ_{+}^{l}} (Wx) + δ_{ℝ_{+}^{s}} (x), \end{matrix}$ (64) where ρ₁ and ρ₂ are the weights for KL term and the summation ∑_i [x] _i over i is a component-wise addition for x. Note that when Wx exactly equals to p for ideal and noiseless condition, the KL term vanishes to 0. Once the data is contaminated by noise, the KL becomes non-negative values.

When developing OS-ADM-based algorithms, a similar fashion as that in (57) for variable introduction is applied. For explicitly demonstration, we written here once again as: $z = (\begin{matrix} z_{1} \\ z_{2} \end{matrix}), z_{1} = \nabla x, z_{2} = Wx .$ (65)

Consequently, the KL-TV can be equivalently converted into $\begin{matrix} (\tilde{x}, z_{1}, z_{2}) = arg min_{x, z_{1}, z_{2}} ρ_{1} {∥ z_{1} ∥}_{1} + δ_{ℝ_{+}^{s}} (x) + δ_{ℝ_{+}^{l}} (z_{2}) \\ + ρ_{2} \sum_{i} {[z_{2} - p + p ln p - p ln (pos (z_{2}))]}_{i}, \\ s . t . {\begin{matrix} z_{1} = \nabla x \\ z_{2} = Wx \end{matrix} . \end{matrix}$ (66)

Similar as that in (59), the mapping procedure is straightforward as $\begin{matrix} x = x, y = z, \\ M = (\begin{matrix} \nabla \\ W \end{matrix}), N = - I, c = 0, \\ f (x) = δ_{ℝ_{+}^{s}} (x), \\ g (y) = ρ_{1} {∥ z_{1} ∥}_{1} + ρ_{2} \sum_{i} {[z_{2} - p + p ln p - p ln (pos (z_{2}))]}_{i} + δ_{ℝ_{+}^{l}} (z_{2}) . \end{matrix}$ (67)

The corresponding AL function of (66) is $\begin{matrix} (\tilde{x}, z_{1}, z_{2}) = arg min_{x, z_{1}, z_{2}} ρ_{1} {∥ z_{1} ∥}_{1} + δ_{ℝ_{+}^{s}} (x) + δ_{ℝ_{+}^{l}} (z_{2}) \\ + ρ_{2} \sum_{i} {[z_{2} - p + p ln p - p ln (pos (z_{2}))]}_{i} \\ + \frac{β_{1}}{2} {∥ \nabla x - z_{1} + \frac{λ_{1}}{β_{1}} ∥}^{2} + \frac{β_{2}}{2} {∥ Wx - z_{2} + \frac{λ_{2}}{β_{2}} ∥}^{2}, \end{matrix}$ (68) where the multipliers λ₁ and λ₂ are incorporated into the quadratic term similar to that in (60). It should be pointed out that solving the above AL function is actually easily-handled.

Note that the sub-problems for z₁ and x are the same as those in (60), and thus we only need to focus on the solvation of z₂. The sub-problem for z₂ is obviously written as $\begin{matrix} {\tilde{z}}_{2} = arg min_{z_{2}} L_{A} (z_{2}) \\ = arg min_{z_{2}} δ_{ℝ_{+}^{l}} (z_{2}) + ρ_{2} 〈 1, z_{2} - p ln (pos (z_{2})) 〉 + \frac{β_{2}}{2} {∥ Wx - z_{2} + \frac{λ_{2}}{β_{2}} ∥}^{2}, \end{matrix}$ (69) where some constants are omitted and the component-wise summation is replaced by inner product notation. Note that the presences of indicator function $δ_{ℝ_{+}^{l}} (z_{2})$ and positive enforcement function pos (z₂) make the analysis a little bit troublesome. Therefore, we first drop out these two function and we just should keep in mind that the non-negativity of z₂ should be checked. Note that the following function of z₂ is differentiable $L_{A 1} (z_{2}) = ρ_{2} 〈 1, z_{2} - p ln (z_{2}) 〉 + \frac{β_{2}}{2} {∥ Wx - z_{2} + \frac{λ_{2}}{β_{2}} ∥}^{2}, z_{2} > 0,$ (70) set its derivative to 0 and we will get the following quadratic equation: $β_{2} z_{2}^{2} + (ρ_{2} - β_{2} C) z_{2} - ρ_{2} p = 0, β_{2} > 0, ρ_{2} > 0,$ (71) where C = Wx + λ₂/β₂ and the square on z₂ is component-wise operation. We have two possible solutions as $z_{2} = \frac{- (ρ_{2} - β_{2} C) \pm \sqrt{(ρ_{2} - β_{2} C)^{2} + 4 β_{2} ρ_{2} p}}{2 β_{2}} .$ (72)

Note that z₂ should be non-negative, and thus the positive solution should be chosen which leads to $z_{2} = \frac{- (ρ_{2} - β_{2} C) + \sqrt{(ρ_{2} - β_{2} C)^{2} + 4 β_{2} ρ_{2} p}}{2 β_{2}} .$ (73)

Consequently, with the update formulas of z₁, z₂, x and the multipliers, the OS-ADM version for KL-TV reconstruction algorithm can be listed in Algorithm 8.

4 Experiments demonstrations of OS-ADM algorithms for image reconstructions

In the previous section, the algorithm derivations under some typical applications have been demonstrated by utilizing the proposed design framework. In this section, some experiments are conducted to exhibit the practical performances of them. However, the algorithms considered here are only working versions which need to be carefully improved when put into real CT scanners.

The experiments considered in this section contain three aspects. One among them is the sparsity exploiting reconstruction on consistent and ideal condition, and the second one is on noisy condition with the presence of Poisson noise in the observed projections. Both the first two scenarios are conducted on simulation projection data. The third experiment takes the real CT data as the projection source which can be seen as a primary investigation of the applied algorithm when facing with real imaging applications. The selected algorithms are Algorithm 7 (termed as L1-TV algorithm) and algorithm 8 (termed as KL-TV algorithm) developed previously, and the 2D reconstructions using them are performed.

Algorithm 8:
Pseudocode for K steps OS-ADM iteration scheme for KL-TV reconstruction. Other settings are the same with those in Algorithm 3.

1:    ξ ← ξ₀ ∈ (0, 1); β ← β₀ > 0; k ← 0

2:    x ← x₀; z₁ ← (z₁) ₀, z₂ ← (z₂) ₀; λ₁ ← 0, λ₂ ← 0

3: Do

4:        a) $x_{k}^{1} \leftarrow x_{k}$ , $(z_{1})_{k}^{1} \leftarrow (z_{1})_{k}, (z_{2})_{k}^{1} \leftarrow (z_{2})_{k}$ , $(λ_{1})_{k}^{1} \leftarrow (λ_{1})_{k}, (λ_{2})_{k}^{1} \leftarrow (λ_{2})_{k}$

5:        b) For subsets m ∈ 1, 2, …, T do

6:                $x_{k}^{m + 1} \leftarrow pos {𝔽^{- 1} {Λ^{- 1} 𝔽 (β_{1} \nabla^{T} (z_{1})_{k}^{m} - \nabla^{T} (λ_{1})_{k}^{m} + \frac{β_{2}}{τ} x_{k}^{m} - β_{2} g_{k}^{m})}},$

7:                $(z_{1})_{k}^{m + 1} \leftarrow \max (| \nabla x_{k}^{m + 1} + \frac{(λ_{1})_{k}^{m}}{β_{1}} | - \frac{ρ_{1}}{β_{1}}, 0) \cdot sgn (\nabla x_{k}^{m + 1} + \frac{(λ_{1})_{k}^{m}}{β_{1}}),$

8:                $(z_{2})_{k}^{m + 1} \leftarrow \frac{- (ρ_{2} - β_{2} W^{m} x_{k}^{m + 1} - (λ_{2})_{k}^{m}) + \sqrt{(ρ_{2} - β_{2} W^{m} x_{k}^{m + 1} - (λ_{2})_{k}^{m})^{2} + 4 β_{2} ρ_{2} p_{m}}}{2 β_{2}},$

9:                $(λ_{1})_{k}^{m + 1} \leftarrow (λ_{1})_{k}^{m} + β_{1} ξ (\nabla x_{k}^{m + 1} - (z_{1})_{k}^{m + 1}),$

10:              $(λ_{2})_{k}^{m + 1} \leftarrow (λ_{2})_{k}^{m} + β_{2} ξ (W^{m} x_{k}^{m + 1} - (z_{2})_{k}^{m + 1}) .$

11:      c) End for

12:      d) $x_{k + 1} \leftarrow x_{k}^{m + 1}$ , $(z_{1, 2})_{k + 1} \leftarrow (z_{1, 2})_{k}^{m + 1}$ , $(λ_{1})_{k + 1} \leftarrow (λ_{1})_{k}^{m + 1}, (λ_{2})_{k + 1} \leftarrow (λ_{2})_{k}^{m + 1}$ , k ← k + 1,

13: untilk ≥ K, end do

14: returnx.

The experiments are conducted on a PC equipped with dual-core Intel Core i5-2400 CPU at 3.10 GHz, with a memory size of 8.00 Gigabytes. The codes are written in MATLAB 2014b, and some of them are in C and CUDA with the “mex” interface provided by MATLAB and interacting with MATLAB code. The graphics processing unit (GPU) used is NVidia GeForce GTX 570 with 1.25 Gigabytes global memory and 480 CUDA cores.

4.1 Simulation data studies

In the simulation experiments, a fan beam geometry is set to simulate the data acquisition of CT imaging. The typical shepp-logan phantom with a discretization of 256×256 pixels is utilized as the standard image. This is generated by MATLAB code phantom(‘Shepp-Logan’,256) which returns a low contrast image, widely used by tomography researchers. The distance from the X-ray source to the iso-center is 680.0 mm and the distance from the X-ray source to the center of the line detector is 1360.0 mm. The number of detector bins is n_bin = 512 with a size of 0.776 mm for each.

4.1.1 Sparse view reconstruction

The view numbers for sparsity exploiting reconstruction are set according to the sparsity of gradient magnitude image (GMI) which is the number non-zero entities in the GMI. The sparsity of the GMI of the test image is k = 2184 and the minimum number of projections for exact reconstruction should not be less than (2k + 1)/n_bin ≈ 8.5 based on CS theory [8 , 53]. Thus, the minimum number for the projection view is 9. For the purposes of comparison, both 7 and 9 projections are utilized to conduct the reconstructions for L1-TV and KL-TV algorithms.

For the practical implementations of the two algorithms, the choices for the values of parameters should be paid enough attentions. Obviously, the settings of parameters have important affects to the algorithm performances. The parameters involved in L1-TV and KL-TV are almost the same, i.e., the penalty weights ρ₁, ρ₂, AL weights β₁, β₂, τ and ξ. The important ones of the parameters are ρ₁, ρ₂, β₁, β₂, and actually they are not independent variables which implies that actually ρ₁/ρ₂, β₁/β₂ make sense in the implementations which imply that the tuning of them is actually not that bothering. Although the tuning for the searching for the best parameters are non-trivial tasks, some empirical choices are sufficient for the practical applications. In order to provide the reader a guide for rapid implementations of these algorithms, we offer a group of relatively appropriate choices for L1-TV and KL-TV, respectively. The specific values for each parameter are listed in Table 1, and they are used for both ideal and noisy projection data.

Table 1
Available choice for parameters selection for simulation studies

Parameters selection ρ ₁ ρ ₂ β₁ β₂ τ ξ

L1-TV 0.007 25.6 0.16 1.28 1.0 1.0

KL-TV 0.005 16000 0.16 1.00 1.0 1.0

Parameters selection	ρ ₁	ρ ₂	β₁	β₂	τ	ξ
L1-TV	0.007	25.6	0.16	1.28	1.0	1.0
KL-TV	0.005	16000	0.16	1.00	1.0	1.0

For the reconstructions under ideal and noiseless data, the maximum iteration numbers for the two algorithms are set to 2000, and once the program attains to the maximum iteration number they will stop. The running time for the algorithms of MATLAB implementation is 19 seconds and 21 seconds for 7 views and 9 views, respectively. It should be noted that the difference of time consumption between L1-TV and KL-TV is very little under the same settings. Since in the first experiment we mainly care about the basic performances under very sparse projection views, the OS level for them is 1 for convenience. And we allow for the properties of OS levels in the next focus. The reconstruction images are shown in Figs. 1 and 2. For better visual comparisons, we choose two display gray-scale window. Note that the phantom itself is of very low contrast, the first window is set as [0.005 0.05] under which we can get an intuitive impression of the test phantom. For more detailed comparison, a much more narrow display window of [0.016 0.022] under which even very tiny errors and fluctuations can be easily observed by naked eyes.

Fig.1

L1-TV reconstructions from sparse views for low contrast shepp-logan phantom: from the left column to the right, there are images of background truth, KL-TV results from 7 views and 9 views, respectively. The top row is in the gray scale window of [0.005 0.05], and the bottom row is in a more narrow window of [0.016 0.022].

Fig.2

KL-TV reconstructions from sparse views for low contrast shepp-logan phantom: from the left column to the right, there are images of background truth, L1-TV results from 7 views and 9 views, respectively. The top row is in the gray scale window of [0.005 0.05], and the bottom row is in a more narrow window of [0.016 0.022].

When the amount of data acquisition is under the theoretical lower bound, the exact (or high accurate) reconstructions for both TV-based algorithms is unreachable. These are verified for the reconstruction under 7 views shown as Figs. 1(5) and 2(5) where the artifacts are obvious under narrow display window. When the case is of adequate acquisition, both the two algorithms produce very clear images without any observed artifacts under both windows as shown in Figs. 1(3), 1(6), 2(3) and 2(6). Some typical numerical evaluations are listed in Table 2 in which the visual observation are agreed with the numerical statistics. The errors compared with the background truth are evaluated using the root mean square error (RMSE):

Table 2

Performances for L1-TV and KL-TV reconstructions

	RMSE (7 views)	RMSE (9 views)	Time (7 views)	Time (9 views)
L1-TV	6.559E-4	4.579E-6	19.1 seconds	20.8 seconds
KL-TV	5.576E-4	7.654E-6	19.0 seconds	21.0 seconds

$RMSE = \sqrt{\frac{(f_{t} - f_{0}) (f_{t} - f_{0})^{T}}{s}},$ (74) where vectors f_t and f₀ denote the reconstructed and reference images of s voxels.

4.1.2 Various projection views

In order to obtain some global knowledge of the performances for both algorithms, the two methods are tested under various projection views. The global reduction of RMSEs under different views for both algorithms are plotted in Fig. 3. The curves in figures suggest that the performances of iterations for both algorithms are very stable and only very tiny fluctuation can be observed. An interesting phenomenon observed in this figure is that the curves of 12 views, 36 views and 90 views are almost overlapped completely. This may imply that, under ideal and noiseless conditions, when the amount of acquired data is above some threshold point, the error reduction performance may stay stable.

Fig.3

RMSE behaviors of L1-TV and KL-TV and reconstructions: the top and the bottom are RMSE reduction curves of different projection views of L1-TV and KL-TV reconstructions, respectively.

4.1.3 Performances under noise presence

For the tests of the performance under noisy projection, we add Poisson noise to the projection data. For the test KL-TV reconstruction, we use 60 projection acquired within 180 degrees evenly. It should be pointed out that the phantom we utilized in simulation experiments has very low contrast parts which makes it sensitive to the presence of noise. To make an intuitive impression of the noise level, the FBP reconstruction result is shown in Fig. 4. Note that the amount of projections for FBP is 900, relatively huge compared to iterative algorithms.

Fig.4

Reconstructions for noisy projections for low contrast shepp-logan phantom: from the left column to the right column there are background truth images, FBP reconstruction from 900 views, KL-TV reconstruction from 60 views. The top and the bottom rows are the same images displayed in gray scale windows of [0.005 0.05] and [0.016 0.022].

The experiments for the presence of noises, the stop criterion for the iterative algorithm should be modified and in our tests the iterative program is stopped when the RMSE is no longer reducing. And for the tested KL-TV reconstruction, it takes about 300 loops of iteration to obtain the best RMSE. Figure 4 shows the results which present a primary visual comparison of the two reconstructions. The root mean square errors for FBP and KL-TV reconstruction are 0.0591 and 4.2E-4, respectively.

4.1.4 OS-level tests

For the last parts of the simulation data experiments, the properties of OS-levels are studied. The test algorithm is KL-TV reconstruction. We test the KL-TV method under different projection views and OS-levels. Note that complete iteration for all the above setting can be very time consuming. In the previous context, we point out that OS can accelerate the primary stage of the iteration, and thus we make a compromise in the experiments. Therefore, we only focus the first 40 loops of each individual reconstruction. Projections of 180, 90, 36, 12 views are used for tests, and the OS-levels are chosen from 1 12, 1 12, 1 8, 1 4, respectively. Each subset of projection data is chosen evenly from the global data set. For the global set $p \in ℝ^{l}$ and OS-level = T, and the subset index m, the m-th subset p_m is set as $p_{m} = {p (k) | k = m + T \times i, i = 0, \dots, \frac{l}{T} - 1} .$ (75)

The comparisons of RMSE behaviors are plotted in Fig. 5. The curves are basically agreed with the theoretically analysis as that the OS can accelerated the convergence in the beginnings of reconstructions. Furthermore, for different dataset, in a proper OS-level range, higher OS-levels may have stronger power in acceleration. However, when the number of subsets is out of the proper range, this relationship does not always exist, and the related more in-depth theoretical analysis may be the future work. Another essential fact that must be mentioned is that, for better convergence performance, it is better to make the OS-level = 1 for the middle and the final loops of reconstruction, especially when the data is very sparse.

Fig.5

RMSEs behaviors for different projection views and OS-levels. From the top left, top right, bottom left and bottom right, there are RMSE curves for different OS-levels for projection views of 180, 90, 36 and 12, respectively. The tips of the legend depict the specific settings, for example, “KL-TV-OS-08-ADM” stands for KL-TV reconstructions under OS-ADM framework with OS-level = 8.

4.2 Real CT data tests

In this subsection the real CT projections are utilized to test the selected algorithms. The same to that in the above subsection, the L1-TV and KL-TV algorithms are chosen to be investigated. The experimental data are taken on a CBCT system. An anthropomorphic head phantom is utilized as the scanning object. In order to get quick impressions of the performance of these developed working algorithms, only 2D reconstructions are conducted. The sinogram image used is shown in Fig. 6. The projections are acquired at X-ray tube energy of 125 kVp. 655 projections are acquired over 360 degrees. The distance of X-ray source to the iso-center is 1100 mm, and the distance of the center of the detector to the iso-center is 550 mm. The equivalent line detector has 1024 bins with size of 0.388 mm for each. Thus, the projection data has a data size of 1024×655. The reconstruction image has a size of 512×512 with each pixel having the size of 0.49 mm×0.49 mm.

Fig.6

The sinogram used in real CT data testings.

Note that we take full projection data for both L1-TV and KL-TV reconstruction. The forward-projection and backward-projection operations in the iterative algorithms are launched on GPU via CUDA code. Under the tested dataset, the running time for each iteration loop is about 0.24 seconds. The iterative algorithms are stopped when the program reach 1000 loops and we take the results as the “converged” outcome.

When dealing with the real CT projections, the parameters for the iterative algorithms are possible different from those in simulation studies. In real CT data investigation, the weights on the data fidelity (or data fitting) term have been risen to a proper extent. A group of available choices are listed in Table 3. It should be noted that the values in this table are not the best choices, and we only provide these parameters for working version. More careful tuning for them can be incorporated when facing various real data tests. The reconstructions of FDK, L1-TV and KL-TV is as shown in Fig. 7. It is a common sense that real CT projections is definitely contaminated by noise which lead to actually no true images as simulation studies. Visual comparisons of each method shows that there are almost no differences between these results. This can be verified from the both full HU window and a much narrower window display in Fig. 7.

Table 3

Available choice for parameters selection for real CT projections

	ρ ₁	ρ ₂	β₁	β₂	τ	ξ
L1-TV	0.006	1024	0.01	20.0	1.0	1.0
KL-TV	0.0003	16000	0.004	1.0	1.0	1.0

Fig.7

Reconstructions from real CT projections: from the left column to the right column, the images are the results of FDK reconstructions, L1-TV reconstruction and KL-TV reconstructions, respectively. From the top row to the bottom row, for each column, there are: the same results in full HU window of [–1000 2200] (upper row), in a much narrower HU window of [–500 500] (middle row), and local zoom-in of the region-of-interest (bottom row). The zoom-in display is located by the red solid rectangle in middle image of the left column. The iteration numbers for both L1-TV and KL-TV are 1000.

Investigations for the OS-level on the reconstruction are also conducted here. In the aforementioned text, there is actually no true images but we can take the above mentioned “converged” outcome as the “true” image to calculate the RMSEs in the following tests. The RMSEs are computed by comparing the intermediate images with this “true” image. The divide of subsets is similar to (75). We test both L1-TV and KL-TV algorithms under OS-level of 1, 2, 4 and 5. The RMSEs behaviors are plotted in Fig. 8. Note that only beginning iterations with about 25 loops are plotted in Fig. 8. As we have pointed out in the previous context, the combination of conventional ADM following the OS-ADM with OS-level>1 can efficiently accelerate the convergence rate at the beginning stage of iterations.

Fig.8

The RMSEs of L1-TV and KL-TV reconstructions with different OS-levels. The left is the RMSEs reduction curves with OS-level of 1, 2, 4 and 5 for L1-TV algorithm, and the right is for KL-TV algorithm.

5 Discussion and conclusion

This paper has presented the OS-ADM framework for working algorithms design. The main derivation is based on ADM scheme, and OS is incorporated to accelerate the convergence of the iteration. We demonstrate that the proposed method is flexible and can be applied to many convex problems in CT imaging. The various problems in practical applications can be formed as convex optimizations, and the design of these optimizations can be very flexible due to the diversities of demands. The ADM serves as a powerful tool for developing practical algorithms which often apply an alternative iteration scheme.

There are at least two promising aspects for the application of OS-ADM algorithm. One of the most effect merits of ADM may be the usage of decouple of complex structures in the original problems by the relaxation of introducing new variables. When in the dealing of the constraints, the augmented Lagragian function is applied which contains both first-order term and quadratic term. The AL function with the update of multipliers guarantees the equivalence to the original problem. The alternating update scheme divides the original problem into several simple problems. Moreover, OS techniques has been incorporated into the practical implementation of ADM. We have demonstrate that there are many problems raised in reconstructions which can be modeled as convex optimizations, and thus the ADM with OS can be promisingly applied to develop the working algorithms. Note that we mean the working algorithms which means that the developed algorithms may be not the most efficient ones but at least can serve as an available option for some practical problems.

We have proposed and tested several algorithms developed using OS-ADM in the previous sections. Both the simulations and real data studies have suggested that the proposed design method can be of practical value. In sparse view test, the results of L1-TV and KL-TV algorithms may suggest that the ADM scheme can reach the state-of-the-art performance in image qualities and running speed. The experiments with real CT projections have provided some intuitive impressions of L1-TV and KL-TV reconstruction. Both the two tested algorithms and the other instances are pretty easy-coded which are appropriate and important for rapid prototyping algorithms. However, it must be noted that for better practice and performance much more tunings must be considered.

The running time for the iterative algorithms is an important factor that must be taken into consideration when put these algorithm instances into real scanners. In this paper, the proposed framework applies an iterative fashion to obtain the results and the converged images may need thousands or at least hundreds iteration loops which need very long time. This problem is even much more notable for 3D reconstruction in application needing fast imaging. Therefore, both fast convergent algorithm framework and computational accelerations are of crucial positions. To this end, as parts of our future work, some accelerated versions of ADM can be studied and applied to the applications of image reconstructions. Besides the algorithms optimizations, the supported hardware accelerations should also be taken into our focuses. Nowadays, some high performance computation devices and facilities, such as high-end CPUs and GPUs or clusters, have already gained the attentions of imaging engineers and actually successfully promoted the imaging speed. Once the computation cost is lower enough, the practical applications of optimization-based algorithms will greatly be broaden.

Future work will focus on solving some more practical and specific problems raised in CT imaging field, and developing efficient corresponding algorithms. Furthermore, also, more investigations for the comprehensive performances of these algorithms should be studied.

Footnotes

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61372172 and 61601518). We also want to thank the anonymous referees for giving us useful comments and suggestions to improve this paper.

Appendix 1:

Approximate minimizer of sub-problems in (3) using linearized approximation and the corresponding proximal mapping.

For simplicity of notations, we denote the fixed variables in the previous iterations as y_k = y, λ_k = λ in $arg min_{x} L_{A} (x, y_{k}, λ_{k})$ . When minimizing $L_{A}$ with respect to x when fixing y and λ, we first simply denote the objective $arg min_{x} L_{A} (x, y_{k}, λ_{k})$ as (76) $arg min_{x} L_{A} (x) = f (x) + \frac{β}{2} {∥ Mx + C ∥}_{2}^{2},$ where C = Ny - c - λ/β. Note that, in the simplification, g (y) has no influence on the minimization with respect to x and is omitted in (76).

It is sometimes not very easy to compute the pseudo inverse of the linear transform M, and some alternative method needs to be developed which can avoid computing the inverse of big matrix. In this appendix, like in the reference [15], we linearized the quadratic term in functional (76) at the current point x_k as: (77) ${∥ Mx + C ∥}_{2}^{2} \approx {∥ {Mx}_{k} + C ∥}_{2}^{2} + 2 g_{k}^{T} (x - x_{k}) + \frac{1}{τ} {∥ x - x_{k} ∥}_{2}^{2},$ where g_k = M^T (Mx_k + C) stands for the gradient at x_k regardless of the coefficient. The parameter τ controls the weight of the quadratic term of x. Substitute the quadratic term in functional (76) with (77), then we get (78) $\begin{matrix} x^{*} = arg min_{x} L_{A} (x) \\ = arg min_{x} f (x) + \frac{β}{2} {2 g_{k}^{T} (x - x_{k}) + \frac{1}{τ} {∥ x - x_{k} ∥}_{2}^{2}} \\ = arg min_{x} f (x) + \frac{\hat{β}}{2} {∥ x - x_{k} + τ g_{k} ∥}_{2}^{2}, \end{matrix}$ where $\hat{β} = β / τ$ . Strictly speaking, although the last two “ = ” should be “≈”, here we slightly abuse these notations when not causing misunderstandings.

1) If $f (x) = {∥ x ∥}_{2}^{2}$ , then the optimal condition is $2 x + \hat{β} (x - x_{k} + τ g_{k}) = 0$ , thus, (79) $x = \frac{\hat{β}}{2 + \hat{β}} (x_{k} - τ g_{k}) .$

Note that the last expression in (78) is in a similar form of proximal mapping of f (x). If we set $\hat{β} = β$ , x = x′, x_k - τg_k = x in (78), we get: (80) ${prox}_{β} [f] (x) = \arg min_{x^{'}} {f (x^{'}) + \frac{β}{2} {∥ x^{'} - x ∥}_{2}^{2}} .$

This operation dose allow for non-smooth convex functions, but f (x) does need to be simple enough so that the above optimization can be solved in a closed form.

2) Particularly, if f (x) =0, the proximal mapping of f at x turn to be x itself, i.e., (81) ${prox}_{β} [f] (x) = x, if f (x) = 0 .$

For the applications in CT imaging, usually the form of f (x) can often guarantee that the proximal mapping has a closed form or can be solved easily in very high precision, such as f (x) is in the form of L1-nrom or the square of L2-norm.

3) Generally, compare equation (78) to (80), an approximate solution for x^* is (82) $x^{*} = {prox}_{\hat{β}} [f] (x_{k} - τ g_{k}) .$

Plug g_k and C into (82), we will get: (83) $x^{*} = {prox}_{\hat{β}} [f] {x_{k} - τ M^{T} ({Mx}_{k} + Ny - c - λ / β)} .$

Appendix 2:

Proximal mappings of indicator functions

The problem we are facing is (84) ${prox}_{β} [δ_{Ω}] (x) = \arg min_{x^{'}} {δ_{Ω} (x^{'}) + \frac{β}{2} {∥ x^{'} - x ∥}_{2}^{2}},$ where β > 0. For more specific and clear derivation, we first specify our discussion in the sense of real and finite dimensional Hilbert space (more specifically, N-dimensional real Hilbert space). Assume the convex set Ω = Ω₁ × Ω₂ × … × Ω_N, where Ω_i ⊆ H₁, and H₁ stands for 1-dimensiaonal Hilbert space. Therefore, we can rewrite (84) as (85) ${prox}_{β} [δ_{Ω}] (x) = arg min_{x^{'}} \sum_{i = 1}^{N} {δ_{Ω_{i}} (x^{'} (i)) + \frac{β}{2} {(x^{'} (i) - x (i))}^{2}},$ where we use δ_Ω (x′) = ∑δ_{Ω
_i} (x′ (i)), and x′ (i) ∈ H₁.

Here we make an intuitive analysis for the solution of (85). Note the fact that the solution to the above problem can be computed component-wise. Therefore, we take out the following (86) $x * (i) = arg min_{x^{'} (i)} δ_{Ω_{i}} (x^{'} (i)) + \frac{β}{2} {(x^{'} (i) - x (i))}^{2}$ for concrete analysis, and we observe that the optimal value for x * (i) can be decided by x′ (i) and the meaning of the indicator.

If x′ (i) ∈ Ω_I, as shown in the left of Fig. 9, then δ_{Ω
_i} (x′ (i)) = 0, and the quadratic term can reach to 0 value when set x (i) = x′ (i), and obviously, for this occasion the optimal solution is x (i) = x′ (i).

If x′ (i) ∉ Ω_I, as shown in the right of Fig. 9, the point x′ is outside of Ω. Furthermore, note that the solution must be in Ω, elsewise the problem of (86) is unfeasible. Therefore, the optimization problem of (86) can be interpreted as finding a point x in Ω which is the optimal approximation, in the sense of L2-norm, of the point x’ outside of Ω. A geometrical interpretation for the quadratic term (x′ (i) - x (i)) ² is that finding a point x in Ω which has the shortest Euclidian distance from the point x’ outside Ω. For a convex set Ω, as the dashed line shown in the right of Fig. 9, finding this optimal approximation point of x’ can be obtained by the projection operator, i.e., x = proj_Ω (x′).

Summarize the two situations, the proximal mappings of indicator functions can be obtained by the projection onto convex set Ω: (87) $x = {prox}_{β} [δ_{Ω}] (x^{'}) = {proj}_{Ω} (x^{'}) .$

Particularly, if $Ω = ℝ_{+}^{N}$ , then ${proj}_{ℝ_{+}^{N}} (x^{'})$ can be easily computed by (88) $x = pos (x^{'}),$ where pos (x) forces all the negative entities in x to be 0 while keeping positive entities and zeros unchanged.

Generally, if an linear transform T is presented in (84), just as the following expression

(89) $x^{'} = \arg min_{x^{'}} {δ_{Ω} (x^{'}) + \frac{β}{2} {∥ {Tx}^{'} - x ∥}_{2}^{2}} .$

Assume that the magnitude of T is ω, and thus, the above problem can be converted into (90) $x^{'} = \arg min_{x^{'}} {δ_{Ω} (x^{'}) + \frac{β}{2} ω^{2} {∥ x^{'} - \hat{x} ∥}_{2}^{2}},$ where $\hat{x}$ satisfies $T \hat{x} = x$ and if the generalized inverse of T is defined as T⁺, and thus $\hat{x}$ can be expressed as $\hat{x} = T^{+} x$ . Therefore, the optimal solution for x′ can be interpreted as finding the projection of T⁺ x onto convex set Ω, which can be expressed as (91) $x^{'} = {proj}_{Ω} (T^{+} x) .$

Appendix 3:

Proximal mappings of L1-norm functions

The problem we are facing with is (92) ${prox}_{β} [{∥ \cdot ∥}_{1}] (x) = \arg min_{x^{'}} {{∥ x^{'} ∥}_{1} + \frac{β}{2} {∥ x^{'} - x ∥}_{2}^{2}},$ where β > 0. Similar to the trick we apply in Appendix 2, the summation of L1-norm and L2-norm are all component-wise operation as (93) ${∥ x^{'} ∥}_{1} + \frac{β}{2} {∥ x^{'} - x ∥}_{2}^{2} = \sum_{i} {| x^{'} (i) | + \frac{β}{2} {(x^{'} (i) - x (i))}^{2}}$

Therefore, we take a single component out of the above equation: (94) $min f (x^{'}) = min \sqrt{(x^{'})^{2}} + \frac{β}{2} {(x^{'} - x)}^{2} .$

The first order optimal condition is (95) $\partial | x^{'} | + β (x^{'} - x) = 0,$ where ∂|x′| is the sub-gradient of |x′|, and is defines as (96) $\partial | x^{'} | = {\begin{matrix} \frac{x^{'}}{| x^{'} |}, & x^{'} \neq 0, \\ {κ | | κ | < 1}, & x^{'} = 0 . \end{matrix}$

If x′ > 0, (95) turns out to be 1 + β (x′ - x) = 0, thus

(97)

x^{'} = x - \frac{1}{β},

where x > 1/β for x′ > 0.

If x′ < 0, (95) turns out to be -1 + β (x′ - x) = 0, thus

(98)

x^{'} = x + \frac{1}{β},

where x < -1/β for x′ < 0.

If x′ = 0, (95) turns out to be κ - βx = 0, thus

(99)

x^{'} = 0, x = \frac{κ}{β}, | κ | < 1,

where obviously x ∈ (-1/β, 1/β) . Consequently, summarize the three situations, we get (100)

x^{'} = {\begin{matrix} x - 1 / β, x > 1 / β \\ 0, x \in (- 1 / β, 1 / β) \\ x + 1 / β, x < - 1 / β \end{matrix} .

Or, briefly, x′ = max(|x| - 1/β, 0) · sgn (x), where sgn (x) is a component-wise operation and it returns the sign of x.

Thus the solution for (92) can be expressed as (101) $x^{'} = max (| x | - \frac{1}{β}, 0) \cdot sgn (x),$ where |x| = (|x (1) |, |x (2) |,. . . , |x (s) |) ^T.

Appendix 4:

Definition of finite difference operator and its transpose

For a discrete image $u \in ℝ^{s}$ with dimension s = N₁ × N₂, the forward and backward finite difference can be defined with the periodic condition: (102) $\begin{matrix} \partial_{1}^{-} u (i, j) \equiv {\begin{matrix} u (i, j) - u (i - 1, j), 1 < i \leq N_{1}, \\ u (1, j) - u (N_{1}, j), i = 1, \end{matrix} \\ \partial_{2}^{-} u (i, j) \equiv {\begin{matrix} u (i, j) - u (i, j - 1), 1 < j \leq N_{2}, \\ u (i, 1) - u (i, N_{2}), j = 1, \end{matrix} \\ \partial_{1}^{+} u (i, j) \equiv {\begin{matrix} u (i + 1, j) - u (i, j), 1 \leq i < N_{1}, \\ u (1, j) - u (N_{1}, j), i = N_{1}, \end{matrix} \\ \partial_{2}^{+} u (i, j) \equiv {\begin{matrix} u (i, j + 1) - u (i, j), 1 \leq j < N_{2}, \\ u (i, 1) - u (i, N_{2}), j = N_{2} . \end{matrix} \end{matrix}$

Therefore, the operator $\nabla : ℝ^{s} \to ℝ^{d \cdot s}$ can be defined as (103) $\nabla u (i, j) = (\partial_{1}^{+} u (i, j), \partial_{2}^{+} u (i, j)) .$

The corresponding adjoint (or the transpose in matrix form) operator of ∇ can be defined with the notation of divergence $div : ℝ^{d \cdot s} \to ℝ^{s}$ as (104) $div m (i, j) \equiv \partial_{1}^{-} m_{1} (i, j) + \partial_{2}^{-} m_{2} (i, j) .$

Note that ∇^T = - div . Furthermore, the extension to high dimension image is straightforward.

Appendix 5:

The computation of (54) using FFTs

We consider the following equation (105) $(β_{1} \nabla^{T} \nabla + \frac{β_{2}}{τ} I) x = β_{1} \nabla^{T} z - \nabla^{T} λ_{1} + \frac{β_{2}}{τ} x_{k} - β_{2} g_{k},$ and it should be noted that on the periodic definition condition the matrix form of ∇^T∇ is block circulant. Therefore, ∇^T∇ can be diagonalized using FFT transform as (106) $Λ = β_{1} 𝔽 \nabla^{T} \nabla 𝔽^{H} + \frac{β_{2}}{τ} I,$ where $𝔽^{H}$ denotes the Hermite form of $𝔽$ . Thus, the solution can be easily computed as (107) $x = 𝔽^{- 1} {Λ^{- 1} 𝔽 (β_{1} \nabla^{T} z - \nabla^{T} λ_{1} + \frac{β_{2}}{τ} x_{k} - β_{2} g_{k})} .$

References

Buzug

T.M.

, Computed Tomography: From Photon Statistics to Modern Cone-Beam CT, Springer-Verlag, Berlin, Heidelberg, 2008.

Pan

, Sidky

E.Y.

and Vannier

, Why do commercial CT scanners still employ traditional, filtered back-projection for image reconstruction? Inverse Problems 25(12) (2009), 1–36.

Feldkamp

, Davis

and Kress

, Practical cone-beam algorithm, Journal of Optical Society of America A 1(6) (1984), 612–619.

Tuy

H.K.

, An inversion formula for cone-beam reconstruction, SIAM Journal on Applied Mathematics 43(3) (1983), 546–552.

Andersen

and Kak

, Simultaneous algebraic reconstruction technique (SART): A superior implementation of the ART algorithm, Ultrasonic Imaging 6(1) (1984), 81–94.

Andersen

A.H.

, Algebraic Reconstruction in CT from Limited Views, IEEE Transactions on Medical Imaging 8(1) (1989), 50–55.

Smith

B.D.

, Image reconstruction from cone-beam projections: Necessary and sufficient conditions and reconstruction methods, IEEE Transactions on Medical Imaging 4(1) (1985), 14–25.

Sidky

E.Y.

and Pan

, Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization, Physics in Medicine and Biology 53(17) (2008), 4777–4807.

Candes

E.J.

, Romberg

J.K.

and Tao

, Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information, IEEE Transactions on Information Theory 52(2) (2006), 489–509.

10.

Candes

E.J.

, Romberg

J.K.

and Tao

, Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics 59(8) (2006), 1207–1223.

11.

Sidky

E.Y.

, Jørgensen

J.H.

and Pan

, Convex optimization problem prototyping for image reconstruction in computed tomography with the Chambolle-Pock algorithm, Physics in Medicine and Biology 57(10) (2012), 3065–3091.

12.

Sidky

E.Y.

, Jørgensen

J.H.

and Pan

, First-order convex feasibility algorithms for x-ray CT, Medical Physics 40(3) (2013), 031115.

13.

H.Y.

and Wang

, Compressed sensing based interior tomography, Physics in Medicine and Biology 54(9) (2009), 2791–2805.

14.

, Wang

, Hsieh

, Entrikin

D.W.

, Ellis

, Liu

and Carr

J.J.

, Compressive sensing-based interior tomography: Preliminary clinical application, Journal of Computer Assisted Tomography 35(6) (2011), 762–762.

15.

Nien

and Fessler

J.A.

, Fast X-ray CT image reconstruction using a linearized augmented Lagrangian method with ordered subsets, IEEE Transactions on Medical Imaging 34(2) (2015), 388–399.

16.

Ramani

and Fessler

J.A.

, A splitting-based iterative algorithm for accelerated statistical X-ray CT reconstruction, IEEE Transactions on Medical Imaging 31(3) (2012), 677–688.

17.

Cai

, Wang

, Yan

, Li

, Zhang

and Hu

, Efficient TpV minimization for circular, cone-beam computedtomography reconstruction via non-onvex optimization, Computerized Medical Imaging and Graphics 45(1) (2015), 1–10.

18.

Glowinski

, Numerical Methods for Nonlinear Variational Problems, Springer, New York, 1984.

19.

Glowinski

, Le Tallec

, Augmented Lagrangian and operatorsplitting methods, In: Nonlinear Mechanics, SIAM Studies in Alied Mathematics., SIAM, Philadelphia, 1989.

20.

Bertsekas

D.P.

and Tsitsiklis

J.N.

, Parallel and distributed computation: Numerical methods, Prentice hall Englewood Cliffs, NJ, 1989.

21.

Boyd

, Parikh

, Chu

, Peleato

and Eckstein

, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends^® in Machine Learning 3(1) (2011), 1–122.

22.

Deng

and Yin

, On the global and linear convergence of the generalized alternating direction method of multipliers, Journal of Scientific Computing 66(3) (2016), 889–916.

23.

, Yin

, Jiang

and Zhang

, An efficient augmented Lagrangian method with applications to total variation minimization, Computational Optimization and Applications 56(3) (2013), 507–530.

24.

Wang

, Cai

, Zhang

, Yan

, Li

, Hu

and Bao

, Distributed CT image reconstruction algorithm based on the alternating direction method, Journal of X-ray Science and Technology 23(1) (2015), 83–99.

25.

Wang

, Cai

, Zhang

, Yan

, Li

and Hu

, Distributed reconstruction via alternating direction method, Computational and Mathematical Methods in Medicine 2013(2013).

26.

Cai

, Wang

, Zhang

, Yan

, Li

, Xi

and Li

, Edge guided image reconstruction in linear scan CT by weighted alternating direction TV minimization, Journal of X-Ray Science and Technology 22(3) (2014), 335–349.

27.

, Niu

, Huang

, Bian

, Feng

, Yu

, Liang

, Chen

and Ma

, An efficient augmented lagrangian method for statistical X-ray CT image reconstruction, PloS one 10(10) (2015), e0140579.

28.

Zhang

, Wang

, Yan

, Li

, Xi

and Lu

, Image reconstruction based on total-variation minimization and alternating direction method in linear scan computed tomography, Chinese Physics B 22(7) (2013), 078701.

29.

Hudson

H.M.

and Larkin

R.S.

, Accelerated image reconstruction using ordered subsets of projection data, IEEE Transactions on Medical Imaging 13(4) (1994), 601–609.

30.

Kamphuis

, Beekman

F.J.

, Viergever

M.A.

(1197), Accelerated SPECT reconstruction using OS-EM with only two projections per subset, in: 1995 IEEE Nuclear Science Symosium and Medical Imaging Conference Record, 1995, pp. 1193–1197 vol.1192.

31.

Wallis

J.W.

, Miller

T.R.

, Dai

G.M.

(1756), Comparison of the convergence properties of the It-W and OS-EM algorithms in SPECT, in: 1997 IEEE Nuclear Science Symosium, 1997, pp. 1752–1756 vol.1752.

32.

Urabe

, Ogawa

(1998), Introduction of ordered subsets algorithm to maximum a posteriori expectation maximization method, in: 1998 International Conference on Imagerocessing Proceedings., 1998, pp. 394–398.

33.

Beekman

F.J.

, Kamphuis

(2000), Fast ordered subset reconstruction for X-ray CT, in: 2000 IEEE Nuclear Science Symosium Conference Record, 2000, pp. 15/87–15/90 vol.12.

34.

Soo-Jin

, Accelerated deterministic annealing algorithms for transmission CT reconstruction using ordered subsets, IEEE Transactions on Nuclear Science 49(5) (2002), 2373–2380.

35.

Chambolle

and Pock

, A first-order primal-dual algorithm for convex problems with applications to imaging, Journal of Mathematical Imaging and Vision 40(1) (2010), 1–26.

36.

Pock

, Chambolle

(2011), Diagonal preconditioning for first order primal-dual algorithms in convex optimization, in: 2011 IEEE International Conference on Comuter Vision (ICCV), IEEE, 2011, pp. 1762–1769

37.

Sidky

E.Y.

, Chartrand

, Boone

J.M.

and Pan

, Constrained TV minimization for enhanced exploitation of gradient sparsity: Application to CT image reconstruction, IEEE Journal of Translational Engineering in Health and Medicine 2(1) (2014), 1–18.

38.

Sidky

E.Y.

, Kraemer

D.N.

, Roth

E.G.

, Ullberg

, Reiser

I.S.

and Pan

, Analysis of iterative region-of-interest image reconstruction for x-ray computed tomography, Journal of Medical Imaging (Bellingham) 1(3) (2014), 031007.

39.

Gabay

and Mercier

, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Journal of Computational and Applied Mathematics 2(1) (1976), 17–40.

40.

Hestenes

M.R.

, Multiplier and gradient methods, Journal of Optimization Theory and Applications 4(5) (1969), 303–320.

41.

Powell

M.J.D.

, A method for nonlinear constraints in minimization problems, London, Academic Press, New York, 1969.

42.

Xiao

and Song

, An Inexact Alternating Directions Algorithm for Constrained Total Variation Regularized Compressive Sensing Problems, Journal of Mathematical Imaging and Vision 44(2) (2012), 114–127.

43.

, Liao

L.Z.

, Han

and Yang

, A new inexact alternating directions method for monotone variational inequalities, Mathematical Program 92(1) (2002), 103–118.

44.

Yin

, Osher

and Goldfarb

, Bregman iterative algorithm for l1-minimization to compressed sensing, SIAM Journal on Imaging Sciences 1(1) (2008), 143–168.

45.

Goldstein

and Osher

, The split Bregman method for L1-regularized problems, SIAM Journal on Imaging Sciences 2(2) (2009), 323–343.

46.

Esser

, Applications of Lagrangian-based alternating direction methods and connections to split Bregman, in: UCLA Comutational and Applied Mathematics Reports, 2009.

47.

Daubechies

, Defrise

and De Mol

, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Communications on Pure and Applied Mathematics 57(11) (2004), 1413–1457.

48.

Manglos

G.G.M.

and Krol

S.H.

, Transmission maximum-likelihood reconstruction with ordered subsets for cone-beam CT, Physics in Medicine and Biology 40 (1995), 1225–1241.

49.

Cox

, Schaer

, Mobley

(1996), Fast distributed EM computation for PET image reconstruction using ordered subsets, in: IEE Colloquium on Advances in Electrical Tomograhy (Digest No: 1196/143), 1996, pp. 8/1–8/9.

50.

Kim

, Ramani

and Fessler

J.A.

, Combining ordered subsets and momentum for accelerated X-ray CT image reconstruction, IEEE Transactions on Medical Imaging 34(1) (2015), 167–178.

51.

Erdŏgan

and Fessler

J.A.

, Ordered subsets algorithms for transmission tomography, Physics in Medicine and Biology 44(11) (1999), 2835–2851.

52.

Jorgensen

J.S.

, Sidky

E.Y.

and Pan

, Quantifying admissible undersampling for sparsity-exploiting iterative image reconstruction in X-ray CT, IEEE Transactions on Medical Imaging 32(2) (2013), 460–473.

53.

Jørgensen

J.H.

, Sidky

E.Y.

and Pan

, Analysis of discrete-to-discrete imaging models for iterative tomographic image reconstruction and compressive sensing, IEEE Transactions on Medical Imaging 1(3) (2011), 1–15.

Optimization-based image reconstruction in computed tomography by alternating direction method with ordered subsets

Abstract

Keywords

1 Introduction

2 Generic alternating direction method and ordered subset acceleration

2.1 Some essential theoretical backgrounds

3 CT algorithms instances by OS-ADM

3.1.1 Unconstrained least-square reconstruction

4.1.1 Sparse view reconstruction

Table 1 Available choice for parameters selection for simulation studies Parameters selection ρ 1 ρ 2 β1 β2 τ ξ L1-TV 0.007 25.6 0.16 1.28 1.0 1.0 KL-TV 0.005 16000 0.16 1.00 1.0 1.0

Footnotes

Acknowledgments

Appendix 1:

Appendix 2:

Appendix 3:

Appendix 4:

Appendix 5:

References

Table 1
Available choice for parameters selection for simulation studies

Parameters selection ρ ₁ ρ ₂ β₁ β₂ τ ξ

L1-TV 0.007 25.6 0.16 1.28 1.0 1.0

KL-TV 0.005 16000 0.16 1.00 1.0 1.0