Abstract
The “missing wedge” of a single tilt in electron tomography introduces severe artifacts into the reconstructed results. To reduce the “missing wedge” effect, a widely used method is “multiple-tilt reconstruction,” which collects projections using multiple axes. However, as the number of tilt series increases, the computing and memory costs also rise. The degree of parallelism is limited by the sample thickness, and a large memory requirement cannot be met by most multicore computers. In our study, we present a new fully distributed multiple-tilt simultaneous iterative reconstruction technique (DM-SIRT). To improve the parallelism of the reconstruction process and reduce the memory requirements of each process, we formulate the multiple-tilt reconstruction as a consensus optimization problem and design a DM-SIRT algorithm. Experiments show that in addition to slightly better resolution, DM-SIRT can obtain a 13.9 × accelerated ratio compared with the full multiple-tilt reconstruction version. It also has a 97% decrease in memory overhead and is 16 times more scalable than the full reconstruction version.
1. Introduction
In recent years, cryoelectron tomography (cryo-ET) has become a powerful approach that enables the visualization of macromolecular complexes and assemblies in a near-native cellular environment (Lučić et al., 2013; Grotjahn et al., 2018). In addition, cryo-ET can achieve the atomic resolution structure in situ by analyzing repeating structures within larger objects, namely, subtomogram averaging (Briggs, 2013).
In cryo-ET, the microscope stage is tilted around a single fixed axis with a range from −60° to +60°. The three-dimensional structure can be reconstructed from the series of two-dimensional projection images (also called the tilt series). However, the tilt angle range of 60° to 90° and −60° to −90° cannot be achieved in the process. The absence of projection images leads to a lack of reconstruction information. This problem is commonly referred to as the “missing wedge” and causes severe ray artifacts for the reconstructed results.
One method used to reduce the “missing wedge” problem is acquiring multiple axes, that is, collecting multiple-tilt series by rotating the sample in a plane. To acquire a double axis, the sample first tilts around a single fixed axis and then rotates 90° to obtain the other tilt series (Penczek et al., 1995). The multiple-tilt series can be obtained by reducing the rotation angle in a plane and can be extended to an 8-tilt series or a 16-tilt series (Phan et al., 2017). As the number of tilt series increases, the absence information also decreases, so the “missing wedge” effect can be weakened into a “missing pyramid” effect (Mastronarde, 1997).
There are many program packages for dealing with double-axis reconstruction, such as IMOD (Mastronarde and Held, 2016), TxBR (Lawrence et al., 2006), and AuTom-dualx (Han et al., 2018). To achieve high-quality reconstruction results, these programs all use iteration-based methods (Gilbert, 1972).
When the number of axes increases, the reconstruction of multiple-axis data has two critical problems. The iteration-based methods are time consuming, so the parallelization of the multiple-axis reconstruction method is required. However, the degree of parallelism is limited by the existing parallel strategies. In contrast to single-tilt reconstruction, which can split the reconstructed volume along the tilting axis (Wang et al., 2018), the geometry in multiple-tilt reconstruction is nonlinear, and the Y-axis varies while the X-axis rotates. For multiple-tilt data collection, the reconstructed volume can be split only along the Z-axis (Zhang et al., 2014). Limited by electron microscopy imaging conditions, the sample is very thin (Majorovits et al., 2007), severely restricting the parallel degree along the Z-axis. Moreover, parallelization generates an enormous memory requirement. For iterative methods, updating requires assessing the whole projection series in each iteration, which means that each process must handle the whole projection series. Compared with the single-tilt series, which has only ∼100 images, 16-tilt data collection has ∼2000 projection views. For example, in the 2048 × 2048 projection series, each iteration needs 60.5 GB memory to handle the process. The memory requirement cannot be satisfied by most multicore computers.
To solve these obstacles in the process of paralleling multiple-tilt reconstruction, we developed a new fully distributed multiple-tilt simultaneous iterative reconstruction technique (DM-SIRT).
We first formulate the multiple-tilt reconstruction as a consensus optimization problem. To solve the problem, we divide the multiple-tilt data into multiple subsets regarded as independent data to optimize the same target. Then, we apply a multiagent consensus equilibrium (MACE) framework (Buzzard et al., 2018) to optimize the results of each subset by iteratively updating the global target. This framework has been proven to converge to the globally optimal result when each subset can converge independently to its global optimal solution. To the best of our knowledge, this is the first multiple-tilt reconstruction method in electron tomography based on consensus optimization.
Our presented distributed method can solve the aforementioned problems of multiple-axis reconstruction. First, the distributed method can improve the parallelism of the reconstruction because we apply a new data partitioning strategy, and each subset can perform reconstruction separately. In addition, based on the new data partitioning methods, each subset needs only partial projection data. This approach can reduce the memory requirements for each process and adjust the number of projections processed according to the memory of the real computing environment. Finally, we use a multitree parallel strategy to reduce the communication overhead and improve the scalability of DM-SIRT. Multiple-tilt reconstruction can benefit from these strategies and be performed with high efficiency and less memory consumption.
The remainder of the article is organized as follows. Section 2 introduces the background of the multiple-tilt reconstruction and MACE. Section 3 presents the theory and implementation of our new distributed framework, DM-SIRT. In Sections 4 and 5, we show the resolution, time, and scalability performance of DM-SIRT by comparing it with widely used methods. Finally, we present the conclusions in Section 6.
2. Related Work
2.1. Multitilt reconstruction
Multiple-tilt data acquisition can be considered as N (N ≥ 2) repeats of single-tilt data acquisition. For example, in double-tilt data acquisition, also known as dual-axis tomography, the sample is rotated 90° in the XOY plane to obtain two tilt series, as shown in Figure 1a. Multiple-tilt data can be classified as double-tilt, 4-tilt, 8-tilt, or 16-tilt, as shown in Figure 1b, based on the number of rotation angles. With the increase in the number of tilts, the reconstruction artifacts can be gradually weakened by the “missing wedge.”

Data acquisition of cryoelectron tomography.
Because the geometry of each tilt is different, and the sample may move in the plane, the images must be aligned within the data set. The process of alignment adjusts the data from different geometries to a single global coordinate system to ensure the accuracy of reconstruction (Arslan et al., 2006). There are several methods of double-tilt data alignment, including IMOD, TxBR, and AuTom-dualx. However, only TxBR can process multiple-tilt data alignment, and we also use the TxBR package to adjust the geometry. The data can be reconstructed after the process of alignment. In multiple-tilt reconstruction, the direct back-projection method cannot take full advantage of the relation of multiple-tilt data. The iterative method of updating the volume is more suitable for multiple-axis data.
2.2. Multiagent consensus equilibrium
MACE is a framework for an inverse problem in imaging that focuses on including various kinds of image processing operations and simultaneously balancing the multiple operators. There is a brief mathematical description of the MACE framework. The simplest reconstruction formula for the inverse problem is
where f is the data fidelity term, h is the regular term, and x is the reconstruction result. In more general terms, if the data have different fidelity functions, the cost function can be written as
where variable
Buzzard et al. (2018) propose the general MACE framework to solve the consensus optimization problem, as shown in Equation (3). The framework can handle multiple heterogeneous models from optimizing formulas or learn from source data. It maps Equation (3) to an auxiliary function, Equation (4), similar to that proposed by alternating direction method of multiplier (ADMM) (Boyd et al., 2011) to solve the consensus equilibrium. After mapping, it solves the function as a fixed-point problem and uses an iteration framework to calculate a globally optimal solution. A more detailed reformulation of the framework and proof of convergence can be found in Buzzard et al. (2018).
2.3. Consensus equilibrium for computed tomography
Sridhar et al. (2018) present a distributed and iterative approach for computed tomography reconstruction based on the MACE framework.
This study divides the sinograms into multiple disjoint subsets. The different subsets are regarded as different agents, and the consensus target is the reconstructed slice. To solve the problem by MACE, x represents the image to be reconstructed and N represents the number of view subsets given in Equation (3). They use a greedy method that uses only one full pass of the iterative coordinate descent optimization technique (Wang et al., 2016) to replace the auxiliary function F shown in Equation (4).
3. Methods
In the multiple-tilt reconstruction, with the increase in the number of tilts, the computation and memory occupation also increase. However, the degree of parallelism is limited, and the projection data in one process become huge. If we naively divide each set of tilt data, reconstruct simultaneously, and average the results directly, the final result would be far from the true reconstruction. The main reason is that each reconstructed volume has different gray levels.
3.1. Distributed multiple-tilt SIRT
To solve the discussed problem, we first analyze the reconstruction method in electron tomography. Let n denote the total number of voxels in the 3D volume and
where W is defined as the projection matrix. In matrix W, the element Wij specifies the contribution of voxel
In multiple-tilt reconstruction, we often use the family of iterative algebraic reconstruction algorithms. For the kth iteration, the concrete updating strategy is
W is the projection matrix from all orientations, WT is the back-projection operator, and
We apply the consensus equilibrium framework for multiple-tilt electron tomography reconstruction and use optimization functions that have been proven to converge to replace the proximal map in the MACE framework so that the reconstructions from all tilts can achieve a global consensus solution. In this new framework, we can guarantee that the reconstruction results do not get worse. At the same time, we can improve parallelism to increase efficiency and reduce the memory consumption of each process. The framework is called DM-SIRT, and the algorithm is shown in Algorithm 1.
First, the multiple-tilt data are divided into N different subsets. For each subset i, there is a local weighted result

The consensus framework of distributed multiple-tilt reconstruction.
3.2. Overlapped data division strategy
In the consensus equilibrium method, a round-robin method is often used to divide data. However, it is not suitable for multiple-tilt reconstruction. Although we have adjusted multiple-tilt data to the same coordinate system, there is still an existing difference between the different axis projections. We divide the different tilt data into different subsets that can contribute to updating faster in the inner loop of DM-SIRT. Within the scope of the same axis, more commonness exists in the adjacent angle, and each subset needs overlapping angles to avoid overfitting in the outer loop of DM-SIRT.
We distribute the projection data into two steps. Figure 3 shows the data division strategy in DM-SIRT, and the number of overlap angles is set to 2. While dividing into subsets, we first separate the projection angle from the A to the N axis to ensure that the data of the same tilt are divided together. Then, dividing the angles in the same tilt, we group the adjacent angles together; for example, subset i − 1 includes −67°, −66°, and subset i includes 65°, 66°. To avoid overfitting, each subset will obtain some overlap angles from the next subset; for example, subset i has −1° and 0° in subset i − 1.

The data division strategy for DM-SIRT. DM-SIRT, distributed multiple-tilt simultaneous iterative reconstruction technique.
3.3. Parallel strategy based on Multitree
Based on the process of the algorithm, the projection images are divided into N subsets according to the multiple-tilt projection angle (usually, N can be 4, 8, or 16…). We can process N subsets in parallel. In each subset, the reconstructed volume is divided along the Z-axis. The calculation of each Z slice is independent, so we can use as many processes as possible for the calculation.
Based on these computational characteristics, we design a multitree parallel strategy. In contrast to the traditional master-slave architecture, to make the best use of the computing resources, all the processes participate in computing. The details of the hierarchy are shown in Figure 4. Each node corresponds to a process. We mark the multitree nodes as two types. The first type of node, which is responsible for the control and computation, is the parent node of each subset, and it needs to update the subset variables in the outer loop of DM-SIRT, such as zi, wi in Algorithm 1. In particular, the parent node of subset 0 is also the root node. It interacts with other control nodes for data and is also responsible for calculating the final result

Parallel strategy based on multitree for DM-SIRT.
In the multitree hierarchy, the control and compute nodes use Reduce and Bcast operation to synchronize global data. The compute nodes use Scatter and Gather operation to communicate needed data. The hierarchy can not only improve communication efficiency but also reduce the memory occupancy of each process.
4. Experimental Setup
4.1. Data sets
We used the cryo-ET data set named EEL-Crosscut-Four, which was obtained using an FEI Titan operated at 300 kV, from the National Center for Microscopy and Imaging Research (NCMIR); the micrograph was produced by a 4096 × 4096 CCD camera. The tilt series includes four axes, and the acquisition method is shown in Figure 1b. The tilt angles of the projection images in each axis range from −60° to 60° at 1° intervals. There are 121 images of each axis. The size of the projection image is 4096 × 4096 with a pixel size of 1.36 nm. In this article, to ensure that all methods can work, we bin the tilt series with four factors to generate a data set with a size of 1024 × 1024 × 121 × 4. The tilt series are aligned using TxBR, and the size of the reconstruction result is 1024 × 1024 × 58. The experiments are all performed on the Tianhe-2 supercomputer (Liao et al., 2014). Each node is equipped with two Intel Xeon E52692 2.2 GHz processors, each of which has 24 cores and 128 GB of RAM.
4.2. Experimental setup
We use four methods to analyze the performance. The first method is the conventional method, full angles iterative reconstruction technique (FULL-SIRT), which uses full angles to reconstruct the volume. The second is a simple reconstruction method named filtered-back projection (FBP) (Herman and Frank, 2014). The third is our proposed DM-SIRT. The last method, dividing the projection angle, reconstructing it independently, and combining the results directly, is named DirectCombine-SIRT. All iterative methods use the same relax factor 0.3, and the whole number of iterations is set to 100. The DM-SIRT inner iteration time is set to 10, and the outer iteration time is also set to 10. We use 32 subsets for DM-SIRT, and the number of overlaps is set to 2.
5. Results
In this section, we describe the results of our experiments. First, we compare the reconstruction results of FULL-SIRT, FBP, DM-SIRT, and DirectCombine-SIRT. Next, we compare the timing and memory performance of FULL-SIRT and DM-SIRT. Finally, we analyze the scalability of DM-SIRT.
5.1. Reconstruction precision
Figure 5 shows the center slice of the EEL-Crosscut-Four data set results. From the visual point of view, the results of FULL-SIRT and DM-SIRT are very similar, so we next use the normalized correlation coefficient (NCC) given in Equation (10) between the reprojections of the reconstruction methods with the original tilt series (axis A in this article) to obtain a more intuitive analysis.

The reconstruction results using multiple-tilt data.
From Figure 6, we find that the DM-SIRT performance is almost the best on the NCC results over all tilt angles. FULL-SIRT, as the standard method, did not perform well at some high angles compared with DM-SIRT. This finding shows that our proposed method does not reduce accuracy but achieves better accuracy than the standard method. The accuracy of FBP and DirectCombine-SIRT is worse than that of DM-SIRT and FULL-SIRT, which shows that these methods are not usually used due to poor performance. We cannot separate and combine data directly to improve parallelism because this practice severely reduces the accuracy.

The NCC comparison with different reconstruction methods. NCC, normalized correlation coefficient.
5.2. Performance results
We test the overall performance of FULL-SIRT and DM-SIRT on the Tianhe-2 supercomputer. Table 1 lists the reconstruction time at different number of nodes. FULL-SIRT can divide the data only along the Z-axis, so the degree of parallelism is limited by the Z-axis thickness. The result “NA” in the last four columns indicates that the FULL-SIRT cannot scale up to 96 cores because the Z-axis thickness is 58 for the EEL-Crosscut-Four data set.
The Acceleration of Distributed Multiple-Tilt Simultaneous Iterative Reconstruction Technique Compared with FULL-SIRT
DM-SIRT, distributed multiple-tilt simultaneous iterative reconstruction technique; FULL-SIRT, full angles iterative reconstruction technique.
In contrast, our DM-SIRT method improves the parallelism by dividing the projection angle, so it can be scaled to the entire number of tested nodes with 768 cores. In addition, due to the data division strategy, the memory consumption of each process is also reduced; the detailed data are shown in Section 5.3. As given in Table 1, DM-SIRT achieves 46 minutes of reconstruction time at 768 nodes. It is 13.9 times faster than the FULL-SIRT performance on 48 nodes and is 16 times more scalable than FULL-SIRT.
5.3. Memory overhead
We analyze the memory occupation in each process. The primary memory consumption includes the reconstructed result, the projection data, and the weight array needed by each process. As FULL-SIRT divides only the reconstructed volume in each process, each process needs to handle the whole projection series. We use subset 0 to represent the FULL-SIRT method in Figure 7. The other numbered subsets were all adopted by DM-SIRT. Based on the results shown in Figure 7, with the increase in the number of subsets, the memory consumption of each process decreases accordingly. When DM-SIRT adopts 32 subsets, it has a 97% decrease in memory overhead compared with FULL-SIRT.

The memory overhead in different subsets.
5.4. Scalability performance
In the scalability test, we fixed the total number of tasks and tested the scalability with the aforementioned reconstruction data. We increased the number of processes only from 48 to 768. As shown in Figure 8, we observe that the parallel efficiency decreases to 80% when using 192 processes and decreases to 76% when using 768 processes. The observed degradation of scalability efficiency is acceptable.

The scalability performance.
6. Conclusion
In this study, we present a new fully distributed multiple-tilt reconstruction framework (DM-SIRT). We are the first to formulate the reconstruction as a consensus optimization problem in cryo-ET. With the help of a MACE approach, we improve the parallelism of the reconstruction process and reduce the memory requirements by reducing the number of projection data that each process needs while guaranteeing the reconstruction accuracy. We propose a new overlapping data division strategy to accelerate convergence and prevent overfitting during reconstruction. Furthermore, we use the multitree hierarchy parallel method to improve scalability. Multiple-tilt reconstruction benefits from these strategies and can be performed without loss of resolution. Experiments also show that our proposed method has a high degree of parallelism, low memory consumption, and high scalability.
Footnotes
Acknowledgments
We thank Albert Lawrence and Sebastien Phan at UCSD for providing the experimental data set. All intensive computations were performed on Tianhe-2 supercomputer at the National Supercomputer Center in Guangzhou (NSCC-GZ), China.
Author Disclosure Statement
The authors declare they have no competing financial interests.
Funding Information
This research was supported by National Key Research and Development Program of China (Grant Nos. 2017YPE0103900 and 2017YFA0504702), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDA19020400), the NSFC projects (Grant Nos. U1611263, U1611261, 61932018, and 61672493), Beijing Municipal Natural Science Foundation (Grant No. L182053) and Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase).
