An efficient and high-quality scheme for cone-beam CT reconstruction from sparse-view data

Abstract

Computed tomography (CT) is capable of generating detailed cross-sectional images of the scanned objects non-destructively. So far, CT has become an increasingly vital tool for 3D modelling of cultural relics. Compressed sensing (CS)-based CT reconstruction algorithms, such as the algebraic reconstruction technique (ART) regularized by total variation (TV), enable accurate reconstructions from sparse-view data, which consequently reduces both scanning time and costs. However, the implementation of the ART-TV is considerably slow, particularly in cone-beam reconstruction. In this paper, we propose an efficient and high-quality scheme for cone-beam CT reconstruction based on the traditional ART-TV algorithm. Our scheme employs Joseph's projection method for the computation of the system matrix. By exploiting the geometric symmetry of the cone-beam rays, we are able to compute the weight coefficients of the system matrix for two symmetric rays simultaneously. We then employ multi-threading technology to speed up the reconstruction of ART, and utilize graphics processing units (GPUs) to accelerate the TV minimization. Experimental results demonstrate that, for a typical reconstruction of a 512 × 512 × 512 volume from 60 views of 512 × 512 projection images, our scheme achieves a speedup of 14 × compared to a single-threaded CPU implementation. Furthermore, high-quality reconstructions of ART-TV are obtained by using Joseph's projection compared with that using traditional Siddon's projection.

Keywords

computed tomography image reconstruction algebraic reconstruction technique compressed sensing total variation parallel computing

Introduction

Computed Tomography (CT) is capable of generating detailed cross-sectional images of scanned objects non-destructively. Over the past decades, industrial CT has been widely used in a variety of applications.^1–3 Nowadays, industrial CT has proven to be an effective tool for analyzing and modelling of cultural relics.^4–6 To obtain high-quality images for cultural relic objects, complete projections are usually required for reconstruction with analytical methods such as FDK (Feldkamp Davis Kress).⁷ This will increase not only the scan time, but also the data acquisition cost. One feasible way to address this problem is to scan the cultural relic objects with larger angular sampling intervals than normal. Such scanning mode, however, leads to severe artifacts in the reconstructions due to incomplete projection data. Hence, reconstructing high-quality images from sparse view projections has become an important issue in the field of CT reconstruction.^8–12

Traditional iterative methods, such as the ART¹³ and the simultaneous ART (SART),¹⁴ have great advantages over analytical methods for reconstruction under incomplete projection data. However, the reconstructions with these iterative methods usually suffer from severe artifacts under sparse-view conditions. In 2005, Yu et al. introduced total variation (TV) constrained minimization in the optimization process of iterative reconstruction.¹⁵ In 2006, Candes et al. introduced the compressed sensing (CS) theory, which showed that under certain assumptions, unknown signals could be accurately recovered from far fewer samples.¹⁶ The CS theory provides a powerful tool for reconstructing images from sparse-view projections. Very soon after the introduction of CS theory, several researchers have reported their CS-based methods for reconstruction from insufficient projection data. Sidky et al. developed the ART-TV algorithm for fan-beam CT.¹⁷ Then, Sidky and Pan minimized the TV algorithm by steepest descent with an adaptive step-size, and applied it to a circular cone-beam CT system.¹⁸ To improve the reconstruction quality of CS-based methods, Hsieh et al. proposed a modified canny operator and applied it to the CS algorithm.¹⁹ Wu proposed a fast-iterative CT reconstruction algorithm,²⁰ where the CS theory was applied and the physical and mathematical bases were analysed for the reconstruction.

Over the past decade, several groups of researchers have devoted their efforts to improving the CT reconstruction quality of TV-based approaches. Liu et al.²¹ proposed an adaptive-weighted TV (AwTV) minimization algorithm for low-dose CT reconstruction with sparse-view projection. They demonstrated that their AwTV model can preserve fine structures and edges than the traditional TV model. Chen et al.²² proposed a new anisotropic total variation (ATV) reconstruction approach to solve the limited-angle imaging problem. Song et al.²³ proposed a nonlinear weighted anisotro-pic TV regularization model to preserve the conductivity profiles for electrical impedance tomography by incorporating the internal inhomogeneity information. Xi et al.²⁴ proposed an adaptive-weighted high order total variation (awHOTV) algorithm, in which they constructed the second order TV-norm using the second order gradient. They claimed that the awHOTV algorithm can effectively suppress block artifacts and preserve image details. Ertas et al.²⁵ proposed an approach to reduce the artifacts due to insufficient projection data. This approach combines TV and non-local means process, which can obtain better reconstructions compared with traditional ART-TV and ART methods. Although these adapted TV models can improve the reconstruction quality to a certain extent, they will obviously increase the computational cost. Consequently, the reconstruction speed will be a prominent issue to be solved for cone-beam CT.

In previous work, we proposed an efficient method for computing the system matrix of Siddon's projection,²⁶ and then applied it to ART-TV. By using parallel computing technology, we realized fast implementation of ART-TV. However, this method only solves the system matrix computation for fan-beam geometries, which needs to be further extended for cone-beam CT geometries. More recently, we proposed a hybrid reconstruction model,²⁷ called L1-αL2-TV, for 3D cultural relics model reconstruction. The L1-αL2-TV model was solved by the split Bregman algorithm, which has been shown to achieve low-noise and high-quality reconstruction from cone-beam sparse view projections. However, since the split Bregman algorithm requires several intermediate variables of image size, it is difficult for this method to be performed for the reconstruction of high-resolution images. Moreover, the above methods use Siddon's projection, which is a relatively simple and less accurate reconstruction model. By contrast, Joseph's projection can provide better reconstruction quality with iterative methods by using linear interpolation. However, existing methods for the computation of Joseph's projection are not efficient, which precludes its practical use for reconstruction. To address this problem, we proposed an efficient method for computation of Joseph's projection.²⁸ We found that the reconstruction speed and quality can be improved compared with the most used Siddon's projection.

In recent years, deep learning has been successfully used in diverse image processing applications for challenging problems.^29–33 To address the sparse-view CT problem, many researchers have paid their attention to deep learning-based approaches. Zhang et al.³⁴ proposed a DD-Net, which combines the DenseNet and Deconvolution-based network. Li et al.³⁵ proposed a dual-domain with parallel Transformer network (DDPTransformer) to solve the dual-domain CT reconstruction under sparse-view. Zhou et al.³⁶ proposed a dual-domain data consistent recurrent network, named DuDoDR-Net. The DuDoDR-Net reconstructs images from both recurrent image domain and sinogram domain restorations. Zhang et al.³⁷ proposed a meta-inversion network, named MetaInv-Net, where the number of trainable parameters is significantly reduced. Obviously, deep learning-based approaches have shown great potential for high-quality CT reconstruction under sparse-view projections. However, since the reconstruction results rely heavily on the training datasets, it is difficult for the deep learning-based reconstruction approaches to obtain satisfactory results from projection data with any level of sparsity. More importantly, most of these approaches are designed to solve the fan-beam spars-view CT problem.

To enhance the CT reconstruction speed, several parallel computing techniques have been utilized by researchers. Chen et al.³⁸ accelerated the TV regulated three-dimensional expectation maximization (EM) algorithm by a hybrid architecture using CPU, GPU, and FPGA. Liu et al.³⁹ utilized the texture memory and hardware interpolation on GPUs to a parallel branchless distance-driven algorithm for cone beam CT. Zhang et al.⁴⁰ proposed a fast method for parallel implementation of the FDK algorithm by using multi-GPU. Liu et al.⁴¹ extended the SART algorithm to linear scan cone-beam CT and accelerated it using cluster computing. In addition to optimizing the reconstruction algorithms themselves, these parallel computing techniques offer effective means to further accelerate the cone-beam CT reconstructions.

In this paper, we focus on improving the reconstruction quality and speed against traditional ART-TV algorithm, and propose a fast and high-quality scheme for cone-beam CT reconstruction from spars-view projections. In our scheme, Joseph's projection is utilized for the forward projection operation of ART to improve the reconstruction quality. To accelerate the reconstruction of Joseph's projection based ART, we propose an efficient method to compute the elements of the system matrix for two symmetric rays simultaneously. We then use the multi-threading technology to further speed up the ART procedure, and the GPUs to accelerate the TV minimization processes. The major contributions of this paper are listed as follows:

An improved algorithm ART-TV-J is proposed for high-quality reconstruction from sparse-view projections.

To improve the reconstruction speed of ART-TV-J, an efficient method is proposed to compute the system matrix for Joseph's projection using symmetry.

To accelerate the implementation of ART-TV-symJ for cone-beam CT system, a hybrid parallel reconstruction scheme is proposed.

The organization of our paper is as follows. In Section 2, we briefly present Joseph's projection based ART-TV algorithm, and then describe the system matrix computation of Joseph's projection using symmetry as well as our parallel acceleration scheme in detail. Section 3 illustrates the experimental results for the evaluation of our method using both simulated and real CT projection data. Finally, conclusions and future work are presented in Section 4.

Methods

ART-TV algorithm

The reconstruction problem of cone-beam CT can be modeled as a linear algebra equation: Af = p . Here, A is the system matrix (M × N) in which an element a_ij, namely weight coefficient, represents a measure of the influence that jth voxel has on the attenuation of ith ray; f is the unknown column vector (N × 1) whose element f_j is the value of jth voxel, and N = I × I × I is the total number of voxels; p is the column vector (M × 1) whose element p_i is the value of ith pixel measured from the detector plane, and M = ϕ×N_c × N_r is the total number of pixels (or rays), where ϕ is the number of projection views, N_c and N_r is the column and row number of detector plane, respectively. We can rewrite Af = p as a system of linear equations:

\sum_{j = 1}^{N} a_{i j} f_{j} = p_{i}, i = 1, 2, \dots, M

(1)

It is obvious that the weight coefficients play an important role in solving Eq. (1). Practically, the exact value of a_ij depends on the system model. In many current CT systems, the line intersection based system model is widely used due to its simplicity, where a_ij is equal to the length of on intersection of the jth voxel with ith projection ray. The line intersection model, however, represents a voxel by a cubic box, which is a crude approximation of image and usually leads to artifacts in the reconstruction due to its unrealistic discontinuity. More accurate system models, such as Joseph's and Köhler's projection methods,^42,43 consider the image as a smooth function. In this work, we choose Joseph's projection as the system model in consideration of its accuracy. In this system model, the projection line intersects with a series of yz- or xz-planes and generates series of intersection points. These intersection points are equally spaced with a slab length. The value of each intersection point can be obtained by using bilinear interpolation from its four nearest neighboring voxels. Then, the weight coefficient of each voxel can be computed by its interpolation coefficients multiplied by the slab length, which is shown in Figure 1.

Figure 1.

Illustration of weight coefficients of 3D Joseph's projection.

As shown in Figure 1, suppose the ith projection line intersects with a yz-plane at point A, whose four nearest voxels are 1, 2, 3, and 4. The slab length between two neighboring yz-planes is L. The horizontal and vertical distances from A to voxel 1 are u1 and v1, respectively. Then, the contributions of the four voxels can be computed by [(1 − u)(1 − v)f₁+ (1 − u)vf₂+ uvf₃+ u(1 − v)f₄]L, here u is the horizontal interpolation coefficient, and v is the vertical one, u = u1/δ, v = v1/δ. Obviously, the weight coefficient of voxel 1, 2, 3, and 4 is computed as a_i₁= (1 − u)(1 − v)L, a_i₂= (1 − u)vL, a_i₃= uvL, and a_i₄= u(1 − v)L, respectively.

Given the system matrix A , the unknown image f can be reconstructed from the measured projection data p . The ART algorithm solves (1) by an iterative procedure, whose formula is expressed as:

f_{j}^{(k + 1)} = f_{j}^{(k)} + λ \frac{p_{i} - \sum_{n = 1}^{N} a_{i n} f_{n}^{(k)}}{\sum_{n = 1}^{N} a_{i n}^{2}} a_{i j}

(2)

where k denotes the iteration number, i = k mod M + 1, and λ denotes a real relaxation parameter in (0, 2). Note that f⁽⁰⁾ is an initial guess of image f, which is usually initialized to a zero vector.

The ART algorithm is capable of achieving better image quality than traditional analytical methods with incomplete and noisy data. However, for sparse-view projection data, the reconstruction problem is significantly ill-posed, which usually leads to severe artifacts. The compressed sensing (CS) theory makes it possible to reconstruct high-quality images with sparse-view projection data by utilizing TV regularization. The TV-based CT reconstruction method from sparse view projection can be described as:

min ‖ f ‖_{TV}, \begin{matrix}  \end{matrix} s u b j e c t t o A f = p

(3)

For image f , the three-dimensional TV norm is computed as:

‖ f ‖_{TV} = \sum_{i = 1, j = 1, k = 1}^{I} \sqrt{{(f_{i, j, k} - f_{i - 1, j, k})}^{2} + {(f_{i, j, k} - f_{i, j - 1, k})}^{2} + {(f_{i, j, k} - f_{i, j, k - 1})}^{2}}

(4)

where f_i_,j,k is the image value at voxel (i,j,k). In this work, the minimization of total variation is solved by the gradient descent method. We have

f^{w, m + 1} = f^{w, m} - α d v / ‖ v ‖_{2}, d = ‖ f^{w} - f^{w - 1, N_{T V}} ‖_{2}

(5)

where w is the iteration number of ART reconstruction, m is that of TV minimization, N_TV is the total iteration number of TV minimization, a is an acceleration factor, v represents the derivative of TV norm

‖ f ‖_{TV}

, and d is the L2 norm of image

f^{w}

and

f^{w - 1, N_{T V}}

As discussed above, the ART-TV includes two computation procedures: the ART algorithm is firstly used to reconstruct image f and guarantee the data fidelity, and then the TV regularization is implemented on the reconstructed image. The two procedures are carried out repeatedly and alternately until convergence.

Computation of the system matrix using symmetry

As can be seen from (2), the computation and storage of the system matrix elements are crucial for fast reconstruction of ART. For cone-beam CT system, both M and N are extremely large, hence it is infeasible to pre-compute and store the nonzero weight coefficients. In practice, they are usually computed on the fly. Recently, we proposed an efficient method for the computation of 3D Joseph's projection.³³ Based on this algorithm, we will exploit the geometric symmetry of the cone-beam rays and further improve the reconstruction speed.

For convenience, the one dimensional image index n can also be represented by three indices: i, j, and k, which correspond to the row, column, and layer index of f_n, respectively. Let K be the value of I × I, which represents the total number of voxels within one layer. Thus, n is equal to k × K + i × I + j.

Let S(S_x, S_y, S_z) be the X-ray source point, E(E_x, E_y, E_z) be the center point of a pixel on the 2D detector plane, and (s,t) be the local coordinates of E. The t-axis of the detector is parallel to the z-axis. Suppose |E_x−S_x| is larger than both |E_y−S_y| and |E_z−S_z|. All voxels that have the identical column index j are located on one plane, which is parallel to the yoz-plane (referred to as the jth yz-plane). As shown in Figure 2, suppose SE intersects with the jth yz-plane at point A, whose coordinates are (x,y,z). Then, the indices of the four neighbouring voxels around A, i.e., (i,j,k), (i,j,k−1), (i−1,j,k−1), and (i−1,j,k), as well as the corresponding interpolation coefficients u and v can be computed by its coordinates. Let F(s,-t) be the symmetric point of E(s,t) about the s-axis on the detector. Because SO is always perpendicular to the detector, SE and SF are symmetrical about the xoy-plane. Then, SF will intersects the jth yz-plane at point B, whose coordinates must be (x,y,-z). Thus, according to the symmetry the indices of the four neighbouring voxels around point B must be (i,j,h), (i−1,j,h), (i−1,j,h + 1), and (i,j,h + 1), where h = I + 1 − k. In addition, the interpolation coefficients of point B must be u and 1 − v, respectively. Therefore, if we have computed the system matrix elements of ray SE, we can easily obtain those elements for its symmetry ray SF, which will further reduce the computation cost.

Figure 2.

Computation of weight coefficients using symmetry.

Based on the above discussion, we will give a simplified description of our method. Let m_xy and m_z be the slope of the projection lines of SE on the xoy-plane and yoz-plane, respectively, where m_xy= (E_y−S_y)/(E_x−S_x), m_z= (E_z−S_z)/(E_x−S_x). Suppose 0 ≤ m_xy ≤ 1 and m_z ≥ 0. First, we compute the initial four voxel indices as well as their interpolation coefficients u and v. Then, for the next yz-plane, u and v will increase m_xy and m_z, respectively, i.e., u = u + m_xy, v = v + m_z. According to the updated values of u and v, we can determine the four new voxels that have contributions to ray SE. Once u or v is larger than 1, it will be decreased by 1. This process is repeated until any of the voxel indices is out of range.

For ray SE, we define an array a to save those values of the one dimensional voxel indices, and array b to save the corresponding weight coefficients. However, for its symmetry ray SF, we just use array c to save the values of voxel indices. The total number of computed voxel indices of SE is represented by variable r. Algorithm 1 summarizes the pseudocode of system matrix computation for two symmetry rays.

Algorithm 1.

System matrix computation for two symmetry rays

1: Δx←E_x−S_x, Δy←E_y−S_y, Δz←E_z−S_z, m_xy¬Dy /Δx, m_z¬Dz/Δx,

L \leftarrow δ \sqrt{Δ x^{2} + Δ y^{2} + Δ z^{2}} / | Δ x |

, r←0.

2: Compute the initial voxel indices i, j, k, and the corresponding interpolation coefficient u and v, n1←k × K + i × I + j, n2←(I + 1 − k) × K + i × I + j.

3: while i ≥ 2 and j ≤ I and k ≥ 2 do

4: a[r]← n1, b[r]←(1 − u) × (1 − v) × L, c[r]← n2, r←r + 1

5: a[r]← n1 − I, b[r]←u × (1 − v) × L, c[r]← n2 − I, r←r + 1

6: a[r]← n1 − K, b[r]←v × (1 − u) × L, c[r]← n2 + K, r←r + 1

7: a[r]← n1 − K − I, b[r]←u × v × L, c[r]← n2 + K − I, r←r + 1

8: u←u + m_xy, v←v + m_z

9: if v ≥ 1 then v←v−1, k←k−1, n1 ←n1 − K, n2 ←n2 + K end if

10: if u ≥ 1 then u←u−1, i←i−1, n1←n1 − I, n2←n2 − I end if

11: j←j + 1, n1←n1 + 1, n2←n2 + 1

12: end while

13: return r

It should be noted that, in a practical CT system, the detector center usually does not align with the rotational center and the X-ray focal spot. Moreover, the projection lines do not always intersect exactly at the centers of the detector bins. These misalignments can introduce inaccuracies in the reconstruction of symmetric ray pairs. To ensure that the actual CT data are suitable for our symmetry computation, we can construct a virtual detector with the same size and resolution as the real one, whose center aligns perfectly with both the rotational center and the X-ray focal spot. The corresponding projection data of this idealized virtual detector are generated by applying translation and bilinear interpolation operations to the original projection data according to its horizontal and vertical offsets.

Multi-thread acceleration of ART

The ART algorithm consists of three main loops. The middle loop runs over ϕ projection views. Although the projection views can be accessed in different schemes, this loop is impracticable to be parallelized due to its sequential execution. The inner loop runs over T = N_c × N_r rays to perform the ART reconstruction for a given view. Intuitively, we can simply assign T threads to implement this loop. However, since one voxel may be passed by several adjacent rays within a projection view, such method will lead to access conflict when updating the same voxel value by these adjacent rays at the same time. To address the problem of access conflict, we partition the reconstruction procedure of ART into a set of sub-procedures, each of which is executed by an individual CPU thread. Within one thread, the sub-procedure is performed in sequence according to the ray indices. Considering the symmetry, the reconstruction procedure is partitioned by evenly dividing the half detector plane into N_t sub-procedures. Each sub-procedure has N_r/2N_t rows of rays, which is corresponding to a thread. Therefore, there are T/2N_t rays to be processed in one thread. Currently, several types of parallel programming models, such as Pthreads, OpenMP, and Win32 API, have been developed for multithreading programming on shared memory multi-core systems.^44–46 Among these models, the OpenMP offers us full control over parallelization. Moreover, it is particularly designed for loop-based data parallelism. Therefore, we use the OpenMP programming model to accelerate our symmetry-based ART. Algorithm 2 summarizes the pseudocode of multi-thread accelerated ART.

Algorithm 2.

Multi-thread accelerated ART

1: Initialization: f ⁽⁰⁾ ←0, w←0, ε, λ,

2: while

‖ A f - p ‖_{2} > ε

do //the outer loop

3: for each k ∈ [1, ϕ] do //the middle loop

4: Load data {p_i } of projection view k

5: #pragma omp parallel sections {

6: #pragma omp section //Begin of sub-procedure 1

7: for each i ∈ [1, T/2N _t ] do //the inner loop

8: Calculate array a,b for ray i, and c for its symmetry ray i^’.

9: s1←0, s2←0, s3←0

10: for each j ∈ [1, r] do // forward projection

11: s1←s1 + b[j] × f[a[j]], s2←s2 + b[j] × f[c[j]], s3←s3 + b[j] × b[j]

12: end for // r is the number of valid elements in array a

13: for each j ∈ [1, r] do // back projection

14: f[a[j]] + = λb[j] × (p_i−t1)/s3, f[c[j]] + = λb[j] × (p_i’−t2)/s3

15: end for //end of the inner loop

16: end for //end of sub-procedure 1

17: #pragma omp section // begin of sub-procedure 2

18: for each i ∈ [T/2N _t , T/N _t ] do

19: Perform sub-procedure 2 similar to lines 8–15

20: end for

21: …… // begin other sub-procedures

22: #pragma omp section // begin of sub-procedure N_t

23: for each i ∈ [(N _t −1) T/2N _t , T/2] do

24: Perform sub-procedure N_t similar to lines 8–15

25: end for

26: }

27: end for // end of the middle loop

28: end while // end of the outer loop

GPU acceleration of the TV minimization

In this section, we describe the GPU acceleration of the TV minimization in detail. As introduced above, the program flow of TV minimization can be presented in Algorithm 3 .

Algorithm 3.

TV minimization

1: a←0.36, d ←0, ε ←10⁻⁸

2: for each i ∈ [1, N] do d ←d + (f1[i]-f[i]) × (f1[i]-f[i]) end for

3: d ← sqrt(d)

4: for each m ∈ [1, N_TV] do

5: c ←0

6: Compute the elements of image v ,

7: for each j ∈ [1, N] do c ← c + v[j] × v[j] end for

8: c ← sqrt(c)

9: for each j ∈ [1, N] do f[j] ← f[j] – a × d × v[j]/c end for

10: end for

11: for each i ∈ [1, N] do f1[i] ← f[i] end for

In Algorithm 3, array f1[1..N] stores the TV regularized image of previous iteration, and array f [1..N] stores the currently reconstructed image by ART algorithm. The first step (lines 2–3) is to compute the L2 norm distance from f1 and f, which is implemented on CPU. The second step (line 6) is to compute the elements of v. For this purpose, we allocate two arrays f2[1..N] and f3[1..N] on GPU's global memory to store the values of array f and elements of image v, respectively. Then we design the first CUDA kernel with I × I threads, each of which runs a loop on k (1 ≤ k ≤ I) to compute the elements of array f3. The second kernel is to compute the L2 norm of image v (lines 7–8). In this kernel, we specify BN blocks, and TN threads within one block. In each thread, we sum the squares of elements of array f3 and store them in a shared memory array s[1..TN]. Then, we use reduction method to sum all the element values to s[1]. For each block, we store s[1] to a global memory array g[1.. BN]. Finally, we transfer the GPU array g to a CPU array q[1.. BN]. Using array q we can easily compute the L2 norm of image v. Algorithm 4 summarizes the pseudocode of the second kernel.

Algorithm 4.

Computation of the L2 norm of image v

1: tid←threadIdx.x, bid ←blockIdx.x, offset←TN /2, s[1] ←0

2: for i← bid × TN to N do

3: s[1] ← s[1] + f3[i] × f3[i], i← i + BN × TN

4: end for

5: __syncthreads() // Thread Synchronization

6: while offset > 0 do

7: if tid < offset then s[tid] ← s[tid + offset] end if

8: offset← offset / 2

9: __syncthreads()

10: end while

11: if tid = 1 then g[bid] ← s[1] end if

12: Transfer array g to CPU array q, norm← 0

13: for i←1 to BN do norm ← norm + q[i] end for

14: norm ← sqrt(norm)

15: return norm

The final kernel is to update image f2 (line 9 in Algorithm 3). Similar to the first kernel, we specify I × I threads, and each thread runs I loops to update image f according to (5). Having launched the third kernel, the updated image f2 is then transferred to image f. Algorithm 5 summarizes the pseudocode for acceleration of TV minimization with GPU.

Algorithm 5.

Acceleration of TV minimization with GPU

1: a←0.36, d ←0, ε ←10⁻⁸

2: for each i ∈ [1, N] do

3: d ← d + (f1[i]-f[i]) × (f1[i]-f[i])

4: end for

5: for each m ∈ to [1, N_TV] do

6: Transfer data from CPU array f to GPU array f2

7: Launch the first kernel and store the results to GPU array f3

8: Obtain the L2 norm c of array f3 using Algorithm 4

9: Launch the third kernel and update array f2

10: end for 11: Transfer data from GPU array f2 to CPU array f

12: Transfer data from f to CPU array f1

Experiment and results

The codes of ART-TV algorithm were written in ANSI-C and compiled in Visual Studio 2013. OpenMP 3.0 and CUDA 8.0 runtime API were utilized for multi-threading and GPU acceleration, respectively. We evaluated the performance of our method on a workstation equipped with an Intel 10-core, 3.30 GHz CPU. The CUDA code for GPU acceleration was executed on a GeForce GTX 1080 Ti GPU with 11 GB on-board memory.

Accuracy of Joseph's projection based ART-TV

We simulated a cone-beam flat-detector CT system. The source-to-origin distance was 780 mm, and the source-to-detector distance was 1040 mm. The flat-detector had N_c × N_r= 512 × 512 detector bins, with a bin size of 0.256 mm × 0.256 mm. We used a three-dimensional extension of the Shepp-Logan phantom (SLP) for the following evaluations.⁴⁷ To assess our method with different levels of sparsity of the projections, the view number ϕ was set to 45, 60, 90, and 180 during the experiments.

To evaluate the effectiveness of the proposed method, the 3D SLP was reconstructed using three methods: traditional Siddon's projection-based ART and ART-TV, and our proposed Joseph's projection-based ART-TV (named ART-TV-J). The reconstructed images were 512 × 512 × 512 voxels with a spacing of 0.192 mm × 0.192 mm × 0.192 mm. In the reconstruction procedure, the relaxation factor λ, the acceleration factor a, and N_TV were set to 0.2, 0.36, and 20, respectively. For estimating the accuracy of the reconstruction, we used the measure of normalized root mean square (NRMS) in our experiments. The NRMS error is defined as:

NRMS = \sqrt{\sum_{i = 1}^{N} {[f_{i} - r_{i}]}^{2} / \sum_{i = 1}^{N} {[f_{i} - f_{0}]}^{2}}

(6)

where f is the original image voxelized from the 3D SLP, f₀ is the mean value of all voxel values of image f, r is the reconstructed image.

Figure 3 presents plots of the NRMS errors versus iteration numbers, where the NRMS errors were computed using the 196^th transversal slices obtained by the three methods from different numbers of views. As illustrated in Figure 3, as the number of iterations increases, all three methods gradually converge; however, our ART-TV-J algorithm converges much faster than the traditional ART and ART-TV algorithms. Furthermore, for 45 projection views, the NRMS errors of ART-TV-J are much lower than those of ART and ART-TV. These results indicate that ART-TV-J can significantly improve reconstruction quality compared to the traditional ART-TV algorithm, particularly for sparse-view reconstructions.

Figure 3.

Plots of the NRMS errors versus iteration numbers using ART, ART-TV, and ART-TV-J from different views. (a) 45 views; (b) 60 views; (c) 90 views; (d) 180 views.

Figure 4 shows the reconstructed SLP images of the 196^th transversal slices using three methods after 10 iterations. For reconstructions from 45 projection views, ART exhibits obvious artifacts, whereas both ART-TV and ART-TV-J produce more satisfactory results. To better compare these methods, we selected regions of interest (ROIs) within the reconstructed images. The ROI includes part of the edge of the SLP, marked by a small red rectangle. The corresponding magnified ROI is indicated by a larger red rectangle within the same image. As can been seen from the magnified ROIs, the reconstructed edges using ART-TV-J are clearer than those produced by ART-TV and ART. Furthermore, the reconstructions of ART-TV-J appear slightly better than those of traditional ART-TV, especially for sparse views reconstruction. However, for larger numbers of projection views, little difference can be observed between the reconstructions of ART-TV and ART-TV-J.

Figure 4.

Reconstruction results of the 3D SLP. The first to third row shows the reconstructions using ART, ART-TV, and ART-TV-J, respectively. The first to fourth column shows the reconstructions from 45, 60, 90, and 180 projection views, respectively.

Efficiency of the symmetry-based ART-TV-J

To evaluate our symmetry-based system matrix computation for Joseph's projection, we used this method to perform the ART-TV-J algorithm (named ART-TV-symJ), and made further comparison with ART-TV-J. For implementation of the ART-TV-J, we used our previously proposed Joseph's projection method (see²⁸). We reconstructed the 3D SLP from 60 projections with 10 iterations. Table 1 lists the average reconstruction time for one iteration of the two methods. As shown in Table 1, the computation time for system matrix of Joseph's projection is reduced from 66 s to 39 s by using the symmetry. In the forward projection, the loop of ray indices just runs from 1 to N_r/2, thus the forward projection of two symmetric rays can be performed simultaneously in the same loop. Consequently, the computation time of the forward projection is saved about 21 s. Since the denominator in (2) can be computed only once for two symmetric rays, the back projection procedure is further improved. Overall, the reconstruction time of ART-TV-J is reduced about 56 s for one iteration by using our method.

Table 1.

Reconstruction time for one iteration using two methods (s).

Methods	System matrix	Forward projection	Back projection	TV minimization	Total time
ART-TV-J	66.241	92.337	89.861	140.562	389.001
ART-TV-symJ	39.045	71.078	82.154	140.562	332.839

Table 2 lists the NRMS errors of the reconstructed transversal slices using the two methods after 10 iterations. As shown in Table 2, the NRMS values of the 196^th slices reconstructed by the two methods are completely identical. However, little difference can be found from those of the 257^th and 316^th slices. The reason is that the reconstruction values of ART is related to the access sequence of the projection rays. In our method, the projection rays are accessed according to their row indices on the detector plane. For the slices above xoy-plane, such as 196^th slice, both ART-TV-J and ART-TV-symJ access the projection rays from row 1 to N_r/2, hence these slices have completely identical reconstruction values. However, for the slices below xoy-plane, such as the 257^th and the 316^th slices, the ART-TV-J accesses the projection rays from row N_r/2 + 1 to N_r, while the ART-TV-J accesses them from row N_r to N_r/2 + 1. Therefore, there exists very small difference between slices reconstructed by the two methods below xoy-plane. The results show that our symmetry based projection computation method is accuracy.

Table 2.

NRMS errors of the two methods.

Methods	196^th slices	257^th slices	316^th slices
ART-TV-J	0.13562	0.16365	0.15066
ART-TV-symJ	0.13562	0.16390	0.15075

Figure 5 shows the reconstructed images and corresponding profile plots of the 196^th, 257^th, and 316^th transversal slices using the two methods after 10 iterations. Visually, the images reconstructed by ART-TV-J and ART-TV-symJ are nearly indistinguishable. As shown in the profile plots, the profiles of the two methods almost coincide with each other. These results further indicate the accuracy of ART-TV-symJ.

Figure 5.

Reconstructions of the 3D SLP and the profile plots across the lines indicated in the reconstructed slices. From top to bottom: the 196^th, 257^th, and 316^th transversal slices. The first and second columns show the reconstructions using ART-TV-J and ART-TV-symJ, the third column shows the corresponding profile plots.

Comparisons with other TV-based methods

To further evaluate the effectiveness of our ART-TV-symJ, we conducted more detailed comparisons with other TV-based reconstruction methods under very sparse view conditions. For this evaluation, we reconstructed the 3D SLP from 45 projections using the EM-TV algorithm, the awHOTV constrained ART (ART-awHOTV), the awTV constrained ART (ART-awTV), and ART-TV-symJ, respectively. It should be noted that, for the other three methods, Siddon's projection was applied to compute the system matrix. Figure 6 displays the original phantom and the reconstructed images using different methods with 10 iterations. The chosen ROI contains an ellipse, indicated by a small red rectangle. As shown in Figure 6(b), the reconstruction of EM-TV is very blurry, making it difficult to discern the ellipse both in the ROI and its magnified one, due to the slow convergence of the EM algorithm. Figure 6(c) demonstrates that the reconstruction of ART-awHOTV is somewhat clearer than that of EM-TV. However, streak artifacts can be observed in the result. This is because that the SLP is piecewise constant, while awHOTV is better suited for phantoms with fluctuating features. As can be seen from Figure 6(d) and (e), particularly in their magnified ROIs, the reconstruction of ART-TV-symJ is slightly better than that of ART-awTV. Overall, the reconstruction quality of ART-TV-symJ surpasses that of EM-TV, ART-awHOTV, and ART-awTV.

Figure 6.

Original phantom (a) and reconstructed images of the 219^th transversal slice of the 3D SLP using different methods: (b) EM-TV, (c) ART-awHOTV, (d) ART-awTV, and (e) ART-TV-symJ.

For a more intuitive comparison, we plotted the profiles of the reconstructed images in Figure 7, where (a) corresponds to images from Figure 6, and (b) shows the local magnified profiles within the red rectangular region of (a). As shown in Figure 7(b), it is clear that the profile of ART-TV-symJ is the closest to that of the original phantom among these algorithms. Additionally, jagged edges are noticeable in the profile of ART-awHOTV.

Figure 7.

Profiles comparison of the images in Figure 6. (a) Profiles along the central vertical lines in Figure 6; (b) the magnified profiles within the red rectangular region of (a).

In addition to visual comparisons, the reconstruction quality is quantitatively evaluated using the NRMS measure. Table 3 lists the NRMS errors of the reconstructed images for different numbers of iterations. It is clear that the NRMS errors of ART-TV-symJ are the lowest across all iteration numbers. After ten iterations, the NRMS error of our method is reduced by 19.9% compared to ART-awTV, and by 55.2% compared to the EM-TV algorithm. These results further verify the accuracy of our ART-TV-symJ algorithm.

Table 3.

NRMS errors of different methods for different iterations.

Number of iterations	1	2	3	4	5	6	7	8	9	10
EM-TV	0.82552	0.68765	0.60326	0.54581	0.50197	0.46616	0.43547	0.40841	0.38451	0.36358
ART-awTV	0.50414	0.38271	0.33504	0.30398	0.27960	0.25943	0.24228	0.22751	0.21464	0.20335
ART-awHOTV	0.44691	0.36092	0.31933	0.29234	0.27255	0.25709	0.24451	0.23401	0.22507	0.21737
ART-TV-symJ	0.42262	0.33007	0.28545	0.25379	0.22950	0.21028	0.19480	0.18213	0.17164	0.16287

To evaluate the efficiency of ART-TV-symJ, we compared the time performance of the above methods. Since most TV-based methods involve two primary operations: iterative reconstruction and TV minimization, we measured the execution time of these two operations separately. Given that the EM algorithm updates each voxel by considering all intersecting rays, the reconstruction process using EM is particularly time-consuming. To speed up this process, we employed two look-up tables to store the relevant correction terms for each voxel. Table 4 presents the time performance comparisons of the various methods. Despite this acceleration, the EM implementation remains computationally intensive. In contrast, the reconstruction process of ART-TV-symJ is 3.2 times faster than that of EM-TV. For more complex TV models, the computational cost increases significantly compared to traditional TV minimization. Consequently, for a single iteration, the total implementation time of ART-TV-symJ is 2.4 times faster than that of ART-awTV and 3 times faster than that of ART-awHOTV. These results further demonstrate that ART-TV-symJ exhibits superior efficiency.

Table 4.

Time performance comparisons of a single iteration (s).

	EM-TV	ART-awTV	ART-awHOTV	ART-TV-symJ
Reconstruction	469.186	429.564	429.566	144.659
TV minimization	140.562	277.934	427.309	140.562
Total	609.748	707.498	856.875	285.221

Efficiency and accuracy of the parallel scheme

To evaluate our parallel reconstruction method, we utilized ART-TV-J and ART-TV-symJ to reconstruct the 3D SLP from 60 projections with 10 iterations. The ART algorithm was executed on the CPU using varying numbers of threads, while the TV minimization was implemented on the GPU using our designed CUDA kernels. Given the number of cores in our test CPU, the number of threads was capped at 10 in our experiments. For the first and third kernels, the size of block was set to 16 × 16, and that of grid was set to 32 × 32. For the second kernel, the block was set to 1024, and the grid was set to 512.

Table 5 lists the average reconstruction time for one iteration of the two methods. In the case of single-threaded reconstruction, only the TV minimization procedure is accelerated. Therefore, as can be seen from Tables 1 and 5, the TV operation is reduced from 140 s to 2.7 s by using GPU acceleration. Compared with the performance of single-threaded CPU, the TV operation is improved about 51 times. When using two threads, the reconstruction time for ART is improved by about 1.9 times. Clearly, the more threads utilized, the greater the reduction in reconstruction time. With ten threads for ART-TV-symJ, the reconstruction time is improved by approximately 14 times compared to ART-TV-J implemented on the CPU. The results demonstrate that the proposed multi-threaded and GPU acceleration method is highly efficient.

Table 5.

Reconstruction time for a single iteration with different numbers of threads (s).

Methods	Number of threads
Methods	1	2	3	4	5	6	7	8	9	10
ART-TV-J	251.735	130.186	89.772	68.779	56.267	47.927	42.518	37.362	34.662	32.741
ART-TV-symJ	195.021	102.503	71.337	55.143	45.427	39.356	34.515	30.146	28.281	26.212

Table 6 lists the NRMS errors of the 196^th transversal slices achieved by ART-TV-J and ART-TV-symJ. The iteration number was set to 10, and the number of threads varied from 1 to 10. As can be seen from Table 6, the accelerated ART-TV-J and ART-TV-symJ methods maintain the accuracy of reconstruction, which further demonstrates the effectiveness of our hybrid parallel scheme.

Table 6.

NRMS errors with different numbers of threads.

Methods	Number of threads
Methods	1	2	3	4	5	6	7	8	9	10
ART-TV-J	0.13562	0.13562	0.13562	0.13562	0.13552	0.13562	0.13562	0.13562	0.13562	0.13552
ART-TV-symJ	0.13562	0.13562	0.13562	0.13562	0.13552	0.13562	0.13560	0.13562	0.135501	0.13551

Figure 8 shows the reconstructed 196^th transversal slices using ART-TV-J and ART-TV-symJ. As shown in Figure 8, the reconstructed images are almost identical when accelerated using different numbers of threads. The results indicate that our method, which combines multi-threading with GPU acceleration for the ART-TV-symJ, is highly accurate.

Figure 8.

Reconstructed images of the SLP by two methods using multi-thread and GPU acceleration. The first to fourth column shows the reconstructions using 1, 4, 8, and 10 threads, respectively. The first (second) row shows results achieved by the ART-TV-J (ART-TV-smyJ) algorithm.

Real data study

Since the efficiency of the ART-TV-symJ has been assessed in detail, in this sub-section, we focus on evaluating its accuracy for real CT projection data. The projections were obtained by scanning a physical phantom on a real cone-beam CT scanner. The phantom is a ceramic teapot, which has almost one single material. The source-to-origin distance was 1080 mm, and the source-to-detector distance was 1950 mm. The flat detector had a resolution of 2048 × 2048 bins with a pixel size of 0.2 mm × 0.2 mm. The reconstruction image was 512 × 512 × 512 with a voxel size of 0.443 mm × 0.443 mm × 0.443 mm. A total of 600 projections were acquired with equal angular spacing over 360° rotation. For comparison, we reconstructed the teapot using the FDK, ART-TV, and ART-TV-symJ methods. For traditional ART-TV method, Siddon's forward projection was used. The FDK algorithm was implemented using all projections, whose reconstruction results can be considered as the reference for comparison. Both ART-TV and ART-TV-symJ were implemented using a subset of 40 equally spaced projections with 20 iterations.

Figure 9 presents the reconstructed images of the teapot using these methods. Specifically, it displays the 246^th transversal slice and the 210^th coronal slice. As shown in Figure 9, the reconstructed images from these methods appear virtually identical, indicating that both ART-TV and ART-TV-symJ are highly effective in reconstructing single-material objects from very sparse view projections. For further comparison, we plotted the profiles (marked by red lines in Figure 9) in Figure 10. As depicted in Figure 10(a) and (c), the profiles of ART-TV and ART-TV-symJ are nearly coincident. However, as shown in Figure 10(b) and (d), the profiles of ART-TV-symJ exhibit greater consistency with those of the FDK method.

Figure 9.

Comparison of reconstruction results of the ceramic teapot. The first (second) row shows the 246^th transversal (210^th coronal) slices. The first to third column shows the images reconstructed using FDK, ART-TV, and our multi-thread and GPU accelerated ART-TV-smyJ.

Figure 10.

Profiles comparison of the images in Fig. 9. (a) Profiles along the 200^th horizontal lines of the Figs. 9(a), (b), and (c); (b) the magnified profiles within the blue rectangular region of (a); (c) Profiles along the 360^th horizontal lines of the Figs. 9(d), (e), and (f); (d) the magnified profiles within the blue rectangular region of (c).

The teapot phantom features both a relatively simple surface structure and material composition. To evaluate our method with more complex phantoms, we scanned a metal bottle using 1080 equally spaced projections on the same CT scanner. The metal bottle has intricate design elements, including two dragon-shaped ears and carved texture patterns on its surface. Additionally, it is composed of a variety of materials. The source-to-origin distance was 1949 mm, and the source-to-detector distance was 2221 mm. The reconstruction image was 512 × 512 × 512 with a voxel size of 0.7024 mm³. We reconstructed the metal bottle phantom using the three aforementioned methods. For FDK, all 1080 projections were utilized. In contrast, the ART-TV and ART-TV-symJ reconstructions used only 40 equally spaced projections, each with 20 iterations.

Figure 11 displays the reconstructed 246^th transversal and 235^th coronal slices of the metal bottle, with corresponding profiles plotted in Figure 12. As shown in Figure 11, the contours of images reconstructed using FDK are clearer compared to those obtained using ART-TV and ART-TV-symJ. Also, the reconstructions from ART-TV and ART-TV-symJ exhibit smoother images as well as several artifacts. We believe that increasing the number of iterations would further improve the reconstruction quality of both ART-TV and ART-TV-symJ methods. As illustrated in the magnified profiles of Figure 12(b) and (d), the reconstruction quality of ART-TV-symJ is superior to that of traditional ART-TV. These results further demonstrate that our ART-TV-symJ method remains effective for more complex phantoms with real CT projection data.

Figure 11.

Comparison of reconstruction results of the metal bottle. The first (second) row shows the 256^th transversal (235^th coronal) slices. The first to third column shows the images reconstructed using FDK, ART-TV, and ART-TV-smyJ.

Figure 12.

Profiles comparison of the images in Fig. 11. (a) Profiles along the 232^th horizontal lines of Figs. 11(a), (b), and (c); (b) magnified profiles within the blue rectangular region of (a); (c) Profiles along the middle horizontal lines of Figs. 11(d), (e), and (f); (d) magnified profiles within the blue rectangular region of (c).

Conclusion

Improving the reconstruction speed has always been a research in the field of CT. We have presented an efficient and high-quality reconstruction scheme for cone-beam sparse-view CT based on traditional ART-TV algorithm. Existing ART-TV methods usually utilize Siddon's projection to implement ART. By using Joseph's projection, high-quality can be obtained for ART-TV. However, the ART-TV method is extremely time-consuming particularly for cone-beam CT system. For this reason, we propose an efficient method to compute the system matrix for Joseph's projection. Our method exploits the geometric symmetry of the rays in cone-beam geometry. By computing the system matrix elements for one ray, we could easily obtain the corresponding elements for its symmetric ray. On this basis, we present a hybrid parallelization scheme for fast implementation of ART-TV-symJ, which involves both multi-thread and GPU computations. The experimental results indicate that our method is efficient. We believe that the proposed scheme could be further applied to the modelling of cultural relic objects with very sparse view projections.

Note that, our method employs bilinear interpolation to generate the virtual projection data. While this process may sacrifice the spatial resolution of the projection data and further compromise the overall reconstruction quality to some extent. Moreover, our method depends heavily on the number of physical cores, which is a limitation for the acceleration of ART-TV-symJ. Compared with ART, the SART algorithm has good parallelism, which is suitable for GPU acceleration. Hence, our future work will study the total variation regularized cone-beam SART reconstruction.

Footnotes

ORCID iD

Shunli Zhang

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (Grant Number. 61772421).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Reh

Gusenbauer

Kastner

, et al. MObjects–A novel method for the visualization and interactive exploration of defects in industrial XCT data. IEEE trans Vis Comput Gr 2013; 19: 2906–2915.

Fan

Sun

. Detecting and evaluation of fatigue damage in concrete with industrial computed tomography technology. Constr Build Mater 2019; 223: 794–805.

Brancaccio

Bettuzzi

Casali

, et al. Real-time reconstruction for 3-D CT applied to large objects of cultural heritage. IEEE Trans Nucl Sci 2011; 58: 1864–1871.

Bossema

Coban

Kostenko

, et al. Integrating expert feedback on the spot in a time-efficient explorative CT scanning workflow for cultural heritage objects. J Cult Herit 2021; 49: 38–47.

Dambrogio

Ghassaei

Smith

, et al. Unlocking history through automated virtual unfolding of sealed documents imaged by X-ray microtomography. Nat Commun 2021; 12: 1184.

Bossema

Palenstijn

Heginbotham

, et al. Enabling 3D CT-scanning of cultural heritage objects using only in-house 2D X-ray equipment in museums. Nat Commun 2024; 15: 3939.

Feldkamp

Davis

Kress

. Practical cone-beam algorithm. J Opt Soc Am A 1984; 1: 612–619.

Niu

Gao

Bian

, et al. Sparse-view x-ray CT reconstruction via total generalized variation regularization. Phys Med Biol 2014; 59: 2997.

Kim

Worstell

, et al. Sparse-view spectral CT reconstruction using spectral patch-based low-rank penalty. IEEE Trans Med Imag 2015; 34: 748–760.

10.

Han

. Framing U-Net via deep convolutional framelets: application to sparse-view CT. IEEE Trans Med Imag 2018; 37: 1418–1429.

11.

Cui

Yang

, et al. Generalized deep iterative reconstruction for sparse-view CT imaging. Phys Med Biol 2022; 67: 025005.

12.

Shu

Entezari

. Sparse-view and limited-angle CT reconstruction with untrained networks and deep image prior. Comput Meth Prog Bio 2022; 226: 107167.

13.

Gordon

Bender

Herman

. Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography. J Theor Biol 1970; 29: 471–481.

14.

Andersen

Kak

. Simultaneous algebraic reconstruction technique (SART): a superior implementation of the ART algorithm. Ultrasonic Imaging 1984; 6: 81–94.

15.

, et al. Total variation based iterative image reconstruction. Lecture Notes in Computer Science 2005; 3765: 526–534.

16.

Candes

Romberg

Tao

. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inform Theory 2006; 52: 489–509.

17.

Sidky

Kao

Pan

. Accurate image reconstruction from few-views and limited-angle data in divergent-beam CT. J X-ray Sci Technol 2006; 14: 119–139.

18.

Sidky

Pan

. Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization. Phys Med Biol 2008; 53: 4777–4807.

19.

Hsieh

Huang

Hsieh

, et al. Compressed sensing based CT reconstruction algorithm combined with modified Canny edge detection. Phys Med Biol 2018; 63: 155011.

20.

. A fast-iterative reconstruction algorithm for sparse angle CT based on compressed sensing. Future Gener Comp Sy 2022; 126: 289–294.

21.

Liu

Fan

, et al. Adaptive-weighted total variation minimization for sparse data toward low-dose X-ray computed tomography image reconstruction. Phys Med Biol 2012; 57: 7923–7956.

22.

Chen

Jin

, et al. A limited-angle CT reconstruction method based on anisotropic TV minimization. Phys Med Biol 2013; 58: 2119–2141.

23.

Song

Wang

Liu

. A nonlinear weighted anisotropic total variation regularization for electrical impedance tomography. IEEE Trans Instrum Meas 2022; 71: 1–13.

24.

Zhou

, et al. Adaptive-weighted high order TV algorithm for sparse-view CT reconstruction. Med Phys 2023; 50: 5568–5584.

25.

Ertas

Yildirim

Kamasak

, et al. Iterative image reconstruction using non-local means with total variation from insufficient projection data. J X-ray Sci Technol 2016; 24: 1–8.

26.

Zhang

Qiang

. Fast parallel implementation for total variation constrained algebraic reconstruction technique. J X-ray Sci Technol 2022; 30: 737–750.

27.

Zhang

Yue

Chen

, et al. Industrial computed tomography for three-dimensional cultural relic model reconstruction based on L1-αL2 + TV norm minimization. Measurement 2024; 225: 114057.

28.

Zhang

Tuo

, et al. Fast algorithm for Joseph’s forward projection in iterative computed tomography reconstruction. J Amb Intel Hum Comp 2023; 14: 12535–12548.

29.

LeCun

Bengio

Hinton

. Deep learning. Nature 2015; 521: 436–444.

30.

Gharbi

Adams

, et al. Differentiable programming for image processing and deep learning in Halide. ACM Tran Graphics 2018; 37: 1–13.

31.

Hatt

Parmar

, et al. Machine (deep) learning methods for image processing and radiomics. IEEE Trans Radiat Plasma 2019; 3: 104–108.

32.

Monga

Eldar

. Algorithm unrolling: interpretable, efficient deep learning for signal and image processing. IEEE Signal Proc Mag 2021; 38: 18–44.

33.

Salvi

Acharya

Molinari

, et al. The impact of pre-and post-image processing techniques on deep learning frameworks: a comprehensive review for digital pathology image analysis. Comput Biol Med 2021; 128: 104129.

34.

Zhang

Liang

Dong

, et al. A sparse-view CT reconstruction method based on combination of DenseNet and deconvolution. IEEE Trans Med Imag 2018; 37: 1407–1417.

35.

Wang

, et al. DDPTransformer: dual-domain with parallel transformer network for sparse view CT image reconstruction. IEEE Trans Comput Imag 2022; 8: 1101–1116.

36.

Zhou

Chen

Zhou

, et al. DuDoDR-Net: dual-domain data consistent recurrent network for simultaneous sparse view and metal artifact reduction in computed tomography. Med Image Anal 2022; 75: 102289.

37.

Zhang

Liu

, et al. MetaInv-Net: meta inversion network for sparse view CT image reconstruction. IEEE Trans Med Imag 2020; 40: 621–634.

38.

Chen

Cong

Vese

, et al. A hybrid architecture for compressive sensing 3-D CT reconstruction. IEEE J Em Sel Top C 2012; 2: 616–625.

39.

Liu

Lin

Man

, et al. GPU-based branchless distance-driven projection and backprojection. IEEE Trans Comput Imag 2017; 3: 617–632.

40.

Zhang

Geng

Zhao

. Fast parallel image reconstruction for cone-beam FDK algorithm. Concurr Comp-Pract E 2019; 31: e4697.

41.

Liu

Zeng

. Parallel SART algorithm of linear scan cone-beam CT for fixed pipeline. J X-ray Sci Technol 2009; 17: 221–232.

42.

Joseph

. An improved algorithm for reprojecting rays through pixel images. IEEE Trans Med Imag 1983; 1: 192–196.

43.

Zhang

Tuo

, et al. Iterative image reconstruction based on Köhler’s forward projection. J Amb Intel Hum Comp 2023; 14: 11469–11480.

44.

Kegel

Schellmann

Gorlatch

. Comparing programming models for medical imaging on multi-core systems. Concurr Comp-Pract E 2011; 23: 1051–1065.

45.

Zhong

Altun

Tian

, et al. Parallel protein secondary structure prediction schemes using Pthread and OpenMP over hyper-threading technology. J Supercomput 2007; 41: 1–16.

46.

Jones

Yao

Bhole

. Hybrid MPI-OpenMP programming for parallel OSEM PET reconstruction. IEEE Trans Nucl Sci 2006; 53: 2752–2758.

47.

Shepp

Logan

. The Fourier reconstruction of a head section. IEEE Trans Nucl Sci 1974; 21: 21–43.