Abstract
Computed tomography (CT) is capable of generating detailed cross-sectional images of the scanned objects non-destructively. So far, CT has become an increasingly vital tool for 3D modelling of cultural relics. Compressed sensing (CS)-based CT reconstruction algorithms, such as the algebraic reconstruction technique (ART) regularized by total variation (TV), enable accurate reconstructions from sparse-view data, which consequently reduces both scanning time and costs. However, the implementation of the ART-TV is considerably slow, particularly in cone-beam reconstruction. In this paper, we propose an efficient and high-quality scheme for cone-beam CT reconstruction based on the traditional ART-TV algorithm. Our scheme employs Joseph's projection method for the computation of the system matrix. By exploiting the geometric symmetry of the cone-beam rays, we are able to compute the weight coefficients of the system matrix for two symmetric rays simultaneously. We then employ multi-threading technology to speed up the reconstruction of ART, and utilize graphics processing units (GPUs) to accelerate the TV minimization. Experimental results demonstrate that, for a typical reconstruction of a 512 × 512 × 512 volume from 60 views of 512 × 512 projection images, our scheme achieves a speedup of 14 × compared to a single-threaded CPU implementation. Furthermore, high-quality reconstructions of ART-TV are obtained by using Joseph's projection compared with that using traditional Siddon's projection.
Keywords
Introduction
Computed Tomography (CT) is capable of generating detailed cross-sectional images of scanned objects non-destructively. Over the past decades, industrial CT has been widely used in a variety of applications.1–3 Nowadays, industrial CT has proven to be an effective tool for analyzing and modelling of cultural relics.4–6 To obtain high-quality images for cultural relic objects, complete projections are usually required for reconstruction with analytical methods such as FDK (Feldkamp Davis Kress). 7 This will increase not only the scan time, but also the data acquisition cost. One feasible way to address this problem is to scan the cultural relic objects with larger angular sampling intervals than normal. Such scanning mode, however, leads to severe artifacts in the reconstructions due to incomplete projection data. Hence, reconstructing high-quality images from sparse view projections has become an important issue in the field of CT reconstruction.8–12
Traditional iterative methods, such as the ART 13 and the simultaneous ART (SART), 14 have great advantages over analytical methods for reconstruction under incomplete projection data. However, the reconstructions with these iterative methods usually suffer from severe artifacts under sparse-view conditions. In 2005, Yu et al. introduced total variation (TV) constrained minimization in the optimization process of iterative reconstruction. 15 In 2006, Candes et al. introduced the compressed sensing (CS) theory, which showed that under certain assumptions, unknown signals could be accurately recovered from far fewer samples. 16 The CS theory provides a powerful tool for reconstructing images from sparse-view projections. Very soon after the introduction of CS theory, several researchers have reported their CS-based methods for reconstruction from insufficient projection data. Sidky et al. developed the ART-TV algorithm for fan-beam CT. 17 Then, Sidky and Pan minimized the TV algorithm by steepest descent with an adaptive step-size, and applied it to a circular cone-beam CT system. 18 To improve the reconstruction quality of CS-based methods, Hsieh et al. proposed a modified canny operator and applied it to the CS algorithm. 19 Wu proposed a fast-iterative CT reconstruction algorithm, 20 where the CS theory was applied and the physical and mathematical bases were analysed for the reconstruction.
Over the past decade, several groups of researchers have devoted their efforts to improving the CT reconstruction quality of TV-based approaches. Liu et al. 21 proposed an adaptive-weighted TV (AwTV) minimization algorithm for low-dose CT reconstruction with sparse-view projection. They demonstrated that their AwTV model can preserve fine structures and edges than the traditional TV model. Chen et al. 22 proposed a new anisotropic total variation (ATV) reconstruction approach to solve the limited-angle imaging problem. Song et al. 23 proposed a nonlinear weighted anisotro-pic TV regularization model to preserve the conductivity profiles for electrical impedance tomography by incorporating the internal inhomogeneity information. Xi et al. 24 proposed an adaptive-weighted high order total variation (awHOTV) algorithm, in which they constructed the second order TV-norm using the second order gradient. They claimed that the awHOTV algorithm can effectively suppress block artifacts and preserve image details. Ertas et al. 25 proposed an approach to reduce the artifacts due to insufficient projection data. This approach combines TV and non-local means process, which can obtain better reconstructions compared with traditional ART-TV and ART methods. Although these adapted TV models can improve the reconstruction quality to a certain extent, they will obviously increase the computational cost. Consequently, the reconstruction speed will be a prominent issue to be solved for cone-beam CT.
In previous work, we proposed an efficient method for computing the system matrix of Siddon's projection, 26 and then applied it to ART-TV. By using parallel computing technology, we realized fast implementation of ART-TV. However, this method only solves the system matrix computation for fan-beam geometries, which needs to be further extended for cone-beam CT geometries. More recently, we proposed a hybrid reconstruction model, 27 called L1-αL2-TV, for 3D cultural relics model reconstruction. The L1-αL2-TV model was solved by the split Bregman algorithm, which has been shown to achieve low-noise and high-quality reconstruction from cone-beam sparse view projections. However, since the split Bregman algorithm requires several intermediate variables of image size, it is difficult for this method to be performed for the reconstruction of high-resolution images. Moreover, the above methods use Siddon's projection, which is a relatively simple and less accurate reconstruction model. By contrast, Joseph's projection can provide better reconstruction quality with iterative methods by using linear interpolation. However, existing methods for the computation of Joseph's projection are not efficient, which precludes its practical use for reconstruction. To address this problem, we proposed an efficient method for computation of Joseph's projection. 28 We found that the reconstruction speed and quality can be improved compared with the most used Siddon's projection.
In recent years, deep learning has been successfully used in diverse image processing applications for challenging problems.29–33 To address the sparse-view CT problem, many researchers have paid their attention to deep learning-based approaches. Zhang et al. 34 proposed a DD-Net, which combines the DenseNet and Deconvolution-based network. Li et al. 35 proposed a dual-domain with parallel Transformer network (DDPTransformer) to solve the dual-domain CT reconstruction under sparse-view. Zhou et al. 36 proposed a dual-domain data consistent recurrent network, named DuDoDR-Net. The DuDoDR-Net reconstructs images from both recurrent image domain and sinogram domain restorations. Zhang et al. 37 proposed a meta-inversion network, named MetaInv-Net, where the number of trainable parameters is significantly reduced. Obviously, deep learning-based approaches have shown great potential for high-quality CT reconstruction under sparse-view projections. However, since the reconstruction results rely heavily on the training datasets, it is difficult for the deep learning-based reconstruction approaches to obtain satisfactory results from projection data with any level of sparsity. More importantly, most of these approaches are designed to solve the fan-beam spars-view CT problem.
To enhance the CT reconstruction speed, several parallel computing techniques have been utilized by researchers. Chen et al. 38 accelerated the TV regulated three-dimensional expectation maximization (EM) algorithm by a hybrid architecture using CPU, GPU, and FPGA. Liu et al. 39 utilized the texture memory and hardware interpolation on GPUs to a parallel branchless distance-driven algorithm for cone beam CT. Zhang et al. 40 proposed a fast method for parallel implementation of the FDK algorithm by using multi-GPU. Liu et al. 41 extended the SART algorithm to linear scan cone-beam CT and accelerated it using cluster computing. In addition to optimizing the reconstruction algorithms themselves, these parallel computing techniques offer effective means to further accelerate the cone-beam CT reconstructions.
In this paper, we focus on improving the reconstruction quality and speed against traditional ART-TV algorithm, and propose a fast and high-quality scheme for cone-beam CT reconstruction from spars-view projections. In our scheme, Joseph's projection is utilized for the forward projection operation of ART to improve the reconstruction quality. To accelerate the reconstruction of Joseph's projection based ART, we propose an efficient method to compute the elements of the system matrix for two symmetric rays simultaneously. We then use the multi-threading technology to further speed up the ART procedure, and the GPUs to accelerate the TV minimization processes. The major contributions of this paper are listed as follows:
An improved algorithm ART-TV-J is proposed for high-quality reconstruction from sparse-view projections. To improve the reconstruction speed of ART-TV-J, an efficient method is proposed to compute the system matrix for Joseph's projection using symmetry. To accelerate the implementation of ART-TV-symJ for cone-beam CT system, a hybrid parallel reconstruction scheme is proposed.
The organization of our paper is as follows. In Section 2, we briefly present Joseph's projection based ART-TV algorithm, and then describe the system matrix computation of Joseph's projection using symmetry as well as our parallel acceleration scheme in detail. Section 3 illustrates the experimental results for the evaluation of our method using both simulated and real CT projection data. Finally, conclusions and future work are presented in Section 4.
Methods
ART-TV algorithm
The reconstruction problem of cone-beam CT can be modeled as a linear algebra equation:

Illustration of weight coefficients of 3D Joseph's projection.
As shown in Figure 1, suppose the ith projection line intersects with a yz-plane at point A, whose four nearest voxels are 1, 2, 3, and 4. The slab length between two neighboring yz-planes is L. The horizontal and vertical distances from A to voxel 1 are u1 and v1, respectively. Then, the contributions of the four voxels can be computed by [(1 − u)(1 − v)f1 + (1 − u)vf2 + uvf3 + u(1 − v)f4]L, here u is the horizontal interpolation coefficient, and v is the vertical one, u = u1/δ, v = v1/δ. Obviously, the weight coefficient of voxel 1, 2, 3, and 4 is computed as ai1 = (1 − u)(1 − v)L, ai2 = (1 − u)vL, ai3 = uvL, and ai4 = u(1 − v)L, respectively.
Given the system matrix
The ART algorithm is capable of achieving better image quality than traditional analytical methods with incomplete and noisy data. However, for sparse-view projection data, the reconstruction problem is significantly ill-posed, which usually leads to severe artifacts. The compressed sensing (CS) theory makes it possible to reconstruct high-quality images with sparse-view projection data by utilizing TV regularization. The TV-based CT reconstruction method from sparse view projection can be described as:
As discussed above, the ART-TV includes two computation procedures: the ART algorithm is firstly used to reconstruct image
Computation of the system matrix using symmetry
As can be seen from (2), the computation and storage of the system matrix elements are crucial for fast reconstruction of ART. For cone-beam CT system, both M and N are extremely large, hence it is infeasible to pre-compute and store the nonzero weight coefficients. In practice, they are usually computed on the fly. Recently, we proposed an efficient method for the computation of 3D Joseph's projection. 33 Based on this algorithm, we will exploit the geometric symmetry of the cone-beam rays and further improve the reconstruction speed.
For convenience, the one dimensional image index n can also be represented by three indices: i, j, and k, which correspond to the row, column, and layer index of fn, respectively. Let K be the value of I × I, which represents the total number of voxels within one layer. Thus, n is equal to k × K + i × I + j.
Let S(Sx, Sy, Sz) be the X-ray source point, E(Ex, Ey, Ez) be the center point of a pixel on the 2D detector plane, and (s,t) be the local coordinates of E. The t-axis of the detector is parallel to the z-axis. Suppose |Ex−Sx| is larger than both |Ey−Sy| and |Ez−Sz|. All voxels that have the identical column index j are located on one plane, which is parallel to the yoz-plane (referred to as the jth yz-plane). As shown in Figure 2, suppose SE intersects with the jth yz-plane at point A, whose coordinates are (x,y,z). Then, the indices of the four neighbouring voxels around A, i.e., (i,j,k), (i,j,k−1), (i−1,j,k−1), and (i−1,j,k), as well as the corresponding interpolation coefficients u and v can be computed by its coordinates. Let F(s,-t) be the symmetric point of E(s,t) about the s-axis on the detector. Because SO is always perpendicular to the detector, SE and SF are symmetrical about the xoy-plane. Then, SF will intersects the jth yz-plane at point B, whose coordinates must be (x,y,-z). Thus, according to the symmetry the indices of the four neighbouring voxels around point B must be (i,j,h), (i−1,j,h), (i−1,j,h + 1), and (i,j,h + 1), where h = I + 1 − k. In addition, the interpolation coefficients of point B must be u and 1 − v, respectively. Therefore, if we have computed the system matrix elements of ray SE, we can easily obtain those elements for its symmetry ray SF, which will further reduce the computation cost.

Computation of weight coefficients using symmetry.
Based on the above discussion, we will give a simplified description of our method. Let mxy and mz be the slope of the projection lines of SE on the xoy-plane and yoz-plane, respectively, where mxy = (Ey−Sy)/(Ex−Sx), mz = (Ez−Sz)/(Ex−Sx). Suppose 0 ≤ mxy ≤ 1 and mz ≥ 0. First, we compute the initial four voxel indices as well as their interpolation coefficients u and v. Then, for the next yz-plane, u and v will increase mxy and mz, respectively, i.e., u = u + mxy, v = v + mz. According to the updated values of u and v, we can determine the four new voxels that have contributions to ray SE. Once u or v is larger than 1, it will be decreased by 1. This process is repeated until any of the voxel indices is out of range.
For ray SE, we define an array a to save those values of the one dimensional voxel indices, and array b to save the corresponding weight coefficients. However, for its symmetry ray SF, we just use array c to save the values of voxel indices. The total number of computed voxel indices of SE is represented by variable r. Algorithm 1 summarizes the pseudocode of system matrix computation for two symmetry rays.
System matrix computation for two symmetry rays
Multi-thread acceleration of ART
The ART algorithm consists of three main loops. The middle loop runs over ϕ projection views. Although the projection views can be accessed in different schemes, this loop is impracticable to be parallelized due to its sequential execution. The inner loop runs over T = Nc × Nr rays to perform the ART reconstruction for a given view. Intuitively, we can simply assign T threads to implement this loop. However, since one voxel may be passed by several adjacent rays within a projection view, such method will lead to access conflict when updating the same voxel value by these adjacent rays at the same time. To address the problem of access conflict, we partition the reconstruction procedure of ART into a set of sub-procedures, each of which is executed by an individual CPU thread. Within one thread, the sub-procedure is performed in sequence according to the ray indices. Considering the symmetry, the reconstruction procedure is partitioned by evenly dividing the half detector plane into Nt sub-procedures. Each sub-procedure has Nr/2Nt rows of rays, which is corresponding to a thread. Therefore, there are T/2Nt rays to be processed in one thread. Currently, several types of parallel programming models, such as Pthreads, OpenMP, and Win32 API, have been developed for multithreading programming on shared memory multi-core systems.44–46 Among these models, the OpenMP offers us full control over parallelization. Moreover, it is particularly designed for loop-based data parallelism. Therefore, we use the OpenMP programming model to accelerate our symmetry-based ART. Algorithm 2 summarizes the pseudocode of multi-thread accelerated ART.
GPU acceleration of the TV minimization
In this section, we describe the GPU acceleration of the TV minimization in detail. As introduced above, the program flow of TV minimization can be presented in Algorithm 3 .
In Algorithm 3, array f1[1..N] stores the TV regularized image of previous iteration, and array f [1..N] stores the currently reconstructed image by ART algorithm. The first step (lines 2–3) is to compute the L2 norm distance from f1 and f, which is implemented on CPU. The second step (line 6) is to compute the elements of v. For this purpose, we allocate two arrays f2[1..N] and f3[1..N] on GPU's global memory to store the values of array f and elements of image v, respectively. Then we design the first CUDA kernel with I × I threads, each of which runs a loop on k (1 ≤ k ≤ I) to compute the elements of array f3. The second kernel is to compute the L2 norm of image v (lines 7–8). In this kernel, we specify BN blocks, and TN threads within one block. In each thread, we sum the squares of elements of array f3 and store them in a shared memory array s[1..TN]. Then, we use reduction method to sum all the element values to s[1]. For each block, we store s[1] to a global memory array g[1.. BN]. Finally, we transfer the GPU array g to a CPU array q[1.. BN]. Using array q we can easily compute the L2 norm of image v. Algorithm 4 summarizes the pseudocode of the second kernel.
The final kernel is to update image f2 (line 9 in Algorithm 3). Similar to the first kernel, we specify I × I threads, and each thread runs I loops to update image f according to (5). Having launched the third kernel, the updated image f2 is then transferred to image f. Algorithm 5 summarizes the pseudocode for acceleration of TV minimization with GPU.
Experiment and results
The codes of ART-TV algorithm were written in ANSI-C and compiled in Visual Studio 2013. OpenMP 3.0 and CUDA 8.0 runtime API were utilized for multi-threading and GPU acceleration, respectively. We evaluated the performance of our method on a workstation equipped with an Intel 10-core, 3.30 GHz CPU. The CUDA code for GPU acceleration was executed on a GeForce GTX 1080 Ti GPU with 11 GB on-board memory.
Accuracy of Joseph's projection based ART-TV
We simulated a cone-beam flat-detector CT system. The source-to-origin distance was 780 mm, and the source-to-detector distance was 1040 mm. The flat-detector had Nc × Nr = 512 × 512 detector bins, with a bin size of 0.256 mm × 0.256 mm. We used a three-dimensional extension of the Shepp-Logan phantom (SLP) for the following evaluations. 47 To assess our method with different levels of sparsity of the projections, the view number ϕ was set to 45, 60, 90, and 180 during the experiments.
To evaluate the effectiveness of the proposed method, the 3D SLP was reconstructed using three methods: traditional Siddon's projection-based ART and ART-TV, and our proposed Joseph's projection-based ART-TV (named ART-TV-J). The reconstructed images were 512 × 512 × 512 voxels with a spacing of 0.192 mm × 0.192 mm × 0.192 mm. In the reconstruction procedure, the relaxation factor λ, the acceleration factor a, and NTV were set to 0.2, 0.36, and 20, respectively. For estimating the accuracy of the reconstruction, we used the measure of normalized root mean square (NRMS) in our experiments. The NRMS error is defined as:
Figure 3 presents plots of the NRMS errors versus iteration numbers, where the NRMS errors were computed using the 196th transversal slices obtained by the three methods from different numbers of views. As illustrated in Figure 3, as the number of iterations increases, all three methods gradually converge; however, our ART-TV-J algorithm converges much faster than the traditional ART and ART-TV algorithms. Furthermore, for 45 projection views, the NRMS errors of ART-TV-J are much lower than those of ART and ART-TV. These results indicate that ART-TV-J can significantly improve reconstruction quality compared to the traditional ART-TV algorithm, particularly for sparse-view reconstructions.

Plots of the NRMS errors versus iteration numbers using ART, ART-TV, and ART-TV-J from different views. (a) 45 views; (b) 60 views; (c) 90 views; (d) 180 views.
Figure 4 shows the reconstructed SLP images of the 196th transversal slices using three methods after 10 iterations. For reconstructions from 45 projection views, ART exhibits obvious artifacts, whereas both ART-TV and ART-TV-J produce more satisfactory results. To better compare these methods, we selected regions of interest (ROIs) within the reconstructed images. The ROI includes part of the edge of the SLP, marked by a small red rectangle. The corresponding magnified ROI is indicated by a larger red rectangle within the same image. As can been seen from the magnified ROIs, the reconstructed edges using ART-TV-J are clearer than those produced by ART-TV and ART. Furthermore, the reconstructions of ART-TV-J appear slightly better than those of traditional ART-TV, especially for sparse views reconstruction. However, for larger numbers of projection views, little difference can be observed between the reconstructions of ART-TV and ART-TV-J.

Reconstruction results of the 3D SLP. The first to third row shows the reconstructions using ART, ART-TV, and ART-TV-J, respectively. The first to fourth column shows the reconstructions from 45, 60, 90, and 180 projection views, respectively.
Efficiency of the symmetry-based ART-TV-J
To evaluate our symmetry-based system matrix computation for Joseph's projection, we used this method to perform the ART-TV-J algorithm (named ART-TV-symJ), and made further comparison with ART-TV-J. For implementation of the ART-TV-J, we used our previously proposed Joseph's projection method (see 28 ). We reconstructed the 3D SLP from 60 projections with 10 iterations. Table 1 lists the average reconstruction time for one iteration of the two methods. As shown in Table 1, the computation time for system matrix of Joseph's projection is reduced from 66 s to 39 s by using the symmetry. In the forward projection, the loop of ray indices just runs from 1 to Nr/2, thus the forward projection of two symmetric rays can be performed simultaneously in the same loop. Consequently, the computation time of the forward projection is saved about 21 s. Since the denominator in (2) can be computed only once for two symmetric rays, the back projection procedure is further improved. Overall, the reconstruction time of ART-TV-J is reduced about 56 s for one iteration by using our method.
Reconstruction time for one iteration using two methods (s).
Table 2 lists the NRMS errors of the reconstructed transversal slices using the two methods after 10 iterations. As shown in Table 2, the NRMS values of the 196th slices reconstructed by the two methods are completely identical. However, little difference can be found from those of the 257th and 316th slices. The reason is that the reconstruction values of ART is related to the access sequence of the projection rays. In our method, the projection rays are accessed according to their row indices on the detector plane. For the slices above xoy-plane, such as 196th slice, both ART-TV-J and ART-TV-symJ access the projection rays from row 1 to Nr/2, hence these slices have completely identical reconstruction values. However, for the slices below xoy-plane, such as the 257th and the 316th slices, the ART-TV-J accesses the projection rays from row Nr/2 + 1 to Nr, while the ART-TV-J accesses them from row Nr to Nr/2 + 1. Therefore, there exists very small difference between slices reconstructed by the two methods below xoy-plane. The results show that our symmetry based projection computation method is accuracy.
NRMS errors of the two methods.
Figure 5 shows the reconstructed images and corresponding profile plots of the 196th, 257th, and 316th transversal slices using the two methods after 10 iterations. Visually, the images reconstructed by ART-TV-J and ART-TV-symJ are nearly indistinguishable. As shown in the profile plots, the profiles of the two methods almost coincide with each other. These results further indicate the accuracy of ART-TV-symJ.

Reconstructions of the 3D SLP and the profile plots across the lines indicated in the reconstructed slices. From top to bottom: the 196th, 257th, and 316th transversal slices. The first and second columns show the reconstructions using ART-TV-J and ART-TV-symJ, the third column shows the corresponding profile plots.
Comparisons with other TV-based methods
To further evaluate the effectiveness of our ART-TV-symJ, we conducted more detailed comparisons with other TV-based reconstruction methods under very sparse view conditions. For this evaluation, we reconstructed the 3D SLP from 45 projections using the EM-TV algorithm, the awHOTV constrained ART (ART-awHOTV), the awTV constrained ART (ART-awTV), and ART-TV-symJ, respectively. It should be noted that, for the other three methods, Siddon's projection was applied to compute the system matrix. Figure 6 displays the original phantom and the reconstructed images using different methods with 10 iterations. The chosen ROI contains an ellipse, indicated by a small red rectangle. As shown in Figure 6(b), the reconstruction of EM-TV is very blurry, making it difficult to discern the ellipse both in the ROI and its magnified one, due to the slow convergence of the EM algorithm. Figure 6(c) demonstrates that the reconstruction of ART-awHOTV is somewhat clearer than that of EM-TV. However, streak artifacts can be observed in the result. This is because that the SLP is piecewise constant, while awHOTV is better suited for phantoms with fluctuating features. As can be seen from Figure 6(d) and (e), particularly in their magnified ROIs, the reconstruction of ART-TV-symJ is slightly better than that of ART-awTV. Overall, the reconstruction quality of ART-TV-symJ surpasses that of EM-TV, ART-awHOTV, and ART-awTV.

Original phantom (a) and reconstructed images of the 219th transversal slice of the 3D SLP using different methods: (b) EM-TV, (c) ART-awHOTV, (d) ART-awTV, and (e) ART-TV-symJ.
For a more intuitive comparison, we plotted the profiles of the reconstructed images in Figure 7, where (a) corresponds to images from Figure 6, and (b) shows the local magnified profiles within the red rectangular region of (a). As shown in Figure 7(b), it is clear that the profile of ART-TV-symJ is the closest to that of the original phantom among these algorithms. Additionally, jagged edges are noticeable in the profile of ART-awHOTV.

In addition to visual comparisons, the reconstruction quality is quantitatively evaluated using the NRMS measure. Table 3 lists the NRMS errors of the reconstructed images for different numbers of iterations. It is clear that the NRMS errors of ART-TV-symJ are the lowest across all iteration numbers. After ten iterations, the NRMS error of our method is reduced by 19.9% compared to ART-awTV, and by 55.2% compared to the EM-TV algorithm. These results further verify the accuracy of our ART-TV-symJ algorithm.
NRMS errors of different methods for different iterations.
To evaluate the efficiency of ART-TV-symJ, we compared the time performance of the above methods. Since most TV-based methods involve two primary operations: iterative reconstruction and TV minimization, we measured the execution time of these two operations separately. Given that the EM algorithm updates each voxel by considering all intersecting rays, the reconstruction process using EM is particularly time-consuming. To speed up this process, we employed two look-up tables to store the relevant correction terms for each voxel. Table 4 presents the time performance comparisons of the various methods. Despite this acceleration, the EM implementation remains computationally intensive. In contrast, the reconstruction process of ART-TV-symJ is 3.2 times faster than that of EM-TV. For more complex TV models, the computational cost increases significantly compared to traditional TV minimization. Consequently, for a single iteration, the total implementation time of ART-TV-symJ is 2.4 times faster than that of ART-awTV and 3 times faster than that of ART-awHOTV. These results further demonstrate that ART-TV-symJ exhibits superior efficiency.
Time performance comparisons of a single iteration (s).
Efficiency and accuracy of the parallel scheme
To evaluate our parallel reconstruction method, we utilized ART-TV-J and ART-TV-symJ to reconstruct the 3D SLP from 60 projections with 10 iterations. The ART algorithm was executed on the CPU using varying numbers of threads, while the TV minimization was implemented on the GPU using our designed CUDA kernels. Given the number of cores in our test CPU, the number of threads was capped at 10 in our experiments. For the first and third kernels, the size of block was set to 16 × 16, and that of grid was set to 32 × 32. For the second kernel, the block was set to 1024, and the grid was set to 512.
Table 5 lists the average reconstruction time for one iteration of the two methods. In the case of single-threaded reconstruction, only the TV minimization procedure is accelerated. Therefore, as can be seen from Tables 1 and 5, the TV operation is reduced from 140 s to 2.7 s by using GPU acceleration. Compared with the performance of single-threaded CPU, the TV operation is improved about 51 times. When using two threads, the reconstruction time for ART is improved by about 1.9 times. Clearly, the more threads utilized, the greater the reduction in reconstruction time. With ten threads for ART-TV-symJ, the reconstruction time is improved by approximately 14 times compared to ART-TV-J implemented on the CPU. The results demonstrate that the proposed multi-threaded and GPU acceleration method is highly efficient.
Reconstruction time for a single iteration with different numbers of threads (s).
Table 6 lists the NRMS errors of the 196th transversal slices achieved by ART-TV-J and ART-TV-symJ. The iteration number was set to 10, and the number of threads varied from 1 to 10. As can be seen from Table 6, the accelerated ART-TV-J and ART-TV-symJ methods maintain the accuracy of reconstruction, which further demonstrates the effectiveness of our hybrid parallel scheme.
NRMS errors with different numbers of threads.
Figure 8 shows the reconstructed 196th transversal slices using ART-TV-J and ART-TV-symJ. As shown in Figure 8, the reconstructed images are almost identical when accelerated using different numbers of threads. The results indicate that our method, which combines multi-threading with GPU acceleration for the ART-TV-symJ, is highly accurate.

Reconstructed images of the SLP by two methods using multi-thread and GPU acceleration. The first to fourth column shows the reconstructions using 1, 4, 8, and 10 threads, respectively. The first (second) row shows results achieved by the ART-TV-J (ART-TV-smyJ) algorithm.
Real data study
Since the efficiency of the ART-TV-symJ has been assessed in detail, in this sub-section, we focus on evaluating its accuracy for real CT projection data. The projections were obtained by scanning a physical phantom on a real cone-beam CT scanner. The phantom is a ceramic teapot, which has almost one single material. The source-to-origin distance was 1080 mm, and the source-to-detector distance was 1950 mm. The flat detector had a resolution of 2048 × 2048 bins with a pixel size of 0.2 mm × 0.2 mm. The reconstruction image was 512 × 512 × 512 with a voxel size of 0.443 mm × 0.443 mm × 0.443 mm. A total of 600 projections were acquired with equal angular spacing over 360° rotation. For comparison, we reconstructed the teapot using the FDK, ART-TV, and ART-TV-symJ methods. For traditional ART-TV method, Siddon's forward projection was used. The FDK algorithm was implemented using all projections, whose reconstruction results can be considered as the reference for comparison. Both ART-TV and ART-TV-symJ were implemented using a subset of 40 equally spaced projections with 20 iterations.
Figure 9 presents the reconstructed images of the teapot using these methods. Specifically, it displays the 246th transversal slice and the 210th coronal slice. As shown in Figure 9, the reconstructed images from these methods appear virtually identical, indicating that both ART-TV and ART-TV-symJ are highly effective in reconstructing single-material objects from very sparse view projections. For further comparison, we plotted the profiles (marked by red lines in Figure 9) in Figure 10. As depicted in Figure 10(a) and (c), the profiles of ART-TV and ART-TV-symJ are nearly coincident. However, as shown in Figure 10(b) and (d), the profiles of ART-TV-symJ exhibit greater consistency with those of the FDK method.

Comparison of reconstruction results of the ceramic teapot. The first (second) row shows the 246th transversal (210th coronal) slices. The first to third column shows the images reconstructed using FDK, ART-TV, and our multi-thread and GPU accelerated ART-TV-smyJ.

Profiles comparison of the images in Fig. 9. (a) Profiles along the 200th horizontal lines of the Figs. 9(a), (b), and (c); (b) the magnified profiles within the blue rectangular region of (a); (c) Profiles along the 360th horizontal lines of the Figs. 9(d), (e), and (f); (d) the magnified profiles within the blue rectangular region of (c).
The teapot phantom features both a relatively simple surface structure and material composition. To evaluate our method with more complex phantoms, we scanned a metal bottle using 1080 equally spaced projections on the same CT scanner. The metal bottle has intricate design elements, including two dragon-shaped ears and carved texture patterns on its surface. Additionally, it is composed of a variety of materials. The source-to-origin distance was 1949 mm, and the source-to-detector distance was 2221 mm. The reconstruction image was 512 × 512 × 512 with a voxel size of 0.7024 mm3. We reconstructed the metal bottle phantom using the three aforementioned methods. For FDK, all 1080 projections were utilized. In contrast, the ART-TV and ART-TV-symJ reconstructions used only 40 equally spaced projections, each with 20 iterations.
Figure 11 displays the reconstructed 246th transversal and 235th coronal slices of the metal bottle, with corresponding profiles plotted in Figure 12. As shown in Figure 11, the contours of images reconstructed using FDK are clearer compared to those obtained using ART-TV and ART-TV-symJ. Also, the reconstructions from ART-TV and ART-TV-symJ exhibit smoother images as well as several artifacts. We believe that increasing the number of iterations would further improve the reconstruction quality of both ART-TV and ART-TV-symJ methods. As illustrated in the magnified profiles of Figure 12(b) and (d), the reconstruction quality of ART-TV-symJ is superior to that of traditional ART-TV. These results further demonstrate that our ART-TV-symJ method remains effective for more complex phantoms with real CT projection data.

Comparison of reconstruction results of the metal bottle. The first (second) row shows the 256th transversal (235th coronal) slices. The first to third column shows the images reconstructed using FDK, ART-TV, and ART-TV-smyJ.

Profiles comparison of the images in Fig. 11. (a) Profiles along the 232th horizontal lines of Figs. 11(a), (b), and (c); (b) magnified profiles within the blue rectangular region of (a); (c) Profiles along the middle horizontal lines of Figs. 11(d), (e), and (f); (d) magnified profiles within the blue rectangular region of (c).
Conclusion
Improving the reconstruction speed has always been a research in the field of CT. We have presented an efficient and high-quality reconstruction scheme for cone-beam sparse-view CT based on traditional ART-TV algorithm. Existing ART-TV methods usually utilize Siddon's projection to implement ART. By using Joseph's projection, high-quality can be obtained for ART-TV. However, the ART-TV method is extremely time-consuming particularly for cone-beam CT system. For this reason, we propose an efficient method to compute the system matrix for Joseph's projection. Our method exploits the geometric symmetry of the rays in cone-beam geometry. By computing the system matrix elements for one ray, we could easily obtain the corresponding elements for its symmetric ray. On this basis, we present a hybrid parallelization scheme for fast implementation of ART-TV-symJ, which involves both multi-thread and GPU computations. The experimental results indicate that our method is efficient. We believe that the proposed scheme could be further applied to the modelling of cultural relic objects with very sparse view projections.
Note that, our method employs bilinear interpolation to generate the virtual projection data. While this process may sacrifice the spatial resolution of the projection data and further compromise the overall reconstruction quality to some extent. Moreover, our method depends heavily on the number of physical cores, which is a limitation for the acceleration of ART-TV-symJ. Compared with ART, the SART algorithm has good parallelism, which is suitable for GPU acceleration. Hence, our future work will study the total variation regularized cone-beam SART reconstruction.
Footnotes
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (Grant Number. 61772421).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
