Comparison and verification of two deep learning models for the detection of chest CT rib fractures

Abstract

Background

A high false-positive rate remains a technical glitch hindering the broad spectrum of application of deep-learning-based diagnostic tools in routine radiological practice from assisting in diagnosing rib fractures.

Purpose

To examine the performance of two versions of deep-learning-based software tools in aiding radiologists in diagnosing rib fractures on chest computed tomography (CT) images.

Material and Methods

In total, 123 patients (708 rib fractures) were included in this retrospective study. Two groups of radiologists with different experience levels retrospectively reviewed images for rib fractures in the concurrent mode aided with RibFrac-High Sensitivity (HS) and RibFrac-High Precision (HP). We compared their diagnostic performance against the reference standard in terms of sensitivity and positive predictive value (PPV).

Results

On a per-patient basis, RibFrac-HS exhibited a higher sensitivity compared with RibFrac-HP (mean difference=0.051, 95% CI=0.012–0.090; P = 0.011), whereas the latter significantly outperformed the former in terms of the PPV (mean difference=0.273, 95% CI=0.238–0.308; P < 0.0001). The use of RibFrac-HP significantly improved the junior and the senior groups’ sensitivities respectively by 0.058 (95% CI=0.033–0.083; P < 0.0001) and 0.058 (95% CI=0.034–0.081; P < 0.0001), and decreased the diagnosis time by 206 s (95% CI=191–220; P < 0.0001) and 79 s (95% CI=67–92; P < 0.0001), respectively, when compared to no software assistance.

Conclusion

The sensitivity and efficiency of radiologists in identifying rib fractures can be improved by using RibFrac-HS and/or RibFrac-HP. With an added module for false-positive suppression, RibFrac-HP maintains the sensitivity and increases the PPV in fracture detection compared to Rib-Frac-HS.

Keywords

Rib fracture artificial intelligence computed tomography false positives computer-aided diagnosis

Introduction

A rib fracture is one of the most common consequences of traumatic injuries, affecting around 10% of the patients in general (1) and nearly 40%–80% of patients experiencing blunt chest trauma due to high-impact collision (2). Studies have shown that the mortality rate in patients with post-traumatic rib fractures is in the range of 10%–16% (3), while it may further increase to as high as 65% when pulmonary infection occurs secondary to rib fracture (4). Therefore, prompt and effective detection of post-traumatic rib fractures is urgently needed in stratifying the severity of the trauma, as well as for determining an appropriate patient care management.

Chest digital radiography (DR) (5) and computed tomography (CT) (6) are the most commonly used imaging techniques for early assessment of patients with chest trauma. Although radiographs can be obtained more expediently in the emergent setting, incomplete or non-displaced fractures can frequently be missed (7). On the other hand, Performance and interpretation of CT exams are more time-consuming due to the large number of images and frequent coexistence of other injuries and incidental findings on CT compared to radiographs (8). Despite these facts, the anatomy and orientation of the rib cage also requires radiologists to follow each rib on multiple slices (whether in axial, coronal, or sagittal views) to detect sometimes subtle discontinuity or distortion of the rib cortices, making fracture detection even more difficult than in other osseous structures (9).

In recent decades, considerable progress has been made in the application of deep learning (DL) to medical image processing and recognition. As a consequence, DL-based computer aided diagnosis (CAD), such as the detection and characterization of lung (10), prostate, and breast cancers (11,12), has been widely implemented in clinical practice. In terms of fracture detection, CAD has proven its accuracy and efficiency in facilitating the diagnosis of common fractures, including the ones occurring in the proximal humerus, femoral neck, and wrist joint (13 –15). However, its levels of precision, performance, and practical feasibility remain inconclusive in detecting rib fractures, as several studies have reported the high occurrence of false-positive results in this diagnostic method (9,16).

The aim of the present study was to evaluate the performance of two versions of DL-based CAD tool for rib fractures (RibFrac; Aitrox, Shanghai, PR China), namely the high sensitivity version (RibFrac-HS) and the high precision version (RibFrac-HP), in assisting radiologists in identifying blunt trauma-induced rib fractures. RibFrac-HS was designed to capture as many rib fractures as possible from CT images, whereas RibFrac-HP had an additional module to neutralize the plausible false positives occurring during the automatic detection. Here, based on a rib-fracture dataset of 123 cases with the available ground truth as the reference standard, we compared the performance levels of two groups of radiologists in diagnosing the same rib fractures with and without the aid of software tools in terms of the sensitivity, positive predictive value (PPV), and the length of the diagnostic process.

Material and Methods

This study was approved by the Institutional Review Board of Shanghai Changzheng Hospital before patient information was accessed, and the requirement for informed consent of patients was waived due to the retrospective nature of the analysis and the anonymity of the data.

Case selection

Diagnostic imaging records of the patients enrolled for the thoracic CT scan in Shanghai Changzheng Hospital between January 2015 and December 2019 were retrospectively reviewed with the following inclusion criteria: (i) images were from patients with thoracic trauma only; (ii) images were available with slice thickness of ≤1 mm; and (iii) images were of fair quality without breathing or motion artifacts. Patient images were excluded if they had undergone any internal rib fixation or any other thoracic surgeries (e.g. lung wedge resection or mastectomy). After screening, the images of 123 patients were enrolled in our study. We should mention that this cohort of 123 patients is also related to our other project, which requires patients to have DR and CT examinations at the same time. Fig. 1 shows the case selection approach, and the number of patients finally included in the study according to the inclusion and exclusion criteria. Both RibFrac-HS and RibFrac-HP were trained with 1500 chest CT images from two other hospitals. The detailed inclusion flow chart is shown in Fig. 1.

Fig. 1.

The strategy used for case selection and the number of patients finally recruited per the inclusion and exclusion criteria.

Image acquisition and preprocessing

The CT images were acquired by using one of the following multidetector CT scanners (manufacturer: Toshiba, General Electric, or Philips) with the following scan settings: tube voltage = 120 kVp; tube current = 50–150 mAs; image matrix = 512 × 512 pixels; and scanning duration = 0.5 s. The included images were reconstructed using a standard reconstruction algorithm (17) with a thickness in the range of 0.625–1 mm. All scans were performed from the level of the thoracic cavity to the level of the upper part of the kidney with a slice thickness of 1 mm.

Reference standard of rib fractures

A rib fracture was regarded as the ground truth if the lesion had been individually annotated by two senior radiologists (S.X. and P.W., respectively with 13 and 15 years of experience in chest CT imaging). In cases where consensus was not reached, a third expert (Y.X., with 25 years of experience) evaluated and discussed with others, and the determined discussion was agreed as a reference standard. If there was still disagreement, the lesion was removed. All image annotations were performed on the Prolego (Image Processing System, Aitrox Technology Corporation Limited, Shanghai, China). All CT images were observed under a bone window, with a window width (WW) of 1400 HU and a window level (WL) of 600 HU. We divided all rib fractures into five categories: completely displaced fracture; completely non-displaced fracture; incomplete fracture; cortical distortion; and callus formation. The completely displaced fracture was characterized by a dislocation >2 mm. Fractures without significant dislocation, including simple, oblique, transverse, or butterfly fractures, were classified as completely non-displaced fractures. The incomplete fracture was characterized if some cortical bones were discontinuous but not completely broken. A fracture with “focal deformity” was classified as cortical distortion. Callus formation was defined as focal sclerosis of the rib, with or without cortical displacement.

DL model architecture

The architecture of “RibFrac-HS” consisted of two modules: (i) a 3D-region proposal network to detect rib fractures based on the combination of a modified U-Net (18) and a ResNet Block (19); and (ii) a bounding box eliminator clearing the regions outside of the rib regions (Fig. 2). Compared to “RibFrac-HS,” “RibFrac-HP” had an additional U-Net-based classification network (18) in between those two modules to suppress false positives (Figs. 2 and 3). All networks were trained using PyTorch version 1.5.1 (20) on the platform of Python version 3.6.8 (Python Software Foundation, Wilmington, DE, USA).

Fig. 2.

Architectures of the developed DL-based software tools (a) “ribFrac-HS” and (b) “ribFrac-HP.” DL, deep learning.

Fig. 3.

Structures of the 3D-region proposal network (top) and the novel false-positive suppression module (bottom) in ribFrac-HP.

Diagnostic performance evaluation

We investigated the diagnostic performance of two groups of radiologists with varied clinical experiences in chest CT interpretation, respectively, with and without the aid of the two software tools. The junior radiologists’ group consisted of Q.X., Q.S., and C.S., with two, three, and three years of experience, respectively. The senior group consisted of H.S., S.C., and X.W., with eight, nine, and nine years of experience, respectively.

Their performances in identifying rib fractures with and without software assistance were evaluated according to the following steps (Fig. 4): (i) the participating radiologists individually marked the location of rib fractures based on their own judgment using a dedicated software tool Prolego (Image Processing System, Aitrox Technology Corporation Limited, Shanghai, China); (ii) RibFrac-HS and RibFrac-HP were retrospectively applied to the collected rib-fracture dataset, generating a report including the location of all rib fractures identified for each patient; (iii)rReferring to the reports generated in step (ii), the radiologists re-examined all CT images and marked any location of fractures they could identify, and the new diagnosis time was also recorded; and (iv) locations of rib fractures annotated in steps (i), (ii), and (iii) were respectively compared against the ground truth, in terms of the detection sensitivity and PPV, calculated as:

Sensitivity = \frac{T P}{P},

(1)

where TP is the number of true-positive detections, and P is the total number of rib fractures per the reference standard, and

PPV = \frac{T P}{T P + F P},

(2)

where FP is the number of false-positive detections.

Fig. 4.

The timeline used for the steps of evaluation, with the time gap between the steps illustrated (Figs. 5 and 6). An example (CT image, HS output, and HP output) to illustrate a false-positive case detected by HS and eliminated by HP version of algorithm.

Fig. 5.

The rib fracture was detected in HS, and the doctor suspected cortical distortion; HP did not output later. Finally, 3D reconstruction confirmed no rib fracture.

Fig. 6.

The rib fracture was detected and output in HS, and the doctor considered it was not a rib fracture and removed it. HP did not output later.

An identified rib fracture was regarded as a true positive if the center of its bounding box fell inside the bounding box of a rib fracture annotated in the reference standard (i.e. the ground truth).

Note that, in this diagnostic performance test, (i) there was a period of at least one month (21) between step (i) and (iii) for radiologists to neutralize their diagnostic bias; (ii) all radiologists were always blind to the information of patients; and (iii) the order of CT scans was randomized before a new round of diagnostic test The time needed by each radiologist to reach a diagnosis for each patient was automatically recorded by Prolego (Image Processing System, Aitrox Technology Corporation Limited, Shanghai, China). Before this study, all radiologists involved had been trained to be familiar with the operation of the Prolego.

Statistical analysis

To compare the performances of diagnoses made with and without artificial intelligence (AI) assistance, a paired-sample two-sided t-test was applied. A difference was considered statistically significant when a P value was <0.05. The statistical analysis was performed on the R language platform version 4.0.0 (The R Foundation for Statistical Computing, Vienna, Austria).

Results

Patient demographics

A total of 123 patients (82 men [67%], 41 women [33%]; mean age = 54 years; age range = 46–64 years), with 708 rib fractures identified as the ground truth, met the inclusion and exclusion criteria. Of the 708 fractures, 360 (51%) fractures were complete dislocation fractures, 170 (24%) were complete non-displaced fractures, 113 (16%) were incomplete cortical fractures, and 40 (6%) were incomplete cortical distortions. The remaining 25 (3%) fractures were bone callus. See Fig. 1 for the strategy used for case selection and the number of patients included, and Table 1 for a description of the patient characteristics.

Table 1.

Patient demographics.

No. of patients	123
Age (years)	54 (46–64)
Sex
Male	82
Female	41
No. of fractures	708
Completely displaced fracture	360
Completely non-displaced fracture	170
Incomplete fracture	113
Cortical distortion	40
Callus formation	25

Values are given as n or median (IQR).

Diagnostic performance of radiologists with and without AI assistance

Table 2 presents the sensitivities, PPVs, and the reading time in two groups of radiologists in diagnosing rib fractures with and without the support of RibFrac-HS and RibFrac-HP tools.

Table 2.

Summary of the diagnostic performance and diagnosis time of junior and senior radiologists in identifying rib fractures in CT images with and without the aid of two software tools.

		Sensitivity	PPV	Diagnosing time
AI software tools	RibFrac-HS	0.83 ± 0.25 (0.78–0.88)	0.15 ± 0.11 (0.13–0.17)	—	—	—
AI software tools	RibFrac-HP	0.79 ± 0.26 (0.74, 0.84)	0.43 ± 0.24 (0.38, 0.47)	—	—	—
Junior radiologists	Without AI	0.71 ± 0.25 (0.66–0.75)	0.68 ± 0.26 (0.63–0.73)	285 ± 88 (269–302)
	With RibFrac-HS	0.76 ± 0.26 (0.72–0.81)	0.63 ± 0.26 (0.58–0.67)	151 ± 64 (140–163)
	With RibFrac-HP	0.78 ± 0.24 (0.74–0.82)	0.63 ± 0.25 (0.58–0.67)	80 ± 33 (74–86)
Senior radiologists	Without AI	0.72 ± 0.24 (0.68–0.77)	0.71 ± 0.24 (0.67–0.75)	222 ± 68 (209–234)
	With RibFrac-HS	0.77 ± 0.24 (0.73–0.81)	0.65 ± 0.25 (0.61–0.70)	156 ± 52 (146–165)
	With RibFrac-HP	0.78 ± 0.25 (0.74–0.83)	0.66 ± 0.25 (0.61–0.70)	142 ± 42 (135–150)

Values are given as mean ± SD (95% CI).

AI, artificial intelligence; CI, confidence interval; CT, computed tomography; PPV, positive predictive value; SD, standard deviation.

On a per-patient basis, the mean diagnostic sensitivities of the junior radiologists’ group under the conditions of without AI assistance and with the assistance of RibFrac-HS, or RibFrac-HP were 0.71 (95% confidence interval (CI) = 0.66–0.75), 0.76 (95% CI = 0.72–0.81), and 0.78 (95% CI = 0.74–0.82), respectively. Likewise, the sensitivities of the senior radiologists’ group under those three conditions were 0.72 (95% CI = 0.68–0.77), 0.77 (95% CI = 0.73–0.81), and 0.78 (95% CI = 0.74–0.83), respectively (Table 2).

Interestingly, neither junior nor senior radiologists could outperform the CAD tools in identifying comparatively more rib fractures labeled in the reference standard. Notably, RibFrac-HS and RibFrac-HP had sensitivities of 0.83 (95% CI = 0.78–0.88) and 0.79 (95% CI = 0.74–0.84), respectively. Compared with the relative variabilities in PPV between two groups of radiologists in human only diagnostic approach, the clinical application of CAD tools significantly decreased the PPV for both groups, regardless of their years of experience (Table 2).

Comparison of performances between ribFrac-HS and ribFrac-HP

Table 3 presents the differences in the sensitivity, PPV, and the reading time between the following: (i) the junior and the senior radiologists; (ii) RibFrac-HS and RibFrac-HP; (iii) human and machine (i.e. RibFrac-HP); (iv) humans with and without the support of the machine; (v) junior radiologists with RibFrac-HS and RibFrac-HP; and (vi) senior radiologists with RibFrac-HS and RibFrac-HP.

Table 3.

Differences in sensitivity, PPV, and diagnosis time between (1) the junior and the senior radiologists, (2) ribFrac-HS and ribFrac-HP, (3) human and machine, and (4) senior radiologists with and without the support of machine.

	Sensitivity			PPV			Diagnosing time
	Mean difference	95% CI	P value	Mean difference	95% CI	P value	Mean difference	95% CI	P value
Junior vs. Senior	0.000 (-0.025–0.025)		0.995	0.020 (-0.008–0.047)		0.169	63 (49–78)		< 0.0001
RibFrac-HS vs. RibFrac-HP	0.051 (0.012– 0.090)		0.011	0.273 (0.238–0.308)		< 0.0001	—	—	—
Human vs. Machine∗	0.076 (0.045–0.106)		< 0.0001	0.281 (0.239–0.322)		< 0.0001	—	—	—
Human vs. Human + Machine^†	0.058 (0.034–0.081)		< 0.0001	0.051 (0.023–0.079)		0.0004	79 (67–92)		< 0.0001
RibFrac-HS vs. RibFrac-HP(Intra junior radiologists)	0.010 (-0.019–0.038)		0.5047	0.001 (-0.030–0.029)		0.9733	72 (61–82)		< 0.0001
RibFrac-HS vs. RibFrac-HP(Intra senior radiologists)	0.008 (-0.017–0.034)		0.5075	0.008 (-0.020–0.036)		0.5720	13 (3–23)		0.0095

*Comparison between senior radiologists and software tool “RibFrac-HP.”

^†

Comparison between senior radiologists and senior radiologists with the aid of software tool “RibFrac-HP.”

AI, artificial intelligence; CI, confidence interval; PPV, positive predictive value.

With an additional module for false-positive suppression, RibFrac-HP significantly reduced the average number of false-positive detections per scan by a factor of 2.90 (95% CI = 1.88–3.93; P < 0.0001). As a result, its PPV was significantly improved by 0.273 (95% CI = 0.238–0.308; P < 0.0001) but with a very limited compromise in sensitivity of 0.051 (95% CI = 0.012–0.090; P = 0.011), compared to that in case of RibFrac-HS (Tables 2 and 3).

When software tools were applied to assist radiologists in identifying rib fractures, while no statistically significant differences were observed in the diagnostic sensitivity and PPV between RibFrac-HS and RibFrac-HP, we found that the use of RibFrac-HP prevailed in further decreasing the reading time, especially for the junior radiologists group, as the time was decreased from 151 to 80 s (mean difference = 72, 95% CI = 61–82; P < 0.0001), and for the senior group it was decreased from 156 to 142 s (mean difference = 13, 95% CI = 3–23; P = 0.0095) (Tables 2 and 3).

Discussion

In the present study, we investigated the diagnostic performance of two versions of rib fracture detection tools—RibFrac-HS and RibFrac-HP, which were developed based on the DL approach in assisting radiologists diagnosing rib fractures. The sensitives of the junior and senior radiologists’ group were 70.9% and 72.2%, respectively. Notably, the application of DL to medical image analysis and interpretation enables radiologists to detect at least 5.0%–6.5% more rib fractures than that without DL support. Furthermore, these models can significantly improve the diagnostic efficiency of radiologists (approximately 54% of diagnosis time can be saved) irrespective of their experience.

Compared with a previously published report (22), the PPV of the developed algorithms (RibFrac-HS and RibFrac-HP) was lower. To improve the performance of these models and reduce the false positives, multiple types of rib fractures were incorporated into the model, especially those rib fractures at atypical sites that tend to be misdiagnosed during the clinical practice, as the test set of the model. In this study, we had a higher proportion of minor fractures near the sternum and thoracic spine (49%), which could be the primary reason for the slightly lower PPV compared with the previous studies.

Rib fracture is a common and frequently occurring medical condition in emergency patients with high-impact collision injuries. The sensitivities of radiologists in detecting the fractures were 0.71 (junior) and 0.72 (senior), which did not have significant statistical significance. These results were in line with previously published studies (23). It might be possible that in terms of experience there was a slight difference between the senior and the junior radiologists enrolled in this study. Another possible reason could be that rib fracture is quite common in patients with traumatic injury. Though there was a certain gap between the marking results and the gold standard, our statistical data were very intuitive in showing that the two algorithms could improve the sensitivity and diagnosis time of doctors.

Our results demonstrated that the diagnostic application of DL-based CAD tools could significantly improve the sensitivity of fracture detection and reduce the per-patient diagnosis time for radiologists, regardless of their years of experience, while interpreting CT images without any statistically significant reduction in precision. In our evaluation of these tools on a real-world dataset, RibFrac-HP exhibited a greater improvement in elevating the PPV and further decreasing the diagnosis time than that of RibFrac-HS, which could be attributed to its false-positive suppression module. In addition, RibFrac-HS reduced the per-patient diagnosis time by 36.9% and 32.2%, respectively, for the junior and the senior radiologists, in contrast to RibFrac-HP, which reduced the time by 68.8% and 39.2%, respectively.

The comparison between the sensitivity and the number of false positives for RibFrac-HS and RibFrac-HP showed the importance of the false-positive reduction module as it significantly reduced the number of false positives, thereby improving the precision in fracture detection without compromising the sensitivity.

To the best of our knowledge, RibFrac-HP has been the first software tool specifically designed with an additional U-Net-based classification network embedded to eliminate false positives. In addition, the data and labeling standards were high in this study. Notably, we included only images that were obtained through high-quality and thin-slice (at a thickness of 1 mm) CT scans. Moreover, we have had two senior radiologists with 13 and 15 years of experience, respectively, in chest CT diagnosis to note the CT diagnosis report. In this algorithm test study, six radiologists with different seniority were organized to form two groups with three radiologists in each group. The obtained results were satisfactory as per our understanding.

The diagnostic sensitivities of these two groups of radiologists were high, while the diagnosis time was reasonably short. More importantly, the sensitivity was further improved when radiologists included CAD support in their diagnostic method. Thus, these results suggest that we could reach the primary aim of our study in improving the diagnostic efficiency and simultaneously reducing the rates of missed diagnosis and misdiagnosis in case of rib fracture detection.

In the present study, the PPV of the software tools was not as good as we expected. After the human-machine combination, PPV was decreased. It might happen that the algorithms found a comparatively greater number of fractures, but at the same time, it could also be possible that those counts included false positives, which require manual detection and elimination to arrive at the right diagnosis. However, we decided to sacrifice a small percentage of PPV in order to gain an overall increase in sensitivity and a significant decrease in diagnosis time. Therefore, our study indicates that human-machine integration can drastically improve diagnostic performance, at least in the case of rib fracture detection.

In this study, the total number of fractures in the gold standard was 708. We observed that most of the misdiagnosed and missed rib fractures were located at the junction of thoracic vertebrae and sternum. A lot of distorted cortices and callus were also misjudged. In addition, the most misdiagnosed fractures included osteofibrous dysplasia, vascular sulci, bone islands, and other rib abnormalities. The complexity of the ribs themselves and the surrounding bone structures undoubtedly requires an experienced radiologist to distinguish and judge the lesion type. In this study, we had 2.90 false positives per patient, which were inevitable and within the acceptable error range.

The present study has some limitations. First, this is a retrospective study with all rib fracture samples collected from a single center. As a result, the total number of patients was relatively small. In the future, a multi-center study involving a larger cohort of patients should be conducted to examine the efficacy of the developed software tools in a real-world clinical setting. Second, radiologists in this study were not under the stress of the actual emergency, which usually brings a higher incidence of missed diagnosis or misdiagnosis due to fatigue and stress. Moreover, the radiologists were focused only on the task of rib fracture detection, which is never the case in clinical practice. Third, the reference standard was developed based on the experience of our senior radiologists, whereas different radiologists may have different standards towards the diagnosis of rib fractures, especially for the inconspicuous ones. However, most non-obvious rib fractures are not clinically significant. Thus radiologists pay more attention to the obvious fractures to evaluate the severe complications due to trauma. Finally, the landscape of the DL networks was not fully explored. It is our ongoing work to optimize the algorithm, in terms of improving the accuracy and recall, as well as reducing the probability of misdiagnosis. This is because it is not the worst scenario to miss a non-displaced rib fracture but more harmful to mistakenly diagnose a benign or malignant bone condition as a fracture. To meet this aim, future improvements made to a rib fracture CAD tool should take into account its capability to differentiate image features that other rib abnormalities or artifact (e.g. motion artifacts) can mimic.

The performances of two versions of software tools (RibFrac-HS and RibFrac-HP) in assisting radiologists with varying years of experience to identify the post-traumatic rib fractures were systematically assessed. We discovered the following: (i) RibFrac-HS and RibFrac-HP both improved the diagnostic sensitivity of radiologists (e.g. for the junior group by 0.058; 95% CI = 0.033–0.083; P < 0.0001), regardless of their years of experience; and (ii) attributable to a false-positive suppression module, RibFrac-HP greatly outperformed RibFrac-HS in terms of the PPV (mean difference = 0.273; 95% CI = 0.238–0.308; P < 0.0001) and prevailed in further reducing the diagnosis time for rib fractures in a clinical setting (e.g. for the senior group by 79 s; 95% CI = 73–140 s; P < 0.0001).

In conclusion, our findings suggest that the application of advanced AI-guided tools can significantly improve the sensitivity and efficiency of radiologists in diagnosing rib fractures based on CT images. RibFrac-HP may serve as an effective and generally practical tool to be used in assisting the clinical management of patients with chest trauma.

Footnotes

Funding

This work was supported by the National Key R&D Program of China [grant number 2016YFE01030003, grant number 2018YFC0116404]; Contract grant sponsor: Pyramid Talent Project of Shanghai Changzheng Hospital and National Natural Science Foundation of China (grant number 82001812).

ORCID iDs

Sun Hongbiao

Zhang Mingzi

Liu Shiyuan

References

Gage

Rivara

Wang

, et al. The effect of epidural placement in patients after blunt thoracic trauma. J Trauma Acute Care Surg 2014;76:39–45; discussion 45–36.

Lin

Tung

, et al. Morbidity, mortality, associated injuries, and management of traumatic rib fractures. J Chin Med Assoc 2016;79:329–334.

Flagel

Luchette

Reed

, et al. Half-a-dozen ribs: the breakpoint for mortality. Surgery 2005;138:717–723; discussion 723–715.

Barnea

Kashtan

Skornick

, et al. Isolated rib fractures in elderly patients: mortality and morbidity. Can J Surg 2002;45:43–46.

Huang

Liu

Wang

, et al. Rectifying supporting regions with mixed and active supervision for rib fracture recognition. IEEE Trans Med Imaging 2020;39:3843–3854.

Meng

Wang

, et al. A fully automated rib fracture detection system on chest CT images and its impact on radiologist performance. Skeletal Radiol 2021;50:1821–1828.

Shelmerdine

Langan

Hutchinson

, et al. Chest radiographs versus CT for the detection of rib fractures in children (DRIFT): a diagnostic accuracy observational study. The Lancet Child & Adolescent Health 2018;2:802–811.

Wang

Dong

Pan

, et al. Machine vision-based monitoring methodology for the fatigue cracks in U-Rib-to-deck weld seams. IEEE Access 2020;8:94204–94219.

Chai

Qian

, et al. Development and evaluation of a deep learning algorithm for rib segmentation and fracture detection from multicenter chest CT images. Radiol Artif Intell 2021;3:e200248.

10.

Tajbakhsh

Suzuki

. Comparing two classes of end-to-end machine-learning models in lung nodule detection and classification: MTANNs vs. CNNs. Pattern Recognit 2017;63:476–486.

11.

Goldenberg

Nir

Salcudean

. A new era: artificial intelligence and machine learning in prostate cancer. Nat Rev Urol 2019;16:391–403.

12.

Rodríguez-Ruiz

Krupinski

Mordang

J-J

, et al. Detection of breast cancer with mammography: effect of an artificial intelligence support system. Radiology 2019;290:305–314.

13.

Chung

Han

Lee

, et al. Automated detection and classification of the proximal humerus fracture by using deep learning algorithm. Acta Orthop 2018;89:468–473.

14.

Badgeley

Zech

Oakden-Rayner

, et al. Deep learning predicts hip fracture using confounding patient and healthcare variables. NPJ Digit Med 2019;2:1–10.

15.

Gan

Lin

, et al. Artificial intelligence detection of distal radius fractures: a comparison between the convolutional neural network and professional assessments. Acta Orthop 2019;90:394–400.

16.

Zhou

Q-Q

Wang

Tang

, et al. Automatic detection and classification of rib fractures on thoracic CT using convolutional neural network: accuracy and feasibility. Korean J Radiol 2020;21:869.

17.

Mauf

Held

Gascho

, et al. Flat chest projection in the detection and visualization of rib fractures: a cross-sectional study comparing curved and multiplanar reformation of computed tomography images in different reader groups. Forensic Sci Int 2019;303:109942.

18.

Ronneberger

Fischer

Brox

. U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention . Berlin: Springer, 2015, pp. 234–241.

19.

Zhang

Ren

, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. New York, NY: IEEE, 2016, pp. 770–778.

20.

Paszke

Gross

Massa

, et al. Pytorch: an imperative style, high-performance deep learning library. In: International Conference on Neural Information Processing Systems. CA: Neural Information Processing Systems (NIPS), 2019, pp. 32.

21.

Zhang

Jia

, et al. Improving rib fracture detection accuracy and reading efficiency with deep learning-based detection software: a clinical evaluation. Br J Radiol 2021;94:20200870.

22.

Weikert

Noordtzij

Bremerich

, et al. Assessment of a deep learning algorithm for the detection of Rib fractures on whole-body trauma computed tomography. Korean J Radiol 2020;21:891–899.

23.

Jin

Yang

Kuang

, et al. Deep-learning-assisted detection and segmentation of rib fractures from CT scans: development and validation of FracNet. EBioMedicine 2020;62:103106.