Abstract
OBJECTIVE:
To investigate the application value of a computer-aided diagnosis (CAD) system based on deep learning (DL) of rib fractures for night shifts in radiology department.
METHODS:
Chest computed tomography (CT) images and structured reports were retrospectively selected from the picture archiving and communication system (PACS) for 2,332 blunt chest trauma patients. In all CT imaging examinations, two on-duty radiologists (radiologists I and II) completed reports using three different reading patterns namely, P1 = independent reading during the day shift; P2 = independent reading during the night shift; and P3 = reading with the aid of a CAD system as the concurrent reader during the night shift. The locations and types of rib fractures were documented for each reading. In this study, the reference standard for rib fractures was established by an expert group. Sensitivity and false positives per scan (FPS) were counted and compared among P1, P2, and P3.
RESULTS:
The reference standard verified 6,443 rib fractures in the 2,332 patients. The sensitivity of both radiologists decreased significantly in P2 compared to that in P1 (both p < 0.017). The sensitivities of both radiologists showed no statistical difference between P3 and P1 (both p > 0.017). Radiologist I’s FPS increased significantly in P2 compared to P1 (p < 0.017). The FPS of radiologist I showed no statistically significant difference between P3 and P1 (p > 0.017). The FPS of Radiologist II showed no statistical difference among all three reading patterns (p > 0.05).
CONCLUSIONS:
DL-based CAD systems can be integrated into the workflow of radiology departments during the night shift to improve the diagnostic performance of CT rib fractures.
Introduction
Rib fractures are the most frequent type of chest injury in patients with blunt trauma. Minor fractures have a slightly negative impact on health, but as the number of fractures increases and complications worsen, they can be life-threatening [1]. Therefore, an accurate diagnosis of rib fractures is important for the patient’s prognosis. Multidetector spiral thin-slice computed tomography (CT) has become the most sensitive examination method for rib fracture detection [2]. However, this technique is not without its disadvantages. First, numerous thin-slice CT images and rib fractures require side-by-side and root-by-root comparisons, which are tedious and time-consuming processes [2, 3]. Second, the night shift has certain peculiarities when compared to the day shift, such as greater psychological stress, circadian rhythm disturbances, and increased fatigue [4–6]. Circadian rhythm disorders and fatigue may result in missed and/or misdiagnosis by radiologists working night shifts versus those working day shifts [4–8]. Missed or misdiagnosed rib fractures are likely to lead to medical disputes [9, 10]. Finally, the number of radiologists has not increased proportionally to the dramatic increase in radiological examination requests [11], and radiologists in the majority of large healthcare centers are required to provide official written reports at night [12]. These factors can overwhelm night-shift radiologists, who rely solely on manual reading to diagnose rib fractures.
Therefore, research on diagnostic performance, causes, effects, and strategies for after-hours radiologists has become a hot topic [4–8, 13–15]. The use of computer-aided diagnosis (CAD) systems to diagnose rib fractures has been reported in recent years [16–23] due to the potential for diagnostic performance improvements [24, 25]. For instance, Weikert et al. [16] developed a deep learning (DL) algorithm to identify rib fractures in 511 patients. They found that the sensitivity of the CAD system was 87.4% and the false positives per scan (FPS) were 0.16 on a per-examination basis. They indicated its potential value in clinical applications as a tool to screen for fractures and help radiologists reduce missed diagnoses. Jin et al. [17] trained a model based on DL for detecting and segmenting rib fractures in cohorts in a study that included 900 patients. They found that the sensitivity of the CAD system was 92.9%, the FPS was 5.27, and for segmentation, Dice was 71.5% in the test cohort. They found that human readers with the aid of the CAD system could achieve higher detection sensitivity compared to human-only diagnosis. Meng et al. [18] used a DL model based on DL to detect rib fractures on a test dataset that included 300 chest CT images, and they found that two junior radiologists with the aid of the model could achieve a higher precision rate, recall rate, and F1-score compared to the radiologists alone.
Nevertheless, these prior retrospective application studies on DL-based CAD systems for rib fracture detection were not specifically designed for the night shift and did not include a group for human reading, either independently or with the aid of a CAD system during the night shift. As a result, these studies could not realistically simulate night shift scene particularities and did not accurately reflect the intricacies of the readings taken by night shift radiologists, either independently or with the aid of the CAD system. The current study design focuses on the unique factors presented during night shift reading, and a DL-based CAD system for rib fractures was utilized during the night shift of a radiology department. By analyzing the diagnostic performance of three different reading patterns, we hoped to assess any differences in diagnostic performance made independently by radiologists during the night versus day shift. We also examined the utility of CAD systems for rib fracture diagnosis during night shifts.
Dataset and methods
Dataset and classification criteria
Images and structured reports were selected from the picture archiving and communication system (PACS) for 2,362 patients with blunt chest trauma who underwent chest CT scans at the Jingzhou Hospital Affiliated to Yangtze University from January to December 2021. All reading and structured reports were completed by two on-duty radiologists on the day of the examination using three different reading patterns. The inclusion criteria were as follows: blunt chest trauma patients who had been injured within the last month and had callus formation or healed fractures present on follow-up CT examinations. The exclusion criteria included patients with internal rib fixation (n = 8), those who underwent surgical treatment (n = 9), and those with severe motion or respiratory artifacts present in their images (n = 13). The final analysis included 2,332 patients, 1,405 males and 927 females, with an average age of 51.4±18.2 years ranging from 18 to 83 years. The institutional review board approved the study and waived the need for informed consent. The flowchart of the study is summarized in Fig. 1.

Study flow chart. P1, P2, and P3 represent the reading patterns used during the day, night, and night shifts with CAD, respectively.
In this study, rib fractures were classified into three major types: fresh, healing, and old fractures. These were further classified into complete and incomplete fractures according to the classification function of the CAD system and structure report demands. Fresh fractures were defined as those having a sharp margin, lacking periosteal reaction or callus formation, and imaged approximately three weeks post-trauma [26]. A healing fracture, intermediate between a fresh and old fracture, was defined as one being imaged between the blurring of the fracture margin and the formation of the post-trauma callus [27]. Bone healing in rib fractures typically takes approximately 12 weeks [28]. Fractures that were imaged approximately three months post-trauma, exhibited a mature callus, bone remodeling, and an invisible fracture line [27], and had no change in subsequent scans, were defined as old fractures. In the current study, a small number of patients were found to have old fractures, most of which were caused by previous trauma rather than current trauma, according to their medical histories. Complete rib fractures show opposing outer and inner cortical bone disruptions. Incomplete fractures involve either the outer or inner cortex, as in the buckle rib fracture, which has a smooth outer cortex and kink-like inner cortex fracture [29, 30].
Images were acquired using a GE Optima CT660 scanner (GE Healthcare, Milwaukee, WI, USA). The scanning range was from the entrance of the thorax to the end of the 12th rib, and it was performed according to a standard protocol. The scanning parameters were standardized as follows: tube voltage, 120 kV; tube current, real-time adjustment according to the patient’s body shape; reconstruction layer thickness, 1.25 mm; bone algorithm reconstruction; and matrix size, 512×512.
Computer-aided diagnosis (CAD) system
The CAD system for detecting rib fractures (Care.Ai Release 2021.1, Deep Wise Healthcare, Beijing, China) using a DL algorithm primarily consisted of fracture detection, rib counting, and fracture-type classification models (Fig. 2). The fracture detection model uses a cascade structure consisting of two models: an anomaly screening model and a false-positive elimination model. First, the anomaly screening model uses the CenterNet model [31] to determine whether different spatial locations are anomalous or non-anomalous regions. The false-positive elimination model was used to determine whether the region was a fracture or noise. The rib-counting algorithm was also divided into two stages: rib segmentation and counting. U-Net [32], with a two-way long short-term memory (LSTM) structure, was used in the rib segmentation model. The segmentation of the rib region was completed using U-Net on each layer. The 3D connection relationship was modeled using a two-way LSTM. The two programs were collaboratively trained to improve segmentation accuracy. A 3D distance transformation based on the results of the AI segmentation was performed during the counting process. A local geometric structure is employed to reduce the rib area. Subsequently, the adhesion ribs were separated, and disconnected sections were combined using a dynamic programming approach to optimize the counting cost function. In addition, two fracture-type classification models were used to categorize the rib fractures. First, a model was built to distinguish between fresh, healing, and old fractures. Second, a sub-model was established to determine whether a complete or incomplete fracture was present in the new fracture group. The classic classification model used a 34-layer residual net (ResNet-34) [33] algorithm that was pre-trained on the vast classification dataset known as ImageNet. This procedure is frequently referred to as “transfer learning.” The model’s parameters were fixed after training on the ImageNet dataset.

A simple schematic diagram of algorithmic architecture. CenterNet for rib fracture detection. U-Net and LSTM for rib counting. Model 1: ResNet-34 is used to differentiate between fresh fractures, healing fractures, and old fractures. Model 2: ResNet-34 is used for differentiating a complete fracture from an incomplete fracture.
The DL algorithm was trained, validated, and tested using 10,847 consecutive CTs with an 8:1:1 ratio, which were performed between August 1, 2010, and October 31, 2019, at five hospitals. The model was trained using eight NVIDIA GeForce RTX 2080 Ti graphics processing units. The performance of the internal testing dataset was determined, with a sensitivity for rib fractures of 92.7% at the per-finding level of 1.25 FPS. At the per-finding level, the classification sensitivities of the model for distinguishing new, healing, and old fractures, as well as the sub-model for distinguishing complete and incomplete fractures, were 84.4% and 85.1%, respectively.
To realistically simulate a night shift scene and evaluate the accuracy of reading performed during the night shift, the current retrospective study highlighted that all diagnostic results from the three reading patterns were obtained from PACS structured reports, and all reading and structuring of the reports were performed by two on-duty radiologists (each with 7-8 years of experience in thoracic CT diagnosis) using three different reading patterns on the day the examinations were completed. Radiologists I and II evaluated rib fractures using the following three patterns (P1, P2, and P3): P1, independent reading during the day shift; P2, independent reading during the night shift; and P3, reading with the aid of the CAD system acting as a concurrent reader during the night shift.
The radiologists used the axial view, curved-planar reconstruction (CPR), maximum intensity projection (MIP), multi-planar reconstruction (MPR), and volume rendering (VR) techniques to assess rib fractures in three reading patterns. Neither of the radiologists had knowledge of the CAD diagnostic performance. The locations and types of rib fractures detected by CAD (Fig. 3) were provided to a night shift radiologist (P3). The location and type of the rib fractures were documented in each structured report by a radiologist. The day shift duty time was from 8 a.m. to 5 p.m., and the night shift duty time was from 5 p.m. to 8 a.m.

(A) The CAD system automatically marked the locations of fractures with rectangular boxes in the axial, sagittal, and CPR views and provided the type of each fracture for the radiologist’s reference. (B) Shows a partial enlargement of the location of the fracture.
From January to June 2022, two senior radiologists (each with 15-16 years of experience in thoracic CT diagnosis) independently read the initial and follow-up CT images and established the reference standard, including the locations and types of rib fractures. If the conclusions were inconsistent, a thoracic surgeon was invited to participate in the expert group. This study established the results of the final discussion by an expert group as the reference standard [34].
Statistical analysis
All rib fractures diagnosed during the three reading patterns were compared to the reference standard, and the results were calculated and counted by an independent data specialist who was not involved in the reading. A true-positive fracture was defined as a fracture whose location and type matched the reference standard. A false positive result was defined as a fracture whose location or type did not match the reference standard. A false negative result was defined as a fracture verified by the reference standard but considered normal by a radiologist. Because there were countless normal locations for a rib, the number of true negatives could not be defined because of the infinite number. The sensitivity was equal to the number of true positives divided by the total number of rib fractures, as verified using the reference standard. The number of FPS was equal to the number of false positives divided by the total number of cases [19].
Data were processed using the R software (version 3.6.0, R Foundation for Statistical Computing, Vienna, Austria). A χ2 test was used to compare the sensitivity and FPS detected by radiologists I and II during the three reading patterns. A statistical p-value <0.05 was considered statistically significant. If a significant difference was found among the three reading patterns, a p-value cutoff of 0.017 was used in the χ2 test with Bonferroni correction for two-by-two comparisons.
Results
Number of true fractures in three reading patterns
According to the reference standard, 6,443 true fractures (2,332 patients) were identified. Radiologist I diagnosed 1,365 true fractures (515 patients) at P1, 920 true fractures (301 patients) at P2, and 1,017 true fractures (355 patients) at P3. Radiologist II diagnosed 1,314 true fractures (492 patients) at P1, 921 true fractures (337 patients) at P2, and 906 true fractures (332 patients) at P3 (Table 1).
Number of true fractures or patients diagnosed using the three reading patterns
Number of true fractures or patients diagnosed using the three reading patterns
P1, P2, and P3 represent the reading patterns used during the day, night, and night shifts with CAD, respectively.
For each radiologist, the difference in sensitivity of the three reading patterns was statistically significant (both p < 0.05). The sensitivity of the two radiologists significantly decreased by 6.2–7.9% in P2 compared with P1 (both p < 0.017). There was no statistical difference in the sensitivities of the two radiologists between P3 and P1 (both p > 0.017) (Table 2).
Comparisons of sensitivity among the three reading patterns
Comparisons of sensitivity among the three reading patterns
A two-by-two comparison showed that the sensitivity of both radiologists significantly decreased in P2 compared with P1 (χ2 = 21.336, p < 0.001, χ2 = 17.362, p < 0.001). There was no statistical difference in the sensitivities of the two radiologists between P3 and P1 (χ2 = 0.582, p = 0.446, χ2 = 1.786, p = 0.181). P1, P2, and P3 represent the reading patterns used during the day, night, and night shifts with CAD, respectively.
When comparing the three reading patterns, the difference in sensitivity for fresh fractures was statistically significant for each radiologist (both p < 0.05). The sensitivity for detecting fresh fractures between the two radiologists significantly decreased by 9.1–12.7% in P2 compared to P1 (both p < 0.017); the sensitivity for incomplete fracture detection by radiologist I was lowered by 23.2% (p < 0.017), while the sensitivity for complete fracture detection by radiologist II was reduced by 15.1% (p < 0.017) (Fig. 4). The sensitivity of the two radiologists for detecting fresh fractures was not statistically different between P3 and P1 (both p > 0.017). The detection sensitivity for healing fractures and old fractures per radiologist was not significantly different among the three reading patterns (both p > 0.05) (Table 3).

(A) The left 8th incomplete rib fracture (arrow) verified by the reference standard in CT imaging of a 67-year-old male was a missed diagnosis by radiologist I when independently reading during a night shift. (B) The right 10th complete rib fracture (arrow) in the CT imaging of a 52-year-old female was overlooked by radiologist II when independently reading during a night shift.
Comparisons of sensitivity for each type of fracture among the three reading patterns
A two-by-two comparison showed the sensitivity for detecting fresh fractures between the two radiologists significantly decreased by 9.1–12.7% in P2 when compared with P1 (χ2 = 13.036, p < 0.001, χ2 = 30.665, p < 0.001), for incomplete fractures detection by radiologist I was lowered by 23.2% (χ2 = 13.899, p < 0.001) and for complete fractures detection by radiologist II was reduced by 15.1% (χ2 = 40.735, p < 0.001). The sensitivity for detecting fresh fractures between the two radiologists showed no statistical difference between P3 and P1 (χ2 = 1.283, p = 0.257, χ2 = 1.748, p = 0.186). P1, P2, and P3 represent the reading patterns used during the day, night, and night shifts with CAD, respectively.
The difference in the FPS of the three reading patterns was statistically significant for radiologist I (p < 0.05). The FPS of radiologist I was significantly higher in P2 than in P1 (p < 0.017). The FPS of the radiologist I showed no statistically significant difference between P3 and P1 (p > 0.017). The FPS of radiologist II showed no statistical differences among the three reading patterns (p > 0.05) (Table 4).
Comparisons of FPS among the three reading patterns
Comparisons of FPS among the three reading patterns
A two-by-two comparison revealed that FPS was significantly higher in P2 for radiologist I than in P1 (χ2 = 12.088, p = 0.001). The FPS of radiologist I showed no statistical difference between P3 and P1 (χ2 = 0.125, p = 0.723). P1, P2, and P3 represent the reading patterns used during the day, night, and night shifts with CAD, respectively.
The use of DL-based CAD systems has been extensively studied for lung nodule detection and benign or malignant predictions [35, 36]. It has also been applied to the diagnosis of rib fractures in recent years. Meng et al. [18] used the DL model to detect and classify rib fractures on CT images and found that radiologists performed better when assisted by the DL model than when they worked alone. Kaiume et al. [19] reported that the sensitivity of a CAD-based system on a deep convolutional neural network for rib fractures was 64.5%, which was greater than that of two interns reading images independently. In addition, the literature reporting missed diagnoses of fractures found that they occur more frequently during the night shift than during the day shift. Hallas and Ellingsen [37] retrospectively analyzed the clinical and imaging data of 5,879 trauma patients in the emergency department and found a diurnal variation in the error rate of fracture diagnosis, with a significant peak occurring from 8 p.m. to 2 a.m. Tublin et al. [12] reported that working non-traditional hours affected a radiologist’s professional productivity, health, and social life. However, these prior retrospective application studies on DL-based CAD systems for rib fracture detection were not specifically designed for the night shift and did not include a group for human reading, either independently or with the aid of the CAD system during the night shift. As a result, these application studies do not accurately reflect the diagnostic performance of radiologists during night shifts. In our study, a DL-based CAD system was specifically adapted to the night shift of a radiology department to explore the application value of the rib fracture CAD system.
In terms of sensitivity, the results of our study showed that both radiologists, when independently reading during the night shift, exhibited a lower fracture detection sensitivity of 6.2–7.9% than during the day shift, which is roughly consistent with the report of Patel et al. [14]. This may be related to circadian rhythm disorders and fatigue experienced by radiologists working the night shift [4–8]. In addition, it has been reported in the literature that radiologists, with the aid of a CAD system, can exhibit improved diagnostic performance in CT rib fracture diagnosis. For example, Zhang et al. [20] reported that using a DL-based CAD system as a concurrent reader can improve the sensitivity of rib fracture detection compared with the radiologist’s independent reading. Jin et al. [17] reported that human experts with human-computer collaboration exhibited higher detection sensitivity compared to human- or computer-independent reading. In our study, the CAD system was adapted to the night shift, and the results showed that radiologists reading with the aid of the CAD system during the night shift exhibited increased sensitivity to rib fractures, matching that of those reading independently during the day shift. This suggests that radiologists’ reading with the aid of a CAD system during the night shift can reduce missed diagnoses of rib fractures.
In terms of sensitivity for each type of fracture, the results of our study showed that the sensitivity of fresh fractures was lower by 9.1–12.7% for both radiologists independently reading during the night shift than during the day shift; for incomplete fracture detection by radiologist I, the sensitivity was lowered by 23.2%, while the sensitivity for complete fracture detection by radiologist II was reduced by 15.1%. The difference in sensitivity among the three reading patterns for healing and old fractures was not statistically significant to either radiologist. This suggests that the decrease in sensitivity observed by radiologists independently reading during the night shift is mainly associated with a decrease in sensitivity for identifying fresh fractures, including complete and incomplete fractures. In recent years, there have been reports on the sensitivity of CAD systems in rib fracture classification. Zhou et al. [21] classified rib fractures into three types: fresh, healing, and old fractures, and found that the CNN model was superior for diagnosing fresh and healing fractures, as demonstrated by five radiologists (each with seven–nine years of CT diagnosis experience) and took less time to diagnose. Castro-Zunti et al. [22] reported two types of models: a classic model for acute, old (healed), and normal, and a binary model for acute and other classes. It was found that the radiologist with nine years of experience had higher sensitivity for acute fractures, while the best classical model showed higher sensitivity for old fractures and normal ribs. Yang et al. [23] classified rib fractures into fresh fractures without dislocation, fresh fractures with dislocation, old fractures with callus formation, and distortion or high suspicion of healing after old fractures. It was found that CAD showed a higher detection performance for all fracture types in comparison to radiologist-only detection (except for senior radiologists), and two classification models could distinguish fresh fractures from old fractures (87.63% accuracy) and determine whether fresh fractures were dislocated (95.22% accuracy). Inspired by these reports, our study applied the CAD system to the night shift and classified rib fractures into fresh, healing, and old fractures, and further classified fresh fractures into complete and incomplete fractures, according to the classification function of the CAD system and structured report demands. Our study showed that two radiologists reading with the aid of the CAD system as the concurrent reader during the night shift could independently improve the sensitivity of fresh fractures (including complete and incomplete fractures) to the same level as that of the day shift reading. This suggests that radiologists’ reading with the aid of the CAD system during the night shift can reduce missed diagnoses of fresh fractures.
In terms of the FPS, our study showed that the FPS of radiologist I independently reading during the night shift was higher than that during the day shift, while the FPS of radiologist II showed no difference among the three reading patterns, which was generally consistent with the findings of previous literature [4–8]. Recently, a few studies have been conducted on the FPS of CAD systems for rib fractures. For example, Castro-Zunti et al. [22] reported that the DL model had fewer false-positive diagnoses for rib fractures. Zhang et al. [20] reported that the FPS of one radiologist increased with the aid of CAD, whereas a radiologist without CAD exhibited a decrease. In the current study, the CAD system was applied to the night shift, and the results showed that one radiologist reading with the aid of a CAD system decreased the FPS of rib fractures to the same level as that seen during the day shift when reading independently, with no change observed. This suggests that radiologists reading with the aid of a CAD system during the night shift may reduce misdiagnoses of rib fractures.
The current study has some limitations. First, the diagnostic performance at each period of the night shift could not be analyzed. Second, the study included only two radiologists from a single center. Further multicenter studies with more radiologists could provide a more accurate depiction of diagnostic performance. Third, because the PACS was unable to record the time of the rib fracture reading in the CT, the reading time of the rib fracture could not be analyzed in this retrospective study. A multicenter prospective study with more readers should be conducted in the future.
In conclusion, the diagnostic performance of rib fractures was lower for radiologists who independently read during the night shift than during the day shift. During the night shift, radiologist reading with a CAD system as the concurrent reader can improve sensitivity without increasing FPS and achieve the same diagnostic performance as independent reading performed during the day shift.
Footnotes
Acknowledgments
We would like to thank Editage (www.editage.cn) for the English language editing.
Conflicts of interest
The authors have no potential conflicts of interest to disclose.
