Abstract
Background
Generation of multiplanar reformation (MPR) images has become automatic on most modern computed tomography (CT) scanners, potentially increasing the workload of the reporting radiologists. It is not always clear if this increases diagnostic performance in all clinical tasks.
Purpose
To assess detection performance using only coronal multiplanar reformations (MPR) when triaging patients for lung malignancies with CT compared to images in three orthogonal planes, and to evaluate performance comparison of novice and experienced readers.
Material and Methods
Retrospective study of 63 patients with suspicion of lung cancer, scanned on 64-slice multidetector computed tomography (MDCT) with images reconstructed in three planes. Coronal images were presented to four readers, two novice and two experienced. Readers decided whether the patients were suspicious for malignant disease, and indicated their confidence on a five-point scale. Sensitivity and specificity on per-patient basis was calculated with regards to a reference standard of histological diagnosis, and compared with the original report using McNemar’s test. Receiver operating characteristic (ROC) curves were plotted to compare the performance of the four readers, using the area under the curve (AUC) as figure of merit.
Results
No statistically significant difference of sensitivity and specificity was found for any of the readers when compared to the original reports. ROC analysis yielded AUCs in the range of 0.92–0.93 for all readers with no significant difference. Inter-rater agreement was substantial (kappa = 0.72).
Conclusion
Sensitivity and specificity were comparable to diagnosis using images in three planes. No significant difference was found between experienced and novice readers.
Keywords
Introduction
The use of multidetector computed tomography (MDCT) has meant a dramatic increase in the number of images produced per patient, due to the effective decoupling of scan time and slice thickness with wider detector collimations. The resulting isotropic voxels has also made possible generation of diagnostic images in different image orientations using multi planar reformations (MPR). Not surprisingly, reconstructed image quality has been improved with subsequent generations of scanner technology (1), with wider collimations, smaller slice thickness, and faster rotation speeds. In newer scanners, MPRs are automatically generated and transmitted to PACS with little operator effort. A total of 5–600 images or more per patient is not unusual (2) depending on reconstructed slice thickness. At the Hospital of South West Jutland all CT examinations are routinely reconstructed in three orthogonal planes.
It has been shown that radiologists search efficiency is reduced at the end of a long workday, and an increased number of images could be expected to increase physical as well as mental fatigue (3). Besides placing extra strain on radiologists, the increased number of images might possibly lead to interpretation errors due to satisfaction of search (4) and reduced productivity. In a review of radiological errors (5) CT accounted for a disproportionately large amount, and it was hypothesized that the large number of images and incidental findings (anatomical “noise”) were to blame.
CT plays an important role when triaging for lung cancer, and in Denmark it is performed in a standardized “package” within defined time limits. Lung cancer is one of the most common forms of cancer, and the Hospital of Southwest Jutland performs around 400 examinations per year in this setting. If follow-up scans are included the number of examinations is around 1300. For these “package” patients radiologists are required to make on-the-spot decision as to whether to refer the patients for further diagnostic workup. If these decisions can be made confidently from fewer images, there is a possibility for a reduction in the radiologists work burden.
As the diameter of the patient is usually smallest in the anterior–posterior axis, coverage of a given volume can be achieved with fewest images in the coronal plane.
Choi et al. (6) demonstrated that image quality on reconstructed coronal MPRs of porcine autopsy lung specimens was comparable to true coronal scanning. Thus, one could expect images of a given volume to have the same information content regardless of reconstructed image orientation.
The concept of using coronal MPR for primary interpretation of lung nodules has been tested by Kwan et al. (7), and it was concluded that both sensitivity and reading time suffered negatively. This study was performed using a 4-slice scanner and non-isotropic voxels. Others found good detection rates for solitary lung nodules and lymph nodes, respectively, (8,9) using 16-slice technology. For indications such as pulmonary embolism (10) and using a 64-slice scanner, results were satisfactory. In the lung cancer “package” all potentially malignant lesions, regardless of size and number are grounds for further investigation, which may include follow-up CT or biopsies. To our knowledge, a study investigating lung cancer detection using only coronal MPRs compared to reconstructions in three orthogonal planes has not been published.
Search efficiency of the radiologist is dependent on factors such as experience and training (11). Experienced radiologists benefit from more efficient search strategies and better decision-making (12).
The purpose of this study was thus to examine diagnostic accuracy for detection of malignant lung lesions in a population with suspicion of lung malignancy, using only coronal MPRs from a 64-slice MDCT scanner. This was done in the context of triaging patients for further diagnostic workup. A secondary aim was to test for any differences in detection performance between expert and novice readers.
Material and Methods
This was a retrospective study using data from the PACS (IDS7™, Sectra AB, Linköping, Sweden) system at the Hospital of Southwest Jutland. Permission to process patient data was obtained from the Danish Data Protection Agency (journal no. 2013-41-1791). Patients aged over 18 years, referred for CT of the thorax and upper abdomen on suspicion of lung cancer in the period April to July 2011 were included. These patients were all part of the lung cancer “package”, and had either suspicious findings on previous chest radiographs or clinical suspicion of lung cancer. Patients with a previous record of cancer were excluded, as well as patients where image quality was described as insufficient. Reports in which the decision to proceed with lung cancer diagnostics was missing or unclear were also excluded.
All examinations were performed on a Toshiba Aquilion™ 64-slice CT-scanner (Toshiba Medical Systems Europe B.V.,Zoetermeer, The Netherlands) according to the standard protocol. Image acquisition was done in a single breath hold, at a collimation of 64 × 0.5 mm, a pitch of 0.898, 120 kVp, and using mA-modulation (SURExposure3D™) with a standard deviation noise parameter of 12. Scanning was done from lung apices to, and including, liver and kidneys. In the standard examination images were automatically reconstructed using SUREview™ at 3 mm slice thickness and 3 mm increment in the axial, coronal, and sagittal planes. These images were reconstructed from a volume of 0.5 mm slices with 0.3 mm increment. A medium-smooth reconstruction kernel (FC03) was used for all images. Intravenous contrast (Ultravist™ 370 mgI/mL, Bayer Healthcare AG, Berlin, Germany) was administered, unless contraindicated, with a dosage of 1.2 mL per kg of body weight. Bolus tracking technique was used.
Original reporting was done immediately after scanning by one of five radiologists, each with at least 5 years of experience in CT. The report resulted in a decision as whether to proceed with lung cancer tests at the department of pulmonary medicine. Positive patients were examined with follow-up MDCT, PET/CT, and/or biopsy.
Coronal datasets for all patients were exported in DICOM-format from the PACS system. The program ViewDex (Sahlgrenska University Hospital, Göteborg, Sweden) (13) was then used to set up an observer performance study. The software facilitates scrolling, window/leveling, and pan/zoom in true DICOM format. Furthermore the images were automatically anonymized and randomized by the program, while answers were automatically recorded. Mediastinal (350/40 HU) and lung (2000/–400 HU) window settings were used with the possibility of manual adjustment.
Two residents (R1 and R2) with 1 year of experience and two senior radiologists (R3 and R4) with more than 10 years of experience each in CT were asked to review these images, focusing only on the possibility of malignancy in the thorax and not to report extra-thoracic findings. Readers were told that all patients were part of the lung cancer program, but otherwise blinded to results of the primary examination, referral information, previous radiographs, and the results of the other readers.
Images were viewed on medical grade monitors fulfilling the standards set by the department of medical physics for CT viewing in the Region of Southern Denmark, i.e. minimum 2 Megapixel resolution and calibrated according to the DICOM grayscale standard.
The four readers were asked to answer “yes” or “no” as to whether lung malignancy was suspected, using the same triaging criteria as in the clinical setting.
Their level of confidence in the presence of malignancy was also recorded on a 5-point scale (1, no malignancy; 2, probably no malignancy; 3, malignancy not excluded; 4, probable malignancy; 5, likely malignancy).
Patients were considered disease-positive according to the reference standard, if they had a histologically verified lung malignancy within 1 year of the primary CT examination date. This was recorded from the pathology database system of the Region of Southern Denmark. The largest diameter of the largest nodule in the positive patient group was measured on a workstation (Syngo.Via™, Siemens AG Healthcare, Erlangen, Germany) using electronic calipers.
Data analysis was performed using STATA 12IC (StataCorp LP, College Station TX, USA). For the binary answer, sensitivity and specificity with corresponding 95% confidence interval was calculated for both the primary interpretation, and each of the four readers, with regards to the reference standard. McNemar’s test for paired data with binary outcome was used to test for statistically significant differences.
Student’s t-test was used to test for differences in mean number of images between original and coronal-only datasets.
To assess any differences in observer performance between the four readers, the 5-point confidence score was used to plot empirical receiver operating characteristic (ROC) curves with respect to the reference standard, and the area under the curve (AUC) used as figure of merit. Stata’s ROCCOMP command was used to test for statistical significant differences in AUC.
Inter-rater agreement for all five image interpretations was calculated using Fleiss’ kappa statistic for frequencies. Pairwise comparison between each reader and the reference standard was calculated using Cohen’s kappa.
Results
Seventy-one reports were read and 63 patients (31 women, 32 men; age range, 39–83 years; mean age, 64 years) were included in the study. The total number of images per patient was in the range of 454–720 (mean, 524) when reconstructed in coronal, sagittal, and axial orientations. Coronal images alone were in the range of 64–122 images per patient (mean, 91). This represented a statistically significant difference with a P value < 0.001.
In the original reports, using images in all three planes, decision was taken to refer 35 patients (56%) to further examination for lung cancer. Twenty were positive for malignant disease according to the reference standard. Of these, 10 patients were diagnosed with adenocarcinoma, four with planocellular carcinoma, two with metastases from colorectal cancer, and one each with mesothelioma, lymphoma, sarcoma, and metastases from malignant melanoma. Lesion size was in the range of 3.3–80.6 mm (mean, 8.2 mm).
The histological diagnosis “no sign of malignancy” was recorded for three patients. There was one rheumatic nodule, one rounded atelectasis, and one biopsy positive for sarcoidosis. Nine patients had lesions that were either stationary or diminished on follow-up CT and had no record of biopsies. None of the patients declared disease-negative in the original report had a history of biopsy in the period studied, and all were still alive at the time of writing.
Sensitivity, specificity, and agreement with the reference standard.
Parentheses contain 95% confidence intervals.
The empirical ROC curves with corresponding AUC for all readers can be seen in Fig. 1. No significant difference in AUC could be found (P = 0.95 – null hypothesis: all AUCs are equal).
ROC plot of readers 1–4, with corresponding AUC.
For all readers sensitivity calculated at cut point 3 was similar to the ones reported in Table 1.
Fleiss’ kappa yielded a value of 0.69 (confidence interval, 0.65–0.74) between the four readers and the original report indicating substantial inter rater agreement (14).
Discussion
The results suggest that use of coronal images only for triage of suspected lung cancer patients may be feasible. No significant difference in sensitivity and specificity was found when compared to the original report.
From a clinical perspective it is important to note that there was almost total agreement on the disease-positive cases, with only one malignant lesion being missed by a single reader. As the purpose of this stage of the “lung cancer package” is to exclude patients who do not need further testing this result is clinically significant.
Specificity was rather low for all four readers, but this was also true of the primary reading, and so likely not a consequence of the number of images viewed. Rather it could be the effect demonstrated by Swensson et al. (15) where false positive rates increased when the readers were told to look specifically for nodules. One of the challenges when using MDCT for lung cancer detection is still the indeterminate lung nodule (16) and the low specificity is probably a reflection of this issue. Also the high clinical suspicion index of the patients, leading to very conservative decision strategies might result in an increased number of false-positive findings (15). Judging from the confidence intervals of kappa values when compared to the reference standard, no difference in agreement with the reference standard could be proved, and the overall agreement between the five readers suggests equal robustness of either viewing paradigm.
Both ROC analysis and specificity/sensitivity calculations failed to find any difference in performance between novice and experienced readers. Interestingly the only missed lesion was by one of the experienced radiologists (R4), but the reason for this is unknown. As such there is no basis for suggesting that using coronal-only representation favors either group. Other studies of focused search tasks also showed that performance was independent of expertise although they were not performed on MDCT images (17). In a clinical setting a vast number of potential diagnoses and their presentation must be considered, and in such a task experienced readers would be expected to perform better than inexperienced ones, given their larger knowledge and expertise.
To evaluate any productivity gains, a prospective study recording total reporting time spent, as well as diagnostic performance in localization of lesions is proposed. Such a study could also include measurement of intra-observer agreement, asking radiologists (experienced and novice) to review their own studies using coronal images only after an appropriate time period to minimize memory bias.
If the results of the present study can be confirmed, it is possible that detection may be performed satisfactorily in one plane with usage of other planes reserved for characterization and measurement of findings, where MPR has been shown to improve reader confidence (18,19). If such a strategy is employed at the hospital of South West Jutland, this translates into about 100,000 fewer images viewed per year for lung cancer “package” patients alone when a 50% detection rate is assumed. As follow-up scans are performed using the same acquisition protocol, one could expect the same effect in that patient group, but this remains a subject for further study.
Lin et al. found reductions in reading time of 35–40% when using solely coronal MPR images for detecting urinary stones, compared to axial images alone (20). The ratio between images in the two reading sessions was approximately 3 as opposed to 6 in our study. One cannot assume a simple linear relationship between number of images and reading time, across different anatomical regions but a ballpark estimate would be a 50% time saving in lung cancer detection task as described above.
Our study has some inherent limitations, the most obvious being the relatively small number of patients. All patients had a high pre-test probability of disease which was known to all readers, due to the fact that the patients were referred to the lung cancer “package”. Not all patients received intravenous contrast agents, mostly because of increased risk of contrast induced nephropathy. We chose to include these patients anyway under the assumption that any degradation in diagnostic quality would be equal for both the original and secondary readings. Five non-lung cancer patients were also defined as positive, but this was deemed appropriate as potentially fatal malignancies were detected with consequences for the patients’ further treatment.
The average size of nodules was relatively large, which limits the generalizability of this study to, for example, a screening population. None of the patients in this study were judged disease suspicious based on extra-thoracic findings in the original report. The design of the confidence scoring could be criticized: all readers used the same operating point (“malignancy not excluded”) for deciding to recommend further testing. It has been suggested to use a continuous scale giving a wider scope for possible answers and eliminating verbal clues (21). Whether such an approach would reveal any differences in AUC is speculative. Some studies found no significant difference in this regard (22). Objections could also be raised to the assessment of inter-rater agreement with the original report, treating the original report as a unique reader, when it was in fact five different readers with their own inter-rater variability.
Objections could be raised to the 1-year follow-up time window, which was seen as a practical compromise. However, it is possible that aggressive small-cell lung cancer can appear in between a negative initial CT and any histology and/or follow-up CT. Likewise the timeframe may be too short to exclude malignancy in subsolid nodules.
In conclusion, use of coronal MPRs alone in detection of lung malignancy appears feasible. This approach presented the reader with approximately one-sixth of the number of images, when compared to reconstruction in three orthogonal planes. Novice and experienced readers performed equally well. Further studies could help quantify any possible productivity gains, while including smaller nodule size, a broader search task and intra-observer agreement. At the very least it is hoped that our study could stimulate discussion about how to navigate in a rising sea of imaging information.
Footnotes
Conflict of interest
None declared.
Funding
This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
