Abstract
Background
The low subject contrast between cancerous and fibroglandular tissue could obscure breast abnormalities.
Purpose
To investigate radiologists’ performance for detection of breast cancer in low and high mammographic density (MD) when cases are digitally acquired.
Material and Methods
A test set of 60 digital mammography cases, of which 20 were cancerous, were examined by 17 radiologists. Mammograms were categorized as low (≤50%) or high (>50%) MD and rated for suspicion of malignancy using the Royal Australian and New Zealand College of Radiology (RANZCR) classification system. Radiologist demographics including cases read per year, age, subspecialty, and years of reporting were recorded. Radiologist performance was analyzed by the following metrics: sensitivity; specificity; area under the receiver operating characteristic (ROC) curve (AUC), location sensitivity, and jackknife free-response ROC (JAFROC) figure of merit (FOM).
Results
Comparing high to low MD cases, radiologists showed a significantly higher sensitivity (P = 0.015), AUC (P = 0.003), location sensitivity (P = 0.002), and JAFROC FOM (P = 0.001). In high compared to low MD cases, radiologists with <1000 annual reads and radiologists with no mammographic subspecialty had significantly higher AUC, location sensitivity, and JAFROC FOM. Radiologists with ≥1000 annual reads and radiologists with mammography subspecialty demonstrated a significant increase in location sensitivity in high compared to low MD cases.
Conclusion
In this experimental situation, radiologists’ performance was higher when reading cases with high compared to low MD. Experienced radiologists were able to precisely localize lesions in breasts with higher MD. Further studies in unselected screening materials are needed to verify the results.
Keywords
Introduction
Breast cancer is the most common cancer in women worldwide. It accounts for 25.4% of the total number of new cancer cases detected in 2018, causing 15% of all female cancer deaths (1). In Jordan, a Middle Eastern country, the incidence of breast cancer is 37.3% of all reported female cancers, ranking first in all years from 1996 to 2014 (2–4). Approximately 30% of breast cancer cases were in women aged 40–49 years (3,4). According to the Jordan Breast Cancer Program, 70% of breast cancer cases present at advanced stages (III–IV) when prognoses are poor and mortality rates high (5). The gold standard for the detection of breast pathology, including breast cancer, is mammographic imaging (6).
One of the most challenging aspects of reading mammograms is the presence of mammographic density (MD). This term is used to define the portion of fibroglandular tissue that is displayed on a mammogram relative to the remaining parts of the breast, specifically adipose (fat) tissue (7). It has been shown that high MD is a significant independent risk factor of breast cancer (8,9), having been linked to a four- to sixfold increase in the lifetime risk of breast cancer (8,10). A previous study (9) has demonstrated that 28% of breast cancers were in patients with high MD breasts.
Breast carcinomas are difficult to detect in breasts with high MD due to the low subject contrast between cancerous and fibroglandular tissue. Increased MD is also adversely associated with radiologist performance (11). In breasts with high MD, sensitivity dropped to 30%–83% compared to 80%–98% in low MD cases (12–14) and specificity decreased from 93.5%–96.9% in low MD cases to 89.1% in high MD cases (14,15). Evidence suggests that women with high MD demonstrate a fivefold increased risk of missed cancers (13), higher recall rates (12), later stage, and greater tumor size (>15 mm) at the time of detection (16).
Advances in image acquisition, and in particular the wide adaptation of full field digital mammography (FFDM) in many countries, including Jordan, have positively impacted the detection of cancer. A number of studies (12,15,17,18) have found FFDM to be significantly better at detecting breast cancer in young premenopausal and perimenopausal women and women with increased MD compared to film screen mammography. Understanding the impact that MD has on radiologists’ performance in the digital setting is important, especially when post-processing techniques are accessible to be utilized to overcome the inherent limitations of film screen mammography and the masking effect of high MD. These include electronic magnification and the ability to change window width and level. Mousa et al. (19) reported that increased MD was associated with improved observer performance when examining FFDM images, suggesting that this may be linked to changed radiologist behavior when interacting with images containing higher MD. Hence, FFDM may lead to high overall sensitivity of any screening program (20).
To the best of our knowledge, no similar studies have investigated radiologists’ performance using FFDM acquisition in Jordan. A secondary aim of the present study was to determine whether reader experience and mammography subspecialty affect the detection of cancer in cases with high MD.
Material and Methods
Ethical approval was granted from the Jordan University of Science and Technology (Project Number 20170326) written consent was obtained from each radiologist before their participation. Seventeen radiologists participated in the study and Table 1 presents their demographic details.
Demographics of participating radiologists
Image test set
This study used a mammographic test set from the BreastScreen Reader Assessment Strategy (BREAST) program (21). The test set included 60 screening mammographic cases; each consisted of standard bilateral cranio‐caudal (CC) and medio‐lateral oblique (MLO) views (n = 240). Twenty cases with 21 biopsy-proven cancer lesions consisted of a mixture of spiculated masses (n = 9), discrete masses (n = 4), calcifications (n = 2), and non-specific density (n = 6). The remaining 40 cases were malignancy-free cases confirmed after routine follow-up at two years. Cases with cancer lesions were independently interpreted by two accredited radiologists for BreastScreen New South Wales. Each cancer lesion had been missed by one radiologist during original screening session. All cases were de-identified.
The test sets were classified to a specific MD by three expert radiologists with at least eight years of experience in breast imaging. Evaluation of MD classification was determined using the Australian and New Zealand College of Radiology (RANZCR) synoptic guidelines (22) as follows: RANZCR 1 = < 25% glandular; RANZCR 2 = 25–50% glandular; RANZCR 3 = 51–75% glandular; and RANZCR 4 = >75% glandular. The test set included 14 cases in RANZCR 1, 16 cases in RANZCR 2, 23 cases in RANZCR 3, and seven cases in RANZCR 4.
Cases of RANZCR 1 and RANZCR 2 were defined as low MD (n = 30), while cases of RANZCR 3 and RANZCR 4 were defined as high MD (n = 30). The radiologists were blinded to each other’s ratings. The majority rating (two of three readers) was used if discordance in MD classification occurred. These readers did not participate in the study as observers and they were blinded to all clinical and pathologic data. There were 11 lesions in low MD cases (five spiculated mass, one discrete mass, one calcification, and four non-specific density) with a median diameter of 12.3 mm (interquartile range [IQR] = 5.5). In high MD cases, there were 10 lesions (four spiculated mass, three discrete mass, one calcification, and two non-specific density) with a median diameter of 12 mm (IQR = 3.1).
Experimental protocol
A test set environment was prepared to simulate the screen reading in the screening settings. A study by Soh et al. (23) reported a considerable level of agreement between laboratory and screening settings. Readings were performed in a 180-m2 reporting room with light gray and brown matte painted walls to minimize specular reflection. The ambient lighting was adjusted according to the recommended standards for maintaining high radiologists’ performance (24) to 20 lux at the position of the radiologist. Ambient room lighting was measured by a calibrated photometer (Model Konica Minolta CL-200, Ramsey, NJ, USA). All cases were displayed using 8-MP RADI FORCE 850 (EIZO, Ishikawa, Japan) monitors. They were calibrated in compliance with the Digital Imaging and Communications in Medicine (DICOM) standard (25), with a pixel depth of 10 bits, minimum luminance of 1.3 cd/m2, maximum luminance of 500 cd/m2, and contrast ratio of 1450:1.
The participating radiologists were able to freely use standard post-processing techniques including: window width and level adjustment; zooming; and panning with no time limit set. The BREAST software platform allows the reader to identify the location of the finding and provide a grade to each lesion in accordance with RANZCR classification (22) (1 = no significant abnormality, 2 = benign, 3 = equivocal, 4 = suspicious, 5 = malignant). The prevalence of cancerous lesions in the test set was blinded. A demonstration of the software and the available post-processing tools was given to all the radiologists before reading.
Data analysis
Radiologists’ performance was assessed by the following metrics: sensitivity to measure the proportion of correctly defined cancer cases; specificity to measure the proportion of correctly defined malignancy-free cases; and area under the receiver operating characteristic (ROC) curves (AUC). Location sensitivity was measured by identifying the proportion of true cancer lesions that were correctly localized as defined by a 50 pixels (0.825 cm) radius from the center of the lesion (26) and which had been graded 3, 4, or 5. In addition, jackknife free-response receiver operating characteristic (JAFROC) figure of merit (FOM) was also used to examine radiologists’ performance (27).
Comparison of radiologists’ performance between high and low MD cases was analyzed. Radiologists were grouped into four categories: those who read ≥1000 mammograms per year; those who read <1000 mammograms per year; those with breast subspecialty; and those without breast subspecialty. Radiologist performance was compared using a paired t-test. Statistical significance was determined at P ≤ 0.05.
Results
The performance of individual radiologists is shown in Table 2.
The performance of each radiologist.
AUC, area under the receiver operating characteristic curve; FOM, figure of merit; JAFROC, jack-knife free-response receiver operator characteristics.
Overall performance
Radiologists revealed a significant increase in sensitivity (P = 0.015), AUC (P = 0.003), location sensitivity (P = 0.002), and JAFROC FOM (P = 0.001) when reading high MD compared to low MD cases (Table 3).
Impact of mammographic density for all radiologists.
Values are given as mean ± SD.
AUC, area under receiver operating characteristic curve; JAFROC, jack-knife free-response receiver operator characteristics; FOM, figure of merit.
Performance of radiologists by cases read per year
Radiologists who read <1000 mammograms per year demonstrated a significant increase in AUC (P =0.007), location sensitivity (P = 0.026), and JAFROC FOM (P = 0.004) for cases with higher MD. Radiologists who read ≥1000 mammograms per year demonstrated significantly increased location sensitivity (P = 0.045) in high MD (72.2%) compared to low MD (51.4%) cases (Table 4).
Impact of mammographic density on radiologists’ performance with different number of mammograms read per year.
Values are given as mean ± SD.
AUC, the area under receiver operating characteristic curve; JAFROC, jack-knife free-response receiver operator characteristics; FOM, figure of merit.
Performance of radiologists without or with mammography subspecialty
Radiologists with no mammographic subspecialty demonstrated a significant increase in AUC (P = 0.02), location sensitivity (P = 0.036), and JAFROC FOM (P = 0.003) for high MD compared with low MD cases. For radiologists with mammography subspecialty, location sensitivity significantly increased in high MD compared to low MD cases was also noted (P = 0.027) (Table 5).
Impact of mammographic density on radiologists’ performance without or with mammography subspecialty.
Values are given as mean ± SD.
AUC, the area under receiver operating characteristic curve; JAFROC, jack-knife free-response receiver operator characteristics; FOM, figure of merit.
Performance of radiologists by years reading mammograms
Radiologists with ≤5 years of experience demonstrated a significant increase in AUC (P = 0.038), location sensitivity (P = 0.018), and JAFROC FOM (P = 0.015) for high compared with low MD mammograms. Radiologists with >5 years of experience reading mammograms demonstrated no significant differences in their performance between low and high MD cases (Table 6).
Impact of mammographic density on radiologists’ performance according to years reading mammograms.
Values are given as mean ± SD.
AUC, the area under receiver operating characteristic curve; JAFROC, jack-knife free-response receiver operator characteristics; FOM, figure of merit.
Discussion
In Jordan, 30% of breast cancers are diagnosed in young women with high MD (4). The ability to accurately detect cancer in breasts with high MD is vitally important in this population. The aim of the present study was to investigate radiologists’ performance in both low MD and high MD cases to determine the impact of MD on radiologists’ ability to detect breast cancer. Further analysis was undertaken to explore whether radiologists with different rates of annual reads and subspecialty perform differently when interpreting cases with different MD.
Historically, it is well established that increased MD decreases radiologists’ performance (12–15). However, with the advent of FFDM, detection rates of cancer increased and lower rates of recall were noted (28), especially for women with dense breasts (17,29). Digital technology allows for the optimization of each image with post-processing tools, thereby enhancing the potential visualization of lesions, making them more distinguishable from normal tissue (20).
A recent study (19) focusing on radiologists’ ability to detect lesions overlaying the dense tissue lends support to our results. The study by Mousa et al. (19) reported that when lesions overlaid the dense tissue, radiologists’ performance increased in high compared to low MD cases with reference to location sensitivity (P = 0.03) and JAFROC FOM (P = 0.05). Ciatto et al. (18) reported higher false-positive (FP) rates in low MD cases with 109 FP marks compared to 32 FPs for the high MD cases. A study of 23,423 FFDM from the French National Breast Cancer Screening Program found that detection rates of cancer were also higher in breasts with high MD (1.20%) compared to low MD cases (0.59%) (29). The current study validates this finding demonstrating that radiologist performance increases in high compared to low MD cases in reference to sensitivity (P = 0.015), AUC (P = 0.003), location sensitivity (P = 0.002), and JAFROC FOM (P = 0.001).
A potential explanation for these findings is that increased MD is positively associated with a known four- to sixfold increase in the risk of breast cancer (8,9). Therefore, when a high MD case is displayed, a radiologists’ visual attention is heightened to search, possibly in more detail using post-processing, for any possibly obscured lesions within the fibroglandular tissue. Mousa et al. (30) reported that areas of increased density attracted radiologists’ visual attention, resulting in longer dwell time with a significantly greater number of fixations and greater lesion detection. In addition, the radiologists in the present study were free to use the post-processing tools, directly assisting determination of the nature of the lesion and to assist differentiation from the surrounding tissue.
The present study also found that radiologists with <1000 annual reads demonstrated a significant increase in AUC (P = 0.007), location sensitivity (P = 0.026), and JAFROC FOM (P = 0.004) for the higher MD mammograms compared with lower MD mammograms. For radiologists with ≥1000 annual reads, a significant increase was only noted for location sensitivity (P = 0.045) in high (72.2%) compared to low (51.4%) MD cases. Furthermore, for radiologists with a mammography subspecialty, the only significant finding was in location sensitivity (Table 5). Radiologists who are not subspecialized in mammography demonstrated significant increases in AUC (P = 0.02), location sensitivity (P = 0.036), and JAFROC FOM (P = 0.003) for the higher compared to lower MD cases. This means that expert breast radiologists’ performance was not affected significantly by increased MD, experience, or training; these radiologists had developed efficient capability to discern true-positive lesions whether the MD is high or low. Performance by expert breast radiologists may have been influenced by the use of post-processing tools. It has been reported that expert breast radiologists have better understanding compared to less experienced readers on how to use different manipulation tools and their impact on the mammographic image (31).
Our findings are further supported by a study by Weigel et al, (20) investigating the impact of MD on the sensitivity of a population-based digital mammography screening program. In this study, the readers were expert radiologists with at least 5000 reads per year and had mammography subspecialty. They concluded that the sensitivity of the program was confirmed in both low and high MD cases. A significant decrease in sensitivity was noted only where MD was extremely dense (RANZCR 4) and this was in only 7% of the studied cases.
This study also showed that radiologists with ≤5 years of experience demonstrated a significant increase in AUC (P = 0.038), location sensitivity (P = 0.018), and JAFROC FOM (P = 0.015) for high compared with low MD mammograms. However, radiologists with >5 years of experience reading mammograms demonstrated no significant differences in their performance between low and high MD cases. This may be due to the fact that radiologists’ experience is better measured according to the number of mammographic cases read per year (32). Studies have shown that the annual mammographic case load is the key factor that may affect radiologists’ performance (32–34). It has been reported that radiologists with <1000 annual reads have reduced performance despite increased years of reading mammograms (32).
For routine mammography screening, images are displayed according to processing algorithms specifically designed for screening purposes. These algorithms provide the optimized display of different image features (35,36). Typically, this display is not changed and in high-volume screening settings there is minimum usage of other post-processing tools. However, algorithms that enhance contrast, sharpness, and alter the displayed dynamic range to improve low contrast lesion detection in high MD cases are widely available (37). The present study had no restrictions on time taken or use of post-processing tools to the advantage of lesion detection in images with increased MD. An important implication for radiologist practice that could be derived from the current work is that increased reporting time should be given to and the use of post-processing tools encouraged in high MD cases. In addition, more training for less frequent screen readers on how to use and select the appropriate post-processing tools will enhance the benefits of FFDM technology with respect to the detection of cancer.
As a limitation, the present study was conducted using a small test set of 60 cases and only seven cases were classified as RANZCR 4. Therefore, it was not possible to analyze the impact of all four RANZCR densities categories separately due to the substantial time that would be required to interpret additional cases. In the present study, time was an important factor to make it possible for radiologist to participate. In addition, prior cases for comparison were not included which could influence our results as the availability of previous images for comparison is known to increase radiologist performance (38). Another limitation is the relatively low number of each cancer type. Future research should include a greater number of lesions and various cancer types to minimize the bias of any results and to provide additional insights into the effect of lesion conspicuity on detection in high versus low MD cases. Time to report each case and using post-processing tools was not recorded during the present study. Further research is required to investigate the effect of increased MD on reading time and use of post-processing tools.
In conclusion, the present study provides an important insight into the impact of high and low MD on radiologists’ performance in Jordan. The results of this study suggest that the relationships between increased MD and risk of breast cancer positively impact radiologist performance, regardless of experience, which was demonstrated to be superior to detection in low MD images. For expert breast radiologists, this relationship made them more accurate in detecting and localizing lesions. Our results have potential implication for varying the current approaches to high volume screen reading, including increasing the time spent reading high MD images and utilizing post-processing tools. The results of the present study support that in the digital era, detection rates for breast cancer have increased for women with high MD breasts.
Footnotes
Acknowledgments
The authors thank BreastScreen Reader Assessment Strategy (BREAST) for providing a platform. The authors also thank the Jordan Breast Cancer Program and Fujifilm for their support towards the completion of this study. The authors thank Saeda Majed and Salsabeel Jawarneh for assisting in the research.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received the following financial support for the research, authorship, and/or publication of this article: This work was supported by Jordan University of Science and Technology (Grant no. 20170326).
