Abstract
Background
The O-RADS scoring has been proposed to standardize the reporting of adnexal lesions using magnetic resonance imaging (MRI).
Purpose
To assess intra- and inter-observer agreement of the O-RADS scoring using non-dynamic MRI and its agreement with pathologic diagnosis, and to provide the pitfalls in the scoring based on discordant ratings.
Material and Methods
Adnexal lesions that were diagnosed using non-dynamic MRI at two centers were scored using O-RADS. Intra- and inter-observer agreements were assessed using kappa statistics. Cross-tabulations were made for intra- and inter-observer ratings and for O-RADS scores and pathological findings.
Results
Intra- and inter-observer agreements were assessed for 404 lesions in 339 patients who were admitted to center 1. Intra-observer agreement was almost perfect (97.8%, kappa = 0.963) and inter-observer agreement was substantial (83.2%, kappa = 0.730). The combined data from center 1 and center 2 included 496 patients; of them, 295 (59.5%) were operated. There was no borderline or malignant pathology for the lesions with O-RADS 1 or 2. Of those with an O-RADS score of 3, 3 (4.1%) lesions were borderline and none were malignant. The O-RADS scoring in discriminating borderline/malignant lesions from benign lesions was outstanding (area under the ROC curve 0.950, 95% CI = 0.923–0.971). Sensitivity, specificity, positive, and negative predictive values of O-RADS 4/5 lesions for borderline/malignant lesions were 96.2%, 87.1%, 72.8%, and 98.4%, respectively.
Conclusion
The O-RADS scoring using non-dynamic MRI is a reproducible method and has good discrimination for borderline/malignant lesions. Potential factors that may lead to discordant ratings are provided here.
Introduction
Ovarian cancer is the eighth most common type of cancer in women and the seventh cause of cancer-related death (1). The incidence of malignant lesions is less than 10% in premenopausal women and 15% in postmenopausal women who undergo surgery for incidental adnexal mass (2–4). Differentiation between benign and malignant adnexal masses and accurate determination of the origin of the mass are of crucial importance in terms of conservative follow-up in benign lesions, personalized treatment in borderline tumors, such as minimally invasive surgery to preserve ovarian functions, and debulking surgery and systemic treatment decisions in malignant lesions (5–7).
Ultrasonography (US) is the first-line imaging method used in the diagnosis of adnexal masses. However, an adnexal mass can be considered indeterminate if it does not have typical features of benign lesions in ultrasonography, and the indetermined group constitutes 5%–25% of all adnexal masses (8). Magnetic resonance imaging (MRI) has many advantages over US, has a higher diagnostic sensitivity and specificity for sonographically indeterminate lesions, and a high negative predictive value for malignancy (9–11). In addition, MRI is superior in detecting the origin of the lesion. Approximately 10% of adnexal lesions identified on US are non-ovarian, and MRI detects the origin of the lesion with an accuracy of 93% (12). Therefore, the European Society of Urogenital Radiology recommends pelvic MRI for lesions thought to be indeterminate on US (13), and the American College of Radiology Ovarian Adnexal Reporting and Data System (O-RADS) MRI committee published a lexicon for risk stratification for adnexal lesions (10,14).
Ovarian lesions are usually assessed with dynamic contrast-enhanced MRI, which has high reproducibility and predictive value for borderline and malign lesions (15). Nevertheless, for various reasons, such as lack of software, non-dynamic examination is usually preferred in routine practice. However, there are insufficient data on the diagnostic value and reproducibility of the O-RADS scoring using non-dynamic MRI. The aim of the present study was to investigate the intra- and inter-observer agreement of the O-RADS scoring using non-dynamic MRI and its agreement with pathologic diagnosis. In addition, we provided the pitfalls in interpreting lesions with the O-RADS scoring based on our discordant ratings.
Material and Methods
The data were obtained retrospectively from the hospital database and the requirement for informed consent was waived. This study was approved by the Institutional Review Board of Bilkent City Hospital (02.11.2022 / E1-22-2995) and was performed according to the Declaration of Helsinki principles.
Study population
The flowchart of the study population and analyses are given in Supplementary Fig. 1. Patients who underwent pelvic MRI at two centers (between March 2019 and July 2022 in center 1 and between June 2021 and March 2022 in center 2) were screened for adnexal mass using the hospital databases. Patients without preoperative MRI scans were excluded from the study. The O-RADS scoring was performed by two radiologists with 10 and 15 years of experience in abdominal radiology. Data from center 1 were scored with the O-RADS system by two observers to assess the inter-observer agreement. The same lesions were rated again by one of the observers 1 month after the first assessment to assess the intra-observer agreement. The data from centers 1 and 2 were combined to assess the diagnostic value of the O-RADS scoring in predicting borderline or malignant lesions, and the agreement between the O-RADS scores and pathological findings in patients who underwent surgical treatment (n = 295). The interpretations were performed blinded to the pathologic findings. All radiological scorings were aimed to show the intrinsic diagnostic value of MRI imaging; therefore, the scorings were also blinded to the clinical and laboratory parameters.
MRI technique
MRI examinations were performed with a Signa Pioneer 3-T MRI scanner (General Electric, Milwaukee, WI, USA). The sequences and other acquisition parameters used for imaging are shown in Supplementary Table 1. All examinations were performed with non-dynamic contrast enhanced MRI. T1-weighted (T1W) images were obtained 30–45 s after contrast injections. MRI examinations that were performed without intravenous administration of contrast material were excluded. All examinations were evaluated visually without creating a time intensity curve (TIC).
Statistical analysis
Categorical variables were expressed as frequency and percentage. Continuous variables were given as mean ± standard deviation (SD). Intra- and inter-observer agreement was evaluated using unweighted kappa statistics, and linear and quadratic using weighted kappa statistics. Because the kappa statistics are affected by prevalence, prevalence and bias-adjusted kappa (PABAK) statistics were also given (16). Kappa values were assessed as follows: 0.21–40 = fair; 0.41–0.60 = moderate; 0.61–0.80 = substantial; and ≥0.81 = almost perfect agreement (16). Analyses were made using Stata 17.0 (StataCorp LP, College Station, TX, USA).
Results
Intra- and inter-observer agreement
Intra- and inter-observer agreement were assessed for 404 lesions in 339 patients (mean age = 39.6 ± 13.8 years) who were admitted to center 1. The agreement was 97.8% and 83.2% for intra- and inter-observer agreement, respectively (Table 1). The kappa and the PABAK values show that intra-observer agreement was almost perfect (0.963 and 0.972, respectively) and inter-observer agreement was substantial (0.730 and 0.789, respectively) (Table 1).
Intra- and inter-observer agreement for O-RADS scoring.
PABAK, prevalence-adjusted bias-adjusted kappa; SE, standard error.
Intra- and inter-observer scorings are given in Table 2. The most common category was the O-RADS score of 2, which was observed in more than 50% of lesions. This was followed by a score of 3 (Table 2). The disagreements for the intra-observer scoring were very low (2.2%) and were among the O-RADS scores of 2, 3, and 4 (Table 2).
Cross-tabulations for intra- and inter-observer agreement.
Values are given as n (%). Columns and rows correspond to the scores given in the first and second assessment for intra-observer agreement, and those given by the first and second observer for inter-observer agreement.
As expected, the disagreement for inter-observer scoring was more common (16.8%) (Table 2) compared to that of intra-observer assessment. Most of the disagreements were among the O-RADS 2 to 3 categories. The most confusing lesions were those with multiseptated hemorrhagic or endometriotic cysts, which were categorized as O-RADS 3 by observer 1 and O-RADS 2 by observer 2 (Fig. 1a, b).

(a) T2-weighted and (b) contrast-enhanced T1-weighted images of a patient with endometriotic cyst scored as O-RADS 3 by both observers.
Table 3 provides possible reasons for the disagreement for each score.
Potential causes of discordant scoring.
Numbers in italics correspond to the total number of patients scored discordantly for each cell by observer 1 and observer 2 irrespective of the individual scoring of the raters.
Cross-tabulation for O-RADS and pathologic findings
The combined data from centers 1 and 2 included 561 lesions in 496 patients (mean age = 41.2 ± 14.7 years). Among these patients, 295 (59.5%) were operated on, and the percentages of the patients who underwent surgery for O-RADS 1–5 were 11.1%, 47.9%, 67.6%, 89.2%, and 92.0%, respectively. The pathology revealed that 217 (73.6%) lesions were benign, 25 (8.5%) were borderline, and 53 (17.9%) were malignant. The frequency of each diagnosis is given in Supplementary Table 2.
The cross-tabulation for the pathologic findings and O-RADS scores are given in Table 4. There was no borderline or malignant pathology for the lesions with O-RADS 1 or 2. Of those with an O-RADS score of 3, only 3 (4.1%) lesions were borderline and none were malignant. These three patients had large cystic masses with multiseptated but no solid components. These results suggest that the negative predictive value for malignant lesions is high for lesions with O-RADS 3 or below.
Cross-tabulation for O-RADS ratings and the pathological results.
Values denote are given as n (%).
Among the O-RADS score of 4 or 5 (n = 103), 28 (27.2%) were benign lesions (Supplementary Table 3). The majority of these lesions were myoma (25%) (Fig. 2a–d), mature cystic teratoma (21%) (Fig. 3a–c), or fibroma/fibrotechoma (21%) (Fig. 4a–c), and sometimes mucinous cystadenoma (7.1%) (Fig. 5a, b). The reasons for scoring these lesions as O-RADS 4 or 5 were contrast-enhanced components of mature cystic teratomas and contrast enhancement of degenerated fibroids, fibroma, fibrotechoma, and myoma. In addition, one patient had salpingitis and abscess, and one patient had ovarian cyst torsion.

(a) T2W and (b) contrast-enhanced T1W images of a patient with a pathologic diagnosis of myoma scored by both observers as O-RADS 5. (c) T2W and (d) contrast-enhanced T1W images of a patient with degenerated myoma scored by both observers as O-RADS 4. T1W, T1-weighted; T2W, T2-weighted.

(a) T2-weighted, b) non-contrast T1W, and (c) contrast-enhanced T1W images of a patient with mature cystic teratoma scored as O-RADS 4 by one of the observers. T1W, T1-weighted.

(a) T2-weighted, b) non-contrast T1W, and (c) contrast-enhanced T1W images of a patient with fibrotechoma scored as O-RADS 5 by both observers. T1W, T1-weighted.

(a) T2-weighted and (b) contrast-enhanced T1-weighted images of a patient with mucinous cystadenoma scored as O-RADS 4 by both observers.
Borderline or malignant lesions were observed in 34 (59.7%) lesions with O-RADS 4 and 41 (89.1%) lesions with O-RADS 5 (Fig. 6a–f). The area under the receiver operating characteristic (ROC) curve (AUC) for O-RADS in discriminating borderline/malignant lesions from benign lesions was 0.950 (95% confidence interval [CI] = 0.923–0.971), suggesting a very good discrimination (Supplementary Fig. 2). Sensitivity, specificity, and positive and negative predictive values (PPV/NPV) of O-RADS 4/5 lesions for borderline/malignant lesions were 96.2%, 87.1%, 72.8%, and 98.4%, respectively.

(a) T2-weighted, b) non-contrast T1W, and (c) contrast-enhanced T1W images of endometrioid carcinoma and mucinous borderline tumor (lower panels) scored as O-RADS 4 by both observers.
Discussion
Ovarian cancers have a high mortality rate among gynecological cancers. Therefore, it is important to detect these tumors at an early stage. Borderline ovarian tumors constitute 10%–20% of ovarian epithelial tumors (17). The preoperative diagnosis of lesions in this group is difficult. Although ultrasonographic assessment is the first-line and easily applicable method for diagnosing adnexal masses, it has several important limitations, such as limited imaging area and intestinal gas artifact. MRI overcomes these limitations and is therefore a superior imaging modality in the assessment of ovarian tumors.
The American College of Radiology Ovarian Adnexal Reporting and Data System (O-RADS) MRI committee published a lexicon for risk stratification for adnexal lesions (10,14), aiming to provide standardization in radiological reporting. In the present study, we assessed the reproducibility of the O-RADS scoring with non-dynamic MRI, and its diagnostic performance in discriminating borderline or malignant lesions. Moreover, based on the inter-observer discordant scores, we provided some potential pitfalls in scoring that may be helpful in the correct decision.
We found that the intra-observer agreement of O-RADS ratings was very good (97.8%, kappa = 0.963) and the inter-observer agreement was substantial (83.2%, kappa = 0.730) but not perfect. This is consistent with previous reports (18) and suggests that the O-RADS scoring using non-dynamic MRI has acceptable reproducibility, with some cautions that may lead to discordant ratings.
The highest inter-observer discordance was observed between categories 3 and 2. Some of the multiseptated hemorrhagic or endometriotic cysts were categorized as O-RADS 3 by observer 1 but O-RADS 2 by observer 2 (Fig. 1a, b). Lesions with several side-by-side cysts were interpreted as multiseptated cysts and evaluated as O-RADS 3 by one of the readers. Detailed evaluation of the lesion in different planes is important in this regard.
The size of premenopausal cystic masses (≤3 cm or >3 cm) is used as a criterion to evaluate the lesion in the O-RADS 1 or 2 category. The discordant results between the O-RADS 1 and 2 ratings were due to small differences in the measurements of the lesion size that were very close to 3 cm.
In one patient, ovarian tissue was considered to be a solid component within the cystic mass and was categorized as O-RADS 4 by observer 1.
Approximately one-quarter of the lesions with an O-RADS score of 4 or 5 had benign pathology (Supplementary Table 3). Most of these lesions were due to contrast enhancement of mature cystic teratomas (Fig. 3a–c) and degenerated fibroids, fibroma, fibrotechoma (Fig. 4a–c), and myoma (Fig. 2a–d). For mature cystic teratomas, it has been recommended that the O-RADS score should be 2 in the presence of Rokitansky nodules (19). However, the score can be overestimated due to the possibility of immature cystic teratoma. In a previous study, most of the misclassification was observed for the O-RADS 4 category (19). In the same study, Rokitansky nodules in mature cystic teratomas were classified as O-RADS 4 or 5 due to intermediate or high-risk TIC. If there is a large contrast-enhancing component with an irregular margin, such lesions might be assigned as O-RADS MRI score of 4 (20).
To have a correct classification for fibrous tumors, it should be kept in mind that the T2 signals of these tumors may not always be hypointense. In a previous study, 3/11 (27%) fibroma and fibrothecoma lesions presented dark T2/dark diffusion-weighted image (DWI) (18).
The myomas scored as O-RADS 4 or 5 were large and degenerated. Such cases probably require surgery due to the possibility of being symptomatic; therefore, the misclassification in these cases may not have a clinical implication.
Less common causes of benign lesions with O-RADS 4 or 5 were salpingitis/abscess or ovarian cyst torsion. In the article by Thomassin-Naggara et al. in which they evaluated misclassified cases (19), it was reported that pelvic inflammatory disease was a frequent cause of a false-positive result and that 11/12 cases in their series were false positives. They concluded that radiologists should be very careful in this regard and consider clinical features. American College of Radiology O-RADS MRI recommendations state that adnexal lesions should not be scored in patients presenting with acute symptoms (19). Due to the retrospective design of our study, we did not consider whether these patients had acute symptoms. We also did not exclude these cases from analysis to better emphasize image features that could lead to erroneous inferences, as Thomassin-Naggara et al. did in their article. Clinical findings of these patients along with the imaging are helpful in differential diagnosis, especially in cases such as salpingitis and ovarian torsion.
Dynamic and visual analysis of MRI in diagnosing ovarian malignancies have been compared in a subgroup of patients in the EURAD study (15). Compared with the visual analysis, dynamic contrast-enhanced MRI had a higher sensitivity (96% vs. 76%), specificity (95% vs. 76%), and overall accuracy (86% vs. 78%). In addition, the authors stated that visual analysis may not be very reliable in distinguishing between O-RADS 3 and 4, as some malignant lesions may show enhancement later than the outer myometrium (21). On the other hand, another study indicated that dynamic contrast-enhanced MRI is not superior to conventional delayed contrast-enhanced T1W imaging and DWI in the differentiation of benign and malignant adnexal mass (22). In many centers, dynamic contrast-enhanced MRI cannot be performed due to lack of time and lack of software for intensity curve analysis; therefore, non-dynamic examination is used more commonly in routine practice.
The present study shows that the overall discriminative potential of O-RADS scoring for borderline or malignant lesions with non-dynamic MRI is very high (AUC = 0.950). Thomassin-Naggara et al. assessed the diagnostic value of O-RADS 4/5 scores in diagnosing malignant lesions using dynamic MRI and found that it has a sensitivity of 93%, specificity of 91%, PPV of 71%, and NPV of 98% (12). In the present study, we assessed the same issue using non-dynamic MRI and found a very similar figure (sensitivity of 96.2%, specificity of 87.1%, PPV of 72.8%, and NPV of 98.4%). A very high NPV obtained in both studies suggests that scores less than O-RADS 4/5 are valuable in ruling out malignant lesions. In addition, the data imply that patients with O-RADS 4/5 lesions are good candidates for surgical treatment. On the other hand, as there is no borderline or malignant lesion in patients with O-RADS 1 or 2 lesions, these patients can be safely followed up. We observed that among the patients with O-RADS 3, only 3 (4.1%) patients had borderline tumors but there was no malignant tumor in this group. All three patients had large multiseptated cystic masses and no solid component. The solid component of borderline cystadenomas may not be clearly visible. The PPV for malignancy in lesions with the O-RADS 3 category has been reported as approximately 5% (12). In another study, lesions with a low-risk TIC have a PPV of 6.7% for malignancy, and most lesions found to be borderline tumors in this category (23). These findings are consistent with our study and suggest that for O-RADS 3 lesions, the patient's age and lesion size may play a role in the surgical decision.
In conclusion, the present study shows that the O-RADS scoring with non-dynamic MRI has acceptable accuracy and reproducibility and has a very good diagnostic yielding. Specifically, the cutoff value of O-RADS 4/5 has a very high NPV for borderline or malignant lesions, suggesting that scores less than O-RADS 4/5 have a high potential in ruling out borderline and malignant lesions. The reasons for discordant ratings that we presented here might be helpful for radiologists with regard to correct scoring.
Supplemental Material
sj-pdf-1-acr-10.1177_02841851241279897 - Supplemental material for Reliability, reproducibility, and potential pitfalls of the O-RADS scoring with non-dynamic MRI
Supplemental material, sj-pdf-1-acr-10.1177_02841851241279897 for Reliability, reproducibility, and potential pitfalls of the O-RADS scoring with non-dynamic MRI by Gulsum Kılıçkap, Betül Akdal Dölek, Serhat Kaya and Numan Ilteriş Çevik in Acta Radiologica
Footnotes
Data availability
Institutional permission for data sharing will be asked upon reasonable request.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
