Abstract
Background
The diagnostic accuracy of diffusion-weighted imaging (DWI) to detect prostate cancer is well-established. DWI provides visual as well as quantitative means of detecting tumor, the apparent diffusion coefficient (ADC). Recently higher b-values have been used to improve DWI’s diagnostic performance.
Purpose
To determine the diagnostic performance of high b-value DWI at detecting prostate cancer and whether quantifying ADC improves accuracy.
Material and Methods
A comprehensive literature search of published and unpublished databases was performed. Eligible studies had histopathologically proven prostate cancer, DWI sequences using b-values ≥ 1000 s/mm2, less than ten patients, and data for creating a 2 × 2 table. Study quality was assessed with QUADAS-2 (Quality Assessment of diagnostic Accuracy Studies). Sensitivity and specificity were calculated and tests for statistical heterogeneity and threshold effect performed. Results were plotted on a summary receiver operating characteristic curve (sROC) and the area under the curve (AUC) determined the diagnostic performance of high b-value DWI.
Results
Ten studies met eligibility criteria with 13 subsets of data available for analysis, including 522 patients. Pooled sensitivity and specificity were 0.59 (95% confidence interval [CI], 0.57–0.61) and 0.92 (95% CI, 0.91–0.92), respectively, and the sROC AUC was 0.92. Subgroup analysis showed a statistically significant (P = 0.03) improvement in accuracy when using tumor visual assessment rather than ADC.
Conclusion
High b-value DWI gives good diagnostic performance for prostate cancer detection and visual assessment of tumor diffusion is significantly more accurate than ROI measurements of ADC.
Keywords
Introduction
Prostate cancer is the most commonly diagnosed cancer and the second most common cause of cancer-related death in men (1,2). Magnetic resonance imaging (MRI) is the imaging mainstay of prostate cancer localization, being recommended in men considered for radical treatment following positive trans-rectal ultrasound (TRUS) biopsy, and in high-risk patients with a negative biopsy, active surveillance, response to treatment, and recurrence (3,4).
Multi-parametric MRI of the prostate comprises T1-weighted (T1W) and T2-weighted (T2W) imaging, with additional techniques such as diffusion-weighted imaging (DWI) and dynamic contrast-enhanced MRI (3). DWI has become a routine part of prostate MRI protocols, as it provides good tumor contrast without exogenous agents.
Applying DWI with multiple diffusion weightings, or b-values, allows the apparent diffusion coefficient (ADC) to be estimated: a parameter known to inversely correlate with tumor aggressiveness (5,6). Clinically, both DW images, typically higher b-values, and ADC maps are assessed to detect tumor, which appears bright on diffusion images and dark on ADC maps.
ADC maps are calculated through mono-exponential fits to diffusion signal decays on a voxel-by-voxel basis. Other signal models, have been applied to prostate DWI, including diffusional kurtosis (7), intravoxel incoherent motion (8) the stretched exponential (9), and VERDICT (10). However, the mono-exponential model is the most common, requiring only two b-values for fitting.
Performing mono-exponential fitting with higher maximum b-values, bmax, can improve contrast-to-noise (C/N) in the resulting maps, where C/N is defined as: (Signal(lesion) – Signal (background))/estimated noise. This gives better characterization of ADC differences between normal and cancerous prostate, improving tumor detection at the expense of reduced signal-to-noise (S/N) and increased sensitivity to motion artifacts. At 1.5T, bmax of 500 and 1000 s/mm2 are typically used, but increased S/N at higher field strengths permits the use of higher bmax (11). Guidelines recommend a bmax of 800–1000 s/mm2 (3); but there is no consensus on whether higher b-values (bmax ≥ 1000 s/mm2) should be used routinely. Many studies have compared high and lower b-value (bmax < 1000 s/mm2) acquisitions, but results have been conflicting (12–17).
The diagnostic accuracy of DWI is well-established, with multiple meta-analyses reporting its diagnostic performance (18–20). There is uncertainty about the usefulness of high b-value DWI, particularly which b-value provides ADC maps with the greatest diagnostic performance. The aim of this study is to determine the diagnostic performance of high b-value DWI at detecting prostate cancer and whether quantifying ADC improves accuracy.
Material and Methods
This meta-analysis was reported using the preferred reporting items for systematic reviews and meta-analyses outlined in the PRISMA statement (21). The review was registered prior to commencing on PROSPERO (ref no. CRD42015027644).
Search strategy
A comprehensive systematic literature search was independently performed by two reviewers (KCG, TSy) to identify studies investigating the diagnostic accuracy or performance of high b-value DWI for detecting prostate cancer. A MEDLINE search is presented in Supplementary Table 1. In addition, searches were conducted of EMBASE, and the gray literature/trial registry databases: WHO International Clinical Trials Registry Platform and OpenGrey. Studies were not limited by country of origin, but were limited to those published in English. All searches were from database inception to 1 January 2016.
Eligibility criteria
Retrospective and prospective studies were included if they reported detection of prostate cancer in a pre-treatment population using high b-value DWI of the prostate. Only primary research articles, available as full-text, were accepted; however, review articles were checked for additional primary references. High b-value was defined as bmax ≥ 1000 s/mm2. Histopathological results as a reference standard (biopsy or radical prostatectomy), sufficient data to calculate true positive (TP), false positive (FP), false negative (FN), and true negative (TN) data were required. If multiple b-values were used, including b < 1000 s/mm2, the study was only eligible if data with b ≥ 1000 s/mm2 could be extracted. Studies using high b-value DWI in combination with other diagnostic sequences to detect cancer were excluded.
Study identification
Titles and abstracts from the search results, and the full-text papers for all studies which met or potentially met the eligibility criteria, were independently reviewed by two reviewers (KCG, TSy). Those studies which met the eligibility criteria on full review were included in the final analysis. Disagreements on inclusion suitability were resolved by consensus (KCG, TSy).
Data extraction
Two reviewers (KCG, TSy) independently extracted the data on a pre-defined template, including: publication year, country of origin, sample size, description of study population (age), study design (prospective, retrospective, or unknown), patient enrollment (consecutive or not), inclusion and exclusion criteria, reasons for exclusions from analysis, and number of experts who assessed and interpreted MRI results. Data were recorded on: blinding of MRI measurements to clinical, biochemical, or histopathological results; methods used to determine diagnosis; types of coils; and b-values used. For each study, we recorded the number of TP, FP, TN, and FN findings for high b-value DWI in diagnosing prostate cancer. Disagreements in data extraction findings were resolved through discussion or through adjudication with a third reviewer (TSm).
Quality assessment
Two reviewers (KCG, TSy) independently assessed each included paper’s quality using QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies) (22). Any disagreements were resolved through discussion or through adjudication with a third reviewer (TSm).
Statistical analysis
Study heterogeneity was assessed through examination of the data extract table. This indicated broad study homogeneity, meaning a meta-analysis was appropriate. Statistical heterogeneity was assessed using the chi-squared statistic, Q, and the inconsistency, I2. When P < 0.10 and I2 > 50%, unexplained statistical heterogeneity was evident and diagnostic performance analyses were performed using a random-effects model.
Specificity and sensitivity of each study was calculated using 2 × 2 contingency tables. Pooled sensitivity and specificity and positive and negative likelihood ratios with 95% confidence intervals (CIs) were calculated. Finally, the specificity and sensitivity were used to calculate a summary receiver operating characteristic (sROC) curve and the area under the curve (AUC).
The threshold effect was assessed visually, by determining whether the ROC curve presented with a “shoulder-arm” shape, and qualitatively using the Spearman correlation coefficient of the logit of sensitivity and the logit of (1-specificity), with P < 0.05 indicating the heterogeneity between studies could not be explained by threshold effect. A meta-regression and subgroup analysis was performed to explore other sources of heterogeneity and how they influence diagnostic performance.
All statistical computations were performed using Meta-DiSc (version 1.4, Javier Zamora) and Review Manager (version 5.3. Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2014).
Results
Search results
A summary of the search strategy results is presented in Supplementary Fig. 1. In total, 351 studies were identified from the search results, of which 61 were deemed potentially eligible. After full-text review, ten studies met the final eligibility criteria and were included in the analysis (16,17,23–30).
Characteristics of included studies
Principle characteristics of included studies.
Median.
PSA, prostate specific antigen (ng/mL); Pro, prospective; Retro, retrospective.
Imaging and methodological characteristics of the included studies.
Twenty-nine patients were imaged with b-values of 0, 50, 200, 1500, and 2000; 24 patients were imaged with b-values of 0 and 1000.
Coil A = without endorectal coil, coil B = with endorectal coil.
ADC, apparent diffusion coefficient; Blind, blinded; Bx, biopsy; C, both RP and Bx included; NSA, number of signal averages; RP, radical prostatectomy; RS, reference standard; Spasm, anti-spasmodics; TE, echo time; Th, threshold; TR, repetition time; U, unclear; Y, yes.
Three studies were prospective and seven retrospective. Field strengths of 1.5T (23,25,26,28,29) and 3T (16,17,24,27,30) were each used in five studies. Radical prostatectomy specimens were used as the reference standard in six studies (16,17,25,27,29,30), biopsy specimens in three (23,24,26), and one study used a combination (28). The MRI reader was blinded in eight studies (16,17,23,25–28,30); blinding was not known in two. Anti-spasmodic agents, either glucagon or hyoscine butylbromide, were used in five studies (16,17,24,29,30) and their use unknown in the remainder. Five studies (16,17,24,29,30) used b-values > 1000 s/mm2.
Several methods were used to detect prostate cancer: region-of-interest (ROI)-based ADC quantification was used in six studies (23,24,26–29) and visual assessment of lesions using ADC maps was performed in four studies (16,17,25,30), of which two (17,30) used a scale (such as Likert scale), and the other two used a binary cutoff.
All ten studies used the mono-exponential function only to estimate ADC. In three studies (16,17,27), extracted data were split into subsets. Kim et al. (16) and Koo et al. (17) generated multiple ADC maps: b = (0, 1000), and (0, 2000) s/mm2 for the former, and b = (0, 300), (0, 700), (0, 1000), and (0, 2000) s/mm2 for the latter. Rosenkrantz et al. (27) split results from the peripheral zone and the transitional zone. The other studies generated only one set of ADC maps from their DWI data, performing mono-exponential fitting to all acquired b-values.
Quality assessment
Study quality assessment is presented in Supplementary Table 2. Fig. 1 demonstrates the QUADAS-2 graphical summary of the studies indicating the proportion of high, low, or unclear risk in each domain. A high risk of bias was demonstrated in the patient selection domain, but overall the quality of the studies included was considered “good.”
QUADAS-2 results summarizing the proportion of low, high, and unclear risk of bias and applicability concerns.
Diagnostic performance
Results of the subgroup analysis.
P value = comparison of diagnostic odds ratio of subgroups.
AUC, area under the curve.
The pooled sensitivity and specificity of high b-value DWI MRI in detecting prostate cancer were 0.59 (95% CI, 0.57–0.61; Fig. 2) and 0.92 (95% CI, 0.91–0.92; Fig. 3), respectively. Sensitivity and specificity heterogeneity tests gave Q = 435.05 (P < 0.001), I2 = 97.2% and Q = 89 (P < 0.001), I2 = 86.5% respectively, indicating significant statistical heterogeneity between studies.
Forest plot of sensitivity with pooled sensitivity, Q statistic of the chi-squared, and I-squared results. Forest plot of specificity with pooled specificity, Q-statistic of the chi-squared, and I-squared results.

The pooled positive and negative likelihood ratios for high b-value DWI MRI in detecting prostate cancer were 6.64 (95% CI, 4.9–9.0; Supplementary Fig. 2) and 0.33 (95% CI, 0.2–0.5; Supplementary Fig. 3), respectively. Positive and negative likelihood ratio heterogeneity tests gave Q = 82.50 (P < 0.001), I2 = 85.5% and Q = 517.45 (P < 0.001), I2 = 97.7%, respectively, indicating significant statistical heterogeneity between studies.
Fig. 4 shows the sROC curve of the ten studies, where AUC = 0.92, indicating “good” diagnostic accuracy (31).
The sROC curve for high b-value DWI in detecting prostate cancer.
Meta-regression analysis
The ROC curve did not demonstrate a “shoulder-arm” shape (Supplementary Table 4) and the Spearman Correlation Coefficient between the logit of sensitivity and the logit of (1-specificity) was 0.286 (P = 0.344), confirming that the threshold effect is not responsible for the variation in accuracy between studies.
Subgroup analysis
Subgroup analysis was based on different study characteristics and perceived sources of bias and applicability uncovered in the QUADAS assessment. Studies at 3T with and without an endorectal coil demonstrated the highest pooled sensitivity of 0.76 (95% CI, 0.71–0.80) and 0.74 (95% CI, 0.71–0.79) respectively. When assessing protocols with a bmax > 1000 s/mm2, the pooled specificity and AUC of the sROC were greater: 0.94 (95% CI, 0.93–0.95) and 0.98, respectively. A statistically significant (P < 0.05) improvement was seen using assessment of tumor presence on ADC maps as a visual threshold versus ROI measurements. The diagnostic performance of the subgroup analysis and P values of the above-mentioned factors and others are demonstrated in Table 3.
Discussion
This analysis indicates that high b-value imaging is a good diagnostic tool for detecting prostate cancer. The results of the threshold method subgroup analysis imply that there is a benefit in using higher bmax in a clinical setting. The lesser value of quantitative ADC thresholding as a tool for detecting tumor is in line with PI-RADS version 2 recommendations (standardized reporting standards for prostate MRI) (32). The evidence on which this analysis was made was graded as “good” quality using the QUADAS-2 tool.
There have been multiple meta-analyses investigating the diagnostic accuracy of DWI alone or in combination with other imaging techniques (18–20). The pooled sensitivity, specificity, and overall accuracy of our study were 0.58, 0.92, and 0.92, respectively, similar to Jie et al.’s (19) meta-analysis of DWI alone. This is likely due to overlap of included studies, with nine of the ten included studies featured in their meta-analysis. However, in contrast to Jin et al.’s meta-analysis of all b-values, the sensitivity was lower in our study (0.58 versus 0.77), but the pooled specificity and AUC were higher (0.92 versus 0.84 and 0.92 versus 0.88 respectively) (20). This suggests high b-value imaging may help to rule out significant prostate cancer.
There was significant statistical heterogeneity between the included studies that could not be explained by threshold effect. Subgroup analyses of multiple study parameters were assessed to attempt to explain the heterogeneity. Given the unknown cause of statistical heterogeneity, these findings should be interpreted cautiously.
Improved tumor contrast at high b-values comes at the cost of decreased S/N (11); however, this can be mitigated through the use of 3T field strength. Most of the diagnostically specific high b-value diagnostic accuracy studies use 3T (12,33–35). The subgroup analysis of field strength demonstrated a trend towards improved accuracy with 3T. The sensitivity results of the 3T group alone are similar and the specificity and accuracy are better than those found in a meta-analysis of accuracy of visual assessment of combined T2 and DWI sequences for prostate cancer detection by Wu et al. (21).
On review of the method of prostate cancer detection, about half of the studies qualitatively assessed the ability of blinded readers to visually detect prostate cancer either by answering a binary question regarding cancer presence or using a probability scale (16,17,25,30). The remaining studies used ROI-based ADC calculations to determine prostate cancer presence or absence, or used a scale of ADC values to predict cancer (23,24,26–29). The diagnostic performance of tumor visual assessment on ADC maps was significantly better than quantitative ADC methods, with visual assessment giving similar results to Wu et al.’s combined T2 and DWI meta-analysis (18), indicating potential value for high b-value imaging clinically.
A potential explanation for the relatively poor performance of quantitative ADC methods is four of the six studies included used biopsy as a reference standard whereas all visual assessment studies used radical prostatectomy specimens. TRUS biopsy as a reference standard is limited given its poor octant localization, small sampling volume, and substantial number of tumors missed (36,37). Radical prostatectomy studies performed better than biopsy studies, but the difference was not statistically significant.
ADC estimation is influenced by a number of factors, including, but not limited to: noise, fat, and perfusion signals (8,38,39); possible non-Gaussian diffusion (7); and the known diffusion anisotropy of the prostate (40). Because of these confounds, other diffusion models may ultimately prove more appropriate for identifying prostate cancer (41,42). Along with the study-specific TE, diffusion gradient duration, δ, and diffusion time, Δ, such factors could substantially influence the sensitivity and specificity of DWI for evaluating prostate cancer. None of the studies reviewed reported δ and Δ; few researchers (42,43) have considered these factors when applying prostate DWI.
None of the other subgroup analyses demonstrated a significant difference between the groups. In some subgroups, a statistical difference would have been difficult to demonstrate given the small numbers of studies. For example, only three studies used biopsy as a reference standard, but despite this, this subgroup analysis provided the second strongest statistical source of heterogeneity (P = 0.18). The limitations of biopsy as a reference standard are described above, but limiting MRI assessment to patients who have had a prostatectomy introduces patient selection bias, as radical prostatectomy patients tend to be younger, fitter, and tend to have clinically significant tumor, prompting surgery. Radical prostatectomy allows examination of the entire gland including the anterior gland (which TRUS cannot) and detects multifocality, which is frequent (44,45). Eight of nine studies in the prostatectomy subgroup assessed for multifocal disease or for tumor in multiple segments of the prostate and this subgroup’s results may be more representative of the diagnostic accuracy of high b-value diffusion.
There are limitations to this study. Limitation by language and database may have introduced bias. The use of two larger databases, and grey literature, should encompass most eligible English language studies. Publication bias was not assessed as a Deek’s funnel plot is less accurate in meta-analyses with small numbers of studies (46). Finally, this study was restricted to testing localization of prostate cancer within the gland. This is important in determining the accuracy of high b-value diffusion, but not the only useful outcome. Identifying the presence of capsular breach, seminal vesicle invasion and pelvic lymphadenopathy are important staging and prognostic characteristics not assessed in this study.
In conclusion, these findings should be considered cautiously given the degree of statistical heterogeneity. However, this meta-analysis demonstrated that high b-value diffusion is a valuable diagnostic tool, with a sensitivity of 59%, specificity of 92% and sROC AUC of 0.92. There was better diagnostic performance by visual assessment of high b-value DWI studies compared to ADC quantification.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
