Abstract
Background
It has been reported that intravoxel incoherent motion (IVIM) diffusion magnetic resonance imaging (MRI) scan–rescan reproducibility is unsatisfactory.
Purpose
To study IVIM MRI parameter reproducibility for liver parenchyma after the removal of motion-contaminated and/or poorly fitted image data.
Material and Methods
Eighteen healthy volunteers had liver scans twice in the same session to assess scan–rescan repeatability, and again in another session after an average interval of 13 days to assess reproducibility. Diffusion-weighted images were acquired with a 3-T scanner using respiratory-triggered echo-planar sequence and 16 b-values (0–800 s/mm2). Measurement was performed on the right liver with segment-unconstrained least square fitting. Image series with evidential anatomical mismatch, apparent artifacts, and poorly fitted signal intensity vs. b-value curve were excluded. A minimum of three slices was deemed necessary for IVIM parameter estimation.
Results
With a total 54 examinations, six did not satisfy inclusion criteria, leading to a success rate of 89%, and 14 volunteers were finally included for the repeatability/reproducibility study. A total of 3–10 slices per examination (mean = 5.3 slices, median = 5 slices) were utilized for analysis. Using threshold b-value = 80 s/mm2, the coefficient of variation and within-subject coefficient of variation for repeatability were 2.86% and 3.36% for Dslow, 3.81% and 4.24% for perfusion fraction (PF), 18.16% and 24.88% for Dfast; and those for reproducibility were 2.48% and 3.24% for Dslow, 4.91% and 5.38% for PF, and 21.18% and 30.89% for Dfast.
Conclusion
Removal of motion-contaminated and/or poorly fitted image data improves IVIM parameter reproducibility.
Introduction
Since the initial study by Yamada et al. (1), there have been greater interests to explore intravoxel incoherent motion (IVIM) diffusion magnetic resonance imaging (MRI) technique to evaluate diffused liver diseases such as liver fibrosis and non-alcoholic fatty liver disease, to characterize liver tumor, and to evaluate treatment response (2,3). A prerequisite to translating IVIM imaging into clinical applications is accurate measurement of IVIM parameters and acceptable reproducibility. Nevertheless, accurate liver IVIM quantification is challenging, partially due to the limited sampling and low signal-to-noise ratio (SNR) for fast diffusion data acquisition. A major obstacle for clinical application of IVIM technique for abdominal organs is its unsatisfactory scan–rescan reproducibility (2,3). For example, in a short-term reproducibility study, Andreou et al. (4) reported the 95% confidence intervals (CI) of percentage difference between paired measurements of liver parenchyma was (−24.3∼25.1) for PF (f), (−5.12∼8.09) for Dslow (D), and (−31.2∼59.1) for Dfast (D*); and the absolute limit was (0.140∼0.232) for f, (0.951∼1.08) for D, and (35.7∼82.5) for D*.
IVIM diffusion imaging typically involves long data acquisition time with images acquired at a series of b-values. It is usually acquired with respiratory gating; however, even respiratory gating is associated with substantial residual respiration induced motion (2). Respiratory motion can cause inter-b-value motion and intra-b-value motion. Inter-b-value motion causes mismatch of anatomical structures on images of difference b-values, and intra-b-value motion when motion occurs during the data acquisition for the slice causes image artifacts. Diffusion MRI is also influenced by artifacts related to magnet/sequence imperfections, such as B0 inhomogeneity resulting from susceptibility variations and geometric distortions from residual motion probing gradients-induced eddy currents (5). In this study, we introduce a manual “image data cleaning” process, with the aim to mitigate artifacts associated with respiration motion as well as with magnet/sequence imperfections. We hypothesize that if IVIM’s scan–rescan reproducibility will be satisfactory after “image data cleaning,” then with further technical improvement, such as acquisition of more b-values for curve fitting, advanced methods for motion correction, statistical removal of ill fitted pixels, or accelerated data acquisition with single-breath hold, liver IVIM will eventually have clinical diagnostic applicability.
Material and Methods
This study was conducted with the approval of the institutional ethics committee and informed consent was obtained. Eighteen healthy volunteers underwent IVIM diffusion imaging with a 3-T magnet and a 32-channel dStream Torso coil (Ingenia, Philips Healthcare, Best, The Netherlands). The IVIM diffusion imaging was based on a single-shot spin-echo-type echo-planar imaging sequence, with 16 b-values of 0, 3, 10, 25, 30, 40, 45, 50, 80, 200, 300, 400, 500, 600, 700, and 800 s/mm2, NSA of 2 for b = 700 s/mm2 and b = 800 s/mm2, and NSA = 1 for other b-values. Spectral pre-saturation with inversion recovery technique was used for fat suppression. Respiratory triggering was performed using an air-filled pressure sensor fixed on the upper abdomen, resulting in an average TR of 2149 ms. Other parameters included TE = 55 ms, slice thickness = 6 mm, matrix = 100 × 116, field of view (FOV) = 360 × 300 mm, EPI factor = 29, a sensitivity-encoding (SENSE) factor = 4, number of slices = 26. The scan participants were trained so that they maintained gentle regular breathing during the image acquisition. The average IVIM scan duration was 6 min. All volunteers were scanned twice during the same session to assess scan–rescan repeatability (scans 1.1 and scans 1.2) and additionally once again in another session (scan 2) with an interval of 5–21 days (mean = 13 days) to assess scan–rescan reproducibility.
The IVIM signal attenuation was modeled according to Eq. 1:
All curve-fitting algorithms were implemented in a custom program developed on MATLAB (Mathworks, Natick, MA, USA). First, a pixel-wise analysis algorithm was performed in order to exclude pixels which presented a poor fitting for Dslow with a coefficient of determination R2 value < 0.8 (8). Regions of interest (ROIs) were then placed to cover a large portion of right liver parenchyma while avoiding large vessels on Dslow parametric map after the poorly fitted pixel exclusion process. For ROI analysis, the IVIM parameters were calculated based on the mean signal intensity of the whole ROI, which has been shown to offer better estimation than pixel-wise fitting when the SNR of the diffusion-weighted images is low (9,10). As in other reports (11–19), only the right lobe of the liver was measured in the current study (Fig. 1).
One IVIM image series of “good quality.” No evidential motion or artifacts are seen in more than two images. ROI was drawn to cover as much as possible of liver parenchyma of right lobe, while avoiding signal vessels which are noticeable in the central parts of parenchyma and also in the area close to the gallbladder in this slice. The signal intensity and b-value curve have a R2 >0.96 for ROI-based fitting. The signal decay is consistent with b-value changes and no obvious outlier value is noted.
A manual procedure was taken to “clean the image data” for each examination. First, slices which covered only the lowest part of segment V-VI (usually slices below the gallbladder) or the hepatic dome, near the digestive tract, or the diaphragmatic surfaces were discarded. Then, each scan’s image series were graded by a radiologist as “good quality,” “fair quality,” or “insufficient quality.” Motion-induced imaging data degrading was visually assessed between consecutives images at different b-values for each slice, noting the location of the following anatomic structures: kidneys, gall bladder, spleen, hepatic edges, main hepatic vessels (main portal vein, portal veins until second order, main hepatic veins). If no motion or artifact was noted, the slice series was graded “good quality” (Fig. 1). Image series of “insufficient quality” were mainly due to motion leading to liver displacement between images of different b-values (inter-b-value motion), and sometimes apparent artifacts in the hepatic parenchyma which could be due to intra-b-value motion (Fig. 2). Slices presented only slight displacement or inconspicuous artifact were graded “fair quality.” Image series of “good quality” and “fair quality” were included for the second step data cleaning.
An example of excluded image series due to inter-image motion. Left kidney size varies among different images and the posterior branch of the right portal vein can be seen in some images but not in the other images.
Image series which generated a poor IVIM diffusion fitted curve were then excluded. First, slices which presented parameters results with a coefficient of determination R2 value < 0.95 for ROI-wise fitting were excluded (20). Then, the plots of signal intensity vs. b-values were individually evaluated. Slices which demonstrated evidential multiple outliers with MRI signal vs. b-value relation and could not be properly fitted were discarded, as well as those resulted in unreasonable high Dfast value approaching the upper boundary of 200 × 10–3 mm2/s (Fig. 3). The initial part of the curve associated with low b-values tended to be difficult to fit accurately. Very fast signal decay sometimes occurred at b-value ≤ 10 s/mm2 (21,22). This strong signal attenuation, which occurs at b-values < 10 s/mm2 and may reflect the velocity of flowing protons present in relatively large vessels instead of the micro-vessels of IVIM theory (22), can lead to overestimation of Dfast. In addition, for an IVIM image series to be valid, we required that at least three slices be included for final analysis after data-cleaning. The mean of all included slice measurements was then regarded as the value of the examination.
Examples of well-fitted (a), acceptable (b), and unacceptable (c–f) signal intensity and b-value curves. (c–f) MRI signal increases and decreases erratically between b-value = 0 s/mm2 and = 80 s/mm2, and at least three consecutive data points are outliers under or above the fitted curve between b = 0 s/mm2 and b = 80 s/mm2. (c, e) Sharp signal drops at b-value = 0 and = 3 s/mm2 which therefore derive unreasonably high Dfast values.
Statistical analysis was performed using MedCalc Statistical Software (version 17.6, MedCalc Software bvba, Ostend, Belgium). Intra-scan repeatability between scan 1.1 and scan 1.2, and inter-scan reproducibility between scan 1.1 and scan 2 of PF, Dslow, and Dfast were assessed by coefficient of variation (CoV), the within-subject coefficient of variation (wCoV), and Bland–Altman mean difference and 95% limits of agreements (BA-LA). wCoV is defined by Eqs. 2 and 3:
With n being the number of individuals (=14 in this study) and x1 and x2 the duplicate parameter measurements for each individual.
Results
With the total 54 examinations, data-cleaning removed six examinations without satisfying inclusion criteria; leading to a success rate of 89% (Fig. 4). Fourteen volunteers were finally included for measurement reproducibility analysis (five men, nine women; mean age = 25.7 years; age range = 24–27 years). Of the scanned slices, 68.11% were included for the final analysis, with a median of five slices (mean = 5.3 slices, range = 3–10 slices) for each scan. The image series graded of “good quality” presented a higher rate of acceptable fitted curve (81.2%) than slices graded “fair quality” (62.1%).
(a) Image-data cleaning process flow diagram. (b) Number of volunteers included for analysis at each step.
The scan–rescan repeatability and reproducibility for PF, Dslow, and Dfast using threshold b-value = 80 s/mm2 are summarized in Tables 1–3. Bland–Altman plots for each parameter are shown in Fig. 5. The 95% BA-LA, mean of CoV and wCoV for repeatability and reproducibility are shown in Tables 4 and 5. Typically, for threshold b-value = 80 s/mm2, Dslow had a CoV and wCoV of 2.86% and 3.36% between scan 1.1 and scan 1.2, and 2.48% and 3.24% between scan 1.1 and scan 2; PF had a CoV and wCoV of 3.81% and 4.24% between scan 1.1 and scan 1.2, and 4.91% and 5.38% between scan 1.1 and scan 2; Dfast had a CoV and wCoV of 18.16% and 24.88% between scan 1.1 and scan 1.2, and 21.18% and 30.89% between scan 1.1 and scan 2.
Bland–Altman plots of scan–rescan repeatability (scan 1.1 vs. scan 1.2) and reproducibility (scan 1.1 vs. scan 2). PF value (%) in healthy volunteers with b-value = 80 s/mm2 threshold and the number of included slices per examination. CoVslices: Coefficient of variation of PF across the slices included after full data-cleaning. Dslow values (×10–3 mm2/s) in healthy volunteers with b-value = 80 s/mm2 threshold. CoVslices (%): coefficient of variation of Dslow across the slices included after full data-cleaning. Dfast value (included slices [n], CoVslices [%]). CoVslices (in %): coefficient of variation of Dfast across the slices included after full data-cleaning. Scan–rescan repeatability of IVIM parameters. Diff, difference; BA-LA, Bland–Altman 95% limits of agreement. Scan–rescan reproducibility of IVIM parameters. Diff, difference; BA-LA, Bland–Altman 95% limits of agreement.
Discussion
Summary of liver IVIM parameter reproducibility in liver parenchyma.
All statistics are in % and the results shown are for liver parenchyma except those of Andreou et al. When different acquisition (FB vs. RT, etc.) or post-processing (full-fitting, segmented fitting, with or without constraint, etc.) schemes were compared, only the best results are presented in this table. Participant number corresponds to the number of participants included for reproducibility analysis.
HV, heathy volunteers, Pts, patients, CLD, chronic liver disease, CoV, mean of the coefficient of variation; wCoV, within-subject coefficient of variation; BA-LA, Bland–Altman 95% limits of agreement; 95% CI, 95% confidence intervals; RT, free breathing; RT, respiratory triggering (including navigator triggering); ET, electrocardiography triggering; V-by-V, voxel-by-voxel analysis, ROI, ROI analysis; Seg-anal/b = 150, segment fitting method, with b-value threshold of 150 s/mm2; LSF, least square fitting method; Bayesian, Bayesian fitting method.
In addition to the “data cleaning” process taken in this study, a few other steps may have additionally contributed to the good results in this study. The participants were trained to avoid irregular breathing or sudden deep breathing during the examination. Sixteen b-values were used, which is at the upper end of the b-value range compared with published results on IVIM reproducibility (Table 6). We were able to use a median of five slices (average = 5.3 slices) for reproducibility calculation, which is more than the number of slices used in most of the published papers on reproducibility (11–19,21,23–29). The signal measurement was a ROI-based method and the IVIM parameters were calculated based on the mean signal intensity of the whole ROI. The ROI-based approach allows for the assessing of the plots of signal measurements and fitted curves for each slices, while this is not possible for pixel-based method when IVIM parameters are generated on parametric maps.
Dslow for liver parenchyma has been shown to have the best reproducibility, followed by PF, and then Dfast. Interestingly, we found a slightly better reproducibility using threshold b-value = 50 or 80 s/mm2 for Dslow and PF estimation than threshold b-value = 200 s/mm2. On the contrary, a higher b-value threshold (such as 200 s/mm2) can lead to better reproducibility for Dfast (Tables 4 and 5). This may be partially due to more b-values being included for Dslow calculation using a lower b-value threshold. This highlights the importance of acquiring image data with a sufficient b-value number. Wurnig et al. proposed the use of an even lower b-value threshold for liver parenchyma, in the range of 20–40 s/mm2, which could reduce the fitting error (31). However, b-value = 200 s/mm2 remained the most selected threshold in published literature (7,16,19, 23,32,33), since the signal vs. b-value curve can be assumed to be linear when b-value ≥ 200 s/mm2. It should be noted that diffusion compartment is less sensitive to pathologies than perfusion compartment measurement. For example, the study of Luciani et al. (32) reported that Dslow did not differ significantly between healthy and cirrhotic livers, and cirrhotic livers are mainly associated with Dfast reduction. Andreou et al. (4) reported no statistical difference between Dslow of liver parenchyma (=1.1 × 10–3 mm2/s) and liver metastasis (=1 × 10–3 mm2/s). Recent literature review also showed that among nine patient studies, only Dfast, despite being the least stable, consistently demonstrated liver fibrosis is progressively associated with a reduced measurement (2). Therefore, it is critically important to measure the perfusion compartment which can be reflected by Dfast, PF, as well as Dslow computed with low threshold b-value to include perfusion element.
Our study has some limitations. The volunteer population in this study included only young healthy participants. While our results may be applicable to diffused liver diseases such as hepatic fibrosis, how our approach can be applicable to focal liver lesions will require additional studies. Second, we did not ask volunteers to fast before examination, while the hepatic flow may vary depending on the fasted/prandial status (34). The reproducibility of IVIM parameters could therefore may be better when the participants are scanned in fasted status. The data cleaning criteria presented in this study remains subjective, not precisely defined, and were not automatized. An objective assessment method, including machine-based recognition of anatomical landmarks and quantitative estimation of scattering of the MRI signal intensity vs. b-value relationship, are being explored in our laboratory to automatically and consistently assess the data acquisition quality. Another limitation is that we only tested segment-unconstrained analysis of IVIM data. While segment-unconstrained analysis remains till now the most popular approach for IVIM analysis (7), it has been suggested that Bayesian probability may perform better in fitting consistency (33,35). Finally, in this study, six out of 54 scanned could not be used for analysis, leading to a success rate of 89%. A better sequence design allowing over-sampling of the focused liver parenchyma regions can minimize the failure rate, and potentially because of the increased number of “sufficient quality slices” available for averaging, will further increase the measure reproducibility. This may be relevant to further improve Dfast scan–rescan reproducibility.
In conclusion, we demonstrated the proof-of-principle that the scan–rescan reproducibility of IVIM parameters can potentially be good. This understanding is important for further developing the IVIM technique for diagnostic clinical application at an individual patient’s level.
Footnotes
Acknowledgments
The authors thank Miss Yao Tina Li, former research student at the Chinese University of Hong Kong, for programming the image processing tool used in this study; Dr Jean-Pierre Cercueil at Department of Vascular and Interventional Radiology, François-Mitterrand Teaching Hospital, University of Bourgogne/Franche-Comté, Dijon, France, for discussions during the course of data analysis; and Dr Weibo Chen, Philips Healthcare Shanghai, China, for setting-up the IVIM diffusion MRI acquisition protocol in Nanjing.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Dr Olivier Chevallier was supported by a grant provided by the Société Française de Radiologie (SFR) together with the Collège des Enseignants de Radiologie de France (CERF).
