Abstract
BACKGROUND:
Quality control of magnetic resonance imaging includes image validation, which covers also artefact detection. The daily manual review of magnetic resonance images for possible artefacts can be time-consuming, so automated methods for computer-assisted quality assessment of magnetic resonance imaging need to be developed.
OBJECTIVE:
The aim of this study was to develop automatic detection of Gibbs artefacts in magnetic resonance imaging using a deep learning method called transfer learning, and to demonstrate the potential of this approach for the development of an automatic quality control tool for the detection of such artefacts in magnetic resonance imaging.
METHODS:
The magnetic resonance image dataset of the scanned phantom for quality assurance was created using a turbo spin-echo pulse sequence in the transverse plane. Images were created to include Gibbs artefacts of varying intensities. The images were annotated by two independent reviewers. The annotated dataset was used to develop a method for Gibbs artefact detection using the transfer learning approach. The VGG-16, VGG-19, and ResNet-152 convolutional neural networks were used as pre-trained networks for transfer learning and compared using 5-fold cross-validation.
RESULTS:
All accuracies of the classification models were above 97%, while the AUC values were all above 0.99, confirming the high quality of the constructed models.
CONCLUSION:
We show that transfer learning can be successfully used to detect Gibbs artefacts on magnetic resonance images. The main advantages of transfer learning are that it can be applied on small training datasets, the procedures to build the models are not so complicated, and they do not require much computational power. This shows the potential of transfer learning for the more general task of detecting artefacts in magnetic resonance images of patients, which consequently can improve and speed up the process of quality assessment in medical imaging practice.
Introduction
As magnetic resonance imaging (MRI) is a complex diagnostic imaging modality and its use is increasing due to its availability and decreasing costs [1], routine quality control (QC) is required to maintain a high level of magnetic resonance (MR) tomograph performance. It consists of checking different components of a scanner and validation of MR images, which also includes artefact detection [2]. Daily QC can be time-consuming, that is why automated processes are being developed for shortening time delay and reducing human factor errors [2, 3, 4].
Artefacts are unwanted signals on an MR image, and they do not reflect the original object, pathology or image noise [5]. The presence of different image artefacts can affect the accuracy of image analysis in a way that can be confused with pathology or to cover important information or can give false results in quantitative imaging [6]. Artefacts can be patient-related, signal processing-related or hardware-related [5]. Gibbs artefact, also known as ringing or truncation artefact, is a signal processing-related artefact and it appears as multiple bright or dark lines parallel to edges of intensity change in MR images [5, 7]. It occurs since the MR signals are constructed from the finite number of signals in the Fourier series [8]. It does not appear as often as other artefacts, but it is very common in head and spine imaging and therefore it is important to be detected and eliminated [9].
Some useful methods for correction of Gibbs artefact presented in the literature, which effectively minimize artefact but affect image quality, are: filtering K-space, increasing matrix size and using different reconstructions [5].
In our work, we were not dealing with Gibbs artefact correction, but we were trying to develop methods for automatic detection of Gibbs artefact with deep learning techniques. Deep learning is a machine learning approach that solves classification problems using artificial neural networks. In the case of images, convolutional neural networks (CNNs) are used [10]. CNNs join both feature extraction and classification, and consequently do not require time-consuming manual segmentation of images, but they still provide high accurate results [11, 12]. CNNs are trained using large datasets of labeled images, which is not usually the case in medical imaging. Given a small amount of training data, we modify pre-trained general-purpose CNNs to recognize specific images by adding or changing specific layers in the CNN model [11, 13]. This method is called transfer learning (TL) [14]. The TL concept helps to reduce the training time [11] and data required for training the deep learning models [15]. It was used in our case for the automatic detection of Gibbs artefacts.
TL can be used in the diagnostic part of MRI for automatic detection of various pathologies [11, 16, 17, 18] and in the QC part of MRI for detection of various artefacts [3, 4, 19, 20]. Since Oksuz [3] and Kelly et al. [4] demonstrated the high performance of TL on pre-trained CNNs for automatic detection of motion artefacts in head MRI and Tripathi and Sharma [20] succeeded in improving image noise of MRI using deep CNNs, our motivation was to use the same approach for Gibbs artefact detection. We tested and compared three pre-trained CNN architectures VGG-16, VGG-19 and ResNet-152.
Materials and methods
A flowchart of the process used to create and evaluate classification models for automatic Gibbs artefact detection is shown in Fig. 1.
Flowchart of the process used to create and evaluate classification models for automatic Gibbs artefact detection.
MR images, that were used to simulate Gibbs artefacts, were generated with a phantom on MR scanner Philips Achieva 3.0 T TX with dStream system at Faculty of Medicine, University of Ljubljana in the Center for clinical physiology. MR images were produced by scanning a quality assurance 4522-130-95955 Sam 200 mm Performance phantom (QA phantom) using different acquisition parameters and a 32-channel head coil (Fig. 2). Phantom fluid concentration was 1000 ml demi water
QA phantom.
Slice types with Gibbs artefact (left: homogeneous slice; middle: inhomogeneous slice part 1, right: inhomogeneous slice part 2).
MR scans with different intensity to emphasize Gibbs artefact were generated by changing the matrix size (140
Slices chosen for building models, were annotated by two independent raters in two categories: images with and without Gibbs artefact. Both raters are qualified radiographers, the first with professional experience in MRI modality, the second is a master’s student of radiography. All scanning parameters were removed from the raters. Annotations were collected and stored in a Microsoft Office Excel Sheet and IBM SPSS Statistics 26 was used for the calculation of Cohen’s kappa coefficient for measuring inter-rater reliability. The final dataset of annotated MR images contained three groups of images divided by slice type and it consists of 507 annotated MR images.
MR images in the dataset were additionally preprocessed for deep learning (preprocessing step in Fig. 1). Images were resized to 256
We performed additional image augmentation to obtain different variants of images for TL (image augmentation step in Fig. 1) [21]. Image augmentation was performed by left-right rotation, shearing, zooming and shifting of the original images. In addition, the images were scaled to predefined sizes using nearest-neighbor interpolation to fit the input specifications of the CNNs that were used in TL.
TL (transfer learning step in Fig. 1) was performed with three pre-trained CNNs: VGG-16, VGG-19 and ResNet-152. The VGG-16 and VGG-19 [16] are CNNs with 16 and 19 layers, respectively. The ResNet-152 is a residual network with a depth of 152 layers – 8x deeper than the VGG-19, but still with lower complexity [17]. They are trained on more than a million images from the ImageNet database (
Classification models were evaluated using 5-fold cross-validation. The database was divided into 5 folds randomly, and in each iteration, one fold was used for testing and the rest for training. Because the output of the model is the probability that an input image contains a Gibbs artefact, the output probabilities were used to validate the model in the ROC analysis to calculate the AUCs. In assessing classification accuracy, the marginal probability for classification was set to 0.5. If the estimated output probability was above this threshold, it was decided that the input image contained a Gibbs artefact, otherwise it was classified as an image without artefact. Because of the slightly unbalanced data, we also calculated Matthews correlation coefficients (MCCs) [28], which measure the correlation of true and predicted Gibbs artefacts and produce high scores only when the classifiers correctly predict most of the positive instances and most of the negative instances of Gibbs artefacts. It should be noted that no additional adjustment of the classification threshold was made to increase classification accuracy, because we also validate the models in the ROC analysis with the AUCs, which do not depend on predetermined classification thresholds and are also less sensitive to imbalanced data.
The training and validations of the model were performed by using Keras API in CRAN R [18], running on top of the deep-learning platform Tensorflow [19] on a dedicated computer equipped with NVIDIA GPU GeForce GTX 2080.
The Cohen’s kappa coefficient showed nearly perfect inter-rater reliability with
Automatic detection of Gibbs artefacts was evaluated on the entire dataset with 5-fold cross-validation for each classification model separately. Overall results, presented as accuracies and AUC values of each CNN, are summarized in Table 1.
Evaluation results with 5-fold cross-validation
Evaluation results with 5-fold cross-validation
The accuracy in detecting Gibbs artefacts with the modified VGG-16 was 98.8% for the first group of MR images, 98.2% for the second group, and 98.8% for the third group, with AUC values ranging from 0.995 to 0.999. High MCC values of greater than 0.96 were also obtained for all three groups of classifications. The accuracy of Gibbs artefact detection with the modified VGG-19 was also extremely high-99.4% for the first group, 98.8% for the second group, and 99.4% for the third group, with MCC values around 0.96 and AUC values between 0.994 and 0.999. The accuracy of the modified ResNet-152 was 97.6% for the first group of slices, 98.2% for the second group, and 97.0% for the third group, with AUC values ranging from 0.998 to 0.999. MCCs were slightly lower in this case, ranging from 0.91 to 0.97.
All validation results in our study show the high performance of our models in detecting Gibbs artefacts on the phantom MR images, demonstrating the potential of the TL method for automatic Gibbs artefact detection in general.
Our study focuses on Gibbs artefact detection, while similar approaches have also been performed for motion artefact detection in MRI [3, 4]. Kelly et al. [4] used TL in their study to build a model for automatic motion artefact detection and reported accuracies ranging from 96.8% to 98%. These results also demonstrate the high performance of TL models in artefact detection. The main advantages of using TL are that TL can be used also on small training datasets, the procedures for creating the models are not that complicated, and do not require much computational power [11]. Small differences in the results may be due to the use of different acquisition parameters and scanning conditions. Our image database consisted only of phantom MR images, while Kelly et al. used the actual MR images of patients. It is expected that this would degrade the classification results, but since they used a higher number of MR images for training (
In our study, we used the pre-trained VGG-16 as the baseline CNN because it achieved a top-5 test accuracy of 92.7% in the ImageNet database and won first and second place in the ILSVRC 2014 (ImageNet Large Scale Visual Recognition Challenge). We compared it with VGG-19 and ResNet-152. The VGG-19 has 3 more layers than the VGG-16, while ResNet-152 is the residual network with a depth of 152 layers – 8x deeper than the VGG-19, but still with lower complexity. The highest overall accuracy was achieved with the VGG-19, the slightly lower with the VGG-16, and the lowest with the ResNet-152, all of which were high – over 97%. The slightly lower accuracy of the ResNet-152 is due to the poorer convergence of this network during TL training. It can be concluded that there are no significant differences in accuracy, suggesting that different pre-trained CNNs in TL do not affect the detection of Gibbs artefacts. The same was shown in automatic motion artefact detection in head MRI, where Oksuz [3] used the pre-trained CNN Densenet-201, while Kelly et al. [4] used Inception V3 for TL and they both achieved high accuracy.
While we focused on detecting only one artefact in our study, Samani et al. [20] developed a QC tool called QC-Automator to detect various artefacts in diffusion MRI (dMRI): motion, multiband interleaving, ghosting, susceptibility, herringbone, and chemical shifts. Diffusion MRI data from multiple scanners were used for TL on four different pre-trained CNN architectures. The performance of VGGNet, ResNet, Inception, and Xception was evaluated in a 5-fold cross-validation. The results of the study show that VGGNet outperforms the other architectures, achieving 98% accuracy for all artefacts. The same validation technique was used in our classification, where we obtained the same results with the highest accuracy of CNN VGG-19.
Fantini et al. [21] trained their detection models in three MRI planes (sagittal, axial and coronal), while we used only MR images in the transverse plane, In the presented study, a comparison of four different CNN architectures was performed for each plane. The Xception, Inception V3, ResNet-50 and Inception-ResNet CNNs were compared, showing no significant differences in detection accuracy. The average detection accuracy for all planes and all CNNs was 88.27% where the best detection accuracies were achieved in the axial planes.
Note also that we generated MR images that contained different intensities of Gibbs artefacts by scanning the phantom for QA with different parameters. This was done intentionally to increase the visibility of Gibbs artefacts for model training. By using different parameters during scanning, we also gained insight into how different parameters affect the occurrence of Gibbs artefacts. We can see that the Gibbs artefact can be minimized by changing the matrix size, voxel size, and the number of averages. If we increase the matrix size, we can effectively reduce the Gibbs artefact [5, 9]. When the slice thickness is not changed, the voxel size refers to the size of a matrix, which means that we can reduce the Gibbs artefact by decreasing the voxel size or by using a higher number of averages. We have also shown that the phase encoding direction has no effect on the occurrence of Gibbs artefacts.
The limitation of our study is that we only used MR images scanned under controlled conditions with the QA phantom, but this was our intention. We wanted to investigate whether such a deep learning approach could be implemented for the task of detecting Gibbs artefacts in MR images, and then use this approach for detecting artefacts in MR images of patients in general.
Conclusions
In this study, we presented a way to use TL for Gibbs artefact detection in MR images. To do so, we require only a relatively small set of annotated images and a few modifications to a pre-trained CNN used for general image classification. The classification results show the excellent performance of the different tested CNNs in TL in detecting Gibbs artefacts in MR images of the QA phantom. This demonstrates the potential of TL for the more general task of detecting artefacts in MR images of patients. Using these types of modeling techniques for artefact detection could help us avoid the time-consuming manual review of images and would allow us to incorporate such techniques into the regular daily quality control of MR scanners.
Footnotes
Conflict of interest
None to report.
Funding
This research received no specific grants from public, commercial, or nonprofit funding agencies.
