Automatic detection of Gibbs artefact in MR images with transfer learning approach

Abstract

BACKGROUND:

Quality control of magnetic resonance imaging includes image validation, which covers also artefact detection. The daily manual review of magnetic resonance images for possible artefacts can be time-consuming, so automated methods for computer-assisted quality assessment of magnetic resonance imaging need to be developed.

OBJECTIVE:

The aim of this study was to develop automatic detection of Gibbs artefacts in magnetic resonance imaging using a deep learning method called transfer learning, and to demonstrate the potential of this approach for the development of an automatic quality control tool for the detection of such artefacts in magnetic resonance imaging.

METHODS:

The magnetic resonance image dataset of the scanned phantom for quality assurance was created using a turbo spin-echo pulse sequence in the transverse plane. Images were created to include Gibbs artefacts of varying intensities. The images were annotated by two independent reviewers. The annotated dataset was used to develop a method for Gibbs artefact detection using the transfer learning approach. The VGG-16, VGG-19, and ResNet-152 convolutional neural networks were used as pre-trained networks for transfer learning and compared using 5-fold cross-validation.

RESULTS:

All accuracies of the classification models were above 97%, while the AUC values were all above 0.99, confirming the high quality of the constructed models.

CONCLUSION:

We show that transfer learning can be successfully used to detect Gibbs artefacts on magnetic resonance images. The main advantages of transfer learning are that it can be applied on small training datasets, the procedures to build the models are not so complicated, and they do not require much computational power. This shows the potential of transfer learning for the more general task of detecting artefacts in magnetic resonance images of patients, which consequently can improve and speed up the process of quality assessment in medical imaging practice.

Keywords

Gibbs artefact transfer learning automatic detection image quality control

1. Introduction

As magnetic resonance imaging (MRI) is a complex diagnostic imaging modality and its use is increasing due to its availability and decreasing costs [1], routine quality control (QC) is required to maintain a high level of magnetic resonance (MR) tomograph performance. It consists of checking different components of a scanner and validation of MR images, which also includes artefact detection [2]. Daily QC can be time-consuming, that is why automated processes are being developed for shortening time delay and reducing human factor errors [2, 3, 4].

Artefacts are unwanted signals on an MR image, and they do not reflect the original object, pathology or image noise [5]. The presence of different image artefacts can affect the accuracy of image analysis in a way that can be confused with pathology or to cover important information or can give false results in quantitative imaging [6]. Artefacts can be patient-related, signal processing-related or hardware-related [5]. Gibbs artefact, also known as ringing or truncation artefact, is a signal processing-related artefact and it appears as multiple bright or dark lines parallel to edges of intensity change in MR images [5, 7]. It occurs since the MR signals are constructed from the finite number of signals in the Fourier series [8]. It does not appear as often as other artefacts, but it is very common in head and spine imaging and therefore it is important to be detected and eliminated [9].

Some useful methods for correction of Gibbs artefact presented in the literature, which effectively minimize artefact but affect image quality, are: filtering K-space, increasing matrix size and using different reconstructions [5].

In our work, we were not dealing with Gibbs artefact correction, but we were trying to develop methods for automatic detection of Gibbs artefact with deep learning techniques. Deep learning is a machine learning approach that solves classification problems using artificial neural networks. In the case of images, convolutional neural networks (CNNs) are used [10]. CNNs join both feature extraction and classification, and consequently do not require time-consuming manual segmentation of images, but they still provide high accurate results [11, 12]. CNNs are trained using large datasets of labeled images, which is not usually the case in medical imaging. Given a small amount of training data, we modify pre-trained general-purpose CNNs to recognize specific images by adding or changing specific layers in the CNN model [11, 13]. This method is called transfer learning (TL) [14]. The TL concept helps to reduce the training time [11] and data required for training the deep learning models [15]. It was used in our case for the automatic detection of Gibbs artefacts.

TL can be used in the diagnostic part of MRI for automatic detection of various pathologies [11, 16, 17, 18] and in the QC part of MRI for detection of various artefacts [3, 4, 19, 20]. Since Oksuz [3] and Kelly et al. [4] demonstrated the high performance of TL on pre-trained CNNs for automatic detection of motion artefacts in head MRI and Tripathi and Sharma [20] succeeded in improving image noise of MRI using deep CNNs, our motivation was to use the same approach for Gibbs artefact detection. We tested and compared three pre-trained CNN architectures VGG-16, VGG-19 and ResNet-152.

2. Materials and methods

A flowchart of the process used to create and evaluate classification models for automatic Gibbs artefact detection is shown in Fig. 1.

Figure 1.

Flowchart of the process used to create and evaluate classification models for automatic Gibbs artefact detection.

MR images, that were used to simulate Gibbs artefacts, were generated with a phantom on MR scanner Philips Achieva 3.0 T TX with dStream system at Faculty of Medicine, University of Ljubljana in the Center for clinical physiology. MR images were produced by scanning a quality assurance 4522-130-95955 Sam 200 mm Performance phantom (QA phantom) using different acquisition parameters and a 32-channel head coil (Fig. 2). Phantom fluid concentration was 1000 ml demi water $+$ 770 mg CuSO4.5H2O, $+$ 1 ml arquad (1% solution) and $+$ 0.15 ml H2SO4-0.1N solution. Turbo spin-echo (TSE) in the transversal plane was used. Fixed scanning parameters were: TR 4933 ms, TE 100 ms, slice thickness 3 mm, FOV 210 cm ${}^{2}$ , flip angle 90 ${}^{\circ}$ and 36 slices.

Figure 2.

QA phantom.

Figure 3.

Slice types with Gibbs artefact (left: homogeneous slice; middle: inhomogeneous slice part 1, right: inhomogeneous slice part 2).

MR scans with different intensity to emphasize Gibbs artefact were generated by changing the matrix size (140 $\times$ 135; 256 $\times$ 255; 364 $\times$ 285), the voxel size (0.58 $\times$ 0.74; 0.82 $\times$ 0.82; 1.5 $\times$ 1.56), number of averages (2 and 3) and phase encode direction (RL and AP). This resulted in eighteen different pulse sequences. Three slice types were selected from each MR scan: one showing the homogeneous part of the phantom and two showing inhomogeneous parts of the phantom at different positions (Fig. 3).

Slices chosen for building models, were annotated by two independent raters in two categories: images with and without Gibbs artefact. Both raters are qualified radiographers, the first with professional experience in MRI modality, the second is a master’s student of radiography. All scanning parameters were removed from the raters. Annotations were collected and stored in a Microsoft Office Excel Sheet and IBM SPSS Statistics 26 was used for the calculation of Cohen’s kappa coefficient for measuring inter-rater reliability. The final dataset of annotated MR images contained three groups of images divided by slice type and it consists of 507 annotated MR images.

MR images in the dataset were additionally preprocessed for deep learning (preprocessing step in Fig. 1). Images were resized to 256 $\times$ 256 and converted to PNG format. In addition, they were also filtered with a sharpening filter with a diffusion method to highlight the presence of Gibbs artefact. The preprocessing was done in R by using the image processing package based on CImg Library (http://cimg.eu/).

We performed additional image augmentation to obtain different variants of images for TL (image augmentation step in Fig. 1) [21]. Image augmentation was performed by left-right rotation, shearing, zooming and shifting of the original images. In addition, the images were scaled to predefined sizes using nearest-neighbor interpolation to fit the input specifications of the CNNs that were used in TL.

TL (transfer learning step in Fig. 1) was performed with three pre-trained CNNs: VGG-16, VGG-19 and ResNet-152. The VGG-16 and VGG-19 [16] are CNNs with 16 and 19 layers, respectively. The ResNet-152 is a residual network with a depth of 152 layers – 8x deeper than the VGG-19, but still with lower complexity [17]. They are trained on more than a million images from the ImageNet database (http://www.image-net.org) and they can classify images into 1000 object categories. Since we had in our classification tasks just two classes – images with or without Gibbs artefact – we needed to modify the last layers of the pre-trained network to recognize just two classes. All other layers were not changed and stayed fixed during training of the CNNs. Training of the modified CNNs was done by following a deep learning approach with the loss function of binary cross-entropy between predicted and true labels, with the optimizer that implements the RMSprop algorithm, and with the accuracy as the metric function for the final model performance evaluation [11]. The training of the models was done in 20 epochs of 15 steps.

Classification models were evaluated using 5-fold cross-validation. The database was divided into 5 folds randomly, and in each iteration, one fold was used for testing and the rest for training. Because the output of the model is the probability that an input image contains a Gibbs artefact, the output probabilities were used to validate the model in the ROC analysis to calculate the AUCs. In assessing classification accuracy, the marginal probability for classification was set to 0.5. If the estimated output probability was above this threshold, it was decided that the input image contained a Gibbs artefact, otherwise it was classified as an image without artefact. Because of the slightly unbalanced data, we also calculated Matthews correlation coefficients (MCCs) [28], which measure the correlation of true and predicted Gibbs artefacts and produce high scores only when the classifiers correctly predict most of the positive instances and most of the negative instances of Gibbs artefacts. It should be noted that no additional adjustment of the classification threshold was made to increase classification accuracy, because we also validate the models in the ROC analysis with the AUCs, which do not depend on predetermined classification thresholds and are also less sensitive to imbalanced data.

The training and validations of the model were performed by using Keras API in CRAN R [18], running on top of the deep-learning platform Tensorflow [19] on a dedicated computer equipped with NVIDIA GPU GeForce GTX 2080.

3. Results

The Cohen’s kappa coefficient showed nearly perfect inter-rater reliability with $\kappa=$ 0.965. The two independent raters disagreed in 6 cases. In these cases, the images were examined by an additional examiner, also a qualified radiographer with professional experience in MRI, who ruled on the presence or absence of Gibbs artefact.

Automatic detection of Gibbs artefacts was evaluated on the entire dataset with 5-fold cross-validation for each classification model separately. Overall results, presented as accuracies and AUC values of each CNN, are summarized in Table 1.

Table 1
Evaluation results with 5-fold cross-validation

	VGG-16			VGG-19			RESNET-152
	ACC	MCC	AUC	ACC	MCC	AUC	ACC	MCC	AUC
Homogeneous slices	0.9882	0.9661	0.9955	0.9941	0.9661	0.9948	0.9763	0.9150	0.9992
Inhomogeneous slices part 1	0.9822	0.9647	0.999	0.9882	0.9567	0.999	0.9822	0.9339	0.9981
Inhomogeneous slices part 2	0.9882	0.9733	0.9957	0.9941	0.9672	0.9957	0.9704	0.9661	0.9986

The accuracy in detecting Gibbs artefacts with the modified VGG-16 was 98.8% for the first group of MR images, 98.2% for the second group, and 98.8% for the third group, with AUC values ranging from 0.995 to 0.999. High MCC values of greater than 0.96 were also obtained for all three groups of classifications. The accuracy of Gibbs artefact detection with the modified VGG-19 was also extremely high-99.4% for the first group, 98.8% for the second group, and 99.4% for the third group, with MCC values around 0.96 and AUC values between 0.994 and 0.999. The accuracy of the modified ResNet-152 was 97.6% for the first group of slices, 98.2% for the second group, and 97.0% for the third group, with AUC values ranging from 0.998 to 0.999. MCCs were slightly lower in this case, ranging from 0.91 to 0.97.

4. Discussion

All validation results in our study show the high performance of our models in detecting Gibbs artefacts on the phantom MR images, demonstrating the potential of the TL method for automatic Gibbs artefact detection in general.

Our study focuses on Gibbs artefact detection, while similar approaches have also been performed for motion artefact detection in MRI [3, 4]. Kelly et al. [4] used TL in their study to build a model for automatic motion artefact detection and reported accuracies ranging from 96.8% to 98%. These results also demonstrate the high performance of TL models in artefact detection. The main advantages of using TL are that TL can be used also on small training datasets, the procedures for creating the models are not that complicated, and do not require much computational power [11]. Small differences in the results may be due to the use of different acquisition parameters and scanning conditions. Our image database consisted only of phantom MR images, while Kelly et al. used the actual MR images of patients. It is expected that this would degrade the classification results, but since they used a higher number of MR images for training ( $N=$ 207249), this probably resulted in high accuracy of their deep learning models [4].

In our study, we used the pre-trained VGG-16 as the baseline CNN because it achieved a top-5 test accuracy of 92.7% in the ImageNet database and won first and second place in the ILSVRC 2014 (ImageNet Large Scale Visual Recognition Challenge). We compared it with VGG-19 and ResNet-152. The VGG-19 has 3 more layers than the VGG-16, while ResNet-152 is the residual network with a depth of 152 layers – 8x deeper than the VGG-19, but still with lower complexity. The highest overall accuracy was achieved with the VGG-19, the slightly lower with the VGG-16, and the lowest with the ResNet-152, all of which were high – over 97%. The slightly lower accuracy of the ResNet-152 is due to the poorer convergence of this network during TL training. It can be concluded that there are no significant differences in accuracy, suggesting that different pre-trained CNNs in TL do not affect the detection of Gibbs artefacts. The same was shown in automatic motion artefact detection in head MRI, where Oksuz [3] used the pre-trained CNN Densenet-201, while Kelly et al. [4] used Inception V3 for TL and they both achieved high accuracy.

While we focused on detecting only one artefact in our study, Samani et al. [20] developed a QC tool called QC-Automator to detect various artefacts in diffusion MRI (dMRI): motion, multiband interleaving, ghosting, susceptibility, herringbone, and chemical shifts. Diffusion MRI data from multiple scanners were used for TL on four different pre-trained CNN architectures. The performance of VGGNet, ResNet, Inception, and Xception was evaluated in a 5-fold cross-validation. The results of the study show that VGGNet outperforms the other architectures, achieving 98% accuracy for all artefacts. The same validation technique was used in our classification, where we obtained the same results with the highest accuracy of CNN VGG-19.

Fantini et al. [21] trained their detection models in three MRI planes (sagittal, axial and coronal), while we used only MR images in the transverse plane, In the presented study, a comparison of four different CNN architectures was performed for each plane. The Xception, Inception V3, ResNet-50 and Inception-ResNet CNNs were compared, showing no significant differences in detection accuracy. The average detection accuracy for all planes and all CNNs was 88.27% where the best detection accuracies were achieved in the axial planes.

Note also that we generated MR images that contained different intensities of Gibbs artefacts by scanning the phantom for QA with different parameters. This was done intentionally to increase the visibility of Gibbs artefacts for model training. By using different parameters during scanning, we also gained insight into how different parameters affect the occurrence of Gibbs artefacts. We can see that the Gibbs artefact can be minimized by changing the matrix size, voxel size, and the number of averages. If we increase the matrix size, we can effectively reduce the Gibbs artefact [5, 9]. When the slice thickness is not changed, the voxel size refers to the size of a matrix, which means that we can reduce the Gibbs artefact by decreasing the voxel size or by using a higher number of averages. We have also shown that the phase encoding direction has no effect on the occurrence of Gibbs artefacts.

The limitation of our study is that we only used MR images scanned under controlled conditions with the QA phantom, but this was our intention. We wanted to investigate whether such a deep learning approach could be implemented for the task of detecting Gibbs artefacts in MR images, and then use this approach for detecting artefacts in MR images of patients in general.

5. Conclusions

In this study, we presented a way to use TL for Gibbs artefact detection in MR images. To do so, we require only a relatively small set of annotated images and a few modifications to a pre-trained CNN used for general image classification. The classification results show the excellent performance of the different tested CNNs in TL in detecting Gibbs artefacts in MR images of the QA phantom. This demonstrates the potential of TL for the more general task of detecting artefacts in MR images of patients. Using these types of modeling techniques for artefact detection could help us avoid the time-consuming manual review of images and would allow us to incorporate such techniques into the regular daily quality control of MR scanners.

Footnotes

Conflict of interest

None to report.

Funding

This research received no specific grants from public, commercial, or nonprofit funding agencies.

References

Grover

Tognarelli

Crossey

Cox

Taylor-Robinson

McPhail

. Magnetic resonance imaging: Principles and techniques: Lessons for clinicians. J Clin Exp Hepatol 2015; 5(3): 246-55. doi: 10.1016/j.jceh.2015.08.001.

Sun

Barnes

Dowling

Menk

Stanwell

Greer

. An open source automatic quality assurance (OSAQA) tool for the ACR MRI phantom. Australas Phys Eng Sci Med 2015; 38(1): 39-46. doi: 10.1007/s13246-014-0311-8.

Oksuz

. Brain MRI artefact detection and correction using convolutional neural networks. Comput Methods Programs Biomed 2020; 199(1). doi: 10.1016/j.cmpb.2020.105909.

Kelly

Pietsch

Counsell

Tournier

. Tranfer learning and convolutional neural net fusion for motion artefact detection. Proc Int Soc Magn Reson Med Sci Meet Exhi 2017; 3523.

Erasmus

Hurter

Naudé

Kritzinger

Acho

. A short overview of MRI artefacts. SA J Radiol 2004; 8(2): 13-7.

Oksuz

Clough

Ruijsink

Puyol-Anton

Bustin

Cruz

, et al. Detection and correction of cardiac MR motion artefacts during reconstruction from k-space. In: Shen D, Liu T, Petters TM, Staib LH, Essert C, Zhou S, et al., eds. Medical image computing and computer assisted intervention – MICCAI 2019: 22

{}^{\text{nd}}

international conference. Shenzhen, China: Springer; Vol. 11767, 2019. pp. 695-703. doi: 10.1007/978-3-030-32251-9_76.

Stadler

Schima

Ba-Ssalamah

Kettenbach

Eisenhuber

. Artifacts in body MR imaging: Their appearance and how to eliminate them. Eur Radiol 2007; 17(5): 1242-55. doi: 10.1007/s00330-006-0470-4.

Bourne

. The spatial and frequency domains. In: Fundamentals of digital imaging in medicine. London: Springer; 2010. pp. 64-78.

Parry

Wani

Jan

Gojwari

. Artefacts in magnetic resonance imaging (MRI) and their remedies. IAIM 2019; 6(4): 122-30.

10.

Gao

Jiang

Zhou

Chen

. Convolutional neural networks for computer-aided detection or diagnosis in medical image analysis: An overview. In: Mathematical Biosciences and Engineering, 2019; 16(6). doi: 10.3934/mbe.2019326.

11.

Tripathi

Sharan

Sharma

. An augmented deep learning network with noise suppression feature for efficient segmentation of magnetic resonance images. IETE Technical Review 2021; 1-14. doi: 10.1080/02564602.2021.1937349.

12.

Sharan

Tripathi

Sharma

. Encoder modified U-net and feature pyramid network for multi-class segmentation of cardiac magnetic resonance images. IETE Techical Review 2021; 1-13. doi: 10.1080/02564602.2021.1955760.

13.

Gulli

Pal

. Deep learning with Keras. Birmingham: Packt Publishing; 2017.

14.

Patterson

Gibson

. A review of machine learning. In: Deep learning: a practitioner’s approach, Sebastopol (CA): O’Reilly; 2017. pp. 1-35.

15.

Venugopal

Joseph

Das

Nath

. An EfficientNet-based modified sigmoid transform for enhancing dermatological macro-images of melanoma and nevi skin lesions. Comput Methods Programs Biomed 2022; 222. doi: 10.1016/j.cmpb.2022.106935.

16.

Nowak

Mesropyan

Faron

Block

Reuter

Attenberger

, et al. Detection of liver cirrhosis in standard T2-weighted MRI using deep transfer learning. Eur Radiol 2021; 31(11): 8807-15.

17.

Arbane

Benlamri

Brik

Djerioui

. Transfer learning for automatic brain tumor classification using MRI images. IHSH 2020; 210-4.

18.

Maqsood

Nazir

Khan

Aadil

Jamal

Mehmood

, et al. Transfer learning assisted classification and detection of Alzheimer’s disease stages using 3D MRI scans. Sensors 2019; 19(11).

19.

Tripathi

Sharma

. Denoising of magnetic resonance images using discriminative learning-based deep convolutional neural network. Technol Health Care 2022; 30(1): 145-60.

20.

Tripathi

Sharma

. Computer-aided automatic approach for denoising of magnetic resonance images. Comput Methods Biomech Biomed Eng Imaging Vis 2021; 9(6): 707-716. doi: 10.1080/21681163.2021.1944914.

21.

Bloice

Roth

Holzinger

. Biomedical image augmentation using Augmentor. Bioinformatics 2019; 35(21): 4522-4. doi: 10.1093/bioinformatics/btz259.

22.

Simonyan

Zisserman

. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556; 2014. http://arxiv.org/abs/1409.1556.

23.

Zhang

Ren

Sun

. Deep residual learning for image recognition. CoRR, abs/1512.03385; 2015. https://arxiv.org/abs/1512.03385.

24.

Arnold

. KerasR: R Interface to the Keras Deep Learning Library. The Journal of Open Source Software 2017; 2(14). doi: 10.21105/joss.00296.

25.

Abadi

Barham

Chen

Davis

Dean

, et al. TensorFlow: A system for large-scale machine learning. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016.

26.

Samani

Alappatt

Parker

Ismail

AAO

Verma

. QC-Automator: Deep learning-based automated quality control for diffusion mr images. Front Neurosci 2020; 13: 1456. doi: 10.3389/fnins.2019.01456.

27.

Fantini

Rittner

Yasuda

Lotufo

. Automatic detection of motion artifacts on MRI using deep CNN. Int Workshop Pattern Recognit Neuroimaging 2018. pp. 1-4.

28.

Chicco

Tötsch

Jurman

. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Mining 14 2021; 13. doi: 10.1186/s13040-021-00244-z.

Automatic detection of Gibbs artefact in MR images with transfer learning approach

Abstract

BACKGROUND:

OBJECTIVE:

METHODS:

RESULTS:

CONCLUSION:

Keywords

1. Introduction

2. Materials and methods

Table 1 Evaluation results with 5-fold cross-validation

5. Conclusions

Footnotes

Conflict of interest

Funding

References

Table 1
Evaluation results with 5-fold cross-validation