Multiple semantic X-ray medical image retrieval using efficient feature vector extracted by FPN

Abstract

OBJECTIVE:

Content-based medical image retrieval (CBMIR) has become an important part of computer-aided diagnostics (CAD) systems. The complex medical semantic information inherent in medical images is the most difficult part to improve the accuracy of image retrieval. Highly expressive feature vectors play a crucial role in the search process. In this paper, we propose an effective deep convolutional neural network (CNN) model to extract concise feature vectors for multiple semantic X-ray medical image retrieval.

METHODS:

We build a feature pyramid based CNN model with ResNet50V2 backbone to extract multi-level semantic information. And we use the well-known public multiple semantic annotated X-ray medical image data set IRMA to train and test the proposed model.

RESULTS:

Our method achieves an IRMA error of 32.2, which is the best score compared to the existing literature on this dataset.

CONCLUSIONS:

The proposed CNN model can effectively extract multi-level semantic information from X-ray medical images. The concise feature vectors can improve the retrieval accuracy of multi-semantic and unevenly distributed X-ray medical images.

Keywords

CBMIR multiple semantic retrieval X-Ray image IRMA

1 Introduction

With the rapid development of medical imaging technology, many modalities and protocols have been used to generate digital medical images [1, 2]. Over the past decade, more than tenfold of diagnostic images together with genomic information has been collected, with an estimated volume of 2314 Exabytes in 2020 [3]. Of this amount, diagnostic imaging accounts for a large portion. Digital imaging techniques include X-ray, ultrasound, computed tomography (CT), hybrid positron emission tomography and computed tomography (PET-CT), magnetic resonance imaging (MRI) and others. These images are typically managed using Picture Archiving and Communication System (PACS) in the Digital Imaging and Communications in Medicine (DICOM) format [4]. Radiologists can search PACS using patient ID, study ID, time range, or some textual keywords. However, this traditional search method is time-consuming, labor-intensive and inefficient for clinical decision support, which gets worse when faced with an ever-increasing number of images per body region.

Content-Based Medical Image Retrieval (CBMIR) [1 , 6] can search medical images of the same anatomical region or similar disease conditions from existing databases via visual content. The main purpose of CBMIR is to reduce the “medical semantic gap” which is formed by the difference between physician’s image reading and image features get from different computer algorithms [1]. Nowadays, CBMIR has become an important part of computer-aided diagnostics (CAD) [7]. The literature on medical image retrieval can be divided into two major directions. One class of methods retrieves medical images of the same imaging modality, body region, body orientation, etc. from a PACS-like database. [8 –10]. Another class of methods aims to retrieve medical images representing similar diseases for clinical diagnostic comparison [11, 12]. In this paper, we focus on retrieving medical images in the way of the previous methods. Considering the huge volume of diagnostic images in modern medical institutions, there is an urgent need for efficient image retrieval methods.

With the very rapid development of deep learning technology, deep convolutional neural networks (CNNs) have been successfully applied for various tasks in the medical field and promising results are emerging [7 , 13–15]. Deep CNNs with many hidden layers can effectively describe low-, mid-, and high-level semantic features of images. Recently, deep CNNs have also been introduced into the field of CBMIR, and experiments show that deep CNNs have made significant progress compared to traditional search methods [8 , 16–20]. Typically, training a deep CNN requires a large number of labeled images to learn the millions of parameters. In the general image domain, there are several large-scare image datasets available for deep CNNs’ training, such as ImageNet [21], COCO [22], PASCAL VOC [23], etc. While in the medical field, such large annotated medical image datasets are quite rare, due to the unsustainably high cost of domain experts’ manual image labeling and annotation [1 , 15]. Medical image datasets are often unbalanced, this is because of the different imaging protocols and devices for different body parts and the uneven incidence rates of different malignancies [1]. The most common scenario in medical image retrieval is to use feature vectors to represent images. Typically, feature vectors range in length from a few hundred to several thousand dimensions [1 , 17]. In a real-world clinical environment, image search using too long feature vector is time-consuming and not suitable for practical clinical requirements [1]. Therefore, reducing the size of the feature vectors or creating a very sparse search space is one important aspect that many scholars in this field are trying to make improvement [8 , 25]. The main difficulty in this regard is that retrieval accuracy plays a crucial role in medical image analysis, and it is a very challenging work to construct concise feature vectors with good discriminability [1].

In this paper we focus on learning concise feature vectors through deep CNNs to efficiently retrieve medical images while maintaining high retrieval accuracy for imbalanced X-ray medical images. The IRMA dataset [5] with rich medical semantics from ImageCLEF is employed to validate the proposed method. The contributions of this paper are given below:

A modified residual structure [26] is used as medical image feature extraction block. With identity mapping, the deep CNN model built on this block can be trained more effectively on small and extremely unevenly distributed image datasets.

A new network model for medical image retrieval is proposed in this work, which adopts feature pyramid structure [27] to extract feature vectors from different scales. The final feature vector for retrieval is learned from different feature pyramids. In this way, we obtain a succinct yet powerful discriminative feature vector to perform nearest-neighbor similar medical image search, which is crucial for medical image retrieval, especially when faced with retrieval tasks with a large volume of image data.

2 Related work

Retrieval precision is a very important factor in medical image analysis. Clinical decision-making can be better supported with retrieving more relevant images. There are two key factors that determine the performance of retrieval systems: (1) Feature representation: generally medical images are represented using feature vectors that represent the image content and can be linked to visual perceptions of the images by the physicians. Good feature vector is the fundamental for achieving good performance in medical image retrieval (2) Retrieval strategy: among CBMIR literature, classification based retrieval strategy, nearest-neighbor search strategy or their combination are commonly used for similar feature search. Retrieval strategy should be carefully chosen for different medical retrieval tasks.

2.1 Hand-engineered features

In the field of medical image retrieval, traditional features also called hand-crafted features are widely used. Common hand-crafted features include: texture features, keypoint-based features, local features and global features [1 , 29]. Camlica et al. [30] extracted LBP features from salient regions of an x-ray image, and used SVM to make classification for retrieval. LBP features were successfully applied on 2D-Hela and brain MRI retrieval tasks of ImageCLEFmed [31]. Jiang et al. [32] used region of interest (ROI) of mammography as query input, then retrieve breast tumor with SIFT features. Xu et al. [33] proposed a high speed corner-guided partial shape matching strategy for spine x-ray images retrieval. Wang et al. [34] retrieved basal-cell carcinoma with SIFT features. Venkatachalam et al. [35] combined Gabor filters and Walsh-Hadamard transform to extract brain tumor features from MRI, and then used fuzzy-c means to cluster Minkowski distance to implement retrieval.

2.2 Deep CNN features

Deep CNN is a powerful tool for image feature expression, and has achieved leading results in many challenges such as image classification, object recognition, object detection and so on. In the medical field, deep CNNs have also been introduced into CBMIR applications, and there have been some very enlightening pioneer works showing promising results. In [8, 16], Khatami et al. proposed two deep CNN based search space shrinking strategies for medical image retrieval: one method used multiple CNNs together to produce the shrunk search space. The other method employed two-stage data augmentation to train and fine tune deep CNN to shrink search space. Qayyum et al. [9] trained a CNN framework to retrieval a multimodal medical image dataset collected by the authors. They tried retrieving medical images with and without predicted class lable. And they also tested their method on a public x-ray dataset IRMA [5]. Pelka et al. [17] tried different image enhancement strategies on IRMA dataset using Random Forest classifier, Inception-v3 and Inception-ResNet-v2 to annotate each axial and the complete IRMA code. Ahn et al. [18] proposed an unsupervised feature learning framework for medical image retrieval and classification. They combined kernel learning method into CNN framework to learn sparse initial features from unlabeled medical images. They optimized their kernel learning layer-by-layer, and then stack all layers in a feedforward way to construct the final feature representation. Hofmanninger and Langs [36] used clinical routine images and radiology reports to train CNN, then they fine-tuned CNN for the current medical image retrieval task. Zhang et al. [37] tried a new deep CNN based method that took advantage of multiple information components generated by the empirical mode decomposition method. They trained deep CNN by feeding original medical image with its multiscale empirical mode decomposition components, and derived a concise feature vector to retrieve. Their method got very good performance for medical image retrieval. Alenezi et al. [38] proposed a W-shaped contrastive loss function to train CNN for imbalanced skin lesion image retrieval. Alizadeh et al. [39] got effective hash code for histopathology image retrieval by using Siamese deep network structure. Chen et al. [40] developed a multi-scale triplet hashing algorithm for X-ray, Skin Cancer and COVID-19 radiography datasets retrieval.

2.3 Retrieval strategies

Due to the close relationship between classification and retrieval in medical image processing, there are two common medical image retrieval strategies:

Use the predicted label of the classifier on the input query image as the retrieval result, as in [9, 17].

Use feature vectors generated by the classifier for nearest neighbor retrieval, as in [9 , 36].

3 Materials and methods

3.1 Dataset

Image Retrieval in Medical Application (IRMA) X-ray dataset published on ImageCLEF. IRMA is a widely used benchmark for medical image annotation and retrieval [31], and its 2007 version [41] consists of 11,000 training images and 1000 test radiographs divided into 116 classes. Each X-ray image is annotated with a 13-character string TTTT-DDD-AAA-BBB along four hierarchical axes including: imaging modality T, body orientations D, the body region examined A, and the biological system examined B. The difficulty for retrieval on the IRMA dataset is that the similarities assessed using the IRMA codes exist in different semantic directions. And meanwhile, the limited amount of images and heavily imbalanced distribution make the CNN prone to overfitting. Figure 1 shows three examples of images from the IRMA dataset along with their IRMA codes and descriptions. Figure 2 is the data distribution of IRMA2007.

Fig. 1

Examples of IRMA dataset. These images were randomly chosen from the ImageCLEF 2007 Medical Annotation Task Training Set. Republished from [41] under a CC BY license, with permission from [RWTH Aachen], original copyright [2007].

Fig. 2

The data distribution of IRMA2007 dataset.

Similar to many medical analysis tasks, the problems of insufficient labeled images and highly unbalanced data distribution are two major difficulties in applying deep CNNs in medical image retrieval tasks [1]. Pilot studies have shown promising results by using deep CNNs [8 , 16]. These efforts demonstrate that feature vectors learned by deep CNNs outperform traditional methods in medical image retrieval. Our recent work [37] further advances this research in terms of both accuracy and efficiency. In [37], the authors incorporated the empirical mode decomposition components into the CNN model to increase the input of multi-scale information, and learned concise feature vectors to perform more effective similarity retrieval. To further improve the expressive power of feature vectors, in this work, we propose a new method equipped with two characteristics to deal with these two difficulties separately. First, we choose the identity mapping version ResNet50 (ResNet5V2) as the backbone to extract medical image features to alleviate the performance degradation due to the severe data imbalance. Second, we adopt the Feature Pyramid Network (FPN) framework to make decisions at different scales to improve the performance of the model on complex medical semantics with a very limited amount of training. A brief description of proposed framework is presented as follows.

3.2 Feature extracting backbone - ResNet50V2

Residual blocks, also called “Residual Units” [42] are the basic unit for building deep residual networks (ResNets) [26]. ResNets have achieved one of the best results on ImageNet [21] dataset, and ResNets with different network depths are currently one of the most widely used deep CNN network structures. ResNet50V2 is an improved version of the original ResNet50 [26] that reduces overfitting and improves the optimization process. By rearranging batch normalization (BN) and ReLU, ResNet50V2 propagates information through the entire residual network more directly. The full pre-activation design [26] places BN and ReLU before weight layers that makes identity mapping available in both residual block and shortcut. With this design, information can be propagated forwards and backwards, making the optimization process much easier. Considering the limited amount labeled medical images and very uneven data distribution, we choose ResNet50V2 as the backbone of our network for feature extraction for better generalization. The structure of ResNet50 and ResNet50V2 convolutional blocks are shown in Fig. 3 [26].

Fig. 3

The structure of ResNet50 and ResNet50V2 convolution blocks.

As shown in Fig. 3, the difference between ResNet50 and ResNet50V2 is the position of BN and ReLU. Let F denote the residual function, f denote ReLU and W denote the weights of convolve layers. The output of the ResNet50 and ResNet50V2 units in Fig. 3 can be described by Equations (1) and (2), respectively. $x_{l + 1} of ResNet 50 = f (x_{l} + F (x_{l} + W_{l}))$ (1) $x_{l + 1} of ResNet 50 V 2 = x_{l} + F (x_{l} + W_{l})$ (2)

3.3 The proposed medical image retrieval network

Feature pyramid network (FPN) was introduced by Lin et al. [27], and utilized in the one-stage object detector RetinaNet [43] and the two-stage object detector Faster RCNN [27]. FPN constructs multi-scale feature pyramids by attaching a top-down pathway and lateral connections to standard convolutional networks. In object detection scenarios, FPN attaches a head to each level of the feature pyramid for detecting objects at multiple scales.

In this work, we take advantage of the rich, multi-scale and semantically powerful features extracted from FPN across all level scales to build our classification-based medical image retrieval framework. Helps physicians make comparative diagnoses by searching for similar anatomy and/or case through content-based image retrieval. The similarity of medical images is defined in terms of multiple aspects of clinical medical knowledge, that is, from multiple aspects of semantics. These semantics include image modalities, body orientations, anatomical regions, biological systems, etc., and can be recognized at different scales. The similarity between medical images may exist at different scales, so using FPN helps to exploit various semantic features extracted from input images. Our proposed model identifies medical images from different scales using rich semantic information and, more importantly, provides concise feature vectors to perform similarity retrieval, which is a very significant advantage when retrieving large amounts of medical images. The general process of medical image retrieval is showed in Fig. 4.

Fig. 4

The deep CNN-based content-based medical image retrieval flowchart.

The architect of the proposed network is shown in Fig. 5. ResNet50V2 is used as the backbone for image feature extraction in multiscale, pyramidal way. We use FPN to combine high- and low-resolution features, with which we can fuse recognition results from different scales to make more accurate predictions. In medical image retrieval, nearest neighbor search is the most widely used retrieval method, which requires comparing feature vectors of different images by computing some distance measures. Therefore, concise feature vectors with strong expressive ability are an important factor in handling large-scale medical image retrieval. In our framework, we construct short feature vectors at each level of FPN and concatenate them to form the final feature vector. Figure 5 details our proposed framework.

Fig. 5

The proposed medical image retrieval framework.

3.4 Training process

All images are grayscale images. For the case of different image sizes, each image is scaled to 128*128 size and input to the network after normalization. In order to balance the number of each class, data enhancement is performed on the entire dataset using transformations such as rotation, oblique cutting, scaling, etc., to about 200 images per class. For classes with more than 200 classes, random sampling is used to select 200 images as training images. We set the batch size as 16, and turn on the early stopping, the maximum epochs is 500. We train all the networks on Ubuntu 18.04, with Intel(R) Xeon(R) Gold 6154 CPU and 256 G RAM, and a NVIDIA TITAN V graphic card with 12 G RAM.

4 Experimental results and discussion

4.1 Performance measures

For classification performance, IRMA error [44], average precision(AP), average recall(AR), accuracy(Acc) and F1 measure are adopted to evaluate classification accuracy. For retrieval, in addition to IRMA error obtained by nearest-neighbor search, common measurement mean average precision (MAP) is also used to evaluate retrieval accuracy.

4.2 Classification performance measures

4.2.1 IRMA error

The total IRMA error [44] can be computed by the following formula $\begin{matrix} \sum_{j = 1}^{M} \frac{1}{b_{j}} \frac{1}{j} δ (y_{j}, {\hat{y}}_{j}) \\ with δ (y_{k}, {\hat{y}}_{k}) = {\begin{matrix} 0 if y_{k} = {\hat{y}}_{k} \forall k \leq j \\ 0.5 if y_{k} = * \exists k \leq j \\ 1 if y_{k} \neq {\hat{y}}_{k} \exists k \leq j \end{matrix} \end{matrix}$ (3)

Here, $y_{1}^{M} = y_{1}, y_{2}, \dots, y_{i}, \dots, y_{M}$ represent the correct code along one axis, ${\hat{y}}_{1}^{M} = {\hat{y}}_{1}, {\hat{y}}_{2}, \dots, {\hat{y}}_{i}, \dots, {\hat{y}}_{M}$ is the predicted code generated by classifier, and b_j is the number of possible labels for label position j. IRMA error is an official assessment indicator proposed by the data provider, which is considered to be more in line with medical knowledge. The comparison of different methods in this paper also takes IRMA error as the most important indicator.

AP (average precision) $AP = \frac{1}{K} \sum_{i = 1}^{K} \frac{T P_{i}}{T P_{i} + F P_{i}}$ (4) AR (average recall) $AR = \frac{1}{K} \sum_{i = 1}^{K} \frac{T P_{i}}{T P_{i} + T N_{i}}$ (5) Acc (accuracy) $Acc = \frac{1}{K} \sum_{i = 1}^{K} \frac{T P_{i} + T N_{i}}{T P_{i} + T N_{i} + F P_{i} + F N_{i}}$ (6) F1 measure $F 1 - measure = 2 \times \frac{AP \times AR}{AP + AR}$ (7)

In above, TP means true positive, representing the number of images which are correctly classified as class k; FP means false positive, representing the number of images which are misclassified as class k; TN means true negative, representing the number of images which are correctly classified as not class k; FN means false negative, representing the number of images which are misclassified as not class k; and K is the total number of classes of the IRMA dataset that is 116 in this paper.

4.3 Retrieval performance measure

MAP (mean average precision) [45] is a common metric for evaluating image retrieval performance. MAP calculates the mean of average precision (AP) upon the whole query image set Q. $mAP = \frac{1}{| Q |} \sum_{q \in Q} AP (q)$ (8) where, and |Q| is the amount of query image set. AP is defined as $AP (q) = \frac{1}{N} \sum_{k = 1}^{N} P_{q} (R_{k})$ (9) where, P_q (R_k) is the precision value at the recall value point R_k. And N is the number of top ranked N relevant images given the query image q.

4.4 Comparison

In this work, we compare the classification and retrieval performance of our model with other CNN-based methods in literature.

4.5 Classification

4.5.1 Classification IRMA error using CNN

The classification IRMA error of different CNNs is recorded in Table 1. The results show that with the fast development of deep CNN the classification accuracy of deep CNN is getting better and better. Our proposed method gets lower IRMA error than other methods in literature. Though the classification IRMA error score between our previous work [37] and this work is at the same level, the dimension of input image to CNN in [37] is four times than this work (256×256 in [37] vs. 128×128 in this work, in Table 1 we set the same input image size as 128×128 to network of [37]).

Table 1
Classification IRMA error of proposed method and other CNN based methods in literature. § indicates that the input image size of CNN in [37] is 256×256, while the input image size of the proposed method is 128×128 that is four times smaller

Classification IRMA error

Proposed 68.48

CNN + EMD [37]§ 68.48

Parallel shrink CNN + Radon [8] 165.55

Sequential shrink CNN + LBP [16] 168.05

CNN + Radon [19] 210.35

CNNC + RBC [20] 224.13

	Classification IRMA error
Proposed	68.48
CNN + EMD [37]§	68.48
Parallel shrink CNN + Radon [8]	165.55
Sequential shrink CNN + LBP [16]	168.05
CNN + Radon [19]	210.35
CNNC + RBC [20]	224.13

4.5.2 AP AR Acc and F1 measure

The score of commonly used classification measures AP, AR, and F1 measure computed by using prevalent deep CNNs including VGG [46], ResNet50 [26], and ResNet50V2 [42] are shown in Table 2, and we also attached IRMA error in Table 2. These methods are set as the same training parameters. From Table 2, ResNet50V2 is much better than ResNet50, the main reason is that identity mapping makes ResNet50V2 easier to be optimized for small and heavily unbalanced dataset. The proposed framework gets the best F1-measure.

Table 2
Comparison of classification performance of the proposed framework with other commonly used deep convolution models on IRMA images

Classification IRMA error AP AR F1-measure

vgg16 [46] 109.26 0.57 0.53 0.52

ResNet50 [26] 83.45 0.54 0.54 0.52

ResNet50Fpn [27] 64.34 0.68 0.66 0.65

ResNet50v2 [42] 65.76 0.70 0.68 0.67

Efficient B3 [47] 108.93 0.52 0.53 0.50

Efficient B4 [47] 94.18 0.58 0.58 0.56

Efficient B5 [47] 91.78 0.61 0.61 0.60

Efficient B7 [47] 93.29 0.58 0.60 0.57

Proposed 68.48 0.70 0.69 0.68

	Classification IRMA error	AP	AR	F1-measure
vgg16 [46]	109.26	0.57	0.53	0.52
ResNet50 [26]	83.45	0.54	0.54	0.52
ResNet50Fpn [27]	64.34	0.68	0.66	0.65
ResNet50v2 [42]	65.76	0.70	0.68	0.67
Efficient B3 [47]	108.93	0.52	0.53	0.50
Efficient B4 [47]	94.18	0.58	0.58	0.56
Efficient B5 [47]	91.78	0.61	0.61	0.60
Efficient B7 [47]	93.29	0.58	0.60	0.57
Proposed	68.48	0.70	0.69	0.68

4.6 Retrieval

4.6.1 Retrieval IRMA error

Nearest neighbor search is the most common implementation in medical image retrieval. During the nearest neighbor search, we need to calculate the distance between the feature vector of the query image and that of each image in the dataset to find the one with the smallest feature vector distance under a certain distance measure. The IRMA code of the query image is determined by the image closest to it. Thus effective feature vector is the key factor affecting accuracy. In Table 3 we compare the retrieval IRMA error of our method with other methods in literature. Our proposed framework gets the best score.

Table 3
Retrieval IRMA error comparison between proposed method and other methods reported in literature

IRMA error

Proposed 32.2

CNN + EMD [37] 43.21

SVM + multiscale LBP [30] 146.55

Parallel shrink CNN + LBP, HOG, Radon [8] 165.55

Sequential shrink CNN + LBP [16] 168.05

TAUbiomed [48] 169.5

Diap [48] 178.93

CNN + Radon [19] 210.35

CNNC + RBC [20] 224.13

FEITIJS [48] 242.46

SuperPixel [20] 249.34

VPA [48] 261.16

SP-R [20] 311.8

MedGIFT [48] 317.53

SP-RBC [20] 356.57

IRMA [48] 359.29

MedGIFT [48] 420.91

	IRMA error
Proposed	32.2
CNN + EMD [37]	43.21
SVM + multiscale LBP [30]	146.55
Parallel shrink CNN + LBP, HOG, Radon [8]	165.55
Sequential shrink CNN + LBP [16]	168.05
TAUbiomed [48]	169.5
Diap [48]	178.93
CNN + Radon [19]	210.35
CNNC + RBC [20]	224.13
FEITIJS [48]	242.46
SuperPixel [20]	249.34
VPA [48]	261.16
SP-R [20]	311.8
MedGIFT [48]	317.53
SP-RBC [20]	356.57
IRMA [48]	359.29
MedGIFT [48]	420.91

4.6.2 MAP

MAP is a common evaluation indicator in normal image retrieval. In Table 4, we compare MAP of proposed method with other methods reported in the literature, we also compare different MAP values using different distance measures, and IRMA error is attached as reference and calibration. Table 4 lists feature vector dimension in to illustrate potential for large-scale medical image data retrieval. The data in Table 4 shows that MAP may not be a good indicator for medical image dataset. MAP indicators on different distance measures are insensitive to IRMA dataset retrieval accuracy. For example, MAP values obtained by using method in [37] and our proposed method are at the same level, while for the more important indicator IRMA error annotated by clinical experts, our method has an advantage with a big gap.

Table 4
Retrieval performance comparison between proposed method and other state-of-the-art CNN models on IRMA dataset. * is the case that input image size is 128×128 same as the proposed method, and # indicates 256×256 input size as original set in [37]

Vector dimension for retrieval Best IRMA error MAP

Consine Euc Mahh

Sequential shrink CNN + LBP [16] 8496 168.05 – – –

Parallel shrink CNN + LBP, HOG, Radon [8] 8496 (LBP) 3528 (HOG) 1800 (Radon) 165.55 – – –

3528 (HOG)

1800 (Radon)

CNN+EMD [37]* 32 80.24 0.84 0.84 0.84

CNN+EMD [37]# 32 43.21 0.86 0.84 0.85

vgg16 [46]* 512 55.02 0.47 0.47 0.46

ResNet50 [26]* 2048 47.19 0.72 0.71 0.72

ResNet50Fpn [27]* 580 36.59 0.85 0.85 0.85

ResNet50v2 [42]* 2048 36.54 0.82 0.79 0.82

Efficient B3 [47]* 1536 70.77 0.69 0.68 0.68

Efficient B4 [47]* 1792 61.97 0.72 0.71 0.71

Efficient B5 [47]* 2048 55.35 0.74 0.73 0.73

Efficient B7 [47]* 2560 60.04 0.73 0.73 0.73

Proposed 580 32.2 0.85 0.85 0.85

	Vector dimension for retrieval	Best IRMA error	MAP
Sequential shrink CNN + LBP [16]	8496	168.05	–	–	–
Parallel shrink CNN + LBP, HOG, Radon [8]	8496 (LBP) 3528 (HOG) 1800 (Radon)	165.55	–	–	–
	3528 (HOG)
	1800 (Radon)
CNN+EMD [37]*	32	80.24	0.84	0.84	0.84
CNN+EMD [37]#	32	43.21	0.86	0.84	0.85
vgg16 [46]*	512	55.02	0.47	0.47	0.46
ResNet50 [26]*	2048	47.19	0.72	0.71	0.72
ResNet50Fpn [27]*	580	36.59	0.85	0.85	0.85
ResNet50v2 [42]*	2048	36.54	0.82	0.79	0.82
Efficient B3 [47]*	1536	70.77	0.69	0.68	0.68
Efficient B4 [47]*	1792	61.97	0.72	0.71	0.71
Efficient B5 [47]*	2048	55.35	0.74	0.73	0.73
Efficient B7 [47]*	2560	60.04	0.73	0.73	0.73
Proposed	580	32.2	0.85	0.85	0.85

5 Discussion of both classification and retrieval

The observations in Tables 2 and 4 show that ResNet50V2 is much better than ResNet50 in both classification and retrieval on IRMA dataset. This observation is consistent with the design idea of identity mapping proposed in [42], which is mean to make optimization process of the residual convolution network easier especially on small and unevenly distributed datasets.

From Table 2, we can observe that the proposed method is not significantly superior to the original ResNetV2 for classification performance. The proposed method is slightly better than the original ResNetV2 on F1-measure and slightly worse than the original ResNetV2 on IRMA error. This may be due to the fact that the classification output is a direct prediction of the label of the image. For the experiments in this paper, it is the predicted output of 116 categories. The feature vector of the penultimate layer in Fig. 5 is compressed to 116 dimensions by weight, which does not reflect the advantages of the feature vector that reflects multi-scale information constructed by FPN.

The retrieval process is performed by calculating the similarity between the feature vectors of different images through a similarity measure. The label of the query image is determined by the closest image obtained by calculating the similarity. The feature vector of an image is the penultimate layer of the proposed network (marked in Fig. 5). In the retrieval scenario, the multi-scale information extracted using FPN shows advantages in the calculation of similarity. From Table 4, we can see that the proposed method has a significant improvement in retrieval performance compared to ResNet50V2. This is in line with our analysis that multiscale information is more appropriate for IRMA multi-semantic labeling. And our proposed network obtains the best IRMA error through similarity retrieval. The main reason is that it is an effective way to extract multiple perspective semantic information from the inherent feature pyramid of deep CNNs.

Table 5 shows the ablation experiments for classification and retrieval, using IRMA error as an indicator.

Table 5
Ablation experiments for classification and retrieval

Classification IRMA error Retrieval IRMA error

ResNet50 83.45 47.19

ResNet50v2 65.76 36.54

ResNet50v2 + FPN (proposed) 68.48 32.2

	Classification IRMA error	Retrieval IRMA error
ResNet50	83.45	47.19
ResNet50v2	65.76	36.54
ResNet50v2 + FPN (proposed)	68.48	32.2

We compare the computation time of different networks in Table 6.

Table 6

Ablation experiments for classification and retrieval

	Train time (ms)	Feature extraction time over 1000 inputs (ms)	Classification time over 1000 inputs (ms)	Retrieval time over 1000 inputs (ms)
Vgg16	484.26	10.15	5.72	382.57
ResNet50	951.19	29.09	18.51	427.77
CNN+EMD	10053.84	43.62	19.75	347.01
ResNet50v2 + FPN (proposed)	2839.46	103.86	138.45	404.7

Figure 6 shows the mAP-Recall curves of the proposed method and other CNNs. In Fig. 7, we show an example of a good query result and an example of a poor query result.

Fig. 6

The mAP-Recall curves of the proposed method and other CNNs.

Fig. 7

An example of a good query result and an example of a poor query result.

6 Conclusions

CBMIR can efficiently support medical image analytics by retrieving images with similar medical diagnostic semantic content in multiple perspectives (e.g., anatomy regions, modality, imaging direction, organic systems). This can significantly facilitate domain experts performing comparative medical image diagnosis. Concise yet highly distinguishable feature vectors are essential for medical image retrieval systems. A new convolutional neural network has been proposed in this paper, which improves the accuracy of medical image retrieval with complex semantics by a large margin. The new method takes advantage of the multi-scale information brought by the feature pyramid network to construct feature vector for effective nearest-neighbor search. On the very imbalanced IRMA dataset with complex medical semantics, the proposed method is significantly lower than other methods on IRMA error by a large margin. Through our experiments, we show that retrieval methods are much more effective than classification methods for medical image data with complex semantics. Concise feature vector has also shown the ability of the new method to handle retrieving relative images from big medical image archive. However, the correspondence between multi-scale information and multi-semantics is still complex and needs to be further clarified. And the ability for computational pathology image retrieval is not clear. We intend to further examine CBMIR on other medical datasets, different modalities, and 3D volumetric applications. We intend to further investigate the application of CBMIR on multi-modality medical datasets.

Footnotes

Acknowledgments

We thank Prof. Dr. T. M. Deserno of the Dept. of Medical Informatics, RWTH Aachen, Germany, for providing us with the IRMA medical image dataset. This work was supported in part by Ningxia Natural Science Foundation under Grants 2023AAC03263 and 2022AAC05040.

References

, Zhang

, Müller

and Zhang

, Large-scale retrieval for medical image analytics: A comprehensive review, Medical Image Analysis 43 (2018), 66–84. https://doi.org/10.1016/j.media.2017.09.007.

Müller

, Michoux

, Bandon

and Geissbuhler

, A review of content-based image retrieval systems in medical applications— clinical benefits and future directions, International Journal of Medical Informatics 73(1) (2004), 1–23. https://doi.org/10.1016/j.ijmedinf.2003.11.024.

Aiello

, Cavaliere

, D’Albore

and Salvatore

, The Challenges of Diagnostic Imaging in the Era of Big Data, Journal of Clinical Medicine 8(3) (2019), doi: 10.3390/jcm8030316.

Kahn

C.E.

, Carrino

J.A.

, Flynn

M.J.

, Peck

D.J.

and Horii

S.C.

, DICOM and Radiology: Past, Present and Future, Journal of the American College of Radiology 4(9) (2007), 652–657. doi: https://doi.org/10.1016/j.jacr.2007.06.004.

Lehmann

T.M.

, et al., Content-based image retrieval in medical applications, (in eng), Methods Inf Med 43(4) (2004), 354–361. doi: 10.1055/s-0038-1633877.

Akgül

C.B.

, Rubin

D.L.

, Napel

, Beaulieu

C.F.

, Greenspan

and Acar

, Content-Based Image Retrieval in Radiology: Current Status and Future Directions, Journal of Digital Imaging 24(2) (2011), 208–222. doi:10.1007/s10278-010-9290-9.

Roth

H.R.

et al., Improving Computer-Aided Detection Using Convolutional Neural Networks and Random View Aggregation, IEEE Transactions on Medical Imaging 35(5) (2016), 1170–1181. doi: 10.1109/tmi.2015.2482920.

Khatami

, Babaie

, Khosravi

, Tizhoosh

H.R.

and Nahavandi

, Parallel deep solutions for image retrieval from imbalanced medical imaging archives, Applied Soft Computing 63 (2018), 197–205. doi: https://doi.org/10.1016/j.asoc.2017.11.024.

Qayyum

, Anwar

S.M.

, Awais

and Majid

, Medical image retrieval using deep convolutional neural network, Neurocomputing 266 (2017), 8–20. doi: https://doi.org/10.1016/j.neucom.2017.05.025.

10.

Srinivas

, Naidu

R.R.

, Sastry

C.S.

and Mohan

C.K.

, Content based medical image retrieval using dictionary learning, Neurocomputing 168 (2015), 880–895. doi: https://doi.org/10.1016/j.neucom.2015.05.036.

11.

Pan

, Qiang

, Yuan

and Wu

, Rapid Retrieval of Lung Nodule CT Images Based on Hashing and Pruning Methods, BioMed Research International 2016(2016), 3162649. doi: 10.1155/2016/3162649.

12.

Shin

, Roberts

, Lu

, Demner-Fushman

, Yao

and Summers

R.M.

, Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016 2016, pp. 2497–2506. doi: 10.1109/cvpr.2016.274.

13.

Greenspan

, Ginneken

B.V.

and Summers

R.M.

, Guest Editorial Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique, IEEE Transactions on Medical Imaging 35(5) (2016), 1153–1159. doi: 10.1109/tmi.2016.2553401.

14.

Setio

A.A.A.

et al., Pulmonary Nodule Detection in CT Images: False Positive Reduction Using Multi-View Convolutional Networks, IEEE Transactions on Medical Imaging 35(5) (2016), 1160–1169. doi: 10.1109/tmi.2016.2536809.

15.

Shin

, Orton

M.R.

, Collins

D.J.

, Doran

S.J.

and Leach

M.O.

, Stacked Autoencoders for Unsupervised Feature Learning and Multiple Organ Detection in a Pilot Study Using 4D Patient Data, IEEE Transactions on Pattern Analysis and Machine Intelligence 35(8), (2013), 1930–1943. doi: 10.1109/tpami.2012.277.

16.

Khatami

, Babaie

, Tizhoosh

H.R.

, Khosravi

, Nguyen

and Nahavandi

, A sequential search-space shrinking using CNN transfer learning and a Radon projection pool for medical image retrieval, Expert Systems with Applications 100 (2018), 224–233. doi: https://doi.org/10.1016/j.eswa.2018.01.056.

17.

Pelka

, Nensa

and Friedrich

C.M.

, Annotation of enhanced radiographs for medical image retrieval with deep convolutional neural networks, PLOS ONE 13(11, p. e0206229, 2018. 10.1371/journal.pone.0206229.

18.

Ahn

, Kumar

, Fulham

, Feng

and Kim

, Convolutional sparse kernel network for unsupervised medical image analysis, Medical Image Analysis 56 (2019), 140–151. doi: https://doi.org/10.1016/j.media.2019.06.005.

19.

Khatami

, Babaie

, Khosravi

, Tizhoosh

H.R.

, Salaken

S.M.

and Nahavandi

, A deep-structural medical image classification for a Radon-based image retrieval, in 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), 30 April-3 May 2017 2017, pp. 1–4. doi: 10.1109/ccece.2017.7946756.

20.

Liu

, Tizhoosh

H.R.

and Kofman

, Generating binary tags for fast medical image retrieval based on convolutional nets and Radon Transform, in 2016 International Joint Conference on Neural Networks (IJCNN), 24–29 July 2016 2016, pp. 2872–2878. doi: 10.1109/ijcnn.2016.7727562.

21.

Russakovsky

et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision 115(3) (2015), 211–252. doi: 10.1007/s11263-015-0816-y.

22.

Lin

T.-Y.

et al., Microsoft COCO: Common Objects in Context, in Computer Vision – ECCV 2014, Cham

, Fleet

, Pajdla and Schiele

, Tuytelaars

, Eds., 2014// 2014: Springer International Publishing, pp. 740–755.

23.

Everingham

, Van Gool

, Williams

C.K.I.

, Winn

and Zisserman

, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision 88(2) (2010), 303–338. doi:10.1007/s11263-009-0275-4.

24.

Shen

, Tao

and Ma

Multiview Locally Linear Embedding for Effective Medical Image Retrieval, PLoS ONE 8(12) (2013), e82409. doi: 10.1371/journal.pone.0082409.

25.

Shen

, Tao

and Ma

Dual-Force ISOMAP: A New Relevance Feedback Method for Medical Image Retrieval, PLoS ONE 8(12) (2013), e84096. doi: 10.1371/journal.pone.0084096.

26.

, Zhang

, Ren

and Sun

, Deep Residual Learning for Image Recognition, in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016 2016, pp. 770–778. doi: 10.1109/cvpr.2016.90.

27.

Lin

, Dollár

, Girshick

, He

, Hariharan

, Belongie

, Feature Pyramid Networks for Object Detection, in2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017 2017, pp. 936–944. doi: 10.1109/cvpr.2017.106.

28.

Hwang

K.H.

, Lee

and Choi

, Medical Image Retrieval: Past and Present, Hir 18(1) (2012), 3–9. doi: 10.4258/hir.2012.18.1.3.

29.

Kumar

, Kim

, Cai

, Fulham

and Feng

, Content-Based Medical Image Retrieval: A Survey of Applications to Multidimensional and Multimodality Data, Journal of Digital Imaging 26(6) (2013), 1025–1039. doi: 10.1007/s10278-013-9619-2.

30.

Çamlica

, Tizhoosh

H.R.

and Khalvati

, Medical Image Classification via SVM Using LBP Features from Saliency-Based Folded Data, in 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), , 9–11 Dec. 2015 2015, pp. 128–132. doi: 10.1109/ICMLA.2015.131.

31.

Müller

, de Herrera

A.G.S.

, Kalpathy-Cramer

, Demner-Fushman

, Antani

S.K.

and Eggel

, Overview of the ImageCLEF 2012 medical image retrieval and classiFIcation tasks, 2012, pp. 1–16.

32.

Jiang

, Zhang

, Li

and Metaxas

D.N.

, Computer-Aided Diagnosis of Mammographic Masses Using Scalable Image Retrieval, IEEE Transactions on Biomedical Engineering 62(2) (2015), 783–792. doi: 10.1109/tbme.2014.2365494.

33.

, Lee

, Antani

and Long

L.R.

, A Spine X-Ray Image Retrieval System Using Partial Shape Matching, IEEE Transactions on Information Technology in Biomedicine 12(1) (2008), 100–108. doi: 10.1109/titb.2007.904149.

34.

Caicedo

J.C.

, Cruz

and Gonzalez

F.A.

, Histopathology Image Classification Using Bag of Features and Kernel Functions, in Artificial Intelligence in Medicine, Berlin, Heidelberg

Combi ,

Shahar and

Abu-Hanna , Eds., 2009// 2009: Springer Berlin Heidelberg, pp. 126–135.

35.

Venkatachalam

, Siuly

, Bacanin

, Hubálovský

and Trojovský

, An Efficient Gabor Walsh-Hadamard Transform Based Approach for Retrieving Brain Tumor Images From MRI, IEEE Access 9 (2021), 119078–119089. doi: 10.1109/access.2021.3107371.

36.

Hofmanninger

and Langs

, Mapping visual features to semantic profiles for retrieval in medical imaging, in2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7–12 June 2015 2015, pp. 457–465. doi: 10.1109/cvpr.2015.7298643.

37.

Zhang

, Zhi

and Zhou

, Medical Image Retrieval Using Empirical Mode Decomposition with Deep Convolutional Neural Network, BioMed Research International 2020 (2020), 6687733. doi: 10.1155/2020/6687733.

38.

Öztürk

, Çelik

and Çukur

, Content-based medical image retrieval with opponent class adaptive margin loss, Information Sciences 637 (2023), 118938. doi: https://doi.org/10.1016/j.ins.2023.118938.

39.

Mohammad Alizadeh

, Sadegh Helfroush

and Müller

, A novel Siamese deep hashing model for histopathology image retrieval, Expert Systems with Applications 225 (2023), 120169. doi: https://doi.org/10.1016/j.eswa.2023.120169.

40.

Chen

, Tang

, Huang

and Xiong

Multi-scale Triplet Hashing for Medical Image Retrieval, Computers in Biology and Medicine 155 (2023), 106633. doi: https://doi.org/10.1016/j.compbiomed.2023.106633.

41.

Lehmann

T.M.

et al., IRMA–content-based image retrieval in medical applications, in Stud Health Technol Inform vol. 107, no. Pt 2. Netherlands, 2004, pp. 842–6.

42.

, Zhang

, Ren

and Sun

, Identity Mappings in Deep Residual Networks, in Computer Vision – ECCV 2016, Cham,

B. Leibe

Matas ,

Sebe and

Welling , Eds., 2016// 2016: Springer International Publishing, pp. 630–645.

43.

Lin

, Goyal

, Girshick

, He

and Dollár

, Focal Loss for Dense Object Detection, in2017 IEEE International Conference on Computer Vision (ICCV) , 22–29 Oct. 2017 2017, pp. 2999–3007. doi: 10.1109/iccv.2017.324.

44.

Müller

, Deselaers

, Deserno

T.M.

, Kalpathy—Cramer

, Kim

and Hersh

, Overview of the ImageCLEFmed 2007 Medical Retrieval and Medical Annotation Tasks, in: Advances in Multilingual and Multimodal Information Retrieval: 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, Budapest, Hungary, September 19-21, 2007, Revised Selected Papers: Springer-Verlag, 2008, pp. 472–491. .

45.

Deselaers

, Keysers

and Ney

, Features for image retrieval: an experimental comparison, Information Retrieval 11(2) (2008), 77–107. doi: 10.1007/s10791-007-9039-3.

46.

Simonyan

and Zisserman

, Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv e-prints, p. arXiv::1409.1556 2014. [Online]. Available: https://ui.adsabs.harvard.edu/abs/2014arXiv1409.1556S.

47.

Tan

and Le

Q.V.

, EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, CoRR, vol. abs/1905.11946, / 2019 [Online]. Available: http://arxiv.org/abs/1905.11946.

48.

Tommasi

, Caputo

, Welter

, Güld

M.O.

and Deserno

T.M.

, Overview of the CLEF 2009 Medical Image Annotation Track, in Multilingual Information Access Evaluation II. Multimedia Experiments, Berlin, Heidelberg,

Peters et al., Eds., 2010// 2010: Springer Berlin Heidelberg, pp. 85–93.

	Vector dimension for retrieval	Best IRMA error	MAP
			Consine	Euc	Mahh
Sequential shrink CNN + LBP [16]	8496	168.05	–	–	–
Parallel shrink CNN + LBP, HOG, Radon [8]	8496 (LBP) 3528 (HOG) 1800 (Radon)	165.55	–	–	–
	3528 (HOG)
	1800 (Radon)
CNN+EMD [37]*	32	80.24	0.84	0.84	0.84
CNN+EMD [37]#	32	43.21	0.86	0.84	0.85
vgg16 [46]*	512	55.02	0.47	0.47	0.46
ResNet50 [26]*	2048	47.19	0.72	0.71	0.72
ResNet50Fpn [27]*	580	36.59	0.85	0.85	0.85
ResNet50v2 [42]*	2048	36.54	0.82	0.79	0.82
Efficient B3 [47]*	1536	70.77	0.69	0.68	0.68
Efficient B4 [47]*	1792	61.97	0.72	0.71	0.71
Efficient B5 [47]*	2048	55.35	0.74	0.73	0.73
Efficient B7 [47]*	2560	60.04	0.73	0.73	0.73
Proposed	580	32.2	0.85	0.85	0.85

Multiple semantic X-ray medical image retrieval using efficient feature vector extracted by FPN

Abstract

OBJECTIVE:

METHODS:

RESULTS:

CONCLUSIONS:

Keywords

1 Introduction

2 Related work

2.1 Hand-engineered features

2.2 Deep CNN features

2.3 Retrieval strategies

3 Materials and methods

3.1 Dataset

4 Experimental results and discussion

4.1 Performance measures

4.2 Classification performance measures

4.2.1 IRMA error

4.5 Classification

4.5.1 Classification IRMA error using CNN

4.6.1 Retrieval IRMA error

Table 5 Ablation experiments for classification and retrieval Classification IRMA error Retrieval IRMA error ResNet50 83.45 47.19 ResNet50v2 65.76 36.54 ResNet50v2 + FPN (proposed) 68.48 32.2

Footnotes

Acknowledgments

References

Table 5
Ablation experiments for classification and retrieval

Classification IRMA error Retrieval IRMA error

ResNet50 83.45 47.19

ResNet50v2 65.76 36.54

ResNet50v2 + FPN (proposed) 68.48 32.2