VDVM: An automatic vertebrae detection and vertebral segment matching framework for C-arm X-ray image identification

Abstract

BACKGROUND:

C-arm fluoroscopy, as an effective diagnosis and treatment method for spine surgery, can help doctors perform surgery procedures more precisely. In clinical surgery, the surgeon often determines the specific surgical location by comparing C-arm X-ray images with digital radiography (DR) images. However, this heavily relies on the doctor’s experience.

OBJECTIVE:

In this study, we design a framework for automatic vertebrae detection as well as vertebral segment matching (VDVM) for the identification of vertebrae in C-arm X-ray images.

METHODS:

The proposed VDVM framework is mainly divided into two parts: vertebra detection and vertebra matching. In the first part, a data preprocessing method is used to improve the image quality of C-arm X-ray images and DR images. The YOLOv3 model is then used to detect the vertebrae, and the vertebral regions are extracted based on their position. In the second part, the Mobile-Unet model is first used to segment the vertebrae contour of the C-arm X-ray image and DR image based on vertebral regions respectively. The inclination angle of the contour is then calculated using the minimum bounding rectangle and corrected accordingly. Finally, a multi-vertebra strategy is applied to measure the visual information fidelity for the vertebral region, and the vertebrae are matched based on the measured results.

RESULTS:

We use 382 C-arm X-ray images and 203 full length X-ray images to train the vertebra detection model, and achieve a mAP of 0.87 in the test dataset of 31 C-arm X-ray images and 0.96 in the test dataset of 31 lumbar DR images. Finally, we achieve a vertebral segment matching accuracy of 0.733 on 31 C-arm X-ray images.

CONCLUSIONS:

A VDVM framework is proposed, which performs well for the detection of vertebrae and achieves good results in vertebral segment matching.

Keywords

C-arm X-ray image DR image deep learning image identification

1 Introduction

At present, image-guided surgery (IGS) has been widely used in the field of spine surgery [1]. C-arm X-ray has become the main method for image-guided surgery due to its ability to update the position of instruments in real time and to provide the latest anatomical structures [2]. When performing surgery, a doctor will first perform a C-arm X-ray on the patient and determine the surgical location by determining the vertebral segments in the image. However, due to the narrow field of view of intraoperative X-ray images [3], it is difficult to find anatomical landmarks. At the same time, the low X-ray dose used for C-arm X-ray leads to poor image contrast. In addition, the presence of various types of noise [4] interferes with the images, making it difficult for the surgeon to directly identify C-arm X-ray images. Fig. 1. shows the examples of low contrast C-arm X-ray images. In such cases, the surgeon will compare the patient’s pre-operative images, such as DR/X-ray, computed tomography (CT), magnetic resonance imaging (MRI), etc., with the fluoroscopic images to ensure that the procedure is performed at the correct location. However, due to the morphological similarities between vertebrae, manual determination by observation requires highly specialized experience. Lack of experience may lead to errors in some cases. Therefore, reliable identification of C-arm X-ray images can effectively provide surgical guidance and improve surgical accuracy and efficiency.

Fig. 1

Examples of low contrast C-arm X-ray images.

In recent years, benefiting from its advantages in image processing, deep learning has achieved rapid development in the field of medical images, including classification, detection, segmentation, and registration. Deep learning replaces the process of machine learning to manually adjust a large number of parameters, and can automatically learn multiple levels of image visual features [5], which effectively improves the efficiency of image processing. In the deep learning research, related to C-arm X-Ray images, there have been some good study results. Li et al. [6] used a feature fusion deep learning (FFDL) model to detect the lumbar spine in a C-arm X-ray image. The model combines the shape information and texture information of the lumbar spine through the Sobel kernel and the Gabor kernel, and finally outputs the detection result. Esfandiari et al. [7] segmented the pedicle screws in C-arm X-ray images using the U-Net model, and the results showed that the method was able to segment the screw axis effectively. Kausch et al. [8] proposed a CNN-based posture regression framework. The C-arm X-ray image can be automatically adjusted to the standard projection angle through two sequentially connected CNN models. Esteban et al. [9] proposed a deep learning framework for automatic alignment of C-arm X-ray and CT images, obtaining a registration success rate of more than 97%, making the results effective for intraoperative pose initialization.

There are several preoperative imaging modalities used for C-arm X-ray image comparisons, among which the comparison with DR/X-ray images is the most common. However, previous studies have focused on 2D/3D registration of C-arm X-ray images with CT or MRI images [8 –11]. Unlike previous studies, we will focus on the feasibility of identifying C-arm X-ray images with DR images. We think that there is a certain degree of similarity in the morphology of vertebrae, and in rare cases, especially in the thoracic vertebrae, there may be a possibility of misjudgment during clinical surgery, so we want to use deep learning to identify C-arm X-ray images to provide more provide effective surgical assistance guidance for doctors. The main difficulty in achieving automatic recognition of C-arm X-ray images is due to that the perspective angle of the C-arm is different from that of the preoperative DR image. In some cases, the anatomical structures generated differ significantly from those in the DR images. Also, the pedicle screws implanted in some patients [12], as well as the markers used to identify the surgical location (metal wires, metal rods, scissors, etc.) partially fill the image and can be superimposed on the vertebral area, interfering with the overall identification task of the image. In addition, various types of noise (Gaussian, speckle, Poisson noise, etc.) make C-arm X-ray images different compared with DR images [4].

In this study, we propose a framework for automatic identification of vertebrae in C-arm X-ray images. First, we improve the overall similarity between C-arm X-ray images and DR images through preprocessing methods. Then, a detection model is used to localize the vertebrae, and the target area is cropped to minimize the proportion of the image area occupied by interfering objects. Next, a segmentation model is used to correct the angles of individual vertebrae to reduce the effect of shooting angle on the image. Finally, the vertebral segments are matched based on the image similarity metric for the purpose of C-arm X-ray image identification. We demonstrate the validity of the framework by having the identification results evaluated by spine surgery specialists. To summarize, our objective is to design a framework for automatic vertebrae detection and vertebral segment matching (VDVM) for the identification of vertebrae in C-arm X-ray images.

The rest of this paper is organized as follows. Section 2 introduces the dataset and the VDVM framework, including image preprocessing methods, deep learning models, and similarity metrics. Section 3 presents the VDVM experiments and the result evaluation metrics and analyzes the experimental results. Section 4 and 5 summarize the experimental results and analyze the challenges as well as suggest future work.

2 Methods

2.1 Dataset

There are no publicly available datasets, the images used in this study were collected from the Affiliated Hospital of Ningbo University Medical College and Ningbo Yinzhou No. 2 Hospital. We divide the images into four datasets based on image types, and the images are annotated by orthopedic experts using Labelme and Labelimg software for training and validation of different deep learning models, and the images from each dataset are illustrated in Fig. 2. and summarized in Table 1.

Fig. 2

Illustrates of images and training/testing dataset assignments.

Table 1

Datasets summary

	Image type	Number of images	Application task
Dataset 1	C-arm X-ray image	382	Vertebrae detection and vertebrae segmentation
Dataset 2	X-ray image	203	Vertebrae detection
Dataset 3	DR image	21	Vertebrae segmentation
Dataset 4	C-arm X-ray image and DR image	31, 31	Vertebrae detection and vertebrae segmentation

Dataset 1 – This dataset contains 382 intraoperative C-arm images obtained by the C-arm X-ray machine, including C-arm images of the thoracic and lumbar regions, and is used for the training of the C-arm image vertebra detection model.

Dataset 2 – This dataset contains 203 full-length X-ray images taken before surgery. Due to the lack of available DR images, we use X-ray anteroposterior images instead of DR images for the training of the vertebrae detection of DR images.

Dataset 3 – This dataset contains 21 DR images taken before surgery, which are used for the training of the vertebral segmentation model. The images include part of the thoracic region and the complete lumbar region.

Dataset 4 – This dataset contains 31 sets of intraoperative C-arm images taken by mobile devices, as well as preoperative lumbar DR images of the same patient, for final experimental performance testing.

Both X-ray images and DR images are X-ray images. The difference between X-ray images and DR images used in this paper lies in the image quality and image display area. DR images usually have higher image quality. Because the DR system is digital, it can provide higher spatial resolution and contrast. However, the X-ray images used in this paper include the entire region from the head to the sacrum, while the DR images include only part of the thoracic region and the complete lumbar region.

2.2 Proposed System

VDVM focuses on the identification of C-arm images through deep learning methods, which is divided into two parts, detection of vertebrae and vertebral segment matching, and the main process is shown in Fig. 3. The main purpose of vertebrae identification is to determine the number of vertebral segments, and the purpose of vertebrae detection in this study is to locate the vertebrae region, so that the next step of vertebrae matching can be performed. In the vertebra detection step, the input C-arm X-ray images and DR images are firstly preprocessed, and then the vertebra detection is performed separately for both types of images through the YOLOv3 model. In the vertebrae segment matching step, the detected vertebral region is first cropped based on the detection results, and the cropped vertebrae images are segmented through the Mobile-Unet model. Based on the segmentation results, we measure the inclination angle of the vertebrae, correct and crop the vertebrae region based on the measured angle in the original image, match the corrected and cropped C-arm image with the DR image vertebrae. Based on the similarity measure as a result, the optimal matching result is obtained, and finally the vertebral joints are marked based on the matching result.

Fig. 3

Main process of VDVM framework.

2.3 Vertebrae Detection

2.3.1 Data Preprocessing and Augmentation

In the vertebrae detection step, we use Dataset 1 and Dataset 2 for model training, and Dataset 4 for performance validation. For the C-arm images in Dataset 1 and Dataset 4, we resize the image to 512*512 pixels, and perform color inversion, median filtering, and histogram normalization to improve image quality. Due to the insufficient number of DR images available, we use the X-ray full-length films in Dataset 2 to make a simulated DR image training set. We perform region of interest (ROI) cropping of the images to ensure that the display range in the X-ray image approximates that of the DR images. The intensity histogram of the X-ray images based on the vertical direction is first obtained, and then the columns between the average intensity plus or minus 1.5 standard deviations ( std ) of the mean intensity are used as the left and right boundaries of the ROI. At the same time, the intensity histogram in the horizontal direction is calculated, and twice the intensity minimum point y is used as the upper boundary, keeping the lower boundary unchanged, and the resulting boundary is cropped to simulate the area range shown in the lumbar DR film. The cropping process is shown in Fig. 4. After cropping, the image is resized to 512*1280 pixels. After the two groups (C-arm images in Dataset 1 and Dataset 4, and X-Ray images in Dataset 2) of images are preprocessed, image augmentation is performed through Gaussian blur and Gaussian noise. The number of images after augmentation are tripled, including 1146 C-arm images and 609 X-ray images, and are fed into the YOLOv3 model for training with a 4:1 training and testing ratio. We resize the DR images in Dataset 4 to 512*1280 pixels and perform histogram normalization.

Fig. 4

X-ray image cropping process.

2.3.2 YOLOv3 Model

We use YOLOv3 [13] as the vertebra detection model. YOLOv3 is an improved one-stage target detection algorithm based on YOLOv2 [14], where the candidate frame and feature extraction, target classification, and localization steps are done in a single unit. Compared with YOLOv2, YOLOv3 uses darknet-53 as the backbone network for extracting features. The first half of the network is a stack of convolutional layers and pooling layers, and the second half is a stack of residual blocks. The use of residual modules enables that network more convolutional layers can be added to further increase the depth and complexity of the network. In addition, to enhance the accuracy of the algorithm for small target detection, YOLOv3 uses a feature pyramid network (FPN) for feature fusion after feature extraction, which can capture feature information of different scales. The width, height, category labels and category probabilities of the predicted boxes are obtained using three different YOLO layers for different scales of boxes. YOLOv3 also uses the leaky ReLU activation function, which can alleviate the problem of gradient disappearance. Therefore, YOLOv3 has a faster detection rate in object detection tasks [15]. Fig. 5 shows the structure diagram of YOLOv3 set up in this paper.

Fig. 5

Structure diagram of YOLOv3.

2.4 Vertebrae Segment Matching

2.4.1 Data Preprocessing and Augmentation

In the vertebrae segment matching step, we first perform vertebrae detection of C-arm image and DR image using the trained YOLOv3 model for pre-processed Dataset 1 and original Dataset 3, respectively, and then crop the vertebrae region based on the results of vertebrae detection. The cropped vertebrae images are resized to 64*128 pixels. We randomly select part of the vertebrae images in Dataset 1 and all vertebrae images in Dataset 3, totaling 416 images, and perform image augmentation through Gaussian blurring and Gaussian noise. After augmentation, the number of images is tripled, totaling 1248 images, and they are fed into the Mobile-Unet model in the 4:1 ratio of training set and test set.

2.4.2 Mobile-Unet Model

We use Mobile-Unet as the vertebral segmentation model. The Mobile-Unet network is based on the encoder-decoder structure of the U-Net [16] network, which is replaced by the MobileNetV2 [17] network structure in the encoder part, while retaining the original decoder structure of U-Net. U-Net was originally developed and used for biomedical image segmentation. The network consists of a contraction path on the left and an expansion path on the right, which are used for feature extraction and feature learning, respectively. Due to the fixed structure and simple image semantics of medical images, the small downsampling of U-Net results in a large degree of information reduction and enables more detailed image features to be obtained at low resolution. Meanwhile, the decoder of U-Net is able to fuse more low-level features through multiple upsampling and skip connection, so that the network can obtain more precise features at the image edges, which can effectively deal with the disadvantages of blurred edges and complex gradients of medical images [18, 19].

Since each matching requires segmentation of multiple vertebral images, which makes the matching efficiency lower compared to clinical manual matching, we use MobileNetV2 as the backbone network in the encoder part to improve the speed of model segmentation. MobileNetV2 is extended on the basis of MobileNet [20], which is an efficient lightweight convolutional neural network. MobileNet is characterized by the introduction of a depthwise separable convolution layer. Compared with standard convolution, depthwise separable convolution decomposes the convolution operation into depth wise convolution and point wise convolution, which can effectively reduce computing costs. MobileNetV2 introduces an inverted residual structure and a linear bottleneck. The inverted residual structure reduces the number of feature layer channels corresponding to the convolution kernel, thus reducing the computational effort. The linear bottleneck can embed the input into a low-dimensional subspace without using nonlinear transformation, reducing the depth of convolution and the resulting loss in accuracy [21].

2.4.3 Angle Correction

Due to the change of the perspective angle during the operation, the inclination degree and direction of the vertebral angle in the C-arm image and the DR image may not match. We adopt an angle correction method for the C-arm vertebral images to reduce the influence of the tilt angle on the vertebrae matching. After obtaining the vertebral segmentation result of the C-arm image, the minimum bounding rectangle of the vertebral segmentation contour is obtained first, and the parameters of the external rectangle including the length and width, the horizontal tilt angle and the center of the rectangle are calculated. Then, the center of the bounding rectangle in the original vertebral image is used as the rotation center, and the vertebral image is rotated against the tilt angle. Finally, the image is re-cropped based on the length and width of the rectangle to obtain an angle-corrected vertebral image, and the images are resized to 64*128 pixels. The angle correction process is shown in Fig. 6.

Fig. 6

Angle correction process.

2.4.4 Visual Information Fidelity

We use visual information fidelity (VIF) [22] as the similarity evaluation metric to compute multi-vertebra matching. VIF is a similarity evaluation metric based on natural scene statistics (NSS) for image evaluation from the perspective of information communication and sharing. VIF is statistically obtained from the mutual information C between images, the distorted image F , and the image E perceived by the human visual system (HVS), as shown in Eq. (1). In the equation, I(C; F) and I(C; E) represent the amount of mutual information between images C, F and images C, E. Image E is obtained by inputting a reference image C into the HVS model and adding visual noise. The HVS model is treated here as a distortion baseline, limiting the amount of information that can be transmitted through it. In the wavelet domain, the HVS uncertainty from all sources is aggregated into an additive noise component, called visual noise, and modeled as a stationary, zero-mean, additive white Gaussian noise model in the wavelet domain. VIF first divides the reference image into multiple subbands, and further parses it into multiple blocks, then calculates the mutual information value in each subband and block, and finally performs integration to calculate the image quality. The value range of VIF is [0, 1], and the higher value indicates the better visual quality of the image. When VIF takes zero, it means that all information about the reference image is completely lost. When VIF is taken as 1, it means that the image has no distortion at all. Therefore, for all cases of actual distortion, VIF will be represented between 0 and 1.

$VIF = \frac{I (C; F)}{I (C; E)}$ (1)

2.4.5 Multi-vertebra Matching

We adopt a multi-vertebra matching method to perform the vertebrae segment matching on the vertebral image, so as to ensure that the results are highly reliable. We first sort the vertebral regions detected in the DR image according to the horizontal position, and the last five vertebral regions are labeled as L1 to L5 from top to bottom, and the vertebral regions above L1 are labeled in descending order from T12 to top. At the same time, the vertebrae in the DR images are grouped sequentially from top to bottom based on the number of vertebrae detected in the C-arm images n . Assuming that the number of vertebrae detected in the DR image is m , the vertebrae regions are divided into m-n+1 groups, each group containing n vertebral regions. Then, the similarity set S ={ p₁ , . . ., p_i }, i=1, . . ., m+n+1 is obtained by using the VIF metric to calculate the vertebral regions of the C-arm images and the grouped vertebral regions correspondingly from top to bottom, where p_i is the VIF calculation result of the vertebral region with the smallest similarity in each set. Assuming p_k is the maximum value in S , then the group K is the optimal matching result. Finally, the vertebral regions in the C-arm image are labeled based on the vertebral annotations in group K to obtain a multi-vertebra matching result. The multi-vertebra matching process is shown in Fig. 7. The result map (red) after image matching is shown in Fig. 8.

Fig. 7

Multi-vertebra process.

Fig. 8

Result map (red) after image matching.

3 Results

3.1 Experimental Setup

We performe all experiments in a computer equipped with an Intel i7-10700 CPU and an RTX-2070 Super graphics processor. We used the TensorFlow 1.14 deep learning framework to implement the YOLOv3 model and the Mobile-Unet model in our experiments. For the YOLOv3 model, we set the batch size to 4 and the learning rate to 0.001 in the freeze part, and set the batch size to 2 and the learning rate to 0.0001 in the unfreeze part. For the U-Net model, we set the batch size to 4 and the learning rate to 0.001. All models are trained using Adam as the optimizer and the categorical cross-entropy is used as the loss function.

3.2 Metrics

In the vertebra detection section, we use recall, F1-Score and average precision (mAP) as evaluation criteria and follow the metrics of true positive (TP, vertebra detected and matches ground-truth), false positive (FP, vertebra detected but does not match ground-truth) and false negative (FN, vertebra not detected but present in the ground-truth). Recall indicates the proportion of true positives in the overall predicted positive samples in the prediction results, which can reflect the under-prediction situation. The F1-score is calculated by combining true positives, false positives and false negatives, reflecting the boundary matching degree between the predicted results and the true value. The mAP first calculates the average precision (AP) of each class (vertebrae and background), and then averages the AP according to the number of classes N.

$Recall = \frac{TP}{TP + FN}$ (2)

$F 1 –Score = \frac{2 * TP}{2 * TP + FP + FN}$ (3)

$mAP = \frac{1}{N} \sum_{i = 1}^{N} {AP}_{i}$ (4)

In the vertebral segment matching section, we use accuracy as the evaluation criterion, and the result was determined as True (T) when the predicted result in the C-arm image was consistent with the corresponding vertebral serial number in the DR image, and False (F) if otherwise.

$Accuracy = \frac{T}{T + F}$ (5)

3.3 Vertebrae Detection Result

In the vertebrae detection step, we measure the model performance with Dataset 4 as the validation set. The performance results are shown in Table 2, and the confusion matrix of vertebrae detection are shown in Fig. 9. Relatively speaking, the YOLOv3 model has better detection performance on the DR images than the C-arm X-ray images. In the detection performance of C-arm images, the YOLOv3 model obtained 0.87 mAP, 0.92 Recall, and 0.94 F1-score. In the detection performance of DR images, the YOLOv3 model obtained 0.96 mAP, 0.97 Recall, and 0.95 F1-score. Overall, the results show that VDVM has excellent performance in the vertebral bone detection task.

Fig. 9

Confusion matrix of vertebrae detection.

Table 2

Vertebrae detection result

Image type	mAP	Recall	F1-Score
C-arm	0.87	0.92	0.94
DR	0.96	0.97	0.95

3.4 Vertebrae Matching Result

In the vertebrae segment matching step, we set up a comparison experiment to examine the effect of angle correction of vertebrae on the matching accuracy. We performed similarity matching using VIF metrics for Dataset 4 without angle correction and with angle correction, as shown in Table 3, and we obtained matching accuracies of 0.65 and 0.73, respectively. The results indicate that the correction of vertebrae angle can noticeably improve the accuracy of vertebral segment matching, as well as that VDVM has adequate reliability in vertebrae segment matching. The final experimental rendering is shown in Fig. 10.

Table 3
Vertebrae detection result

Using angle correction Without angle correction

Accuracy 0.73 0.65

	Using angle correction	Without angle correction
Accuracy	0.73	0.65

Fig. 10

Experimental rendering.

4 Discussions

Comparing DR images and C-arm X-ray images to determine the surgical location is a necessary step in spine surgery. Reliable C-arm X-ray image identification is of great significance for improving surgical safety and efficiency. In this research, we proposed a VDVM framework for C-arm image recognition based on deep learning methods, and examined the effectiveness of the framework through a dataset of C-arm X-ray images and DR images from the same patient. In the task of vertebra detection, we used the YOLOv3 model to detect the vertebra in the images. Since the C-arm X-ray images used for validation were obtained from mobile phones, the images have serious noise interference and image blur. We have performed image augmentation on the training set by adding Gaussian noise and Gaussian blur to make the model more robust. At the same time, the image quality is improved through the preprocessing method, and an mAP of 0.87 is obtained in the YOLOv3 model. Due to the lack of DR images, we cropped the full-length X-ray images and extracted the image area from the thoracic to the lumbar region as a simulated DR training set for training, and obtained an mAP of 0.96, which shows that the simulated training set can be effectively used for DR image detection task. In the vertebral segment matching task, we use the VIF metric to match the vertebrae. We segmented individual vertebral images based on Mobile-Unet, and performed angle correction based on the segmentation result to reduce the effect of vertebrae angle inconsistency between DR images and C-arm X-ray images. By comparing with the image matching results with the images without angle correction, it is demonstrated that vertebra correction can improve the accuracy of image matching, thereby improving the identification performance of the VDVM framework. In clinical surgery, the VDVM framework can identify C-arm X-ray images based on preoperative DR images and intraoperative C-arm X-ray images, thereby serving as an auxiliary means for clinicians to determine the area located by the C-arm X-ray machine. Comparing to previous studies, our study first applied the deep learning method to the identification of C-arm X-ray images and DR images, which expanded the application range of deep learning in C-arm X-ray images. The designed VDVM vertebra identification framework includes two parts: vertebra detection and vertebra matching, and the feasibility of the framework is confirmed through the analysis of experimental results.

There is room for further improvement in the vertebral segment matching performance of VDVM. As the C-arm images matched with the DR images were taken by mobile devices, resulting in variable image quality, the images were affected by factors including illumination, shooting angle, and noise. At the same time, due to the inconsistency of the intensity of the influencing factors, excessive and unified preprocessing will lead to blurred images and degrade the performance of the method. Therefore, we only performed median filtering in the preprocessing stage, and did not perform further processing for specific noise (ripple noise, Gaussian noise, etc.), illumination, and shooting angle. In addition, due to the influence of the perspective angle and intraoperative landmarks in the C-arm image itself, the vertebral shape in some images is very different from that in the DR image, and effective matching information cannot be obtained in a limited range of vertebrae. Finally, this experiment only used the front view of the image and did not use the lateral view of the image, and some vertebral information has not been effectively used.

5 Conclusions

In this study, we propose a new C-arm X-ray image identification framework VDVM, which includes two steps of vertebral detection and vertebral segment matching. The experimental results show that the framework can provide an accurate identification result. In the vertebral segment matching task, there is still room for further improvement in the accuracy of vertebral segment matching. In future work, we will further research the vertebral segment matching task in the direction of metric improvement, further improvement of image quality, and combined evaluation using front and lateral views.

References

Tjardes

, Shafizadeh

, Rixen

, Paffrath

, Bouillon

, Steinhausen

E.S.

, et al., Image-guided spine surgery: State of the art and future directions, European Spine Journal 19(1) (2010), 25–45.

Holly

L.T.

and Foley

K.T.

, Intraoperative spinal navigation, Spine 28(15S) (2003), S54–61.

Wang

, Traub

, Heining

S.M.

, Benhimane

, Euler

, Graumann

, et al., Long bone X-ray image stitching using camera augmented mobile C-arm. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2008. pp. 578–86.

Yadav

, Kulkarni

and Patole

, Image quality assessment, denoising and comparative analysis using filters for C-arm X-ray images. In: TENCON 2019-2019 IEEE Region 10 Conference (TENCON). IEEE; 2019. pp. 2143–2147.

Shen

, Wu

and Suk

H.I.

, Deep learning in medical image analysis, Annual Review of Biomedical Engineering 19 (2017), 221–248.

, Liang

, Zhang

, An

and Tan

, Automatic lumbar vertebrae detection based on feature fusion deep learning for partial occluded C-arm X-ray images. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology society (EMBC). IEEE; 2016. pp. 647–50.

Esfandiari

, Newell

, Anglin

, Street

and Hodgson

A.J.

, A deep learning framework for segmentation and pose estimation of pedicle screw implants based on C-arm fluoroscopy, International Journal of Computer Assisted Radiology and Surgery 13(8) (2018), 1269–1282.

Kausch

, Thomas

, Kunze

, Privalov

, Vetter

, Franke

, et al., Toward automatic C-arm positioning for standard projections in orthopedic surgery, International Journal of Computer Assisted Radiology and Surgery 15(7) (2020), 1095–1105.

Esteban

, Grimm

, Unberath

, Zahnd

and Navab

, Towards fully automatic Xray to CT registration. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2019. pp. 631–9.

10.

Oulbacha

and Kadoury

, MRI to C-arm spine registration through Pseudo-3D CycleGANs with differentiable histograms, Medical Physics 47(12) (2020), 6319–6333.

11.

Naik

R.R.

, Anitha

, Bhat

S.N.

, Ampar

and Kundangar

, Realistic C-arm to pCT registration for vertebral localization in spine surgery: A hybrid 3D-2D registration framework for intraoperative vertebral pose estimation, Medical & Biological Engineering & Computing 60(8) (2022), 2271–2289.

12.

, Von Recum

, Wentzensen

and Grützner

, Iso-C (3D0-assisted) navigated implantation of pedicle screws in thoracic lumbar vertebrae, Der Unfallchirurg 106(11) (2003), 907–913.

13.

Redmon

and Farhadi

, YOLOv3: An incremental improvement. arXiv preprint arXiv:180402767. 2018.

14.

Redmon

and Farhadi

, YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. pp. 7263–7271.

15.

Park

J.H.

, Hwang

H.W.

, Moon

J.H.

, Yu

, Kim

, Her

S.B.

, et al., Automated identification of cephalometric landmarks: Part 1—Comparisons between the latest deeplearning methods YOLOv3 and SSD, The Angle Orthodontist 89(6) (2019), 903–909.

16.

Ronneberger

, Fischer

and Brox

, U-Net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention. Springer; 2015. pp. 234–41.

17.

Sandler

, Howard

, Zhu

, Zhmoginov

and Chen

L.C.

, MobileNetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. pp. 4510–4520.

18.

Siddique

, Paheding

, Elkin

C.P.

and Devabhaktuni

, U-Net and its variants for medical image segmentation: A review of theory and applications, IEEE ACCESS 9 (2021), 82031–82057.

19.

, Cao

, Liang

, Chen

and Zhan

, Medical image segmentation based on U-Net: A review, Journal of Imaging Science and Technology 64 (2020), 1–12.

20.

Howard

A.G.

, Zhu

, Chen

, Kalenichenko

, Wang

, Weyand

, et al., MobileNets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv:170404861. 2017.

21.

Zhou

, Chen

, Wang

and Huan

, Review of research on lightweight convolutional neural networks. In: 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC). IEEE; 2020. pp. 1713–1720.

22.

Sheikh

H.R.

and Bovik

A.C.

, Image information and visual quality, IEEE Transactions on Image Processing 15(2) (2006), 430–444.

VDVM: An automatic vertebrae detection and vertebral segment matching framework for C-arm X-ray image identification

Abstract

BACKGROUND:

OBJECTIVE:

METHODS:

RESULTS:

CONCLUSIONS:

Keywords

1 Introduction

2.1 Dataset

2.3.1 Data Preprocessing and Augmentation

2.4.1 Data Preprocessing and Augmentation

2.4.2 Mobile-Unet Model

2.4.3 Angle Correction

3.1 Experimental Setup

3.2 Metrics

Table 3 Vertebrae detection result Using angle correction Without angle correction Accuracy 0.73 0.65

5 Conclusions

References

Table 3
Vertebrae detection result

Using angle correction Without angle correction

Accuracy 0.73 0.65