Abstract
BACKGROUND:
Digital X-ray imaging is essential for diagnosing osteoporosis, but distinguishing affected patients from healthy individuals using these images remains challenging.
OBJECTIVE:
This study introduces a novel method using deep learning to improve osteoporosis diagnosis from bone X-ray images.
METHODS:
A dataset of bone X-ray images was analyzed using a newly proposed procedure. This procedure involves segregating the images into regions of interest (ROI) and non-ROI, thereby reducing data redundancy. The images were then processed to enhance both spatial and statistical features. For classification, a Support Vector Machine (SVM) classifier was employed to distinguish between osteoporotic and non-osteoporotic cases.
RESULTS:
The proposed method demonstrated a promising Area under the Curve (AUC) of 90.8% in diagnosing osteoporosis, benchmarking favorably against existing techniques. This signifies a high level of accuracy in distinguishing osteoporosis patients from healthy controls.
CONCLUSIONS:
The proposed method effectively distinguishes between osteoporotic and non-osteoporotic cases using bone X-ray images. By enhancing image features and employing SVM classification, the technique offers a promising tool for efficient and accurate osteoporosis diagnosis.
Introduction
The hallmarks of osteoporosis include decreased bone mass and the breakdown of bone structure. An increased risk of fracture is a direct result of the deterioration of bone tissue. Osteoporosis is a condition of the skeleton characterised by weakened bones and an increased susceptibility to breaks [1]. Osteoporosis is now understood to be a condition of the skeleton, according to an updated definition. It is estimated that osteoporosis will impact more than 200 million women worldwide. In fact, bone fractures occur somewhere in the world every three seconds as a direct consequence of this disease [2].
In clinical practice, the traditional approach for diagnosing osteoporosis consists of measuring the bone mineral density (BMD) and comparing the results to those of a young adult in good health. The T score is used to represent the outcomes of this test. Normal bone density is defined as having a T–score that is less than or equal to 1. T score between −1 and −2.5 indicates osteopenia, which is a form of bone loss that is less severe than osteoporosis, may be present. A T score of less than 2.5 suggests that osteoporosis may be present in the bone. However, BMD has certain shortcomings in forecasting the probability of fracture [3].
Several different methodologies employ texture analysis to extract characteristics from X-ray pictures of the bone to improve diagnostic accuracy and automate diagnosis. In one of these approaches, the roughness of bone X-ray images can be modelled using fractal analysis [4–9]. Other methods employ transform techniques [10, 11]. Other studies have also used these methods. A recent work [12] captured the statistical behaviour of phase coefficients by combining wavelet decomposition with parametric circular models. These approaches are all based on low-level characteristics and features. Our deep architecture is capable of learning high-level characteristics that may be used during the classification stage to differentiate between osteoporosis (Yes/No).
The innovative assisted osteoporosis diagnostic technique proposed in this article is composed of two components: the autoencoder-based approach and an SVM classifier. The operations are sequentially carried out to categorise a picture into its corresponding semantic category (Yes or No), which is determined by the pixels in the image. In addition, when autoencoders provide an unsupervised data representation, our method is categorised as a semisupervised approach. Unlabelled samples, which are easier and less expensive to obtain than labelled samples, are used when training the autoencoder.
By identifying those at high risk of developing osteoporosis, this technology could pave the way for early intervention and preventative measures. Since X-ray imaging does not require any invasive procedures and is widely available, it could be perfect for screening on a larger scale. On top of that, lossless compression is easy on the computer and could end up being cheap enough for widespread use. While X-rays can provide some insight into osteoporosis risk, the main information they can provide is about bone density and structure. Hormonal fluctuations, dietary habits, and genetics are among the other important factors. There is a strong correlation between the training data and the algorithm’s accuracy, and between the risk estimate and specified attributes. The proposed lossless compression analysis, when combined with additional risk indicators, may provide a more complete risk score, which could lead to better prediction accuracy.
Literature review
The most recent advancements in medical technology use more efficient sensors, producing images of a greater quality using a variety of approaches [13, 14]. One of the most widely used medicinal treatments, X-ray imaging, and is an imaging modality that is used in a variety of medical settings, including the diagnosis of bone fractures and the treatment of degradation, infections, and tumours [15, 16]. Because of the demand for X-ray imaging, the number of produced images requires a large amount of storage space. To reduce the amount of required storage space, the file sizes of these images must be reduced. To achieve this, the characteristics of the photographs must be investigated. One of the characteristics is the presence of redundancy in the visuals. Redundancy may be identified in a variety of statistical and psycho-visual ways to achieve this classification [17]. The lack of sensitivity of human beings is known as psycho-visual redundancy, and the visual system is observed in relation to specific variations in the intensity of a picture.
Lossy, or near-loss, compression methods are the most common approaches for not incurring data loss. Encoding with data loss during compression techniques relies on close approximations to accurately portray the material that generates large ratios of compression production [18]. The compression methods and lossless techniques also have the potential to produce significant compression ratios [19].
Additionally, a predetermined cap is placed on the total quantity of removing unnecessary info. Images are compressed using a process that does not result in any loss. This process is encoded with a high quality level while maintaining a low compression ratio. Medical photos and their accompanying information are highly significant, and any changes to erroneous results are produced by the region of interest (ROI) of these photos.
Compression without information loss is essential for diagnosis in several medical applications [20]. The lossless compression of ROI data and its segmentation might provide a solution to this issue. A good compression ratio could be achieved while maintaining the integrity of crucial details. However, extracting ROIs from X-ray pictures of bone is complex. Numerous obstacles, such as the lack of homogeneous intensity in bone and tissue regions and the poor contrast of these pictures, must be overcome. Standard segmentation approaches, such as the watershed, Otsu [21], and region growth [22] approaches, perform poorly in ROI extraction because of these issues. These approaches can erroneously identify certain elements of the ROI as not being part of the ROI, excluding bone portions from the ROIs as a result.
The findings of this work do not reflect precise borders. If the most extensive border of the identified edges is considered rather than the limit of the ROI, then some of the tissue and some of the bones are lost. Numerous different lossy, near lossless, and lossless compression algorithms have been developed for compressing CT and MRI medical pictures [23–25]. Additionally, two approaches that perform highly were presented for extracting ROIs from CT images and angiograms in [20] and [25], respectively. However, a technique for accurate ROI extraction from bone X-ray images has not been provided.
In this article, a novel method is presented for extracting ROIs from bone X-rays images. The technique relies on analysing how the light in the backdrop is distributed. The intensity of the barrier separating ROI areas from non-ROIs are determined. The pixel intensities that compose the background area are given the same value, increasing the statistical and geographical redundancy. Lossless compression is applied to the created picture, and this picture includes all necessary medical information. To accomplish this goal, an appropriate prediction strategy is used. The ROI picture pixels and the binary mask that specifies which pixels belong to the ROI have been effectively compressed. Comparisons are made between the proposed approach and industry-standard, high-performance lossless compression techniques. The experimental results indicate that the suggested strategy is both successful and efficient.
Proposed methodology
The proposed approach starts by extracting specific characteristics from X-ray images that are linked to bone density and texture. These features may not be immediately applicable to CT scans or MRI, respectively, because of the different image qualities that were mentioned earlier. Training on a large dataset of X-ray images that correspond to osteoporosis diagnosis is likely to be required by the suggested method. Modularization requires datasets similar to those used for magnetic resonance imaging (MRI) and computed tomography (CT) scans.
Although there is hope for lossless compression’s use in X-ray osteoporosis detection, few researches have investigated its potential in other imaging modalities. Using characteristics acquired from CT scans to diagnose osteoporosis has been the subject of several investigations. However, these approaches often necessitate more complex image processing methods, and thus may not be readily applied in clinical environments. There is a dearth of research on the use of MRI to detect osteoporosis; nevertheless, small studies that have used parameters related to bone marrow composition and signal intensity have shown promising results.
In this paper, a neural network is trained to provide accurate results by using an autoencoder, which learns abstract features by minimising the difference between the input x and output y representations of the data. The autoencoder makes this mistake when discovering the connections between the two datasets. The autoencoder uses unlabelled data for learning; hence, it can perform unsupervised learning.
Autoencoders are neural networks that can learn to recreate their own input data. This is accomplished by first encoding the input data into a representation with fewer dimensions and then decoding the lower-dimensional representation to obtain the original data. The autoencoder is trained to minimise the discrepancy between the data it receives as input and the data it produces during reconstruction. For the autoencoder to be able to learn abstract features, it must discover an effective and instructive method of representing the input data. This indicates that the lower-dimensional representation must contain the most essential aspects of the input data while simultaneously being as compact as humanly feasible. This is achieved by the autoencoder repeatedly modifying its weights and biases to reduce the amount of error introduced by the reconstruction process. The autoencoder progressively learns to identify the most relevant aspects of the input data and to represent those features in an efficient and information manner as it continues to learn. The input data reconstruction accuracy of the autoencoder is evaluated based on the amount that the input image differs from the output picture. A negligible difference indicates that the autoencoder has successfully learned to represent the input data in a manner that is not only effective but also instructive. A significant difference indicates that the autoencoder did not learn to accurately represent the input data because the data was too complex. The autoencoder learns abstract aspects of the data, and these are the most essential features for recreating the original data. These characteristics are often nonlinear and hierarchical, and they can efficiently and instructively represent the input data. They may be used to describe the data in this manner.
In this paper, the X-ray picture of the bone is divided into two regions: the region of interest (ROI) and the background. While the background encompasses the surrounding tissue as well as artefacts, the ROI contains the bone tissue. A lossless compression method is then used on the ROI to compress it. This method safeguards the confidentiality of the patient’s medical information included inside the ROI. Then, a classifier that can detect osteoporosis is trained using the compressed ROI. A collection of bone X-ray images whose osteoporosis status is previously determined is used to train the classifier. Afterwards, the classifier is utilised to predict the osteoporosis state of unlabelled bone X-ray images.
Let us assume that x ∈ R
x
is the input vector and that the weights for the autoencoder are represented as W ∈ Ri×j with its hidden layer H ∈ R
y
such that H = f (Wx + bias). Here, f (.) is a sigmoid function that is represented as
For autoencoder training, the cost function is established such that
In this paper, the results of applying adaptive histogram equalisation and then altering the contrast of the pictures to enhance previously unseen details and characteristics of interest are analysed. The adaptive histogram equalisation technique is conducted on small, discrete parts of a picture (windows), as opposed to the entire picture. Each window’s contrast is increased to make the resultant histogram more closely follow a normal distribution [31].
The proposed method primarily focuses on bone density and texture, neglecting other crucial factors like patient age, medical history, lifestyle, and genetic predisposition. These factors significantly influence treatment decisions. The method provides a snapshot of bone health at a specific moment, not accounting for potential changes over time. Monitoring bone health dynamics is crucial for individualizing treatment. Osteoporosis treatment also involves various factors like medication, exercise, and dietary modifications. A single algorithm might not be able to capture the intricate interactions between these elements.
Compression
X-ray pictures require noise reduction and smoothing for efficient segmentation. Therefore, a noise-reduction preprocessing step is required prior to the classification stage.
If you merely blur the picture using a Gaussian filter, the edges become less distinct. The concept of guided image filtering, developed in [32] and used to calculate the output picture given the guidance image’s contents, is relatively new.
The method is efficient and reliable, and it keeps gradients and edges intact.
The next stage relies heavily on this stage being completed successfully. The following phase, which detects incorrect border intensities, is necessary because of the noisy image’s large variations in the background area.
Boundary detection
Problems such as intensity overlaps on tissue and bone segments are not present when trying to determine the intensity of border pixels that may be utilised to separate the black backdrop of an image. The intensities of the background pixels are lower than the other pixels. In most cases, bone segment intensities are greater than the tissue pixel intensities. Some intensity overlaps between tissue and bone pixels may be present, but this overlap does not occur in the background. As a result, there is not much variation in the strength of the backdrop. To rephrase, the intensity levels of pixels inside the same segment are all approximately the same. As a result, the histogram places these pixels in heavy regions. This finding suggests that negative concavities on the histogram might serve as a quick and easy measure for dividing the picture into sections. The backgrounds in bone structure X-ray images are often quite broad and consistently black. Tissues and bones make up the ROI, which is nonuniform and brighter than the background. Negative concavity is utilised to distinguish the background from the ROI based on its characteristics. The smoothed image histogram’s initial negative concavity is utilised to locate the background’s borders. Using these intensity data, a binary mask of the same dimensions as the original image can be made. Pixels whose intensities fall below the chosen threshold value are not shown in this mask. Instead, a value of 1 is used in the mask for every pixel that is over the threshold value by setting the background intensity values to zero in the non-ROI parts of the picture. No effort was made to reduce the information richness of the ROI pixels in this picture. In this mask, the backdrop of the X-ray image is represented by black pixels, whereas the ROI area is represented by the white pixels. Background pixels in bone X-ray pictures are normalised to a single value since they provide no meaningful information. Importantly, the human visual system is often unable to detect these alterations unless they alter on defining features. Because all backdrop pixels are converted to the same intensity level, the statistical redundancy is improved. In the compression phase of the method, this additional statistical redundancy is used to gain efficiency.
A run length encoding (RLE) [33] is utilised to compress the binary mask, each element of which can take only the values 0 or 1. RLE first serialises the picture and then counts how many consecutively similar values it finds before encountering a new value. Since our ROI mask contains several consecutive zeros and ones, RLE works well in this scenario.
Because of the spatial redundancy, pixels included inside an ROI can benefit from lossless compression. This is accomplished by identifying neighbouring pixels in regions that have comparable intensity levels. Because of the similarities, overlap exists in the available space. Geographical redundancy can be converted into statistical redundancy using prediction methods. ALCM involves estimating the value of the current pixel by utilising a linear combination of the values of neighbouring pixels [34, 35]. During the encoding process, the linear combination weights are computed dynamically. From the very beginning, equal weight is provided to each of the pixels that are connected to one another [36].
It is hardly surprising that there are no misses in the non-ROI projections. The formerly insignificant errors in forecasting have risen considerably. The first region of interest (ROI) mask is recorded here. The mask may be reconstructed by the decoder if the decoder knows the arithmetic code stream, its length, and the bit that indicates where the mask begins. For reconstruction purposes, the decoder also receives an arithmetic code stream that includes the ROI pixel defects.
Feature extraction and image subdivision
The method’s capacity to extract robust and meaningful features from a variety of X-ray photos is critically necessary for successful osteoporosis detection. The development of features that are less affected by changes to the image necessitates further research. Datasets used in the proposed study can include a wide variety of X-ray image types, increasing the likelihood that the model will be applicable to real-world clinical scenarios. More extensive and diverse datasets are required for future evaluations.
Deep learning approaches, which automatically learn intermediate and high-level abstractions from raw data (such as photos), have been adopted and used effectively in several computer vision, audio processing, and language comprehension applications. Although these techniques perform best on large datasets with low-resolution pictures, medical field datasets do not necessarily meet these standards. A sliding window procedure is used to mitigate these problems. Each picture is broken up into image patches, each of which is 32 pixels square. The autoencoder is trained with the image patches to extract the output feature vector.
Grouping and summarising
Pooling is employed to build a single vectorial representation for each picture using a small number of local characteristics. This output vector acts as a signature of the picture and a uniform representation for all purposes. The bone X-rays are preprocessed before image classification. It is assumed that the number of patches in each picture is p and that the pooling procedure consists only of adding together the vectors y p , where p = 1, …, k.
Classification is the last stage in our model and involves taking the pooled images and applying a class label, either osteoporosis or nonosteoporosis. The coordinates of each ROI (label) are in the centre of the bounding box that has the greatest confidence score for a particular label. Meanwhile, the conversion in LD is only created within the predetermined range, which comprises of all the training samples with the same ROI label.
Specifically, the training set is first scanned to locate the lowest and highest possible spots within the allotted range. The dilation procedure is chosen because it produces more reliable results than other procedures. Next, the candidate bounding boxes are narrowed down to include only the bounding boxes that are contained within the given range. Finally, the coordinates of each ROI (label) are found by finding the centre of the candidate box with the greatest score for that label.
For the last phase of classification, the SVM is discriminatively trained using the dataset of picture signatures and their associated class labels. The SVM achieves excellent separation across classes by finding a collection of hyperplanes with the highest distance to the closest training example.
Results and discussion
During this research, an open-source database that stores X-ray images of skeletal structures is used [37]. Figure 1 shows the input images: (a) Input Image 1 and (b) Input Image 2.

(a) Input Image 1 (b) Input Image 2.
Classifying these samples is challenging because differentiating between the two populations based on only the visual inspection of these X-ray photographs is difficult.
The suggested technique is evaluated by using a procedure known as 10-fold cross validation, in which the original dataset is arbitrarily segmented into ten distinct folds of the same size. Afterwards, each fold is subjected to the same set of tests under the same conditions. One of the 10 folds is taken out of circulation and employed as the validation data for testing the model. The other nine folds are utilised as training data. Within the parameters of our inquiry, an autoencoder with three distinct layers of information is considered. To train the autoencoder, the training images are divided into patches; each patch is 32 by 32 pixels, and consequently, each image includes 64 patches. The size of each patch is determined by the training photographs. The dimensions of the training photographs are used to calculate the appropriate sizes for each patch in the final image. The autoencoder-based approach uses a column vector as its input layer. This column vector is used to represent the pixel intensities of the square patch. Figure 2 shows the compression results based on the mask for Image 1: (a) mask, (b) patch size 32×32,5-fold validation, (c) patch size 64×64,5-fold validation, and (d) patch size 64×64, 10-fold validation. Figure 3 shows the compression results based on ROI for Image 1: Row 1, Linear Classifier; Row 2, Sigmoid classifier; Row 3, Polynomial classifier (a) patch size 32×32,5-fold validation k (b) patch size 64×64,5-fold validation (c) patch size 32×32, 10-fold validation (d) patch size 64×64, 10-fold validation. Figure 4 shows the compression results based on the mask for Image 2: (a) mask, (b) patch size 32×32, 5-fold validation, (c) patch size 64×64,5-fold validation, and (d) patch size 64×64, 10-fold validation. Figure 5 shows the compression results based on ROI for Image 2; Row 1: Linear Classifier, Row 2: Sigmoid classifier, Row 3: Polynomial classifier (a) patch size 32×32, 5-fold validation k (b) patch size 64×64, 5-fold validation (c) patch size 32×32, 10-fold validation (d) patch size 64×64, 10-fold validation. Figure 6 shows the lossless compression results: Row 1: Image 1, Row 2: Image 2 (a) after lossless compression and (b) before lossless compression. After the deep neural network has gained an understanding of the regional characteristics, the signatures of the training images are generated by combining the characteristics retrieved from the different patches that are equivalent to the same picture. Then, the performance of the network is analysed using these fingerprints. To properly train the SVM classifier, these signatures are first connected to their corresponding labels. Figure 7 shows the validation error performance. Figure 8 represents the ROC curve, and Fig. 9 shows the confusion matrix. The computational complexity of the proposed method has been shown in Table 1.
Computational complexity

Compression results based on the mask for Image 1: (a) mask, (b) patch size 32×32,5-fold validation, (c) patch size 64×64,5-fold validation, and (d) patch size 64×64, 10-fold validation.

Compression results based on the ROI for Image 1. Row 1: Linear classifier; Row 2: Sigmoid classifier; Row 3: Polynomial classifier (a) patch size 32×32, 5-fold validation k (b) patch size 64×64, 5-fold validation (c) patch size 32×32, 10-fold validation (d) patch size 64×64, 10-fold validation.

Compression results based on the mask for Image 2: (a) mask, (b) patch size 32×32, 5-fold validation, (c) patch size 64×64, 5-fold validation, and (d) patch size 64×64, 10-fold validation.

Compression results based on ROI for Image 2. Row 1: Linear classifier; Row 2: Sigmoid classifier; Row 3: Polynomial classifier (a) patch size 32×32, 5-fold validation k (b) patch size 64×64, 5-fold validation (c) patch size 32×32, 10-fold validation (d) patch size 64×64, 10-fold validation.

Lossless compression results. Row 1: Image 1; Row 2: Image 2 (a) after lossless compression (b) before lossless compression.

Validation error performance.

ROC curve.

Confusion matrix.
Comparative research is conducted on the classification performances of several SVM kernel functions, including linear, polynomial, and sigmoid functions. Table 2 shows the performance metrics evaluation for a patch size of 32×32 and 10-fold validation. Table 3 shows the performance metrics evaluation for a patch size of 64×64 and 10-fold validation. Table 4 shows the performance metrics evaluation for a patch size of 32×32 and 5-fold validation. Table 5 shows the performance metrics evaluation for a patch size of 64×64 and 5-fold validation.
Performance metrics evaluation for patch size 32×32, 10-fold validation
Performance metrics evaluation for patch size 32×32, 10-fold validation
Performance metrics evaluation for patch size 64×64, 10-fold validation
Performance metrics evaluation for patch size 32×32, 5-fold validation
Performance metrics evaluation for patch size 64×64, 5-fold validation
The obtained rates demonstrate that the linear classifier is more accurate than the other kernel functions. This is identified by comparing the acquired rates to the other kernel functions. After the picture is smoothed, the edge of the ROI is found, and the boundary of the ROI is identified. Then, a binary mask for each image is generated to distinguish the ROI from the portions that were not in the ROI. Afterwards, a conversion is performed, combining the intensities of the background pixels into a single value.
During the simulation, the proposed ROI segmentation is applied to each image, and the pixel values of the non-ROI areas are set to zero. This first stage of the preprocessing phase is performed before the simulation begins. Consequently, all methods of compression receive the exact same photographs. The conclusions drawn from these studies indicate that our method can achieve the highest possible compression ratio while simultaneously maintaining the lowest possible bit rate in each of these instances.
Use of hardware accelerators such as graphics processing units (GPUs) or dedicated artificial intelligence processors can considerably cut down on the amount of time required to complete computationally expensive tasks such as feature extraction and model inference. Significantly lowering the computational cost of the machine learning model and enabling real-time performance can be accomplished by the optimization of the model through the pruning of parameters, quantization, or the utilization of efficient architectures. It is possible to improve the feasibility of real-time implementation by selecting techniques for compression, feature extraction, and model inference that need less processing effort.
It is possible that the method will result in a large amount of processing overhead due to the fact that it involves methods such as lossless compression, feature extraction, and model inference. As a result of this, it is possible that it will be necessary to have high-performance computer resources, which would limit its application to real-time applications. In order to fulfill the generalizability criteria, it is important to carry out additional evaluations utilizing a considerable number of different datasets and imaging settings.
In this paper, deep learning is the foundation of a unique proposed approach for determining whether a patient has osteoporosis. In this study, unsupervised high-level feature extraction from pixel intensities was accomplished using the autoencoder algorithm. The proposed approach may be broken down into three distinct actions that follow one another. First, the bone X-ray images are processed, aiming to improve the overall quality of the images. Next, image subdivision and feature extraction are performed; the stage is primarily employed to extract high-level characteristics from the picture patches. Finally, a pooling method generates picture signatures using the high-level attributes obtained in the preceding phase. A support vector machine is applied for classification to identify instances of osteoporosis (Yes/No). Because applying deep learning to the osteoporosis diagnostic process has shown promising results, with an AUC of approximately 90.08%, this approach should receive more attention and be further investigated. Constructing this instrument will be the major focus of our future work.
Employing modern equipment with sensitive sensors results in the production of high-quality photographs that are enormous in size and include more relevant information. Lossless compression techniques are used for storing, retrieving, and transmitting these pictures containing crucial information. However, the compression ratios achieved by these technologies are not very high. Most effective image compression techniques exhibit some degree of data loss, which is unacceptable for medical imaging. As a result, lossless compression techniques are needed. In this paper, a new method for processing bone X-rays to filter out irrelevant information is proposed. No useful health data is present in the background pixels. Histogram dispersion is the foundation of the proposed approach. To protect the useful medical information contained inside pictures, the region of interests (ROI) are isolated from the backgrounds and compressed using a lossless compression approach. The compression ratios of the actualised findings demonstrate the efficacy of the suggested approach in minimising the statistical and spatial redundancy.
Footnotes
Acknowledgments
The authors are thankful to the Deanship of Scientific Research at Najran University for Supporting this work under the Distinguished Research Program grant code (NU/DRP/MRC/12/30).
Author contributions
Khalaf Alshamrani; designed the study and performed the experiments and Hassan A. Alshamrani; performed the experiments, analyzed the data.
Conflict of interest
All authors have no conflicts of interest.
