Abstract
BACKGROUND:
Hip fracture is considered one of the salient disability factors across the global population. People with hip fractures are prone to become permanently disabled or die from complications. Although currently the premier determiner, bone mineral density has some notable limitations in terms of hip fracture risk assessment.
OBJECTIVES:
To learn more about bone strength, hip geometric features (HGFs) can be collected. However, organizing a hip fracture risk study for a large population using a manual HGFs collection technique would be too arduous to be practical. Thus, an automatic HGFs extraction technique is needed.
METHOD:
This paper presents an automated HGFs extraction technique using regional random forest. Regional random forest localizes landmark points from femur DXA images using local constraints of hip anatomy. The local region constraints make random forest robust to noise and increase its performance because it processes the least number of points and patches.
RESULTS:
The proposed system achieved an overall accuracy of 96.22% and 95.87% on phantom data and real human scanned data respectively.
CONCLUSION:
The proposed technique’s ability to measure HGFs could be useful in research on the cause and facts of hip fracture and could help in the development of new guidelines for hip fracture risk assessment in the future. The technique will reduce workload and improve the use of X-ray devices.
Introduction
In medical imaging, X-rays have led to improvement in seeing numerous medical conditions inside the human body. Meanwhile, hip fracture has become one of the most severe and frequent bone injuries experienced across most of the Western world population. More than 20% of women with a hip fracture die as a consequence of the fracture [1]. Because bone fracture is a life-threatening condition for elderly people, it is critical to comprehensively identify the characteristics that best predict hip fracture risk (HFR). Bone mass is typically considered the premier determiner of bone strength, but due to limitations associated with accurate bone mass calculation, we need to gather more information about a person’s bones before making a hip fracture prediction.
Based on engineering principles, hip geometric measurements should be related to femoral strength and HFR. Faulkner et al. studied femoral geometry and hip fracture, and their study shows that assessment of hip geometric features (HGFs) can improve the prediction of HFR [2]. Lee et al. conducted a study in 2016 about the relationships between hip geometry, bone mineral density (BMD), and HFR in premenopausal women. They concluded that HGFs are BMD-independent predictors of HFR in premenopausal women [3]. Exact soft tissue estimation over bone, areal BMD calculation for a fat person with weak bone, and areal BMD calculation for a thin person with healthy bone are some of the prime factors contributing to accurate HFR calculation using BMD only. It is quite challenging to appraise the bond between bone strength and anthropometric body measures such as height and weight without knowing actual HGFs [4, 5].
Usually, an increase in body weight is considered an increase in bone mineral content (measured in grams) and bone geometric strength. Overweight people are considered to have high BMD and low HFR [6–8]. But contrary to previous studies, some recent studies have proven that obese people actually have lower BMD against their weight and have increased HFR [9–11]. The limitations of BMD have brought into consideration other factors such as the shape and size of the bone to be studied. Therefore HGFs acquisition is significant for more accurate HFR assessment. The extraction of HGFs from DXA images allows us to study the characteristics that contribute to accurate HFR prediction more comprehensively. Hip geometry is important clinically because of diagnosis, fracture risk of the femur neck, trochanteric fractures, developmental dysplasia of the hip, slipped upper femoral epiphysis, neuromuscular disorders of the lower extremity, and avascular necrosis that result in growth arrest of the upper femur. The proximal femur’s skeletal integrity or lack thereof with age is the most common risk factor for hip fracture and can be life-threatening in older adults. It is crucial to take into consideration the effects of any medication on hip geometry during patient treatment for those diagnosed with osteoporosis [12].
Extracting HGFs requires a lot of effort and is time-consuming with a lack of process standardization. To extract geometric features, Beck et al. developed a semiautomatic piece of software known as the Hip Structural Analysis (HSA) program [13]. The HSA program has been employed in many research studies to characterize bone geometry and bone strength [4, 22–27]. As a semi-automated program, HSA extracts geometric features based on a user’s choices. In the health-care domain, manual and semiautomatic processes usually contain a level of diversity that creates inefficiencies. There is a great need for standardization in the field of health care so that groups of people will be able to work more cooperatively. Even though automation has become a part of our everyday lives, automated technology is still significantly underused when it comes to health care. Currently, in the field of medicine, there is great emphasis on accuracy and automation to increase the performance of medical devices. When we talk about studies in the health-care domain, researchers could use medical applications that are designed to maximize the use of medical devices over a large population, thereby decreasing workload.
In this study, we present an automatic HGFs extraction method using femur DXA images that brings consistency to the process of HGFs extraction. A study of HFR could easily be performed with a large cohort using an automatic HGFs model. Our main aim in this study is to bring automation to the health-care domain, thus assisting physicians and researchers with conducting studies over a large population efficiently.
A random forest (RF) classifier is used to identify anatomical landmarks (ALMs) and extract HGFs from femur DXA. The RF approach is regularly used to locate the landmarks on an object [15, 52]. Our proposed technique of using RF to locate ALMs is applied both globally and with local region constraints. The global search matches every point in an image to an ALM whereas the regionally constrained model searches for an ALM in a bounded area. Regional random forest (RRF) performs better than global random forest (GRF). The local region constraint makes the RRF classifier very robust, which means it is an acute landmark localizer and an accurate HGFs extraction technique. The RRF classifier evaluates points in a given image and casts a vote for the optimal landmark point. Our HGFs algorithm using RRF is mainly composed of five steps: first, the femur image is denoised; second, femur image is automatically segmented and hip contours are extracted; third, basic hip anatomical structures are localized and regions of interest (ROIs) are selected; fourth, RRF is applied to identify essential ALMs; and fifth, hip geometry is generated.
Aside from the HGFs covered in the HSA program, we will also extract some new geometric features useful in hip fracture analysis, including acetabulum width, center-edge angle (CEA), trochanter width, femur head offset, greater trochanter to anterior inferior iliac crest distance (GTAIICD), and pelvis to femoral neck angle. The literature review shows the significance of these new HGFs [16–19, 31–33]. Partanen et al., in their study of hip geometry and HFR, demonstrated the importance of acetabulum width [18]. They found that the acetabulum was significantly smaller in cervical fracture patients and that trochanter width was significantly lower in patients with trochanteric fracture. The CEA provides vital information about hip abnormality and conditions that lead to hip fracture. One of the underlying purposes of the CEA is to identify hip dysplasia and its role in hip fracture. Hip dysplasia causes structural damage to the hip and increases mechanical stress at the acetabular rim and femoral neck. Hip dysplasia is one of the causes of hip osteoarthritis [20, 21]. Patients with osteoarthritis have an increased rate of bone loss, which gradually leads to osteoporosis and increased fracture risk [28–30]. Gl et al. did a cohort study on a Korean population concerning proximal hip geometry and HFR assessment. They disclosed that the femur head offset was significantly shorter in patients with femur neck fractures, and hip axis length was significantly longer in patients with an intertrochanteric fracture [31]. Viradia et al. associate an increase in iliac crest to greater trochanter distance with trochanteric bursitis [19]. On a related note, Shbeeb et al. [32] and Brunner et al. [33] consider trochanteric bursitis one of the reasons for hip fracture.
This paper introduces some novel methods. Feature extraction from femur DXA images is of particular significance for hip segmentation, hip contour extraction, automatic ROIs selection, and RRF implementation. Accurate hip segmentation and hip contour extraction from femur DXA images remains a challenge due to low contrast and noise in DXA images, variability in bone shape, inconsistent X-ray beam penetration, and person-to-person variations. We use the pixel label random forest (PLRF) model to segment a hip from femur DXA images with suitable features and a high level of accuracy. Identifying critical anatomical structures in the hip is a significant challenge due to the noise present in DXA images and variability in bone shape. Appropriate contour features are used to localize hip anatomy and automatically select ROIs for RRF implementation. A fully automatic model is used to extract HGFs from femur DXA images.
Related work
The femur is widely studied in the areas of HFR assessment, orthopedics, anthropology, and forensic and human kinematics. Various geometric features are computed from femur X-ray images for HFR assessment, orthopedic surgery, nutrition and medicine evaluation in human anthropology, and anatomical deformity detection in bone. Accurate and quick measurement of anatomical deformities is clinically important for decision-making in HFR assessment, orthopedic surgery planning, and controlling hip dysplasia and trochanteric bursitis.
A literature review shows the importance of geometric features in HFR analysis. Leslie et al. studied the usefulness of hip axis length (HAL) and neck shaft angle (NSA) in the prediction of HFR. They found that figures for HAL and NSA were significantly higher in women with hip fractures. They also found that most hip fractures occur in individuals with a BMD above the osteoporotic threshold [14]. Due to the limitations of BMD and variations in DXA scan results, Boudreaux et al. studied HFR using HGFs and femur strength metrics without BMD data. Their study concluded that an increase in HAL and a decrease in femur strength metrics increases the fracture risk in elderly people [34]. Karlamangla et al. built a hip model with composite metrics of femur strength to examine HFR prediction. They examined 55 people with an average age of around 72 years old for HFR. Bone geometry features were measured manually using a standard metallic metric ruler on printouts of DXA images. Karlamangla et al. associated a larger bone size with lower risk of incident hip fracture and vice versa [35].
Cardadeiro et al. performed a study on young children to analyze the contribution of some geometric measures in conjunction with physical activity on proximal femur bone mass distribution. For their study, they used the programs Adobe Photoshop Elements and Thin-Plate Spline Digitize (a geometric morphometric software) to extract geometric features from DXA images [36]. In 2015, Souza et al. studied the femoral neck anteversion (FNA) and NSA implications in fetuses across different gestational ages. They observed that the FNA and NSA of the femur did not vary significantly during the third trimester [37]. To extract these angles, Souza et al. used the program Microsoft Paint to manually mark ALMs and then used ImageJ software [38] to compute the angles.
In 2016, Lakshmi et al. performed a comparative study of intact adult human femur NSA and FNA using individuals from the Indian, Japanese, and Western populations and found no significant side differences in either the NSA or FNA. They computed all the angles manually with ImageJ software. The NSA controls the mobility of the femur at the hip joint. It is one of the primary diagnostic benchmarks for femoral neck fracture and a possible pathogenic indicator of a hip disorder. An abnormal FNA can sometimes be associated with a variety of clinical problems such as osteoarthritis, developmental dysplasia of the hip and impingement or instability, and wear in total hip arthroplasty [39].
In 2015, Cho et al. used a 3-D model made from CT images to study femur geometric features and analyze sex differences among a population. They found most parameters to be larger in males than females. The height, head diameter, head center offset, and chord length of the diaphysis; most parameters in the distal femur; and the isthmic width of the medullary canal were smaller in Koreans than in other populations. However, the NSA, subtense, and width of the intercondylar notch in the distal femur were larger in Koreans than in those belonging to other populations [40].
As highlighted by this literature review, the extraction of geometric features to understand their role in HFR using DXA images has so far been done either manually or semiautomatically. The process of measuring bone geometry using diagnostic images suffers from various inherited issues. Some of these inherited issues include distortion due to rotation of the hip while scanning with the outcome being noisy data, difficulty selecting reliable ALMs, problems with choosing the best slice representing the reference axis, manual error, and consistency and robustness issues. This current study presents a fully automatic HGFs extraction model to overcome some of these inherited limitations.
Materials and methods
Geometric analysis
We extracted a total of 13 HGFs. The geometric features we extract includes femur neck width (FNW), femur neck length (FNL), HAL, NSA, shaft width (SW), femur head diameter (FHD), femur head offset (FHO), greater trochanter width (GTW), intertrochanteric distance (ITD), GTAIICD, acetabular width (ACW), CEA, and pelvis to femur neck angle (PNA). Fig. 1 shows some of the most common geometric features. Line

Geometric features measurements from a femur DXA image.
The geometry of hip plays a very important role in hip fracture. Evidence to date suggests an increase in FNL and an increase in the ratio of the FNL to the FNW ratio puts an individual at higher risk of hip fracture [41]. Studies of osteoporotic fractures have demonstrated that an increased HAL and NSA are associated with greater HFR independent of age and BMD [14, 41]. An increase in ACW, an increase in the ratio of the FHD to the FNW, a lower trochanteric width, and a smaller SW indicate greater HFR [18, 41]. An increase in the CEA and PNA is associated with hip dysplasia and increased HFR [20, 28–30]. A shorter FHO and ITD along with a wider HAL and GTAIICD are associated with greater HFR [29, 32].
An overview of the automatic HGFs extraction technique from femur DXA images is provided in Fig. 2. The model localizes ALMs using ALMs={ALMi|i=1,… ….n} to extract the HGFs. The algorithm starts with femur image acquisition using a DXA scanner. The image is denoised using a non-local means filter (NLMF). Then the hip in the image is segmented using a PLRF model. Hip contours are extracted from the segmented image to define structural ROIs. RRF localizes ALMs following the defined regional constraint for landmark localization. Finally, HGFs are extracted using the final set of ALMs.

Hip geometric features extraction flowchart.
To extract RF features for robust localization of ALMs and better visualization of HGFs, we need a clear and high-contrast DXA image. During a DXA scan, we get high energy (HE) and low energy (LE) images. These HE and LE images have very low contrast, so they are combined to yield a high contrast image called a BMD map (BMDMP). The BMDMP provides higher values in the bone region than the soft tissue region, so we are left with a clear and high contrast image. The BMDMP can be extracted from HE and LE images using the following equations:
To improve the quality of hip contour extraction from femur DXA images, the reduction of noise in both HE and LE images is often an essential preprocessing procedure. An NLMF denoises DXA images. More details about our previous work on DXA image denoising using an NLMF are available in [42, 43].
Image segmentation
PLRF is used to segment and extract hip contours from femur DXA images automatically. The PLRF differentiates hip bone pixels from the soft tissue or background pixels and then extracts the hip contour from the horizon of the hip object and the background. The contours define the fibrous structure of hip anatomy, which is used to select ROIs for the application of the RRF model. Fig. 3 demonstrates hip contours extraction from a femur DXA image via PLRF, starting with the femur data acquisition. Femur data are acquired from a DXA scan that produces HE and LE images. These HE and LE images are denoised using an NLMF. Then four new features maps are generated from the HE and LE images. Thus, a total of six features maps are available to train the PLRF model, which is later used to segment the hip captured from the femur DXA images.

Overview of the pixel label random forest algorithm. Input training data are converted into features vectors that are then fed into the model. A trained model is used on the test data to predict a subject pixel label.
DXA produces HE and LE images. Our method generates four new features maps from these HE and LE images, which means information is being captured more effectively. The newly generated features maps include a BMDMP, a map showing the difference between LE and HE images (LHD), a local standard deviation of the BMDMP (LSTD_BMD), and a composite map (CI) [44]. Including the HE and LE images, a total of six features maps are available for the PLRF model.
The
The
The
The
This difference image usually contains useful information for femur segmentation. Because the air regions are more uniform than the bone or soft tissue areas, clear boundaries exist between the tissue and air regions. The LHD is used to identify the air regions, which are not included for classification.
The
The

Composite images for different values of ‘n’ in Equation 6. (a) For n = 2; (b) for n = 3; (c) for n = 5; (d) for n = 7.
The STD can indicate a significant change in the image gradient present in a pixel neighborhood. A high value for STD indicates that the neighborhood pixel values could be spread far from the calculated mean. Inversely, a small STD indicates that the neighborhood pixel values have small variation to the calculated mean. A sudden change in image gradient will yield a significant STD. Usually, the border values between bone and soft tissue appear brighter due to a sudden change in BMD values. Some of the PLRF features maps are shown in Fig. 5.

Features maps. (a) High energy image; (b) low energy image; (c) map showing the difference between low and high energy images; (d) bone mineral density map; (e) composite map; (f) local standard deviation of the bone mineral density map.
The PLRF design of our ensemble model is composed of a set of n decision trees from k randomly selected subfeatures out of m features. The PLRF aggregates the votes from all the different decision trees to decide the final class label of the test subject. Each decision tree in the PLRF is implemented based on the standard CART algorithm [45]. The input features data set is represented in the form of a tuple (S, Y) as follows:
The learning classification function fR measures each attribute in a randomly selected subset SR of features data set S to one of the predefined class labels Yj,n in a subtree ‘n’ of the PLRF. The test image data are also represented in the form of features vectors as follows:
Each tree in the PLRF needs stopping criteria as it splits down the tree with the training data. A tree terminates before it reaches maximum depth if some other matching criterion is met, for example, if the node samples are less than the specified threshold, all samples in the node belong to the same class, or no improvement is observed with the best split found compared to a random choice. We use 1% of the total data as a minimum count for training instances assigned to each leaf node of the tree to stop further splitting. If the count is less than the minimum number, then the node is taken as a final leaf node and further splitting is stopped. Trees are pruned to reduce the size of the PLRF model and improve classification accuracy. Cost complexity pruning determines the most significant splits in the model [45]. Nodes with anomalies and least classification performance are removed. The PLRF terminates when all the trees in the model are trained.
Training data is prepared by extracting a features data set from the denoised images. At each pixel i of jth training data set (features map/image), features are extracted to form a features vector Xi,j for that particular pixel, and the data set is assigned a class label (bone or soft tissue) denoted as Yi,j. Thus, in a training data set, the components of vector Xi,j become attributes with dependent variables Yi,j, and each training pixel i of jth training data set is made of a training pair (Xi,j, Yi,j). Minimum (min) and maximum (max) values in the data are used to normalize it to a fixed range between 0 and 1.
To train the PLRF with n number from a sub-data-set of the training data set, each features vector belonging to pixel i represents pixel information, a class label, and a label of the sub-data-set. Random function is employed to assign a new label from the sub-data-set to each training pair (Xi,j, Yi,j) with an extra dimension n (Xi,j,n, Yi,j,n). We have N number of features vectors in sub-data-set n, and each vector has j number components called attributes (Xi,j,n) along with its class label (Yi,j,n). Y is a dependent variable. Thus, each pixel i in a training data set for the nth tree is represented with a training pair (Xi,j,n, Yi,j,n). In each PLRF tree, the best splitting attribute is selected based on the Gini Index criterion that best minimizes the Gini Index.
To prepare the test data set, each pixel i is represented by some features information along with the pixel position in the test image without the inclusion of a label. The test data sets are prepared by following the rules of each randomly created tree. To acquire a response for the input features vector in a particular tree using the PLRF model, it traverses all the nodes of the tree following the splitting criterion and threshold at each node until it reaches a leaf node. The training data vectors present in the leaves respond and place votes to predict the label for input data. Each tree in the model has its vector representation for a test subject and predicts a class label for it. Then by considering the predicted outcomes, the final votes are calculated. A testing subject (pixel) is assigned a class label based on the maximum votes received for a particular class. To resolve classification ties, the model uses a threshold value (e.g., above 50%) to vote for a particular class.
The hip boundaries labeled by the PLRF are not smooth. The output of the PLRF is a binary image, as seen in Fig. 6(b). A binary smoothing filter is used to smooth hip boundaries as shown in Fig. 6(c) and Fig. 6(d). Binary smoothing removes small-scale noise in the shape while maintaining large-scale features.

Binary smoothing. (a) Object contours by pixel label random forest; (b) a binary image; (c) application of binary smoothing; (d) object contours after binary smoothing.
Contour features play a vital role in understanding an object’s anatomy. The maximum curvature points are most distinctive along a contour [49]. Our femur DXA image segmentation results in a binary image that is used to compute hip object contours. Hip contours are marked as being superior, inferior, or medial based on the boundaries of a white object against the black background. To identify the superior, inferior, and medial contours, the hip bone pixels are marked as background content, and surrounding tissues pixels are marked as an object. For better separation of the contours, a black frame is drawn around the binary image as shown in Fig. 7(b). The size of a contour and its position in the image define whether it is a superior, inferior, or medial contour. Each contour is enclosed in a bounding box (BX) according to its local minima and local maxima. The superior hip contour is enclosed in the top BX. The top BX contains anatomical structures such as the femur bone lateral edge, greater trochanter, superior edge of the femur neck, and superior edge of the femur head. The inferior hip contour is enclosed in the bottom BX, which includes the femur bone medial edge, hip ischium, lesser trochanter, and inferior edge of the femur head. The medial contour containing the pelvic brim of the hip is enclosed in the pelvic BX.

Contours extraction. (a) segmentation results using PLRF; (b) binary representation of segmented bone pixels (black) and soft tissue pixels (white); (c) medial and literal contours of the femur; (d) bounding boxes drawn around medial and literal contours of the femur.
The proposed algorithm uses hip contours features to obtain structural details about the femur and pelvis. The structural details of the femur and pelvis are used to select the ROIs for RRF implementation. A total of six ROIs are selected: the pelvic brim, femur head, femur neck, greater trochanter, lesser trochanter, and femur shaft regions.
Features representation
Feature extraction based on shape signature and polygonal approximation is done to collect details about the anatomical structure of the femur and pelvis. Shape signature is a one-dimensional function derived from shape boundary coordinates. The technique is robust to noise, translation, rotation, and scale invariance. A number of features represent points on the contours, which are extracted using triangle-area representation (TAR), the tangent angle function (TAF), the directional change map function (DCMF), and the distance threshold function (DTF).
A positive TAR value means a convex point when traversing a contour in a counterclockwise direction. Similarly, negative and zero values of TAR mean concave and straight-line points, respectively. Some of the anatomical structures such as the acetabulum are represented with concave lines while a convex line represents others such as the lesser trochanter.
The hip superior, inferior, and medial contours enclosed in the top, bottom, and pelvic BXs are used to obtain the structural details. Automatic ROIs (as shown in Fig. 8) are generated based on different combinations of the abovementioned contour features using TAR, the TAF, the DCMF, or the DTF. The relationship between a structural landmark pi and ROIs R is defined as,

Generate automatic regions of interest. (a) Object contours are traversed to identify initial anatomical landmarks; (b) define region of interest for final localization of anatomical landmarks.
The desired ALMs are localized in the selected ROIs using RRF. Some of these ALMs are identified by RRF (e.g., the femur narrow neck upper and lower landmark points), while others are derived from the basic identification of the ALMs (e.g., the center of the femur neck). Each ALM represents an essential position on the pelvis and femur. The ALMs localization task is formulated as a multiclass classification problem. A feature set € (x) is used to represent a pixel x in the local region Ri. Then the RRF classifier is used to find the probability of pixel x and its relationship to landmark LMi. A local maximum for the probability map is used to cast the final vote for a landmark.
Random forest features representation
To account for the miscellaneous nature of landmarks, we designed a pool of hybrid features that have short- and long-range contextual information [52, 53]. More relevant features are located for landmark detection in femur DXA images. The hybrid feature pool contains a Haar feature, a local feature that covers the short- and long-range contextual features.

Features representation for landmark detection. (a) Convolutional feature; (b) Haar feature; (c) local feature; (d) long-range contextual feature.
Thus, a pixel x is represented by a composite features vector € (x) in the form of convolutional, Haar, local, and long-range contextual features respectively.
To handle the multidimensional feature pool € (x), we construct a collection of decision trees under a single ensemble of RF [56]. The RF classifier is robust and suitable for multiclass classification in the domain of medical imaging [55]. RF follows a randomized configuration of features selection to construct each tree with a suitable subset of features and then averages the results from all the trees. A terminal/leaf node represents the class label with a learned probability of landmarks as illustrated in Fig. 10. A binary distribution splits an internal node into its children. The splitting process stops when it reaches the leaf node. RF applies a general technique of bagging to learn a tree. Given a training set € (X)=x1, x2, …, xn with a class label Y=y1, y2,..., yn, the posterior probability is learned from features space X at each leaf node following the partition rule. Each tree randomly selects a sample from the features space.

Random forest framework. Leaves represent the learned probability pi(LMi) shown in green. Red arrows follow the prediction of x till it reaches the leaves. The final predicted probability Pr(LMi) of a landmark LMi is calculated as the mean distribution from all trees in the forest.
In the testing stage, a prediction is made about a pixel x using the feature € (x) of that pixel. Feature set € (x) of x is fed into the root node of each tree. Each tree splits the fed feature following the splitting rule until it reaches a leaf and returns posterior probabilities as an association of x to each landmark. The final decision is assembled based on a map generated as mean distributions of posterior probabilities at the leaves level from all trees. The posterior probability for x that belongs to a landmark LMi is estimated by,
Image resolution is converted to a pixels-per-centimeter scale by calculating the diagonal density of pixels using the Pythagorean Theorem as follows:
The intersection of the derived anatomical axis forms the anatomical angles of the hip joints (e.g., the angle between the neck axis and shaft axis). An anatomical angle is calculated from the derived axes using the following steps: Find the slope of anatomical axes m1 and m2. Compute the angle of inclination of each line as follows:
Calculate the acute and obtuse angles using the inclination angles of m1 and m2.
Data acquisition
In this work, we collected phantom and real femur DXA images using a DXA imaging system (OsteoPro MAX, B. M. Tech Worldwide Co., Ltd, Republic of Korea) with a maximum tube voltage and current of 76 keV and 1 mA respectively. A femur aluminum bar phantom (i.e., Lunar 1043) was used to obtain phantom images following the scanning protocol for a femur. A total area of 12.5 cm×12.5 cm was scanned with the DXA imaging system. The pixel resolution of the phantom and real femur DXA images was 1.3×2.6 mm2 with 96×48 mm2 in the field of view, and the image matrix size was 420×420. On a display image, every 33.6 pixels represents 1 cm area of the scanned object.
Evaluation and performance analysis
Evaluation
The proposed automatic HGFs technique was evaluated experimentally by comparing the extracted features to ground truths. For validation purposes, the femur phantom was first manually measured using a digital measurement tool called the Digital Caliper Deluxe Model. Then the phantom was scanned using the DXA imaging system, and HGFs were extracted using the proposed model. To generate ground truths for scanned data from a real human, ALMs were manually marked by three different radiology experts referencing the scanned image of a real human femur. ImageJ software was used to extract the HGFs digitally from the manually marked landmark points. The results extracted by the three experts were used in the mean statistical analysis to form the ground truths of the HGFs. The proposed automatic HGFs model results were then compared to the ground truths.
Performance analysis
We used a total of 13 ALMs to calculate the final error in ALM identification.
Our design of the PLRF ensemble model was composed of a set of eleven decision trees with four randomly selected subset features from a total of six available features. The data (a total of 400 images) was divided into 80% for training and 20% for the test to validate the segmentation results. The input features for our PLRF model were an HE image, an LE image, a BMDMP, an LHD, an LSTD_BMD, and a CI. The model’s performance was checked using fivefold cross-validation. In each case, 20% of the reserved data was exchanged with another 20% of data from the training data set. Table 1 shows the performance analysis of the fivefold cross-validation. Segmentation accuracy was computed as the number of correctly segmented images from the test data set in five runs. The true positive rate (sensitivity) is the rate of correctly classified object pixels, whereas the true negative rate (specificity) is the rate of correctly classified pixels in the background. The false positive rate (FPR) and false negative rate (FNR) represent the pixels in the background region marked as object pixels and vice versa. An individual image was considered to be accurately segmented if both the image sensitivity and specificity were greater than 89%. Fig. 11 shows some of the segmentation results.

Pixel label random forest results. Boundaries using pixel label random forest (green) are compared with ground truths (red).
Fivefold cross-validation and accuracy of pixel label random forest
Hip anatomy localization and region-based RF implementation improved the performance of HGFs extraction. Hip anatomical structures were localized with a very high mean JI of 0.9862. The results of hip anatomy localization are shown in Table 2. The JI yields a value between 0 and 1. A value of 0 means no similarity while 1 means 100% similarity between the predicted and ground truth results. The ground truths are generated manually, as each anatomic structure in hip is enclosed in a rectangular box. Out of the 200 tested images, only 4 images were detected with bad contours. The JI value in those four images was not robust, so the contours were corrected manually. We had the option to manually correct bad contours.
Hip anatomy localization
Hip anatomy localization
A data set of 400 images was used to train and test the RF classifier. Out of these images, 250 were used for training and 150 were for the independent test. A total of 17 values were extracted for local and convolutional features. The local patch size for a Haar feature was 25×25. The long-range contextual features were computed in a local region of 25×25 with a size offset of 9×9 for R1 and R2. In the case of GRF, the regional window size was set to 40×30 around a pixel x, and the size of R1 and R2 was set to 16×16. Each model learned from 250 training images with 15 decision trees and 10 random subset features selected out of feature pool € (x). We conducted experiments to study the effect of each feature on the RRF model. The ALM detection results for the proposed system compared to the manual annotations are provided in Table 3. The table represents a mean difference in millimeters between the detected landmarks and ground truths. Fig. 12 illustrates a comparison of landmark detection using either GRF or RRF to the ground truth results. The overall landmark prediction error for the RRF and GRF systems was 2.5 ± 0.6 mm and 5.72±2.99 mm respectively.

Landmark validation. Landmarks with green color represent ground truths, whereas red items are predicted landmarks by regional random forest and blue items are landmarks predicted by global random forest.
Mean difference of landmark detection compared to ground truths. The difference between detected landmarks and ground truths is calculated in millimeter
The aluminum phantom was scanned with different orientations and different sizes of acrylic sheets over it to increase the data size. Data augmentation was used to further increase the size of the data. Data augmentation was performed using different transformations such as shifting, object resizing with linear regression, and rotation relative to the center of the image. For validation purposes, the phantom was physically measured by three different experts using a digital caliper. The mean value for the three different measurements was taken as the ground truth. Fig. 13 shows some of the phantom results. Table 4 provides a comparison between the RF models and ground truth results.

Phantom validation. (a) Aluminum phantom of the femur; (b) scanned aluminum phantom and measured geometric features.
Femur phantom manual results compared to proposed system results. Reference measurements represent the mean of phantom results measured by three different experts. Length and width parameters and their error are measured in centimeter whereas an angle is measured in degrees
ImageJ software was used to generate the ground truths for data on a real human case. Measurements made by three different experts were used to calculate a mean, and STD statistical analysis was done to compute the accuracy of the proposed model results. A detailed analysis of three selected cases is presented in Table 5 and Table 6 using RRF and GRF respectively. The final accuracy of the RRF and GRF techniques are presented in Table 7 as a summary of mean and STD statistical analysis for 150 real human test cases. The value of each row in the accuracy column of a table represents the individual accuracy of each geometric feature whereas the value at the end of an accuracy column in a table represents the final accuracy of the proposed model. In addition, the receiver operating characteristic (ROC) based data analysis was also conducted. The performance of RRF and GRF on test data was evaluated using a ROC curve as presented in Fig. 14. Some of the test case results from the proposed model are presented in Fig. 15 and Table 8.

Receiver operating characteristic curve for regional vs. global random forest.

Hip geometric features.
Real human femur results of the proposed regional random forest system for three different cases compared to ground truths generated using ImageJ. The ground truths are used to calculate the error rate (E) and accuracy (Ac) against the predicted results. The results are presented in centimeters in cases of length and width and degrees in the case of an angle
Real human femur results using global random forest for three different cases compared to ground truths generated using ImageJ. The ground truths are used to calculate the error rate (E) and accuracy (Ac) against the predicted results. The results are presented in centimeters in cases of length and width and degrees in the case of an angle
Comparison of mean statistics for 150 real human images using regional or global random forest to ground truths generated using ImageJ. The error rate shows an absolute difference between the measured results and ground truths. The results are presented in centimeters in the cases of length and width and degrees in the case of an angle
Measurements for each geometric feature shown in Fig. 15
The study shows that the RF classifier was an effective method for locating ALMs in femur DXA images. The constraints of local processing drastically increase the performance of RF and landmark detection. To improve the quality of X-ray images, the reduction of noise could be an essential preprocessing procedure [60–65]. Different features and their combination have an effect on RF classification and voting for a landmark. Contour features play a very crucial role in ROI selection for local processing and precise HGFs extraction. Contour processing can effectively be used to extract the structural details of a hip with a very high JI of 0.9962. Extraction of the hip contours using PLRF with an accuracy of 93% is the gateway for hip anatomy localization and automatic ROIs selection. Landmark and HGFs detection was performed with more relevant features captured in femur DXA images. The features in the current study were designed to account for the nature and behavior of DXA data.
A number of studies have supported automatic landmark detection to overcome the manual marking of landmarks in X-ray images [47, 55]. In 2016, Cheng et al. [58] used RF to localize landmarks in dual-energy dental radiographs, which are helpful in assessing the evidence of osteoporosis. Their proposed method achieved an average detection error of 2.9 mm. Our proposed model of ALM detection from femur DXA images using RRF with an effective combination of features achieved an average detection error of 2.46 mm. The most up-to-date studies have discussed the significance of the structural geometry of bones and the role of structural geometry in the risk of fracture [3, 59]. All recent studies have either extracted geometric features manually or extracted them using semiautomatic, hand-operated HSA software. The current model extracts most HGFs in a fully automatic way with a higher accuracy of 96.22%.
The automatic prediction of HGFs was extensively investigated in this study by considering different parameters and feature extraction from femur DXA images. The performance of our approach was substantially increased with optimal testing. We have shown that a PLRF classification with appropriate features provides an effective way to segment and extract contours from DXA images. RF does not need any specialized hardware for training and prediction. It is a powerful classifier compared to other conventional classifiers such as support vector machine, neural networks, the k-nearest neighbors (KNN) algorithm, logistic regression, and naive Bayes [57]. Manual correction of contours was performed for only 4 images out of 200. In addition, the overall accuracy of 93% for hip segmentation and contours extraction makes RF one of the most suitable techniques for the segmentation of DXA images. An ensemble RF is less proven for overfitting compared to a single decision tree and other conventional classification techniques. The features for contour processing show very high performance concerning anatomy localization and automatic region selection in femur DXA images. The local region bounding search for a landmark point increased the robustness of our model. To the best of our knowledge, this has been the first study conducted for fully automatic HGFs extraction from femur DXA images. The high accuracy of 95.87% and 96.22% for real and phantom data, respectively, makes this approach suitable for the automatic collection of HGFs.
Conclusion
Roentgen’s discovery of X-ray changed people’s view of the world in the health-care domain. Researchers dealing with human health are continuously struggling to procure more useful information from radiological images [60–77]. In this paper, we proposed a new method for automatic HGFs extraction from femur DXA images. An RRF classifier was used to localize landmarks in femur DXA images and extract HGFs. These rich features were then used to uncover the appearance of landmarks. This study evaluated the RF model’s ability in landmark detection using global and local constraints. RRF performed well in terms of accuracy compared to GRF. The algorithm was fast and robust due to the local search constrained by the ALMs. The proposed model compiled femur DXA images for HGFs extraction with a high accuracy of 95.87% and 96.22% for real and phantom data respectively. One limitation of this approach is the requirement for the optimal supervised selection of features. Although the latest deep learning methods can learn features from original data, a significant amount of training data is required. We plan to adopt deep learning approaches for HGFs extraction from DXA images shortly since a larger data set is being built up.
