Abstract
Assessing the age of an individual via bones serves as a technique in determination of individual skills. In this work, the assessment of chronological age for varying age groups of individuals is carried out using left hand wrist radiographs. The datasets employed for experimentation are preprocessed and extracted using an automated segmentation technique using bit plane level data of radiograph images. The flow of proposed work is comprised of three stages, in stage 1 preprocessing is carried out, classification of preprocessed radiographs are classified into male and female samples using convolution kernels based deep neural net. Further, distance features are extracted from the origin of carpal bones to tip of extracted phalangeal regions in the classified outcomes from stage 2 using imtool image analyzer. Finally, classification of distance features is performed using Support Vector Machines with Gaussian Kernel (SVM-GK) to label the radiographs into ages from 1 to 17. The experimentation is performed on the datasets of Pediatric Bone Age challenge of Radiological Society of North America (RSNA) of about 12000 images of 1–17 year age groups. The convergence between actual and clinically validated chronological age is also tested with Gaussian process regression model (GPRM) along with SVM. A very minimal loss of about 4.7% is occurred during classification using deep neural network. The classification accuracy is found to be 76.8% and 88.1% and 0.75 and 1.41 RMSE with respect to GPRM and SVM-GK.
Introduction
Bone age indicates the level of development of bones and maturity of an individual. Generally bone age differs from the chronological age of an individual. Bone age is one of the crucial parameter for assessment of a human’s height, determination of age of puberty and various growths related disorders diagnosis [1]. Quantification of age of a human using bone growth assists in prediction of competency towards few specific fields such as sports, army etc [2].
Generally, assessment of bone age is carried out through different combinations of bones in human body through various scientific methods [3]. The bones commonly used in assessment of bone age include long bones present in lower limbs including the tibia, fibula, femur, metatarsals, and phalanges. Also bones in the upper limbs comprising the humerus, radius, ulna, metacarpals, and phalanges are used for bone age predictions. Bone age assessment with automated systems is performed using left hand wrist radiographs via the study of the various geometric and spatial attributes of the different bones [4]. Figure 1 gives the glimpse of the left hand wrist radiographs of an infant to an adolescent.

Instances of left hand wrist radiographs.
The inter-distance spatial distances are high in case of infants rather than adolescents and are clearly apparent from the images. Age assessment using bones is associated with visualization/interpretation of ossification stages, its growth and fusion times of the primary and secondary origins of ossification [5, 6].
Age assessment based on wrist radiographs is optimal technique due to minimal exposure and harm caused by X-rays. There are several reliable factors implying the conformance of radiograph based method such as employment of multiple ossification centers for evaluation of bone maturity, diagnosing hereditary diseases and growth irregularities, endocrinological problems and growth disorders, growth acceleration status to predict hormonal growth, to predict stable final height of an adult, in judging therapeutic effect to confirm the diagnosis of short stature, growth delay and endocrine disorders, evaluation of metabolic abnormalities, for interpretation of hormone blood levels in children at the pubertal age, for correction of angular deformities etc [7].
Some of the significant contributions include; a Conventional classification scheme has been proposed by Chen et al. [8], in 2019 that are modeled as a ST-Res Net network model. A performance of SVM and Res Net are performed using features of phalanx and carpal regions by using their Local Binary Patterns (LBP) and gray level co-occurrence matrix. A Regression model BoNet+has been devised by Guo et al. [9] in 2020 to handle poor quality X ray images. An integration of Tanner Whitehouse (TW3) methods with deep convolution networks for extraction of regions of interest and later classification is carried out using Faster-RCNN and Inception-v4 networks by Buia et al. [10] in 2019.
Five features of ROI were extracted using particle swarm optimization, feed forward and back propagation neural networks are employed for classification by Liu et al. [11] in 2019. Stern et al. [12] employed a Random forest and Deep Convolution Neural Network for bone feature extraction and encoding. An another attempt by Stern et al. [13] in 2016 is based on the idea of TW2 and magnetic resonance imaging adapting the ossification stages of 13 individual bones is carried out. Datasets are subject to feature extraction and classification using convolution networks.
Meicheng et al. [14] proposed a two stage deep neural networks analysis of bone age for an automated bone age measurement using deep convolution network. Method includes network mask creation and network of age assessment. A pre-trained U-Net conveying network VGG16 is used as an encoder to remove the bone mask. RSNA 2017 pediatric bone age dataset is used for experimentation, the mean absolute error of about 5.98 months is noticed. Vector assist regression is introduced by Guo et al. [15] in 2019 for the automatic assessment of bone age in conjunction with distance learning. An adaptive segmentation threshold approach is used to segment the bones of raw X-ray images. The pass learning strategy is then used to remove hand bone pictures from high-level elements. The findings of the method used demonstrate that vector regression is more accurate than existing methods, with a lower root mean square error (RMSE).
Lui et al. [16] in 2018 explored the uneven ageing of the cartilage layer underlies bone length variations helping to assess skeletal proportions. It is inferred that bones vary significantly in thickness and in various anatomical positions. Human femurs are twenty times longer than the finger and toe phalanges. It indicates physical, structural and molecular senescent change occurs early in the growth plates of smaller bones (metacarpals, phalangos) than in the growth plates of the bigger bones (framors, tibias). Pengyi et al. [17] employed a residual attention based network for hand bone age assessment to forecast the whole X-ray pictures that disturbs certain artifacts. The network is made to learn the hierarchical orientation of the residual interest subnet, which is analogous to the clinician’s assessments protocol, in particular to the key elements of X-rays.
Mughal et al. [18] in 2014 reviewed dental age as an alternative form of assessment of the bone age, which also includes an approximate maturity of the skeleton. CT visualization of clavicle, iliac bone and femoral head is studied but no systematic methods are yet developed. Calculated modalities of bone age have variations in results and vary in their applicability to different ethnic groups. Pilot study is suggested to compare and choose the most appropriate approach for age assessment.
Harmsen et al. [19] in 2013 proposed a stage wise technique for bone age assessment using x-ray’s 14 epiphyseal regions. SVM is integrated with a cross correlation for every image sample and classification is performed. A comprehensive review based on 1097 hand x-rays of 30 diagnostic groups (0–19 years) is carried out by experimenting with nominal and actual useful SVMs with a nearest k-neighbor grouping. The average error for age estimation in 5-NN and SVM is 1.0 and 0.83 years respectively. Precision of SVM with a nominal and actual meaning is 91.57 % and 96.16 % for recognition.
Liu et al. [20] proposed a deep learning model for a wide range of computer vision image computation using hand crafted features. Bone age assessment is carried out in two stages by first segmenting hand wrist images from X-ray images. Further Convolutional Neural Networks (CNN), VGG-U-Net and GAN network is used for classification. The ranking learning methods improves the overall performance of the prediction compared to other conventional classifiers. Pediatric bone age datasets is used for evaluation of method that shows a mean absolute error rate of 6.05 months.
Hao et al. [21] proposed a deep learning technique to compute bone age using left hand wrist radiograph. Carpal bone features are considered as region of interest to perform bone age assessment. CNN Models are adapted for segmentation and classification. The performance of proposed work is compared with VGG 16 by providing different input sizes. Assessments are made on the performance M1 and M2 models. The work proposed couldn’t achieve high accuracy due to the imbalance in distribution of age groups in the datasets and also due to the limited datasets.
Tong et al. [22] used support vector regression and convolution neural networks along with multiple leaning kernel model is adapted for automated bone age assessment. Multiple heterogeneous features such as gender, race and x-rays images of left hand wrist are considered that are combined with automated feature extraction. A series of experiments are performed on two different datasets and noticed that CNN provides better performance in terms of accuracy compared to state of art methods.
Chai et al. [23] proposed a computer based model to perform segmentation of hand bone radiographs using a heuristic approach. Horizontal and vertical band representation of images is used prior to K-means segmentation to cluster the image into bone, soft tissues and background regions. Texture features of GLCM are extracted from clusters with K value as two and three. Finally one of these two images is used for reconstruction phase where reconstruction of images for various range of horizontal and vertical bands is conducted.
Mualla et al. [24] devised a method to automate bone age assessment using transfer learning. Pre-trained models of AlexNet and ResNet are used for transfer to perform automatic extraction of discriminant features from images. The comparisons are made by taking features extracted from AlexNet and ResNet with different classifiers to evaluate performance metrics. Decision tree classifier outperformed in assessing bone age of eight age groups.
Li et al. [25] proposed a work on human age estimation using weighted and OHRanked Sparse Representation-based classification. Ordinal hyper planes ranker and weights of sample numbers are used in each class to carry out the age estimation and experimented on the FG-NET Database.
It is evident from literature that none of the works exhibited an accuracy of above 90% and flourishing with lot of design issues in models used. Also, the use of deep learning networks is found to be dominant in the various works as these models automates lot of manual tasks. Despite of the efforts made by deep learning, it is noticed a poor learning process due to lack of datasets in few other works. Therefore it is required to propose a semi-automated model for bone age assessment as the features of radiographs are very sensitive towards changes from one age group to other.
A couple of observations from analysis using carpal bones being the number of bones in very young children are not visible on the radiographs misleading the recognition. Furthermore, challenges in feature engineering require an identification procedure of visualization of carpal bone structures as a child grows. At the stage of adulthood some bones actually overlap introducing challenges in the development of computer vision algorithms. Hence in the proposed work, assessment of chronological age for various age groups is carried out using a combinational sequence of fully automated and semi-automated approaches.
The protocol for Automated Bone Age Assessment System (ABAAS) in this work is developed in multiple phases. Age assessment is performed on the radiograph images of age groups ranging from 1 to 17 years. An overall radiograph processing protocol initiate with contributions on radiograph enhancement, segmentation for the separation of hand region from the background of radiograph, feature extraction and prediction of skeletal age.
Pre-processing
Often radiograph capturing process of wrist images is influenced by noise that has incurred due to under or over exposure of radiations resulting into grainy appearance in the radiograph images [26]. Enhancement of these images is almost a mandate in except for few of rare cases as generation of wrist radiographs is carried out in a constrained environment. The challenges in radiograph image enhancement include improper positioning of wrist deviating from medically accepted standard position, capturing of wrist at varied orientations, faint bone edges, non-uniform lightening effect due to improper closing of cassette [27, 28]. All the aforesaid challenges introduce barriers in process of extraction of region of interest from radiographs. The algorithms developed often needs to deal with radiographs of varied lighting effects and quality.
Radiograph image enhancement for a radiograph image I is done using histogram processing [29] protocol resulting in I
e
given by (1).
Enhanced image I
e
will undergo Gaussian filtering [30] with a mask of neighborhood 3x3. The filtering process results in elimination of noise resulting in I
s
. Effect of filtering using Gaussian process for an arbitrary pixel (r, c) in I
s
is given by (2)
In (2) r and c emphasizes the distance with respect to the origin towards the horizontal and vertical axis, and σ is measure of standard deviation and I s (r, c) is the filtered image with r = 1, 2, 3 . . . m and c = 1, 2, 3 . . . n for an image I of m × n dimension. In the next stage filtered and enhanced image I s is used for segmentation.
In the proposed work, segmentation is carried out on pre-processed radiograph images by using the details of specific bit level information in k-bits of a pixel. For example in 4-bit image for a pixel of particular gray level is considered as bit level details for processing as shown in Fig. 2.

Bit level representation of gray levels.
Bits that are represented in red color in Fig. 2 indicates lower order bits and bits in black color represents the higher order bits of a specific gray level. To extract the region of wrist from the noisy back ground of radiograph images, a composition of lower and higher order bit planes are used to create masks in the proposed method. In our work, images are sliced into 8 bit planes and a combination of lower order bit plane 3 and higher order bit planes 6 and 7 are being employed for extraction of region of interest from images. Figure 3 indicates the details of bit planes from 0 to 7 for a sample of radiograph image.

Bit planes 0 to 7 of an image sample.
Bits that are represented in red color in Fig. 2 indicates lower order bits and bits n black color represents the higher order bits of a specific gray level. To extract the region of wrist from the noisy back ground of radiograph images, a composition of lower and higher order bit planes are masks are being used in the proposed method. In our work, images are sliced into 8 bit planes and a combination of lower order bit plane 3 and higher order bit planes 6 and 7 are being employed for extraction of region of interest from images. Figure 3 shows the details of bit planes from 0 to 7 for a sample of radiograph image.
Region of interest that are extracted from the segmented radiographs are sent to feature extraction and classification stage.
In the proposed work, as the samples of radiograph images are comprised of both male and female instance types, initially a deep neural network with convolution kernel based architecture is employed for classification of radiograph samples into male and female type. Further the samples classified are redirected for classification into age groups of 1 to 17 years using the distance features computed from carpal bones to phalangeal tip regions.
Architecture of deep convolution network-radiograph image classification into male and female types
In this work a deep convolutional kernels based neural net architecture is proposed to carry out the classification of radiograph samples into male and female.
Region of interests are extracted from samples of 12559 where male and female instances are 6833 and 5726 respectively. Figure 4 shows the samples of region of interests extracted using proposed segmentation technique and employed classification into male and female using convolution network.

Region of interest of radiograph images-input to convolution network. (1) Equalized image (2) Otsu method (3) Proposed segmentation (4) Region of interest extracted.
Architecture of Convolution kernels based deep neural net is comprised of mainly input layer, two hidden layers, a fully connected layer and an output layer. Each hidden layer is further comprised of two convolution layers along with a one max pooling followed by RELU layer.
In convolution layer 1 (CONV 1) and convolution layer 2 (CONV 2), an input image fed through the input layer will undergo the feature extraction with around 64 filters each of size 11x11. Subsequently, the features extracted are subject to higher level abstraction using max pooling layer (MAXPOOL1) and are subject to linearization by forwarding them to
Proposed architecture of convolution kernels based deep neural net architecture is shown in Fig. 5.

Proposed convolution kernels based deep Neural net architecture.
Once both the hidden layers complete the process of feature extraction, classification is carried out by sending the features through the fully connected layer and output layer. In output layer male(+1) and female(-1) classes are activated using softmax() via activations received from fully connected layer.
In the proposed architecture, the stride length used during the convolution process throughout the hidden layers 1 and 2 is 2 with an overlap of filters.
One of the significant aspects of proposed deep neural net architecture is use of RELU and max pooling. RELU maps a linear identity for all positive values, and zero for all negative values as shown in Fig. 6.

RELU activation function.
Max pooling plays a key role in abstraction of features in the image through a process of down sampling the image of dimension of PXQ to one-half of its dimensions as depicted in Fig. 7.

Down sampling -MAXPOOL.
Data transmitted from one hidden to other layer results in abstraction, for example 16 pixels in 4x4 convolution kernel of hidden layer 1 will be represented to 2x2 dimensions by max pooling of every 2x2 block to 1 pixel in hidden layer 2.
The inputs acquired for assessment of bone age in this technique is performed based on the distance features extracted using imtool application in MATLAB [24]. As the distance features from carpal bones to phalangeal tip provides highly discriminative features, distance features are computed for male and female category of images separately and analyzed using various classifiers. The region for which distances computed in wrist radiographs is indicated in blue line as depicted in Fig. 8(a) and 8(b).

Distance features computed using imtool application.
The distance features are computed from carpal regions to thumb, index, middle, ring and baby finger designated as
If the overall training set is of M instances where N features are computed for each instance then, the overall feature vector will be of dimension
The distance features are assessed using two classifiers Support Vector Machine-Gaussian Kernel(SVM-GK) and Gaussian Process Regression Model (GPRM) using five attributes to predict skeletal age. The skeletal age is assessed by employing different combination of features with classifiers. The combination of features include distances from carpal bone to phalangeal tip comprising of
Distance features tested
Distance features tested
In this work SVM-GK and GPRM classifiers are adapted for classification of distance features for prediction of skeletal age in terms of male and female category. SVM-GK is non-parametric classifier which is an exponentially decaying function; the advantage of using Gaussian kernel is transformation of original data points into hyper spectral feature space where data points will be disbursed uniformly in all possible directions leading spherical contours. SVM-GK is a weighted linear combination of kernel function determined between input data points and support vectors. Gaussian kernel of k-dimensions for k data points in input feature space the kernel spatial coordinates are computed for an arbitrary data point by finding the distances to each other data point as a function of Gaussian. Let vector X = {x1, x2, x3 . . . x
k
} represent the data points in input feature space and H = {h1, h2, h3 . . . h
k
} are coordinates on hyper plane, then in SVM-GK the dot product of vectors X and H are computed only for a set of data points chosen. The data points are chosen in such a way that are near to boundary of discontinuity from one group of data to another. A point K (x, y) on the decision boundary is computed as an exponential function as given in (4).
At any instance a set of chosen points on the hyper spectral decision boundary h1, h2, h3 . . . h k satisfies h i = 1 or h i = -1 for i = 1, 2, 3 . . . k. If h i = 1 then a data point x i ≈ 1 otherwise x i ⪡ 1 for the sum of dot products of X and H, for cases where X . H evaluated to 0 clearly indicates the separation of one class of data points from other.
GPRM maps the input data points into infinite dimensional data space which is mainly used for multivariate distribution problems. Assume the Gaussian distribution of the multivariate data points X = {x1, x2, x3 . . . x
n
}, then the covariance of one data point x
i
to other x
j
is given by squared exponential function (5).
Where
In this work, performance of the algorithms implemented has been analyzed separately for deep convolution neural net for gender classification and distance features with SVM-GK and GPRM towards age group classification into 1 to 17 years. Data sets used for experimentation are acquired from Pediatric Bone Age Challenge 2018, machine learning challenge of Radiological Society of North America (RSNA) for pediatric bone age prediction. The bone age is available in the form of number of months in the datasets for age groups starting from one year to 17 years of both the genders male and female.
Performance of deep convolution architecture for gender classification with respect to training and validation loss is presented in Fig. 9a and 9b.

(a) Training versus validation loss –Number of epochs. (b) Training versus valida -tion accuracy –Number of epochs.
According to Fig. 9a and 9b, an accuracy of 95.3% is achieved towards classification.
Performance of semi-automated technique using distance features for assessment of chronological age is analyzed using SVM-GK and GPRM classifiers. The distance features from carpal bone region to respective phalanges thumb, index, middle, ring and baby fingers are considered for classification [29]. Figure 10 depicts the distances from carpal bone regions to each of the phalanges for about 180 samples where it includes 90 of male and 90 of female.

Distance features from carpal bones to phalangeal region.
Figure 10 shows distance features indicating bone growth with reference to chronological age in months. (a) to (e) in the figure are the distance features of male and (f) to (j) presents the distance features of female. In Fig. 10, chronological age is indicated in months which show an overlap for many instances of the radiographs, since bone growth frequency is clearly discernible in the span of years rather than months. Figure 11 represents the frequency at which bone growth in male and female is discernible.

Distance features versus chronological age in years-Middle finger.
Year wise organization and interpretation of the distance features are as projected in Fig. 11 provides the clear discrimination from one to other age individuals clearly.
Further these features are classified using multi class SVM-GK and GPRM classifiers. Evaluation of classifier performance is shown in Figs. 13 and 14 individually for female and male categories of distance features. Age wise classification is performed for totally 17 classes spanning from age 1 to age 17 for the overall data as shown in Fig. 12a. The experimentation is conducted by assuming 60–40 % of training (male- 4045 and female- 2858) and testing (male-2697 and female-1905) as in Fig. 12b and 12c in terms of age in years to be classified into male and female.

(a) Overall dataset statistics. (b) Training samples statistics. (c) Test samples statistics.

Confusion matrix- Performance using SVM-GK.

Performance metrics of GPRM.
In Fig. 13, the cells highlighted in green indicate the positive prediction rate and the cells emphasized in variant red indicate the false discovery rate by classifier. The overall accuracy of the classifier is found to be 76.8% for instances of male and 88.1% for test instances of female for the datasets of about male (6833) and female (4778) with training and testing ratios of (60 : 40) %. Table 3 depicts the performance metrics of classifier SVM-GK for prediction of male and female instances separately.
Performance metrics of SVM-GK for instances of male with 60 : 40 train to test data
Table 3 depicts the performance metrics of classifier SVM-GK for prediction of male and female instances.
Performance metrics of SVM-GK for instances of male with 60 : 40 train to test data
It is observed from experimentation, the discriminable characteristics are highly found in the features of distance from carpal bone to middle finger phalangeal region than to other phalanges and hence leading to aforesaid accuracy. Further, reducing the number of classes may definitely increase the reliability of classifier.
Performance of GPRM for prediction of chronological age is depicted in Fig. 14a and 14b for female (1711) and male (2733) instances on the same test datasets experimented using SVM-GK. According to Fig. 13a and 13b, prediction accuracy of female samples is better compared to the male samples with regard to distance features. Root Mean Square Error of GPRM is found to be 0.7516 in case of female instance prediction and about 1.4168 for male instance prediction with exponential kernel function. Also, the prediction of classes in case of true to predicted class is converged greatly in case of female compared to male class.
In summary, assessment of chronological age is performed by considering the region of interest as wrist of left hand radiographs using a bit plane level processing for segmentation. Further, a convolution kernel based deep convolutional architecture framework is proposed with two hidden layers for classification of radiographs into male and female at level 1. Followed by which a semi-automated technique for distance features extraction from segmented region of interested is performed for classification into multiple age groups is developed for prediction of chronological age using SVM-GK and GPRM classifiers. Overall accuracy of deep learning model proposed is satisfactory. The classification of radiographs into various age groups is performed using SVM-GK and GPRM models using distance features computed from carpal bone regions to phalangeal regions radiograph images. In future, an automated method of extracting only carpal bone regions is to be proposed as an enhancement to this work and this further improve accuracy of age wise assessment.
