Abstract
The skull has long been recognized and utilized in forensic investigations, evolving from basic to complex analyses with modern technologies. Advances in radiology and technology have enhanced the ability to analyze biological identifiers—sex, stature, and age at death—from the skull. The use of computed tomography imaging helps practitioners to improve the accuracy and reliability of forensic analyses. Recently, artificial intelligence has increasingly been applied in digital forensic investigations to estimate sex, stature, and age from computed tomography images. The integration of artificial intelligence represents a significant shift in multidisciplinary collaboration, offering the potential for more accurate and reliable identification, along with advancements in academia. However, it is not yet fully developed for routine forensic work, as it remains largely in the research and development phase. Additionally, the limitations of artificial intelligence systems, such as the lack of transparency in algorithms, accountability for errors, and the potential for discrimination, must still be carefully considered. Based on scientific publications from the past decade, this article aims to provide an overview of the application of computed tomography imaging in estimating sex, stature, and age from the skull and to address issues related to future directions to further improvement.
Introduction
Forensic investigations primarily focus on establishing the identities of deceased individuals. A positive identification of human remains recovered in a medicolegal context is a critical goal for legal investigations.1,2 To narrow down the list of potential identifications, it is essential to construct a biological profile through the estimation of sex, stature, age at death, and ancestry—commonly referred to as the “big four” pillars of forensic anthropology. 3 Among the various skeletal elements, the skull plays a particularly significant role in the identification process due to its complex morphology and the wealth of biological information. 4 As one of the strongest and most enduring components of the human skeletons, the skull frequently remains well preserved even when other bones are fragmented or missing.5,6
According to current comprehensive review, 7 the skull has long been recognized and utilized in forensic science, evolving from basic to complex analyses. In the context of forensic analysis, sex determination, along with stature and/or age estimation from the skull, has been continuously studied to this day in terms of methodology, techniques, and population-specific databases. Traditionally, the analysis of skulls is conducted through direct observation or osteometric analysis. In the morphological approach, assessments are typically based on external sexual dimorphism features, the degree of obliteration of cranial sutures, and degenerative age-related changes. On the other hand, the morphometric approach involves quantifying the dimensions of biological structures, such as width, length, height, angle, area, and ratios. This approach combines anatomical measurements with statistical analysis to establish equations for sex or stature estimation. Although these methods are proven effective in many cases, they do have limitations. External morphological features may sometimes be unavailable or damaged (e.g. orbital margin, zygomatic process, nasal bone, or palate).
The advent of advanced imaging technologies has significantly transformed the field of forensic disciplines. Among the modalities, advancements in computed tomography (CT) have revolutionized the analysis of cranial features by providing high-resolution, three-dimensional (3-D) imaging, and detailed visualization of both external and internal morphologies. 8 Compared to traditional techniques, 3-D images and detailed cross-sectional obtained from CT images significantly enhance the ability to analyze and observe internal structures, cranial vault thickness, paranasal sinuses, and endocranial sutures. Therefore, it might be stated that this modality has opened new possibilities for skull analysis, improving the reliability and accuracy of biological identification.
In recent decades, the integration of CT imaging with artificial intelligence (AI) has become widespread in academia. Machine learning (ML) and deep learning (DL) techniques, subsets of AI, have been increasingly developed in recent years. Studies based on AI algorithms suggested that this new trend reduced human error and enhance the efficiency of forensic analysis.9,10 Moreover, integrating advanced ML and DL techniques with traditional forensic methods significantly improves the accuracy of sex, stature, and age estimation. These trends highlight the ongoing advancements in forensic science, driven by technological innovations and interdisciplinary research. Based on literature review, various studies have employed different protocols (e.g. CT scanner types, scanning parameters, and landmark identification methods) for capturing cranial images. These variations have led to discrepancies in cranial measurements, making it challenging to compare results across studies. Inconsistent protocols also amplify the limitations of current imaging technologies—such as voxel size, intensity range, and software capabilities—which complicate the interpretation of 3-D models. This issue was addressed by Kragskov et al., 11 who highlighted the challenges of localizing cranial landmarks using multi-planar reconstruction in CT scan. To overcome these challenges, Lottering et al. 12 proposed a set of standardized protocols that could be universally adopted. These protocols would establish consistent guidelines for CT imaging settings, anatomical landmark identification, and measurement techniques. Standardization would provide a framework for reproducible measurements, thereby enhancing the reliability of research findings and facilitating more effective cross-study comparisons.
To the best of our knowledge, this review aims to present the current status of CT-based methods for sex, stature, and age estimation from the skull. Moreover, it aims to highlight key advancements in the field and explore future directions and potential enhancements.
What do skulls tell us? The importance of anatomical knowledge
Understanding anatomy of the skull allows anatomists, anthropologists, and medical practitioners to reconstruct it using available clues and to identify individuals from bone fragments. Additionally, understanding how bones develop and change over time is crucial for estimating age at death, for example, through the closure of cranial sutures, osteophyte growth, and modifications in bone density. This requires a solid grasp of normal anatomical processes. For a thorough description of the human skull anatomy, see specific textbooks. 13
Most of the time, the skull remains well-preserved long after other skeletal elements have deteriorated; therefore, it provides a valuable source of information.14,15 In general, sexual dimorphism is evident in the skull, with males and females exhibiting distinct cranial features. Male skulls typically have more pronounced brow ridges, a more robust mandible, and a more prominent external occipital protuberance compared to those of females. These differences lead to develop the widely used skull-based sex determination protocols (e.g. Acsadí and Nemesk´eri, 16 Buikstra and Ubelaker, 17 and Walker 18 scoring methods). Five morphological traits (the nuchal crest, the mastoid process, the supraorbital margin, the glabella, and the mental eminence) received the most attention and become the most commonly used simple ordinal scoring system.
According to the literature on craniofacial aging,19,20 they summarized that skull morphology changes in individuals throughout the adult lifespan. As age increases, cranial sutures become less distinguishable, and facial features also undergo changes. For example, the zygoma becomes more retro-positioned, and the angle of the mandible becomes more obtuse, shorter, and increased anterior projection. These changes are a result of the bone remodeling that is part of the normal aging process.
The degree of suture closure is an important feature used to estimate age at death from remains. 21 The cranial sutures, including the coronal, sagittal, lambdoid, squamous, and sphenooccipital sutures, are recognized as indicators of age. Moreover, facial sutures also provide additional data for age estimation in subadult and adult (e.g. frontonasal, internasal, nasomaxillary, and maxillary sutures: incisive, anterior median, transverse, and posterior median palatine sutures). 22 Different sutures fuse at different rates, for example, the coronal suture typically begins closing earlier than the sagittal suture. The degree of suture closure in facial sutures is often less distinct than those in cranial sutures. Similar to sex determination, the differences observed lead to create the widely used skull-based age estimation protocols (e.g. Acsádi and Nemeskéri, 16 Meindl and Lovejoy, 23 and Mann 24 scoring methods).
Moreover, craniofacial features such as bony prominences, sinuses, condyles, foramina, and bony landmarks (Types I, II, and III) are also used to gather information for personal identification. Type I landmarks are localized within biological structures that can be repetitively identified, such as suture intersections (e.g. bregma, nasion, and asterion). Type II landmarks are defined by geometric considerations, including points of local maxima or minima curvature (e.g. basion, glabella, and goion). Finally, Type III landmarks encompass extremal points, such as the endpoints of a breadth (e.g. pogonion and frontomalare tempolare). 25 Thus, the transition from nonmetric to metric methods in skull analysis involves moving from qualitative to quantitative approaches. Several recent studies26,27 based on skull measurements confirmed that variations in skull dimensions—whether linear or angular measurements—would be indicative of sexual dimorphism and stature estimation.
For our anatomical perspective, it is important to highlight that modern measurements often have synonym terminology depending on the studies. Consequently, these variabilities may affect landmark placements. For instance, “upper facial height” is often synonymous with “nasion-prosthion height,” while “cranial base length” may be referred to as “basion-nasion length.” Additionally, some anatomical landmarks have undergone changes in terminology, such as the replacement of “auriculare” with “radiculare.” This reflects the diversity in terminology used to describe similar anatomical measurements in different contexts. A comprehensive understanding of the intrinsic and extrinsic factors influencing the development and expression of skeletal traits is essential for forensic analysis. One key factor in this understanding is anatomical variability among populations. From a morphological perspective, cranial traits, such as the shape of the skull, prominence of brow ridges, size of the orbits, nasal structures, and overall cranial vault structure, exhibit notable variability across different populations, including those of Caucasian, African, Mongoloid, or inter-population groups. This anatomical variability in size and shape significantly influences the accuracy of sex, stature, age, and ancestry estimations, as measurement techniques developed for one population may not be directly applicable to others. Insights into population or sample variation can be gained by testing methods developed from specific samples on different populations, especially those from other regions. For instance, Kemkes and Göbel 28 tested the Brazilian method for determining sex based on the mastoid process area using samples from Germany and Portugal. They found a decrease in the accuracy of the method when applied to these populations, emphasizing the impact of population variation on morphological characteristics, such as the mastoid process, used in sex determination.
Transitioning from traditional methods to computed tomography: is this the new era of CT
Before the invention of the CT technology in 1972, 29 medical imaging in forensic investigations primarily relied on X-rays. An X-ray image of a corpse was utilized in a criminal investigation as early as 1898. 30 However, this technique had limitations, including its inability to provide information beyond the macroscopic features of the bones. Additionally, the costs associated with the necessary safety equipment to protect operators from radiation. As early as the 1990s, CT technology was first applied to forensic autopsy. It was employed as an alternative to traditional autopsy in cases where there were sensitive religious concerns or when family members have declined a traditional autopsy for other reasons. In 1993, Haglund and Fligner 31 successfully used CT scan images to confirm human identification by comparing antemortem and postmortem skull CT scan images. Around this time, CT scans became an established and widely utilized tool in forensic anthropology research, offering global data accessibility and non-invasive analysis.
Additionally, another reason for the increased use of CT imaging in recent years discussed in Braun et al. 32 is that some osteological collections (i.e. archaeological collection) may not always reflect current contexts, which could restrict the relevance of findings for modern forensic situations. Moreover, increasing ethical concerns regarding the use of identified osteological collections and human remains have resulted in a growing reliance on virtual CT modalities. In literatures, 33 the rapid increase in the use of CT imaging in forensic anthropology was reflected in three main types of publications: (i) case reports, (ii) the application of CT for victim identification in medicolegal contexts, and the most prevalent focus, and (iii) the creation of a population-specific database for biological profiles.
Given the increasing interest in the CT applications within the forensic field, questions have arisen regarding whether CT imaging might eventually replace traditional techniques for analyzing skull. This ongoing question necessitates a comprehensive investigation into the comparability of commonly used methods for estimating biological profiles between traditional dry bone analysis and virtual CT methods. Some studies34–36 reported that craniometric measurements from both dry bones and CT scan were generally accurate and reliable, with only minor differences between the two methods. Despite this, it is still difficult to say which method is the single “best” method. From our perspective, it is not recommended to replace traditional measurement techniques with CT scanning entirely. Instead, we advocate for an integrated approach that combines CT methods with traditional techniques. Traditional methods remain effective for many forensic and anthropological applications, particularly with well-preserved remains. While CT modalities provide detailed insights into macro- and microscopic features, limited by the condition of the visualization. This combination facilitates a more thorough analysis by capitalizing on the strengths of both methods, creating a path for future development, and opening a new era in forensic investigation.
The use of CT scan and its advantages and disadvantages
Various types of CT imaging modalities have been utilized in research, for example, multi-detector CT (MDCT), or multi-slice CT (MSCT), head CT, post-mortem CT (PMCT), cone-beam CT (CBCT), and micro-CT (µCT).
Each modality has unique advantages and limitations that influence research outcomes. For example, MDCT offers high-resolution imaging and rapid acquisition of multiple slices but involves higher radiation doses. CBCT provides detailed 3-D imaging with a lower radiation dose, but the entire craniofacial skeleton is not captured due to limitation of small field of view. Recently, µCT has provided high spatial resolution, allowing for detailed qualitative and quantitative analyses at the trabecular level. However, its drawbacks include high costs and operational challenges. Therefore, researchers could benefit from selecting the appropriate technique based on the research design and the purpose of the study.
Based on CT modalities mentioned above, it can also generate both 2-D and 3-D images, providing visualization options for detailed anatomical analysis and measurements on skeletal remains or living patients. 37 Moreover, CT imaging offers the advantage of analyzing internal structures such as the endocranial vaults, vault thickness, and sinuses. In craniofacial measurements, a significant advantage is the ability to establish fixed landmarks on the skull model. This approach minimizes potential errors that may arise when relocating landmarks for multiple measurements. In terms of limitation, Ortiz Rosa et al. 38 noted that while CT images provided detailed morphological information, practitioners often lacked the opportunity to physically interact with bones. This limitation restricted their ability to directly perceive textures, edges, and ridges.
Indeed, CT scanning protocols can vary significantly, leading to inconsistencies in image quality. These variations may arise from differences in CT scanners, calibration, operator expertise, field of view, voxel size, slice thickness, or spatial resolution. Additionally, the high cost of CT scanners, particularly µCT and high-resolution CT systems, is a substantial limitation for many forensic labs. Sometimes, CT scans may not be readily available for every forensic case, particularly in remote or long-distance locations. The lack of access to advanced imaging technology in these areas can impede the ability to conduct thorough forensic investigations, delaying the collection of critical evidence. 8
CT images provide an appropriate source of current population-specific data to assist in the development of skeletal standards. So far, advanced software and AI algorithms have also been applied to CT data for automated measurements, shape analysis, and quantification of morphological features, potentially increasing efficiency and reducing subjective biases. These advantages potentially enhance the accuracy, efficiency, and objectivity of their analyses, ultimately contributing to more reliable identification of unknown individuals.
Tables 1–3 summarize the overview of previous studies for sex, stature, and age estimation using CT imaging in the past 10 years.
Previous studies for sex determination using skulls by the CT scan (morphological, morphometric, and automated method).
N/A: not available.
Previous studies for stature estimation using skulls by the CT scan (morphological, morphometric, and automated method).
N/A: not available.
Previous studies for age estimation using skulls by the CT scan (morphological, morphometric, and automated method).
N/A: not available.
Sex determination and methods
In Table 1, various methods were employed for sex determination based on CT scans—morphological, morphometric, and automated methods. Three methods shared the same general model-based workflow: (i) digitizing the skull using CT scan, (ii) modeling the skull in 3-D image, (iii) using a computer software for observation or measurement from multiple cross-sections and/or 3-D images, and (iv) creating sex determination equations based on statistical methods (discriminant function analysis or logistic regression). According to Table 1, sex determination varies in accuracy among populations due to genetic, environmental, and biological differences that influence skeletal morphology. While humans exhibit distinct sexual dimorphism, these traits can vary in various ways across populations.
Morphological method
Sex determination from the skull using CT imaging involves with sexually dimorphic traits. Three popular protocols (i.e. Loth and Henneberg, 87 Walker 18 from Buikstra and Ubelaker 88 and Langley et al. 89 ) were frequently used in the research.
Based on previous studies,32,38–41 they found that the glabella,32,38,39,41 supraorbital margin, 32 mastoid process,38,39,41 and mandibular variables 40 significantly highly sexual dimorphism. Consequently, the high accuracy for sex determination ranges from 78.5–100%. In terms of repeatability and reproducibility, CT imaging demonstrates high accuracy and reliability for sex determination, with good intra- and inter-observer agreement, validating its use in morphological methods.
Morphometric method
Generally, CT images involve analyzing the dimensions of craniofacial structures to determine sex. Some findings42–44 highlighted that CT imaging provides high-resolution 3-D data and reveals internal structures and dimensions that are not achievable with traditional visualization techniques. Over the past decade, morphometric measurements have been applied to various parts of the skull, including the cranium, mandible, orbit, mastoid process, foramen magnum, paranasal sinus, crista galli, and foramina. Given that sex determination based on morphometric method has garnered the most focus, Table 1 summarizes studies involving only cranium and mandible; other structures are provided below.
Cranium
According to Table 1, the cranium has been the focus of the most research attention over the past 10 years. Previous studies42,46,48,50–55 measured the cranium based on standard measurements and novel created parameters. They found that the standard measurements, including bizygomatic breadth,42,46,48,50,51,53–55 cranial base length,52,53,55 and bimastoid diameter46,48,51 indicated high accuracy related to sex determination. Moreover, other parameters including mastoid height,46,50 maximum cranial breadth, 48 maximum cranial length, 51 minimal frontal breadth, 48 bizygotemporal breadth, 53 basion-prosthion length, 48 biorbital breadth, 46 orbital height, 48 and nasal height 50 demonstrated to be a potential of morphometric variable for sex determination.
The accuracy for sex determination ranged from 77.5% to 96.3%. Again, almost studies42,46,48,50–55 noted that although the studies employed similar methodologies, the high level of accuracy was attributed by factors such as the sample size, parameters, and population-specific data. To clarify, some studies52,55 found that the accuracies varied significantly. This variation often depends on the specific population factors—genetic factors, cultural norms and practices, environmental influences, socioeconomic conditions, and historical context.
Mandible
Many studies43–45,47,49,50 carried out mandibular measurements for sex determination. They found that the mandibular parameters, including bigonial breath,44,45,47,49,50 ramus length,43–45,50 gonial angle,43,44,47,49 bicondylar breadth,43,44,50 and symphysis height45,47,50 indicated high correlation related to sex determination. Besides, coronoid length,43,45 gonion-gnathion length, 43 corpus length, 50 ante-gonial notch area, 45 chin width, 45 mandibular foramen area, 47 coracoid breath, 47 minimum height of mandibular notch, 47 palatal breadth, 47 coronoid-gonion length, 49 minimum ramus breadth 49 demonstrated a moderately high level of accuracy in sex determination. Overall classification accuracy rates ranged from 72.9% to 94.7%. These studies confirmed that mandibular morphometric analysis was also a reliable method for sex determination.
When comparing the cranium and mandible, the cranium yielded a higher accuracy rate for sex determination than the mandible. Gillet et al. 50 investigated sexual dimorphism using cranial and mandibular measurements obtained through MSCT on the same samples. They found that the classification accuracy of the skull model was generally higher than that of the mandible. Anthropometric models achieved accuracy rates of 87% to 88.3% for the cranium and 68% to 81.4% for the mandible. Similarly, geometric morphometric achieved accuracy rates of 94% to 100% for the cranium and 84.2% for the mandible.
Moreover, it is noteworthy that few studies focused on whole skull measurements. One practical reason for this is that the mandible must be rearticulated with the cranium at the temporomandibular joint. The condyle of the mandibular ramus is covered by cartilage and an articular disc, making it essential to ensure proper occlusion for accurate anatomical alignment. As a result, there have been limited studies on dry bones simultaneously, with most of these studies conducted on living human samples.
Other features
Orbit
The size and shape of the eye sockets differ between males and females, making orbital dimensions valuable for sex determination in skeletal analysis. Previous studies90–93 found that orbital height90–93 was the most significant variable, while orbital width,90,91,93 biorbital breadth, 90 orbital area, 92 and interzygomatic distance 92 were also significant variables. The degree of sexual dimorphism observed in these orbital features allowed for sex determination with a moderate accuracy rate of 73% to 92%. Moreover, Graillon et al. 90 demonstrated that males had significantly larger orbital volumes and a higher ratio of total orbital volume to centroid skull size compared to females, achieving an accuracy of 77.3%.
Paranasal sinus
Previous studies94–101 demonstrated that the paranasal sinuses—namely the frontal, maxillary, ethmoid, and sphenoid sinuses—can be used for sex determination. For maxillary sinus, it had better classification rates for sex from 57.8 to 85.7% compared to other paranasal sinuses.95,97,99,101 For frontal sinus, it was found that the frontal sinus had moderate classification rates for sex from 61.0% to 75.0%.94–96,100,101 Similar to ethmoid 101 and sphenoid sinus,95,98,101 they provided the classification rate from 63.0% to 76.3% and 61.8% to 69%, respectively.
Mastoid process
Almost studies102–106 conducted CT-based mastoid triangle and mastoid process measurements. The mastoid triangle is defined by the distances between three key anatomical landmarks (i.e. the asterion, porion, and mastoidale), and the area enclosed by these landmarks. The volume of the mastoid process is determined by isolating the mastoid process from 3-D models in different geometric planes, including the wider and narrower regions of the mastoid, as well as the mastoid tip. Both the mastoid triangle and mastoid process volume are used in sex determination, as all dimensions of the mastoid process and the triangle area in males have been found to be statistically greater than those in females. Studies102,103,106 measured the length, area, and angles of the mastoid triangle. Their analysis revealed significant differences between males and females, with accuracy rates ranging from 71.4% to 89.0%. Some studies104,105 focused on the volumes of mastoid processes using 3-D images, which significantly enhanced the accuracy of sex determination, achieving rates between 71% and 72%.
Foramen magnum
In forensic cases where only skull fragments are available, the foramen magnum serves as a reliable anatomical feature for sex determination.107–113 Studies108–110,112 demonstrated that the accuracy rate using foramen magnum measurements ranged from 69.1% to 86.7%. Additionally, when some studies107,111,113 investigated the measurements of the foramen magnum together with the occipital condyles, the accuracy rates for these measurements were approximately 70.5% to 71.6%.
Crista galli
The use of crista galli morphometry in sex determination involves analyzing the shape and dimensions of this bony structure. 114 CT applications provide detailed views of the crista galli, enabling accurate assessments of its size and shape. Komut et al. 115 reported that morphometric sexual dimorphism was observed in crista galli (CG) dimensions (width: 15.15 mm, length: 3.45 mm, height: 13.25 mm). CG length provided the best gender classification accuracy (83.7%), followed by height (81.4%) and width (81.2%). In contrast, Okumuş Ö 114 found no statistical difference in height and length values between sexes, but the mean width in females was significantly higher than in males. Therefore, further validation of this structure is necessary to confirm its applicability across different demographic groups. However, further validation of this structure is necessary to confirm its applicability across different demographic groups.
Geometric morphometrics
Geometric morphometrics (GMM) are effectively implemented in several studies to analyze sexual dimorphism in the cranium and mandible. Generally, GMM involves the statistical analysis of shape by using landmark coordinate data (i.e. type I, II, III, and semi-landmarks). This approach provides a more detailed understanding of morphology compared to conventional metric methods. For example, Gillet et al. 50 achieved an accuracy of 97.7% for sex estimation using the cranium, and 84.2% using the mandible. It is important to note that while the number of studies focusing solely on GMM has decreased in the last decade, the principles underlying GMM (e.g. landmark coordinate data and principle component analysis) continue to influence automated methods for shape and form analysis, particularly when integrated with more advanced automated techniques see—.56,57,62,66
Automated method
AI—specifically ML and deep neural networks—have been typically applied in automated methods. Most studies involving AI, particularly those utilizing ML and DL, typically used large sample sizes, often exceeding 100 samples—see Table 1. To enhance the quality and consistency of the training and test data, techniques such as normalization, noise reduction, and resizing are commonly applied. Furthermore, the accuracy of models is often reported using correct classification, sensitivity (true positive rate), specificity (true negative rate), and F1 score.
According to the methods used in literature, ML has been widely used, primarily employing supervised learning algorithms.56,57,61–66,116 Several supervised learning algorithms were commonly used for classification, including logistic regression, decision trees, random forests, support vector machines, and so forth. However, among these, support vector machines56,57,61–64,66,116 were often the most widely adopted due to their effectiveness in various parameter settings. Additionally, one study introduced an unsupervised learning algorithm for sex determination. 58 This algorithm also demonstrated high accuracy, achieving 98.0% for females and 93.02% for males.
Recently, there has been a significant paradigm shift in ML with the rise of DL approaches.59,60,67,112,117 DL algorithms excel at modeling high-level concepts by learning both linear and non-linear relationships between input and output layers, allowing them to self-learn from large datasets. In contrast, traditional ML requires some manual feature extraction. Two common DL approaches were used in literatures, including artificial neural networks (ANNs)60,112,117 and convolutional neural networks (CNNs).59,67 For CNNs architecture, it depends on multiple convolutional layers that automatically extract relevant features, such as shape, density, or structural variations from the skull images. Overall, DL algorithms demonstrated high classification accuracy rates, ranging from 88% to 98%.
Stature estimation and methods
Estimating stature from the skull offers a valuable alternative and complement to traditional long bone measurements, especially in cases where long bones are unavailable.118,119
Based on previous studies,27,68–71 it was found that the morphometric method remains the primary standard for estimating stature through CT images. According to Table 2, it was observed that the variables used for stature estimation were less diverse compared to those used for sex determination, with most height-related variables being linear measurements (e.g. maximum cranial length, cranial breadth, cranial base length, minimum frontal breadth, and bi-zygomatic breadth). Additionally, certain variables, such as orbital height and breadth, derived from specific regions of the skull, were also found to correlate with stature.
Previous studies27,68–71 indicated that various standard cranial measurements exhibited a significant positive correlation with stature, including bi-zygomatic breadth,68,70 maximum cranial breadth, 68 maximum cranial length, 71 cranial base length, 27 basion-bregma height, 71 and skull circumference. 71 Moreover, standard error of estimate (SEE) values obtained from cranial measurements27,68,70,71 were quite low, ranging from 5 to 6 cm. When comparing regression equations derived from femur measurements, the strongest correlation with stature was observed from femur length (R = 0.71), followed by cranial base length (R = 0.53), and the distance from the basion to the nasal bone (R = 0.50), respectively. Moreover, the MAE values obtained from cranial measurements showed slightly higher error than those based on femur measurements. 119 In cases of skeletal fragmentation, it has been found that applying regression equations based on incomplete femur and tibia bones (e.g. upper breadth and maximum anteroposterior diameter of the lateral condyle of the femur, and maximum proximal breadth of the tibia) can estimate stature with a low standard error of the estimate (SEE) of 1.55–1.58 cm. 120 However, estimating adult stature from fragmentary cranial remains presents significant challenges because it is difficult to obtain cranial length and breadth measurements. Additional cranial parameters may be helpful for improving stature estimation in cases of fragmentary remains in future studies.
The discrepancies observed across studies might be attributed by several factors. Firstly, variations in the cranial measurements employed in previous studies could have led to inconsistent results. Secondly, differences in research design, sample sizes, types of CT scan, or the characteristics of the sampled populations might impact the accuracy and comparability of the findings see—. 69 Populations may undergo secular trends in physical characteristics due to improvements in nutrition, healthcare, and living conditions. Studies121,122 showed secular changes in both craniometric and nonmetric traits, indicating that alterations in cranial size and shape, limb proportions, and stature may occur over time. The potential for secular change must be acknowledged.
In addition, it is essential to recognize that stature estimation regression equations are population specific. According to the literature review, the variables selected for multiple linear and stepwise regression analyses vary across studies. Moreover, Kyllonen et al. 69 found that the correlation values, R² values, and SEEs derived from both single and multiple regression equations varied across four ancestry-specific groups in the United States. This highlights that stature estimation equations developed for one regional group are not directly applicable to other populations. Therefore, we recommend creating population-specific data and incorporating additional parameters (e.g. mandibular variables) in future analyses.
Age estimation and methods
Following Table 3, various methods are proposed for age estimation, including (i) morphological method—scoring of grade changes in cranial suture closure, (ii) morphometric method, and (iii) automated method from the application of AI.
Morphological method
One of the mainstream methods for estimating age at death in adult and older skeletal remains is the assessment of cranial suture obliteration. CT imaging facilitates the observation of cranial suture closure from multiple perspectives, including cross-sectional tomograms and external and internal views, with varying resolutions depending on the CT modalities. Each modality offers a more comprehensive understanding of suture morphology and fusion patterns.
Among cranial sutures, sagittal suture closure has garnered the most attention, but the closures of coronal, lambdoidal, and facial sutures also served as age indicators using volumetric and cross-sectional CT images. According to Table 3, the diversity in methodologies—ranging from sample size, imaging modalities, approaches, demographic factors, and specific sutures examined—resulted in inconsistencies in data interpretation and findings. This variability presents significant challenges for directly comparing results across different studies.
Boyd et al. 72 demonstrated a positive correlation between age and the degree of ectocranial suture closure, highlighting good inter-observer agreement. Moreover, previous studies73,77 examined closure of the sagittal suture, assessed through µCT and PMCT images. Sagittal suture provided valuable information for age estimation. For instance, Nikolova et al. 73 reported that the SEE for their models was within a margin of 10 years, deemed acceptable for forensic applications. Additionally, results obtained from75,78 found that sagittal, coronal, and lambdoid sutures also exhibited a significant positive correlation with age. Fan et al. 75 noted a minimum inaccuracy of 7.73 years in their test set, while NJ et al. 78 demonstrated that SEE from multiple linear regression models using obliteration scores of these sutures ranged from approximately 14 to 15 years. Furthermore, other cranial sutures, including facial sutures, 74 the squamous suture, 76 and the palatine suture,79,81 also contributed to age estimation accuracy in adults.
Additionally, previous studies have shown that the suture fusion process slows down over time, and the differences between individuals become less distinct, causing the margin of error in age estimation to increase significantly. This is particularly true for individuals over the age of 30, when the fusion process slows considerably, making age-related changes in suture patterns increasingly difficult to distinguish. For example, Ruengdit et al. 123 reported that the age estimation range was quite broad, with overestimation in younger individuals (under 40 years of age) and underestimation in older individuals (above 40 years of age). Hence, determining an accurate estimate of age from adult cranial sutures remains a challenging task for forensic anthropologists. Similarly, Nikolova et al. 73 stated that suture maturation is an irregular process roughly correlated with aging, beginning in the early 20s, peaking around age 30, and gradually declining in the late 40s. As a result, age estimation becomes increasingly challenging for older individuals, as current methods lack the precision needed to determine more specific age ranges. Consequently, age intervals for this group are often limited to broad and insufficient categories, such as 50 + .123,124
Morphometric method
Kobayashi et al. 81 introduced the suture closure score for age estimation, involved visually inspecting the median palatine suture and calculating a ratio of the closed suture length to the total length of the median palatine suture. The accuracy of this method was moderate with a correctness rate of approximately 72% and a SEE of 14 years.
A recent study has highlighted the potential of CT-based bone density analysis. Obert et al. 80 examined age-related data in density histograms derived from µCT scans of the calvaria in a sample of 341 European human skulls using histogram functional shape method. The overall accuracy of age estimations was moderate, achieving approximately 62.5% for females and 51.6% for males.
Automated method
According to traditional methods, many skeletal methods relied on regression models to predict chronological age from skeletal changes. These models could yield biased estimates—referred to as age mimicry—due to the age composition of the skeletal collections. When collections lack diversity in age, the models may overestimate or underestimate age for individuals outside the predominant age group. 125
Recent advancements have introduced various statistical models and automated methods. For instance, Fan et al. 75 explored four statistical approaches—gradient boosting, support vector machines, decision trees, and Bayesian ridge regression—specifically in a Han male population. Their findings indicated that the support vector model demonstrated lower levels of inaccuracy and bias compared to the other three models, achieving high accuracy in age estimations for individuals aged 40 to 59 years.
In particular, automated methods for age-at-death estimation offer significant advantages over traditional regression models by minimizing intra- and inter-observer variations. By utilizing AI, these advanced techniques enable practitioners to examine a broader range of skeletal features beyond cranial sutures,83,85 including attributes from the mandible, 82 orbit, 84 and paranasal sinus. 86
In their study, Nikolova et al. 83 explored the use of ML techniques to assess the degree of sagittal suture closure for age estimation, using the µCT. They implemented linear regression and K-nearest neighbors’ algorithms to develop predictive models for age at death. The findings indicated that this approach effectively supported age estimation, achieving a low root mean square error.
Recently, DL algorithms have gained traction in age estimation studies.82,84–86 Some studies 86 introduced a genetic algorithm deep neural network for estimating age using paranasal sinus features. This innovative approach used genetic algorithms to evolve and optimize deep neural networks, resulting in enhanced predictive accuracy and the ability to analyze complex skeletal features.
The CNNs, a type of DL architecture, have been extensively utilized for complex image classification tasks, including age estimation from CT images. Their strength lies in the ability to automatically extract and learn relevant features from the imaging data.82,84,85 During training, mean squared error (MSE) was used to reduce the gap between the predicted ages and the actual ages. A lower MSE indicated that the prediction models were closer to the true ages, reflecting better performance. Moreover, MSE was selected over other error analysis methods, such as MAE, due to its sensitivity to large prediction errors. After training, the model was typically evaluated on a separate validation set to assess its accuracy. For example, Joshi et al. 85 reported that their model demonstrated strong performance, achieving a MSE of 2.35. This aligns with findings from Pham et al., 82 who reported that their CNNs achieved a MAE of 5.15 years and a concordance correlation coefficient of 0.80.
Intra- and inter-observer reliability
In forensic anthropology, the reliability of methods—including intra- and inter-observer reliability, as well as repeatability and reproducibility—has been extensively researched (Tables 1–3). When estimating sample size for repeated measurements, previous studies revealed significant variation in the proportion of samples used for these assessments. Specifically, the percentage designated for repeated evaluations ranged widely, from as low as 3.2% to as high as 100% of the total sample. This variability highlights the need for standardized guidelines in determining appropriate sample sizes for such measurements to ensure reliable for future documentation.
In morphological analysis, Cohen's kappa value was quantified for measurement errors. This statistical method is particularly well-suited for assessing agreement in categorical data, whether between different observers or across repeated observations by the same observer. Cohen's kappa is highly valued in this context because it provides an effective evaluation of the consistency in classifications of morphological features.39,41,72–74,78 Moreover, one study 38 used the intra-class correlation coefficient (ICC) to describe measurement error. Cohen's kappa is specifically designed for assessing agreement in qualitative (categorical or ordinal) variables, while ICC is more suitable for quantitative (numerical) variables or measurements, that is, average scoring value. It is noteworthy that the Wilcoxon test was applied in some cases to assess measurement error, 75 although it is not typically used for this purpose.
In morphometric studies, most studies described measurement error using technical error of measurement (TEM), relative TEM (rTEM), and the coefficient of reliability (R). The broad use of these statistical methods stems from their ability to effectively identify and quantify measurement errors.27,46,52–55 In addition, the ICC was used to analyze both intra- and inter-observer reliability44,45,50 for quantitative data, including continuous measurements (i.e. anthropometric dimensions, and various metric data). One study 71 introduced Lin's concordance correlation coefficient to evaluate reproducibility. It is particularly useful for assessing the agreement between two sets of measurements and well-suited for inter-observer reliability. In contrast, the coefficient of reliability emphasizes the consistency of measurements taken from the same subjects across multiple assessments, ensuring that these measurements are stable and repeatable over time.
Overall, the choice of statistical method depends on the nature of the data and the specific research questions being addressed. According to Tables 1–3, we also strongly suggest that the measurement error should be, at least, described by Cohen's kappa and by the TEM and rTEM for morphological and morphometric method, respectively. These statistics effectively capture and highlight measurement errors. For morphometric methods, if additional statistics such as mean difference, mean absolute error, the coefficient of reliability, or ICC are introduced, they should be reported alongside TEM and rTEM to provide a comprehensive understanding of measurement error. Furthermore, it is crucial to avoid inappropriate statistical methods, that is, paired t-tests, for assessing measurement error, as they may not accurately reflect the reliability of the measurements.
To enhance repeatability and reproducibility, we also recommend adopting a standardized position, particularly the Frankfort horizontal plane, 46 as it can promote consistency in the 3-D measurements taken at different times or by various observers. Furthermore, maintaining this standard position may be crucial for ensuring the validity and reliability of both craniometric and automated methods.
Moreover, we recommend the implementation of standardized training to enhance intra-rater reliability, ensuring consistent measurements and observations. Clear protocols with detailed, step-by-step procedures should be followed to minimize individual interpretation and variability, thereby ensuring consistency across different practitioners. Additionally, the use of automated measurement systems could significantly reduce human error and inconsistency. Musilová et al. 57 claimed that the automated procedures reduced the likelihood of both intra- and inter-observer errors, and they do not necessitate the expertise of an experienced anthropologist.
Challenges and future perspective
Despite the advances in CT imaging, several issues remain, including variations in scanning protocols and differences in population data. The accuracy of estimation methods can be influenced by factors such as the quality of CT images, the condition of skeletal remains, the availability of bone collections, and ethical considerations. Although data on human variability have facilitated the development of models for sex determination and age-at-death estimation from skeletal remains across diverse contexts, access to bone collections from specific populations remains challenging. Mann et al. 126 pointed out that there is a growing number of bone collections that contain either historical or modern bones. However, not all countries have access to such collections due to a variety of factors, including financial constraints, ethical considerations, legal restrictions, and limited research infrastructure. Campanacho et al. 127 further emphasized that the availability of these well-documented collections, including detailed data on the age at death, sex, and other variables for each individual, is vital for ensuring that forensic analyses are accurate and reliable. Without access to such samples, the ability to make precise determinations about skeletal remains—especially in terms of sex, stature, and age—becomes more challenging.
In recent years, the development of advanced statistical and computational methods—3-D reconstruction and 3-D geometric morphometrics—has the potential to significantly enhance skull analyses. Furthermore, the integration of sophisticated automated algorithms, including AI, markedly improves measurement accuracy and reduces subjective biases. The outstanding performance of AI methods has been confirmed by numerous studies across various fields. However, while AI systems operate with minimal human intervention, they are ultimately created by humans, leaving room for potential errors. This means that biases present in the training data can be reflected in AI outcomes, leading to skewed or inaccurate results. This highlights the importance of carefully curating and diversifying training datasets. Future research should focus on developing standards for CT protocols and improving the integration of ML models, CNNs, or other DL architectures (e.g. general adversarial network or Bayesian CNN) to enhance accuracy and reliability.
Moreover, AI remains a new frontier with challenges. In fact, AI-based applications developed during the research phase require further validation studies using standardized data before they can be effectively implemented in real-world operational context.
Collaborative efforts across disciplines and geographic regions are essential to address the existing limitations and improve the robustness of these methods. The development of open-access databases and the standardization of analytical techniques will facilitate the validation and application of CT-based methods for sex, stature, and age estimation in diverse populations.
While AI has shown promising results in predicting bone age within clinical settings (e.g. BoneXpert software), its application in forensic routines is currently limited due to small available datasets and the numerous post-mortem variables. 10 Currently, there have been no official standard guidelines and frameworks from forensic institutes worldwide or a major division of professional societies for the use of CT and AI technologies. In light of these challenges, we propose that it is time to collaboratively develop comprehensive guidelines and regulations to ensure the ethical and legal application of emerging technologies. While the integration of these technologies into digital forensics frameworks holds significant promise for enhancing the efficiency and effectiveness of forensic investigations, it is essential to carefully address issues related to integrity, accountability, and regulatory compliance to ensure their responsible use, enabling them to overcome current challenges.
Moreover, while AI offers powerful tools that can enhance forensic practices, it is crucial to address the ethical implications of its use. This includes ensuring transparency, particularly regarding the “black box” nature of many AI systems, where the decision-making process may be unclear. 128 Since AI systems are often trained on large datasets, it is important to recognize that these datasets may contain biases, which can then be inadvertently amplified or perpetuated by AI algorithms. Therefore, it is essential to establish robust mechanisms for ensuring responsibility, accountability, and potential redress for AI systems and their outcomes. This will help ensure that the use of AI in forensic science remains ethical, fair, and just, promoting trust and confidence in its applications.129,130
Conclusion
CT scans have revolutionized the analysis of skeletal remains by providing detailed insights that significantly improve the assessment of sex, stature, and age at death. As highlighted in the review, the application of CT imaging to the skull has gained increasing attention and shows a positive trend in both research and practical application. This advanced imaging technique enhances the accuracy of estimations and reveals morphological features that were previously difficult to analyze. Furthermore, AI has the potential to transform forensic methodologies, automate complex assessments, and integrate diverse big data sources. Despite the significant advancements in DL and ML within the forensic discipline, their adoption raises several critical ethical concerns. Among the most pressing are issues related to data privacy, algorithmic transparency, and the interpretability of automated systems. As such, it is crucial to acknowledge these ongoing challenges and thoughtful consideration to ensure ethical and responsible integration into forensic practices. Moreover, the use of CT imaging and AI has significantly fostered collaboration across interdisciplinary fields such as forensics, anatomy, dentistry, archaeology, data science, legal professionals, and ethicists. This integration promotes a comprehensive approach to multidisciplinary sciences, enabling researchers to develop standardized assessments tailored to specific populations and ushering in a new era in forensic science.
Footnotes
Acknowledgements
We thank the Excellence in Osteology Research and Training Center (ORTC), Chiang Mai University, Chiang Mai, Thailand for their support.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Excellence in Osteology Research and Training Center (ORTC), Chiang Mai University, Chiang Mai, Thailand.
