Abstract
BACKGROUND:
Skull radiography, an assessment method for initial diagnosis and post-operative follow-up, requires substantial retaking of various types of radiographs. During retaking, a radiologic technologist estimates a patient’s rotation angle from the radiograph by comprehending the relationship between the radiograph and the patient’s angle for adequate assessment, which requires extensive experience.
OBJECTIVE:
To develop and test a new deep learning model or method to automatically estimate patient’s angle from radiographs.
METHODS:
The patient’s position is assessed using deep learning to estimate their angle from skull radiographs. Skull radiographs are simulated using two-dimensional projections from head computed tomography images and used as input data to estimate the patient’s angle, using deep learning under supervised training. A residual neural network model is used where the rectified linear unit is changed to a parametric rectified linear unit, and dropout is added. The patient’s angle is estimated in the lateral and superior-inferior directions.
RESULTS:
Applying this new deep learning model, the estimation errors are 0.56±0.36° and 0.72±0.52° in the lateral and superior-inferior angles, respectively.
CONCLUSIONS:
These findings suggest that a patient’s angle can be accurately estimated from a radiograph using a deep learning model leading to reduce retaking time, and then used to facilitate skull radiography.
Introduction
Radiography is used for initial diagnosis and post-operative follow-up. In radiography, a patient’s position and the x-ray direction are set to a non-overlapping image of the diagnostic area. Subsequently, the radiologic technologist verifies that the diagnostic area is adequately depicted in the radiographs. If the radiograph is not suitable for diagnosis, it is rejected and retaken. Rejecting is performed when an image is blurred due to patient’s movement and excessive noise due to insufficient dosage. Rejecting due to image quality has decreased with the spread of digital radiography, but rejecting due to positioning has not decreased. Positioning is the most common reason of rejection [4–6]. While retaking, the radiologic technologist estimates the errors in the patient’s position and the irradiation angle from an incorrect image.
Two key types of image processing with deep learning in radiography have been performed to date. First, some systems classify the positioning method of the patient’s hands based on radiographs to facilitate the Quality assurance (QA) of the images [15]. Second, certain systems analyze radiographs, such as the vertebrae in spinal images [8, 12], whereas others estimate the angle of the lower leg [17]. Additional methods for estimating unknown information are available, e.g., bone age assessment from radiographs [13]. These image analyses are performed after a radiograph is retaken, and not during acquisition. To facilitate the acquisition, a system automatically sets the reference point for taking via an X-ray computed tomography (CT) system [3].
With respect to using conventional image processing to estimate the patient’s angle from a radiograph, no image processing algorithms for determining the angle is known in the radiography literature, and there exists no such research until now to the best of our knowledge. On the other hand, deep learning is a general framework to construct arbitrary input-output functions by using training data, so that it is expected to be able to realize difficult image processing tasks, i.e. the estimation of patient’s angle here, that were impossible with the conventional image processing. Actually, as shown in this paper, we found that it is possible to estimate the angle with reasonable accuracy using deep learning. In a similar spirit, Ohta et al. developed a retake support system to estimate lateral knee joint radiographs [16]. However, this estimates the tilting direction of the knee joint, and there are no studies that use deep learning to estimate the angle of rotation from radiographs, which is rare. This study aimed to develop a retaking method by estimating a patient’s angle from skull radiographs acquired. Skull radiography is performed with the patient in prone with the chin pulled back. This position is unstable and involves a high retaking rate [2], resulting in prolonged acquisition of images that increases the load on the patient. During post-operative follow-up evaluation, radiographs must be acquired in the same position immediately after the operation, increasing the incidence of radiographs to be retaken. Since positioning is a contributing factor to retaken, developing an assistance program in positioning will reduce retaken. This study estimated patient’s angles from projected images of 45 simulated cases. The angle from the frontal image of the same patient was also derived. We demonstrate the effectiveness of the proposed method by performing angle estimation using images of simulated radiographs.
Proposed method
Residual network
In the proposed method, ResNet was utilized as a regression model to derive the patient’s angle using skull radiography as input [10]. We also tested a four-layer convolutional neural network (CNN) as a trial for estimation, but it resulted in estimation errors exceeding 2°. Therefore, we used ResNet, which is known to provide higher accuracy in image classification. Figure 1 shows the structure of residual blocks in the training model that were constructed using ResNet-D [14], and the activation function was a parametric rectified linear unit (PReLU). The value of the PReLU function was set to 0.25. Next, dropout was added, and the ratio was set to 0.25. Table 1 shows the network architecture used in the proposed method. This architecture was based on the re-scaled ResNet (ResNet-RS) with the last layers modified, in such a way it outputs two real numbers corresponding to the estimated angles [7]. A radiograph image was used as the input data. The stem layer downsized the input data and increased the number of channels. Furthermore, the features of the radiograph were extracted in the convolution layers from 1 to 4. Finally, the extracted features were converted to two real numbers representing the angles in the last layer. With respect to the network output, angles ranging from -10° to 9° were used, and, for the convenience of implementation, these values were normalized from 0.0 to 1.0 using the following equation:

Structure of residual block. The value of the PReLU function was set to 0.25, and the ratio of dropout was set to 0.25.
Diagram of network architecture
The estimated angles were derived from the network output using the following equation:
In this study, our model estimated two patient’s angles corresponding to the superior-interior and lateral directions from a simulated radiograph. The experiment comprised four stages: preparation of simulated radiographs, data augmentation, implementation of deep learning using the simulated dataset, and evaluation of accuracy by cross validation. The details are described in the following sections.
Preparation of projected images
Most existing databases are used to combine disease and image, but not for the combination of patient’s angle and image. Therefore, training data must be prepared. There exist two methods for obtaining projected images in various patient’s positions: using an actual radiograph of the human bone phantom taken with an x-ray device, and generating an image with simulation. The phantom taking method is easy to perform because the images can be obtained under the conditions used in clinical practice. However, we believe that the phantom taking method and associated conditions are inappropriate to prepare training data of deep learning, because currently available phantoms are not made from real human bones, they do not accurately depict the detailed bone regions of human body. Furthermore, there exist no individual differences in the phantom radiographs. In addition, there exist the following problem in the phantom method. The patient’s position is generally determined in skull radiography, while x-rays are taken from a determined direction by the radiography method.
In the case of automatic estimation of patient’s angle by deep learning, images are taken at various patient’s angles while keeping the direction of x-ray incidence constant. However, in the case of changing the patient’s angle using a phantom, the phantom’s angle is changed manually, which results in large errors in the phantom positioning. To correct these errors, the phantom position is fixed and the angle of x-ray incidence is changed. However, the resulting image contains a half-shadow which is different from an actual skull radiograph. Therefore, we consider that the phantom method is inappropriate because of this potential difference. Ohta et al. used parallel-beam projected images obtained by simulation, called “Ray-summation (raysum)” below, to estimate the direction of knee joint tilts. They showed that the raysum images have a certain effect on estimating a tilt angle in radiography. There exist two major reasons to use simulated radiographs, i.e. not real radiographs, in this study. Radiography equipment used in most of hospitals all over the world, such as ORBIX (made by Siemens) in our hospital, which is clear to the relationship between the patient’s angle and the image, is old. It cannot take enough images to use training data for deep learning. Therefore, they are inappropriate to be used as training data for deep learning. We expected that it would be much better to use simulated radiographs as the training data. As a first step, we thought that simulation studies under a variety of data conditions and known answers should be performed to clarify the feasibility of estimating the angles from radiographs. In the next step, we are planning to validate with real test (validation) images taken with the ORBIX. Even in this next step, we think that it would be better to use simulated radiographs as the training data.
In this study, images of simulated radiography were generated from a set of 3-D head CT images. The head region was extracted from the CT images obtained with positron emission tomography/CT (PET/CT) (Philips) scanner, available at the Cancer Image Archive [9]. The slice thicknesses were 0.6mm, 1.5mm, 2.5mm, and 3.0mm, respectively. We used 45 cases without metal artifacts. Among them, 40 were used for training and five for validation.
Figure 2 shows the geometric arrangement for generating a simulated radiograph from a head CT image. The simulation was performed using occipitofrontal projections. The resulting simulated radiograph was a 2-D line-integral projection using cone-beam X-rays with a black point in Fig. 2 as the location of X-ray tube [11]. It is well-known that a radiograph is mathematically modelled as 2-D line-integral projection of 3-D CT image representing spatial distribution of X-ray attenuation coefficient. Figure 3 shows the process of the 2-D projection. The values of the 2-D projection were computed from the CT image using the following equation:

Illustration of simulating skull radiography. White and gray rectangular regions represent bed and image detector, respectively. A head rotates in lateral (black arrow) and superior-inferior (gray arrow) direction to simulate 2-D projection image. In the lateral direction, the right posteroanterior oblique view direction was set as positive, and in the superior-inferior direction, the direction corresponding to the extension was set as positive. The distance from the x-ray tube to the detector was 100 cm, and the distance from the detector to the table was 9 cm.

Process of computing the 2D projection. (a) The 3D CT image and the image detector location. The cube composed of bold lines denotes the 3D CT image. The hatched area denotes one detector position at the coordinate (x, z) where the X-ray enters the detector. Gray arrow denotes the X-ray that enter one detector at the coordinate (x, z). (b) The i-th pixel that the X-ray passes through, and li denotes the length of the intersection between the i-th pixel and the X-ray.
Figure 4 illustrates the procedure to determine the patient positioning using the CT image. For the lateral direction, the angle at which the left and right inner ear canals overlap was set as 0°. For the superior-inferior direction, the angle at which the internal auditory canal and the orbital cavity center are horizontal was set to 0°. An angle of 0° in the lateral and superior-inferior directions were set by two radiologic technologists. When computing the projection image by simulation, the patient’s angle was confirmed by using raysum images from three directions: superior-inferior, lateral, and occipitofrontal.

The procedure to determine patient position from a CT image. The black dots on the raysum image indicate the endpoints of the orbit. The white circle indicates the internal auditory canal. The line between the midpoint of the line connecting the two points of the orbital cavity and the center point of the internal auditory canal is set to be vertical.
The rotation angles in the lateral and superior-inferior directions were varied from –10° to 9° in 1° increment. Hence, images with 400 different rotation angles were created for each case. Figure 5 shows representative samples from the obtained projected images (radiographs), where each skull radiograph comprised 256×256 pixels. In deep learning concerning radiography, it is usual to extract a part of the image (region-of-interest) during pre-processing and use it as training data. However, in this study, we used an image of the entire head because using only the region of interest image deteriorates estimation accuracy; further, radiological technologists use the entire image in estimating the patient’s angle manually. Additionally, in the angle estimation of face photographs, three directions are estimated normally. In radiography, however, it is normal to represent the patient’s angle by using the two angles corresponding to the lateral and superior-inferior directions. Therefore, the two angles were estimated in this study.

Representative samples from obtained simulated radiographs.
In this study, radiographs for 40 cases were used as training data. However, this number was insufficient to stably train the network with deep learning. Therefore, we performed data augmentation using the grid distortion technique. Figure 6 shows the images before and after applying the grid distortion process, where their difference images are also shown. The grid distortion is a process in which an image is partitioned into a number of small components (grids), and distortion is applied to each grid using predetermined parameter values. A standard data augmentation method applies rotation and translation to the original image.

Example images before and after applying the grid distortion and their difference image. (a) Original image, (b) image after grid distortion, and (c) difference image between (a) and (b).
However, this method is not suitable for the angle estimation in radiography. The method of rotating the image creates an angle in the occipitofrontal direction that is assumed to be 0° in this research. This makes the original relationship between the patient’s angle and the image invalid. When we conducted a basic experiment using the rotation approach, the estimation error did not decrease despite the increase in the number of training data. On the other hand, the estimation error decreased as expected when the grid distortion was used, i.e. the distortion of the projection image reduced the error due to individual variation in the shape of skull. Figure 7 shows the typical cases with small errors as well as with large errors when the estimation was done without data augmentation. The case with large errors differed significantly from the case with small errors in the shape of the skull, from which we conclude that only original training data consisting of 40 cases were not sufficient to represent the individual variation in the shape of skull. The error in the case with large errors was 4.33±3.48°. However, this error was reduced to 0.89±0.76° when the estimation was performed using training data, including the grid distortion images. In summary, the grid distortion was powerful in compensating for individual variations due to the shape of skull in the Occipitofrontal radiograph.

Example of the case with large error and the case with small error. (a) The case with small error, and (b) the case with large error.
The implementation of grid distortion in this study is as follows. We set the number of grids to eight for each image, and the images with distortion were added to the training data. The parameters for grid distortion were set as follows. Distort limit was set to±0.1, and p value was set to 0.95. Distort limit, and p denote the range of distortion, and probability of applying. The interpolation algorithm was set to linear interpolation, the pixel extrapolation method was set to constant extension. These numbers were determined by trial and error. The grid distortions were implemented using the GridDistortion tool in the Albumentations library [1]. Using the grid distortion, the amount of training data was increased to 32,000, i.e. 400 different rotation angles being set for each of 40 cases and doubled using the data augmentation as described above. For the validation data, the amount of data was set to 2,000 without the data augmentation.
Deep learning was implemented using PyTorch and a graphics processing unit (NVIDIA GeForce RTX 3090). The model applied the mean squared error (MSE) loss function and the Adam optimization method. Table 2 summarizes the values of training parameters, which were determined by trial and error. In this study, the batch size was set to eight. Furthermore, the learning rate and weighting rate were optimized. Finally, the dropout rate and the value of the PReLU function were set from several combinations for the smallest estimation errors.
Values of training parameters. The learning rate was changed dependent on the number of iterations
Values of training parameters. The learning rate was changed dependent on the number of iterations
In this 45-case study, 40 cases were used for the training data, and five cases were used for the validation. The 45 cases were divided into nine groups, and when one group was being validated, the other eight groups were used as training data after applying the grid distortion to increasing the size of the training data. Hereafter, we refer to the two estimated angles as “lateral angle” and “superior-inferior angle”, respectively. The estimation errors were evaluated based on the absolute value of the difference between the estimated and correct angles averaged for all the sample cases, which is computed as
Figure 8 shows the convergence of loss function value for one of the nine groups in the training process, where the gray and black lines show the convergence in the training and validation data, respectively. As shown, the MSE value decreased with the number of iterations. Table 3 shows the estimation errors of the two angles in the lateral and superior-inferior directions. Figure 9 shows the histogram which shows distribution of absolute values of the estimation errors for 45 cases in 2° increments. Figure 10 shows a summary of the error values corresponding to each rotation angle. We observe that the estimation errors are less than 1° in both directions. In the lowest error case of lateral direction, the error was 0.26±0.27°. In the lowest error case of superior-inferior direction, the error was 0.22±0.16°. The largest error found was 1.06±0.44° in the lateral direction and 2.01±1.12° in the superior-inferior direction.

Convergence of loss function value. (a) Lateral direction, and (b) superior-inferior direction. Gray and black lines show convergence of loss function in training and validation data, respectively.
Summary of estimated error values of lateral and superior–inferior angles. Each value represents the average value of 45 cases

Histogram in 2°increment to show distribution of the estimation errors for the 45 cases. (a) Lateral angle, and (b) superior-inferior angle. The histograms were computed in 0.20°bin size of the absolute values of the estimation errors. The depth direction of birr-eye view shows the correct angle.

Error value of estimated angles averaged for all the cases. (a) Lateral angle, and (b) superior-inferior angle. Graphs show the distribution of data into quartiles. Boxes are drawn between the first and third quartiles. Lines in the box show median. Solid lines indicate variability outside the upper and lower quartiles, and any point outside those lines is considered an outlier. Cross marks and circle marks show average and outlier.
Figure 11 shows the estimation error for the same case summarized as a function of lateral angle and superior-inferior angle each of which ranges from –8° to 8° in 2° increments. The estimation error increases as the angle becomes larger in both the lateral and superior-inferior directions.

Error of estimated angle between images of the same case. (a) Lateral angle, and (b) superior-inferior angle.
We estimated the patient’s angle in radiograph from a simulated skull radiograph. We believe that the results are positive, i.e. less than 1° estimation error for the 45 cases in average. When comparing the two directions, the results for the lateral angle were easier with less error compared with the superior-inferior angle. This may be because of the existence of individual differences in the change of images due to rotation in the superior-inferior direction than in the lateral direction. In future, it will be interesting to examine a preprocessing method to eliminate these factors. For example, we expect that effects of these factors can be reduced by increasing the training data in such a way that the training data possess sufficient information of individual variations of skulls which vary in size and shape. Furthermore, we note that Figs. 9 and 10 show that the estimation errors become larger as the angle increases both in the lateral and superior-inferior directions. Figure 10 also shows that as the angle increases from 0°, the angle tends to be estimated larger both in the lateral and superior-inferior directions.
We think that the validation result would depend on the used training data. To verify this, we performed the following additional experiments using different training data from that used in the above experiment. We performed an evaluation using the new training data with the patient’s angle rotated from 0° to 19° in 1° increment. Figure 12 shows a graph of the estimation errors for the lateral and superior-inferior angles using the new training data to verify this fact. The characteristics of the errors differed from those in the above experiment. Concerning the lateral angle, the error value with a lateral angle of 0° was the least in the first experiment, as shown in Fig. 10(a). In the new experiment, however, the error with a lateral angle of 0° was more significant than that with a lateral angle of 10° as shown in Fig. 12(a). On the other hand, the superior-inferior angles also showed different characteristics depending on the training data. Therefore, to achieve more accurate estimation in future, the relationship between the training data and the accuracy of estimation needs to be investigated. Finally, to improve the accuracy, we also mention that a training model which combines multiple different training data set needs to be investigated.

Graph of error value when the alternative training data with patient angle set to rotate from 0°to 19°with 1°increments is used. (a) Lateral angle and (b) superior-inferior angle. Graphs show the distribution of data into quartiles. Boxes are drawn between the first and third quartiles. Lines in the box show median. Solid lines indicate variability outside the upper and lower quartiles, and any point outside those lines is considered an outlier. Cross marks and circle marks show average and outlier.
Ohta et al. estimated the tilting direction of the patient using AlexNet. The differences between Ohta’s previous study and this study together with the advantages of this study are summarized as follows: The proposed method was based on ResNet-RS. We conjecture that the success of estimating the patient’s angle with reasonable accuracy, i.e. not the tilting direction only, partly originates from the use of this network. We used the grid distortion as data augmentation, and showed that it is powerful in this problem. The proposed method could estimate the patient’s angle, i.e. not the tilting direction only.
Summarizing the experimental results in this study, we estimated the patient’s angles with an accuracy of less than 1° the estimation errors. The estimation errors in the angle between the images for the same patient had an error of less than 1° for almost every angle ranging from -10° to 9°. In addition, we suggest that the accuracy of the developed methodology will be improved by examining preprocessing, increasing the number of cases, and studying the training data in future.
We developed a method using deep learning to estimate a patient’s angle from a skull radiograph. The estimation results were reasonable, with an average error of less than 1° in both the lateral and superior-inferior directions. However, we observed the tendency that the error increases when the rotation angle becomes larger. This error is based on the training data. Therefore, it was concluded that a higher accuracy can be expected by training using a set of appropriate data that matches each positioning. Additionally, because the amount of training data in this study was only 40 which is insufficient, the training data were augmented by using the grid distortion technique. We expect the estimation accuracy will be improved by increasing the number of original cases before the grid distortion in the training data. In future, we will improve the accuracy of the method by using the described idea followed by applying the method, and validate a set of real clinical radiographs.
