Abstract
Real-time monitoring and amendment of patient position is important for the radiotherapy. However, using electronic portal imaging device (EPID) and cone beam computer tomography (CBCT) in the clinical practice generate different degrees of delay, so that they cannot achieve the purpose of real-time application. Meanwhile, a few products come with the function of the real-time monitoring and amendment, such as CyberKnife, which is too expensive for the common people. The objective of this study is to develop and test a novel independent system to monitor treatment center and amend the position of patient, which is applicable to most accelerators, based on binocular location. The system monitors the treatment center by tracking the markers attached to the patient. Once the treatment center shifts, the system uses the magic finger, which is developed to control the treatment bet automatically to adjust the treatment bed position. To improve the monitoring accuracy, we trained the data collected from the clinic based on SVM (Support Vector Machine). Thus, the training results assist users to adjust the feasible degree of the monitoring. The experiment results showed that using this new monitoring system, the monitoring resolution reached 0.5 mm, and the error ratio of the judgment was less than 1.5%.
Introduction
Positioning is very important to the radiotherapy and its accuracy significantly influences the treatment efficiency. With the frequent two and three-dimensional imaging during a course of radiation treatment, such as the electronic portal imaging device (EPID) and the cone beam computed tomography (CBCT), patient can receive a high-precision localization under guidance of the imaging coordinates of the actual radiation treatment plan. Such a process is called image-guided radiation therapy (IGRT) [1–3]. With two orthogonal pictures taken by EPID, the radiation therapist compare such EPID images with digitally reconstructed radiograph (DRR) to determine the geometric shift between the planned and the actual lesion and adjust the position of patient according to the shift [4]. By using a kilo-voltage CBCT to acquire the interested anatomy of each patient which takes less time than CT [5, 6] and then matching it with DRRs from the planning CT to calculate the positioning error, the physician adjust the position of patient until he or she is properly aligned in the treatment room.
However, during a treatment, the change of patient posture is ignored commonly. And because the x-ray imaging always be time consuming, the monitoring of patient cannot be real-time. In addition to the feature of imaging delay, such imaging will expose the patients to an additional dose, which may bring an irreversible damage to them. Considering of the above problem, a few products with the function of the real-time monitoring and amendment have been available, such as CyberKnife [7, 8].While due to the high cost of CyberKnife, most people, especially those people in the developing countries, can’t afford to it and consequently can’t receive an accurate patient positioning which is provided such a system.
This paper presents a novel independent system to monitor treatment center and amend the position of patient. Such system is based on a binocular system, by using two cameras to track the infrared sensitive markers which are fixed to the patient and indirectly to monitor the treatment center. Once the treatment center has a certain degree of shift, i.e. it is not aligned with the isocenter of a linear accelerator, the presented system will use an in-house controller, called as the magic finger, to move the treatment bed automatically based on the shift to correct position until the shift is within the clinical tolerance.
System composition
The system consists of three components as follows: a binocular system, a software which monitors the duration of the patient positioning in real time and a magic finger as shown in Fig. 1. The binocular system includes two cameras to take pictures of the interested region and then transfers the images to the software. The software processes the images including extracting the markers and reconstruct the markers in tri-dimensional space, and subsequently computes the geometric shift between the positions of the planned and the actual lesion by matching the acquired data in real time with the planning CT data. Note that before moving the treatment bed, the software would judge whether the calculated shift is credible. If it is credible, the treatment bed will be adjusted to the correct position, vice versa.
The space resolution and the monitoring scope were taken into account when the binocular system was designed. We chose the 1/2.8 in. gray color Complementary Metal Oxide Semiconductor (CMOS) camera (PointGrey Flea3). With the specific type of cameras, the acquisition of images was synchronized at the hardware level with the infrared light-emitting diodes array. The image size was 2048 pixels×1536 pixels, with the pixel size of 2.5μm×2.5μm. The two cameras were mounted rigidly with a distance of 660 mm and an angle of 13°.
The Magic Finger was designed to control the movement of a treatment bed based on an advanced RISC (reduced instruction set computing) machine (ARM) processor. It contains three control units and each unit, consisting of an actuator, a paddle and a drive rod, was designed to control one of the three levers on the control board of treatment bed as shown in Fig. 2. And, the Magic Finger communicated with the software by cable.
Camera calibration and three-dimensional reconstruction
As shown in Fig. 3, the reconstruction work can be divided into two parts: the camera calibration and the real-time three-dimensional (3D) reconstruction. Before the system works, we should complete the calibration of the two cameras to estimate their intrinsic and extrinsic parameters. In the real-time 3D reconstruction part, firstly, we capture the synchronous images from the two cameras. Then after preprocessing the images, we extract the centers of the markers and subsequently match the centers between left and right images. In the end, we complete the 3D reconstruction and the correction for the result.
Camera calibration
Camera model
The camera is assumed to be as a standard perspective camera, so that the transformation between the world coordinates and the image coordinates can be seen in Equation (1) [9, 10]:
where s is the scale factor. represents a two-dimensional (2D) point position in the image coordinates, while is a 3D point position in the world coordinates. The 3 × 4 matrix A contains five intrinsic parameters. And the extrinsic parameters are described by a rotation matrix R and a translation vector t, which denote the transformations from the 3D world coordinates to the 3D image coordinates. The intrinsic matrix A can be written as shown in Equation (2) [9, 10]:
in which α x = f/d x and α y = f/d y , where d x and d y are the scale factor relating pixels to distance and f is the focal length in terms of length. μ0 and v0 represent the principal point which is the intersection point of the optical axis and the imaging plane. γ is the obliquity between the u and v image axes and it is zero ideally.
Although cameras are calibrated based on the camera model as shown in Equation (1), we cannot complete the reconstruction correctly. In real cameras, there are radial distortion and tangential distortion. In this paper, we choose a non-analytical solution to solve these distortions.
Based on the proposed camera model, the intrinsic and extrinsic parameters of the cameras can be obtained easily. The calibration method is similar to the method proposed by Tsai [9], especially in estimating the intrinsic parameters. The sequence of steps can be summarized as followed: We take pictures of 17×17×17 points of 1 m×1 m×1.5 m space, divided by 17 planes. With extracting points from pictures and getting the image coordinates of the points, we can get the world coordinates of the corresponding points based on the 3D Coordinate Measuring Machine (3D CMM). Compute the whole matrix encompassing intrinsic and extrinsic parameters, i.e. . Compute the extrinsic and intrinsic parameters using the method proposed by Tsai [9]. Divide the space into 4×4×4 grids as shown in Fig. 4, reconstruct the 3Dcoordinates of the eight corner points of each divided grid using the whole matrix H given by step 3, and then store them. Finally divide the four grids on the corner into 2×2×2 grids further as shown in Fig. 4, reconstruct them and store them as well.
3D reconstruction
By using the infrared light-emitting diodes array and the infrared filter in the front of camera, we can get the images where only the infrared wavelength is passed while the visible light is cutoff. Because the markers are infrared sensitive, after the infrared filter works on the light which is received by cameras, as shown in Fig. 5(b), only the markers are relatively visible in the images. By processing on such images with the open source computer vision (OpenCV), a library of programming functions for real-time computer vision, we can easily calculate the positions of the marker centers in the images.
Matching method
Assuming that there are many points in the real scene, denoted as n balls, then we will get n points in a pair of images from the left and right eyes of the cameras. Without matching the points from the left and the right images, we have to compute n × n times to reconstruct the 3D scene. Therefore, it is very important to match the points from the left and right images. However there is a constraint in binocular imaging called epipolar geometry [11–13]. As shown in Fig. 6, there is a base line, defined as a line connecting the two camera centers, in the binocular system. The epipolar plane is defined as the plane containing the base line and an arbitrary point in the space. An epipolar point is defined as the intersection point of the imaging plane and the line which connects the camera centers. And an epipolar line is defined as the intersection line of the epipolar plane and the imaging plane, while the intersection point of the all epipolar lines is the epipolar point. Ideally, the imaging point of an arbitrary point in the space should be always in the epipolar line which is the intersection line of the epipolar plane and the imaging plane as shown in Fig. 6 [11, 14]. This relationship is evolved to Equation (3) [11]:
where x
tT
and x are the image coordinates of the left and right cameras respectively. And F is the base matrix which is defined as:
As the matching is done, we can use the whole matrix H which is computed in the process of the calibration and use the Least Squares (LS) method to complete the preliminary reconstruction. As we get the 3D coordinates from the preliminary reconstruction, we can locate the points in the grids that we have divided. In order to eliminate the effects of the camera distortion, we develop a near-linear correction method. As shown in Fig. 7, the transformation from the grid A to A′ is a transformation with distortion rather than a rigorous affine transformation H, so an arbitrary point in A cannot be transformed by H. But, if the grid is infinitely small, it can be assumed that any point in the grid can be transformed by H. Based on the above analysis, if the grid is divided to be enough small, the transformation of any point inside the grid is linearly varying with the location within a narrow range. As a result, we can compute the transformation by the tri-linear interpolation. The computation formula can be written as Equation (5):
Tracking
After the reconstruction, the next step is to realize the real-time monitoring. Note that prior to CT scan, several markers are attached to the thermoplastic mask. In order to monitor the course of the patient positioning, we matched the markers in CT data to those acquired from the monitoring system. And then we perform some matrix operations to calculate the real-time rotation and translation matrices which represent the translation from the CT coordinates to the optical coordinates, which is shown in Equation (6) and Equation (7):
where x ct , y ct and z ct are the 3-D coordinates of any point in the CT coordinate system while x, y and z represent the 3-D coordinates of any point in the optical coordinate system. The transformation is based on the 4×4 matrix Pcp. Then, γ cp , φ cp and θ cp are the rotation angles of the roll, pitch, and yaw respectively. T x , T y and T z are the translations in left-right (LR), inferior-superior (IS) and anterior-posterior (AP) directions respectively. After taking in account the treatment center coordinates in Equation 6, we can obtain the real-time treatment center in monitoring coordinate system, and the deviation to the isocenter in LR, IS, and AP directions consequently which are the geometric shift.
Once the transformation from the CT coordinates to the optical coordinates is calculated, the treatment center bias, defined as the deviation between the linear accelerator isocenter and the treatment center, can be figured out approximately. In this paper, we judge whether the translation is credible by using support vector machine (SVM). If the translation is credible, we will move the treatment bed according to the above treatment center bias.
SVM classification
In order to judge whether the treatment center bias is credible, we choose the SVM, a supervise learning model to classify the credible value and the incredible one. For the two training classes that are linearly separable in the selected feature space, the rationale of SVM is to find a hyperplane to divide the two classes. The hyperplane is defined as [15]:
There may be an unlimited number of such planes that can divide the feature space into two classes, and SVM aims to find the best plane which maximizes the distance from it to the nearest data point on each side. The object is evolved to [15, 16]:
However, in the actual situation, there is no hyperplane can divide the two classes completely. To deal with this problem, the SVM introduced a mapping function and a penalty function. And then the hyperplane and the object are evolved to [17]:
where w is the normal vector to the hyperplane, ∅ () is the mapping function called as kernel function, and the ξ
i
is the tolerance error allowed for the sample point being on the wrong side of the hyperplane. There are five common types of kernel functions, and in this paper, we only choose three types of kernel functions, i.e. the linear kernel, polynomial kernel and radial basis function (RBF) kernel, to enroll into the investigation. The three function can be expressed as [17–20]:
in which c is a constant value. q is the order of the polynomial and it is set to 3 in this paper. γ is the gamma coefficient. And σ is the width of RBF.
As the prototype function of the hyperplane and the kernel function is fixed [21], the supervised learning is to determine the optimal parameters of such a model. Once the parameters are all finalized, the judgement depends on the value of w T * ∅ (x) + b.
Since we have judged the computation credibility, it is known that how much we should shift the treatment bed. However, all the commercially available accelerators don’t provide the signal interface to control the movement of the treatment bed to the users. As a result, we can’t control the bed movement directly by transmitting the control signal to the accelerator. The only solution to deal with this problem is to manipulate the control panel which is offered by the accelerator to control the treatment bed. Therefore, we design a novel device called Magic Finger to achieve the goal of controlling the treatment bed movement automatically. As shown in Fig. 2, Magic Finger has three paddles to control the three levers on the panel respectively, and each paddle was driven by an actuator which is connected by a drive rod.
Experimental results and discussion
Reconstruction results
The accuracy of the reconstruction is analyzed by performing a set of experiments which consists of tracking a single marker in different positions with three dimensional measuring machine which has the accuracy of 0.05 mm. The different positions are all evenly distributed in the monitoring space with a height of 1.0 m, a width of 1.0 m and a depth of 1.5 m. As shown in Fig. 8, before the correction which will correct the reconstruction coordinates, the average error of the reconstruction using the whole matrix directly is 0.79 mm, and the maximal error is 2.3 mm. However, after the correction, the average error is corrected down to 0.24 mm and the maximal error is controlled within 1.0 mm.
We sorted the experiment points according to the z value and the study shows that the reconstruction accuracy is affected at various degrees by the distortions of cameras at different displacements. As shown in Fig. 9, before correction, the reconstruction error increases with the increase of z value, and the bigger the distance between the marker and the center axis is, the greater the reconstruction error is. And, after correction, the influence of camera distortion was eliminated as shown inFig. 9.
Classification results
In this study, 64 sets of clinical data of 6 patients from Najing Drum Tower Hospital are collected, including: the CT images, the synchronized coordinates of the six markers which are attached to the thermoplastic mask of patients from the proposed system and the CT respectively, the treatment bias detected by a cone beam computed tomography (CBCT) and the corresponding calculated bias from the proposed system at the same time. With the study, we find that the credibility of the calculated results is concerned with the deformation of the mask which can be reflected by the markers attached to the mask. The classification input vector is defined as follows:
in which the H-1 is the reverse transformation matrix from the optical coordinate system (CS) to the CT coordinate system. P w is the 3D coordinates in the optical CS of an arbitrary marker and P ct is the 3D coordinates in CT CS of the corresponding marker.
By binary processing on the Euclidian distance between the translation bias detected by CBCT and those detected by the proposed system with a threshold value of 1.0 mm, we classified the detected translation bias from the proposed system into two labels: a credible one (i.e., Euclidian distance ≤1.0 mm) and an incredible one (i.e., Euclidian distance >1.0 mm).
To validate the classification method, we selected a quarter of the clinical data randomly as the validation set and the remainder as the training set. For the three different kernel functions, we used a cross-validation method to choose the best parameter c, g and r for each function, where c is the factor of the penalty function in Equation (10), g is the gamma coefficient in polynomial kernel, and is the width of RBF in RBF kernel, r is the constant coefficient in polynomial kernel. Such an ergodic process takes several cycles. The results can be seen in Fig. 10, the classification accuracy of the linear, polynomial and RBF kernel is 75%, 87.5% and 93.75% respectively. Based on it, we can infer that the RBF kernel function performs better than other two types of kernel functions in clinics.
This paper presents a novel system to monitor the patient positioning and to correct the patient posture in real-time. Monitoring patient positioning is based on an in-house binocular system and we propose a novel convenient calibration method to eliminate the camera distortion of such binocular system easily. To reduce the computational load of the data processing in real time, we introduced the epipolar geometry to match the characteristic point between the left and the right images from the two cameras of the binocular system. And to reduce the probability of wrong operation of the in-house magic finger, we classify the monitoring results into two categories: the credible and the incredible. To achieve the monitoring in a higher accuracy, in our ongoing research, we will use the higher-resolution camera lens and CMOS. And we will reduce the response time of the magic finger in the further research for the synchronous correction of the patient posture.
