A lightweight vehicle detection and tracking technique for advanced driving assistance systems

Abstract

In this paper, an advanced-and-reliable vehicle detection-and-tracking technique is proposed and implemented. The Real-Time Vehicle Detection-and-Tracking (RT_VDT) technique is well suited for Advanced Driving Assistance Systems (ADAS) applications or Self-Driving Cars (SDC). The RT_VDT is mainly a pipeline of reliable computer vision and machine learning algorithms that augment each other and take in raw RGB images to produce the required boundary boxes of the vehicles that appear in the front driving space of the car. The main contribution of this paper is the careful fusion of the employed algorithms where some of them work in parallel to strengthen each other in order to produce a precise and sophisticated real-time output. In addition, the RT_VDT provides fast enough computation to be embedded in CPUs that are currently employed by ADAS systems. The particulars of the employed algorithms together with their implementation are described in detail. Additionally, these algorithms and their various integration combinations are tested and their performance is evaluated using actual road images, and videos captured by the front-mounted camera of the car as well as on the KITTI benchmark with 87% average precision achieved. The evaluation of the RT_VDT shows that it reliably detects and tracks vehicle boundaries under various conditions.

Keywords

Computer vision self-driving car autonomous driving ADAS vehicle detection vehicle tracking

1 Introduction

Increasing safety, reducing road accidents and enhancing comfort and driving experience are the major motivations behind equipping modern cars with Advanced Driving Assistance Systems (ADAS) [1]. In the past couple of decades, major car manufacturers introduce many sophisticated ADAS functions [3] like Electronic Stability Control (ESC), Anti-lock Brake System (ABS), Lane Departure Warning (LDW) [5], Lane Keep Assist (LKA) [6], etc. These functions represent steady incremental steps toward a hypothetical future of safe fully autonomous vehicles [7].

Most recent ADAS functions like Collision Avoidance, Automated Highway Driving (Autopilot), Automated Urban Driving, Automated Parking and Cooperative Maneuvering require more and more fast and reliable detection and tracking for on-road vehicles [12], which is among the most complex and challenging tasks. In order to successfully detect the other vehicles on the road, accurate localization of potential vehicles in camera images or LiDAR data is required, the relative position of these cars with respect to the road needs to be determined, and the vehicle’s movement direction should be assessed and verified as well.

Computer vision techniques are considered the main tools that provide the capabilities of sensing the surrounding environment for the detection, identification, and tracking of moving vehicles. The detection of vehicles consists mainly of the finding of specific patterns/features or cues such as edges, gradients, colored segments, and color distributions in images. Such kind of specification streamlines or guides the process of vehicle detection.

The approach used in this paper, given the name “Real-Time Vehicle Detection and Tracking” (RT_VDT), focuses on the delicate balance among the following three objectives:

Achieving accurate detection of road vehicles from images taken by the front-facing camera of the car.

Fast enough for timely accurate decision-making and further processing.

Lightweight (i.e. in terms of memory requirements and computational overhead) that can run in real-time on a low-cost CPUs that are commissioned in most ADAS modules.

Therefore, the employed approach integrates advanced handcrafted features extracted from camera images with a robust machine learning classification technique to vehicle detection. This approach achieves the following:

The extracted handcrafted features are flexible to integrate and tune, as several of them can be combined together to produce what it is called the “feature vector”. This flexibility allows the incorporation of color channels of multiple color spaces in the feature vector. Moreover, it allows the adaptation of the RT_VDT pipeline by only tuning a limited number of parameters. It is not necessary to redesign the whole pipeline or retrain the whole neural network from scratch as in deeplearning based methods. This flexibility as well helps to customize the RT_VDT for several camera resolutions (higher or lower) without major loss of accuracy. Additionally, future extensions or enhancements are much easier to accomplish as the RT_VDT has a transparent structure compared to that of the deeplearning based methods that are usually of black-box structure.

The execution of the employed advanced feature-extraction stages on affordable CPUs is considerably fast and does not need the incorporation of GPUs as usual in the case of deeplearning based methods.

The computed resources required by the RT_VDT (in terms of memory and processing power) are much less than that required by the deep-learning techniques; thus, much more suitable for ADAS applications that run on traditional 32-bit scalar processors.

In-vehicle detection, the runtime is as important as accuracy. It is necessary to trade-off between runtime and accuracy rather than sacrificing runtime to increase accuracy. The surveyed work below shows that deeplearning techniques have achieved large successes on vehicle detection, with some performance improvement over traditional approaches, however, these techniques are computationally intensive and even with the employment of expensive GPUs and multi-core processors, in most of the cases, they couldn’t reach acceptable real-time performance.

Wei et al. [13] Proposed using deconvolution and fusion of CNN feature maps to add context and deeper features for better object detection and addressing the object occlusion challenge. The proposed CNN enhancements are evaluated using the KITTI dataset [14] of 1280×384 image resolution. The evaluation experiments are run on the very expensive hardware: Intel i7-7700k 4.20 GHz server with 8 CPU cores and 32 GB memory and an Nvidia GeForce GTX 1080 GPU. In spite of that, the best-reported inference time per image is 0.24 Sec., which maps to only 4 frames/second speed.

Moreover, Hu et al. [15] present a scale-insensitive convolutional neural network (SINet) for fast detecting vehicles with a large variance of scales. The authors propose as well a context-aware ROI pooling and a multi-branch decision network to improve detection accuracy. Evaluation experiments have been conducted using the KITTI dataset on Ubuntu 14.04 with a single GPU (NVIDIA TITAN X) and 8 CPUs (Inter(R) Xeon(R) E5-1620 v3 @ 3.50 GHz). In spite of the extremely expensive hardware used, the best-reported inference time per image is 0.2 Sec., which tops to only 5 frames/second speed.

Xiao in his thesis [16] adopts an advanced vehicle detection model that incorporates the residual neural network as a feature extractor and the region proposal network to detect the Region of Interest (RoI) candidate extractor. The model mainly handles the problem of large variation of scales to increase the performance of the vehicle detector. The model is evaluated by testing it on GTX 1080 GPU with 11 GB memory and achieved 0.269 seconds per image inference (i.e. less than 3.7 FPS).

The contribution of this paper can be summarized as follows:

Real-time performance: the proposed pipeline focus on the real-time speed to facilitate the deployment of the ADAS object-detection features on solely on-vehicle inexpensive hardware. The objective is to achieve 10 FPS [17] without the employment of GPUs. The speed comes from using effective methods that do not depend on iterative searches but rather on a single scan per camera frame, as well as concentrates the computation in the image sectors of higher interest.

Using multiple color-spaces: the work employs multiple color spaces to improve the robustness of the feature extraction and combines a more comprehensive “feature vector”. The used classification algorithm is trained on multiple color-spaces.

Flexibility and Adaptability: The RT_VDT pipeline can be adapted by only tuning a limited number of parameters. It is not necessary to redesign the whole pipeline or retrain the whole neural network from scratch as in deeplearning based methods. This flexibility as well helps to customize the RT_VDT for several camera resolutions (higher or lower) without major loss of accuracy. Additionally, future extensions or enhancements are much easier to accomplish as the RT_VDT has a transparent structure compared to that of the deeplearning based methods that are usually of black-box structure.

Reusability: the proposed pipeline can be reused with some modifications to detect other objects on the road like pedestrians, cyclists, traffic signs and traffic lights.

2 Overview of the RT_VDT Algorithm

The RT_VDT algorithm is designed to utilize a single Charge-Coupled Device (CCD) camera. This camera should be mounted on the front-windshield mirror of the car to capture the road front view. However, stereo cameras can also be employed as well, but for the matter of convenience, in this paper, a single front camera is only considered. In order to simplify the detection problem, it can be assumed that the setup makes the baseline horizontal, which assures “the horizon” is in the image and it is parallel to the X-axis (i.e. the projected intersection of left and right lines of the driving lane, after finding them using one of the techniques developed in [5], is referred to as “the horizon”). Nevertheless, for the matter of precision, in the RT_VDT, the image orientation will be adjusted using the calibration data of the front camera in conjunction with removing the visual distortions. The following steps, as well as Fig. 1, depict the big picture of the pipeline highlighting the integration and the cooperation of used the techniques:

Fig. 1

The RT_VDT Pipeline.

Camera Calibration: The input to the RT_VDT algorithm is assumed to be a 1200×720 RGB color image. Therefore, the first thing the algorithm does is to remove the distortion and adjust the orientation using a camera calibration method with chessboard images. This camera calibration technique is only executed once at the initialization of the RT_VDT algorithm not with every iteration/frame, hence, not affecting the real-time performance.

Color-space Conversion: the calibrated image is then converted to grayscale as well as several color spaces [18] (e.g. HSL, HSV, LAB, LUV, YUV, YCrCb, etc. [19].). Each color space carries its own specific or unique features that may improve the performance of the RT_VDT. Not all these color spaces will be integrated into the final RT_VDT pipeline. The final pipeline will be determined and explained in detail in the discussion section on this paper, and several of these color-space conversions are carried out for the matter through study, analysis, and testing to achieve this purpose.

Feature Extraction: After the grayscale and color space conversion step, several features will be extracted from the calibrated-camera images such as the Histogram of Oriented Gradients (HOG) [20], color spatial features [21] and color histogram features [22]. The extraction methods of these features will be explained in detail in the next sections. These features are then combined together to produce what is called “feature vectors”.

Vehicle/non-vehicle Classification: These feature vectors are then fed to a vehicle/non-vehicle classifier built and trained offline by the Support Vector Machine (SVM) algorithm [23] to identify which feature vector represents a vehicle and which is not. The complete construction and training of the SVM classifier will be discussed in detail in Section 6.

Potential Vehicle-Object Detection: After the vehicle/non-vehicle classification of the feature vectors, the potential vehicle objects are then detected in the camera images using the SVM classifier in conjunction with sliding windows for scanning the calibrated camera images in order to detect and localize these vehicle objects. The scan is not carried out on the full calibrated camera image, however, a Region of Interest (ROI) is defined and then extracted from each image to perform this exhaustive search. Accordingly, the undesired image details are masked to speed up the detection of vehicle boundaries and improve the focus and accuracy of the detection procedure, which results in potential car boxes.

Vehicle-Bounding-Box Size Determination and Labeling: The results of the above scanning procedure are used to build active heat-maps that accumulate potential car boxes. The overlapped detected true-positive car boxes are then grouped in bigger boxes and labeled accordingly.

Vehicle-Bounding-Box Drawing: As a final step, the labeled boxes are drawn on the original camera image or video frame. For the matter of illustration, working examples of the resultant road boundary are displayed on the original color image as shown in Fig. 2 and Fig. 3.

Fig. 2

Detected Vehicle boundaries by the RT_VDT algorithm.

Fig. 3

Detected Vehicles’ boundaries by the RT_VDT algorithm.

3 Histogram of Oriented Gradients

The HOG is a feature descriptor used in computer vision and image processing for the purpose of object detection [20].

For instance, to detect a specific object ‘O_bj’ in a camera image the following steps can be followed:

The camera image is converted to gray.

Start by constructing a rectangle (or square) window that is 64 pixels tall by 64 pixels wide (the dimensions of the window are arbitrary depending on the designer choice).

Use it to scan the grey camera image searching for O_bj. The search is done by sliding the window both horizontally and vertically with a stride of 8 bits (as an example).

The object O_bj may have of course different sizes and occupy a bigger or small part of the image. Therefore, the analysis should be done not only on the original starting window (64×64) but also on a series (pyramid) of windows with an increment of 16 bits (as an example), like 80×80, 96×96, 112×112, etc. This pyramid of windows corresponds to larger portions of the original camera image where O_bj or part of it could be inside one of them.

In each step of the windows slide, the HOG features are computed and get associated with the center position of the corresponding window as a matter of “feature localization”.

To compute the HOG features, the input to the algorithm is expected to be a certain window ‘W _I’ from a gray-level image, possibly from a pyramid, and the workflow continues as follows and shown in Fig. 4:

Fig. 4

The Histogram of Oriented Gradients Workflow.

Calculate the two gradient components G_x and G_y of the gradient of W_I by central differences: $G_{x} (r, c) = W_{I} (r, c + 1) - W_{I} (r, c - 1)$ (1) $G_{y} (r, c) = W_{I} (r - 1, c) - W_{I} (r + 1, c)$ (2) where r and c are the corresponding row and column numbers of the pixels in window W_I.

The calculated gradient is then converted to polar coordinates as below, with the angle constrained to be between 0° and 180°. As a result, gradients that point in opposite directions are computed as: $G = \sqrt{G_{x} + G_{y}}$ (3) $θ = \frac{180}{π} (\tan_{2}^{- 1} (\frac{G_{y}}{G_{x}}) \mod π)$ (4) where $\tan_{2}^{- 1}$ is the four-quadrant inverse tangent, which yields values between -π and π.

Construct the cell orientation histograms by dividing the window W_I into adjacent, non-overlapping cells of size C×C pixels (could be C = 8). In each cell, calculate the histogram of gradient orientations that are enclosed (binned) into B bins (could be B = 9). If the bins are numbered 0 through B-1 and have width $w = \frac{180}{B}$ , then bin i has boundaries [wi, w (i + 1)] and center $c_{i} = w (i + \frac{1}{2})$ . A pixel with magnitude G and orientation θ contributes a vote of:

$\begin{matrix} v_{j} & = G \frac{c_{j + 1} - θ}{w} to bin number j \\ = [\frac{θ}{w} - \frac{1}{2}] \mod B \end{matrix}$ (5) and a vote of: $v_{j + 1} = G \frac{θ - c_{j}}{w} to bin number (j + 1) \mod B$ (6) This scheme is called voting by bilinear interpolation and the resulting cell histogram is a vector with B positive entries.

The block normalization step is then carried out by grouping the cells together into overlapping blocks of 2×2 cells each. Therefore, each block has a size of 2C×2C pixels. Accordingly, each two horizontally or vertically consecutive blocks overlap by two cells, that is, the block stride is C pixels. Consequently, each internal cell is covered by four blocks. The four-cell histograms in each block are concentrated into a single block feature b and then the block feature ‘b’ get normalized by its Euclidean norm as: $b \leftarrow \frac{b}{\sqrt{∥ b ∥^{2} + ɛ}}$ (7) Where ɛ is a small positive constant that prevents division by zero in gradient-less blocks.

The normalized block features are then concatenated into a single HOG feature vector h, which is normalized as follows: $h \leftarrow \frac{h}{\sqrt{∥ h ∥^{2} + ɛ}}$ (8) $h_{n} \leftarrow \min (h_{n}, τ)$ (9) Here, h_n is the n^th entry of h and τ is a positive threshold (τ= 0.2). Clipping the entries of h to be no greater than τ (after the first normalization) ensures that very large gradients do not have too much influence—they would end up washing out all other image detail. The final normalization makes the HOG feature independent of overall image contrast. An example of the output of the algorithm is shown in Fig. 5.

Fig. 5

Results of applying HOG.

4 Support vector machine classifier

Support Vector Machine (SVM) [26] is a supervised learning model with an associated learning algorithm that analyzes data used for classification and regression analysis [27]. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier.

Given a training dataset of n points of the form $({\vec{x}}_{1}, y_{2}), \dots, ({\vec{x}}_{i}, y_{i}), \dots, ({\vec{x}}_{n}, y_{n})$ (10) where y_i are either 1 or -1, each indicating the class to which the point ${\vec{x}}_{i}$ belongs. Each is a p-dimensional real vector. It is required to find the “maximum-margin hyperplane” that divides the group of points ${\vec{x}}_{i}$ for which y_i = 1 from the group of points for which y_i = -1, which is defined so that the distance between the hyperplane and the nearest point ${\vec{x}}_{i}$ from either group is maximized.

Any hyperplane can be written as the set of points $\vec{x}$ satisfying $\vec{w} . \vec{x} - b = 0$ (11) where $\vec{w}$ is the normal vector to the hyperplane. The parameter $\frac{b}{\vec{w}}$ determines the offset of the hyperplane from the origin along the normal vector $\vec{w}$ .

If the training data is linearly separable, the optimization problem can be written as follows: $Minimize ∥ \vec{w} ∥ subject to y_{i} (\vec{w} . {\vec{x}}_{i} - b) ⩾ 1,$ $for i = 1, 2, \dots, n$ (12)

The $\vec{w}$ and b that solve this problem determine our classifier, $\vec{x} \mapsto sgn (\vec{w} . \vec{x} - b)$ .

If the training data is not linearly separable, the hinge loss function is introduced as $max (0, 1 - y_{i} (\vec{w} . {\vec{x}}_{i} - b))$ (13)

This function is zero if the constraint $y_{i} (\vec{w} . {\vec{x}}_{i} - b) ⩾ 1$ is satisfied, in other words, if ${\vec{x}}_{i}$ lies on the correct side of the margin. For data on the wrong side of the margin, the function’s value is proportional to the distance from the margin. Then the optimization function will be solved:

$minimize {[\frac{1}{n} \sum_{i = 1}^{n} max (0, 1 - y_{i} (\vec{w} . {\vec{x}}_{i} - b))] + λ {\vec{w}}^{2}}$ (14)

where the parameter λ plays a role of determining the tradeoff between two opposing requirements: one is increasing the margin-size and the other is ensuring that the ${\vec{x}}_{i}$ lie on the correct side of the margin. Accordingly, for sufficiently small values of λ, the second term in the loss function will become negligible; consequently, it will perform similar to the hard-margin SVM, if the input data are linearly classifiable. However, it will still learn if a classification rule is viable or not.

If a nonlinear classification rule needs to be learned, and which this non-linear rule corresponds to a linear classification rule for the transformed data points $φ ({\vec{x}}_{i})$ . Additionally, a kernel function k is given which satisfies $k ({\vec{x}}_{i}, {\vec{x}}_{j}) = φ ({\vec{x}}_{i}) . φ ({\vec{x}}_{j})$ . Accordingly, the classification vector $\vec{w}$ in the transformed spaces satisfies $\vec{w} = \sum_{i = 0}^{n} c_{i} y_{i} φ ({\vec{x}}_{i})$ (15) where the c_i’s are obtained by solving the optimization problem

$\begin{matrix} maximize f (c_{i} \dots c_{n}) \\ = \sum_{i = 1}^{n} c_{i} - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} y_{i} c_{i} k ({\vec{x}}_{i}, {\vec{x}}_{j}) y_{j} c_{j} \\ subject to \sum_{i = 1}^{n} c_{i} y_{i} \\ = 0, and 0 ⩽ c_{i} ⩽ \frac{1}{2 n λ} for all i . \end{matrix}$ (16)

The coefficients $c_{i}^{'} s$ can be solved using quadratic programming [28], and then solve $b = \vec{w} . φ ({\vec{x}}_{i}) - y_{i} = [\sum_{k = 1}^{n} c_{k} y_{k} k ({\vec{x}}_{k}, {\vec{x}}_{i})] - y_{i}$ (17)

Finally, new points ( $\vec{z}$ ) can be classified by computing

$\begin{matrix} \vec{z} \mapsto sgn (\vec{w} . φ ({\vec{x}}_{i}) - b) \\ = sgn ([\sum_{k = 1}^{n} c_{k} y_{k} k ({\vec{x}}_{k}, {\vec{x}}_{i})] - b) \end{matrix}$ (18)

5 Camera calibration

The conversion from three dimensional (3D) real-world scene to a two dimensional (2D) one, exhibits by a camera, results in image distortion, as the transformation from 3D→2D is not perfect. Actually, the shape and size of objects get distorted (changed) in the resulting 2D image from the original 3D appearance. Therefore, before using the resulting 2D camera images, this distortion needs to be undone so that the correct and useful information can be extracted and analyzed.

The construction of real cameras includes using a curved lens to form an image. The light rays usually bend around the edges of these lenses with low or high degrees depends on the focus and position of objects. Therefore, distortion at the images’ edges happens, in a way that lines or objects appear to be more or less curved than their actual reality. This effect is called the “radial distortion”, and represents the principal source of distortion.

Moreover, there is another main source of distortion that is the “tangential distortion”. This distortion happens when the camera’s lens is not perfectly aligned parallel to the image plane that is associated with the camera sensor. This produces a tilt effect to the image, which shows objects nearer or farther away than they actually are.

There are three needed coefficients to correct for radial distortion: k₁, k₂, and k₃. To correct the appearance of radially distorted points in an image, one can use a correction formula.

In the following Equation (19), and Equation (20), (x, y) is a point in a distorted image. To undistort these points, the first step is to use OpenCV [29] to calculate r, which is the known distance between a point in an undistorted (corrected) image (x_corrected,y_corrected) and the center of the image distortion, which is often the center of that image (x_c, y_c). This center point (x_c, y_c) is sometimes referred to as the distortion center. These points are illustrated below in Fig. 6. $x_{distorted} = x_{ideal} + (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6})$ (19) $y_{distorted} = y_{ideal} + (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6})$ (20)

Fig. 6

Points in a distorted and undistorted (corrected) images.

There are two more coefficients that account for tangential distortion: p₁ and p₂, and this distortion can be corrected using a different correction formula as given by Equation (21) and (22). $x_{corrected} = x + [2 p_{1} xy + p_{2} (r^{2} + 2 x^{2})]$ (21) $y_{corrected} = y + [2 p_{1} (r^{2} + 2 y^{2}) + 2 p_{2} xy]$ (22)

To correct for the mentioned distortions, images of known shapes (chessboard images) are used. Selected points in the distorted plans are then mapped to undistorted plans as shown in Fig. 7. Accordingly, the camera images will be calibrated. The following procedure is implemented to undistort the captured camera images and improve the image quality:

Fig. 7

Mapping from a distorted chessboard image to an undistorted one.

Step 1 – finding the chessboard corners: Using 20 chessboard images that have different sizes and orientations as depicted in Fig. 8, the “cv2.findChessboardCorners()” function from the OpenCv3 library [29] is used to locate the chessboard corners. The detected number of corners is 9×6 as shown in the 17 out of the 20 images that are depicted in Fig. 8. In the other 3 images, only 9×5 corners have been detected. The corners are drawn using the “cv2.drawChessboardCorners()” function of openCv3.

Step 2 – get camera matrices: A test chessboard image that has not been used before in finding the corners; is used; after being converted to a greyscale; along with the found corners in step one; to find the camera matrices. “cv2.CalibrateCamera()” function is used to perform this step. To check the quality of the calibration, the gray test image together with the camera matrices to remove the distortion of this image as shown in Figure 9.

Step 2 – saving camera matrices: using Pickle library [30], the camera data (the camera matrix as well as the distortion coefficients) are saved in the pickle file “camera_calibration.p” for easy retrieval later.

Fig. 8

Chessboard images used for calibration with corners drawn.

Fig. 9

A test chessboard image with distortion removal.

Figure 10 provides an example of applying the camera calibration procedure on one of the test images.

Fig. 10

Camera calibration effect (undistortion of images).

6 Implementation of the support vector machines classifier

In this section, the steps to build a classifier based on the SVM algorithm described in Section 4 will be explained in detail, and it is given the abbreviation “SVMC”.

6.1 Training data preparation

The data preparation steps to train the SVMC is summarized as follows

The data supplied by Udacity [31, 32]: the Udacity supplied data have been used throughout this work. The data consists of almost balanced “non-vehicles” and “vehicles” images:

The “non-vehicles” collections consist of the “GTI” collection [33] and the “Extras”. Both contain 8968 RGB images of size (64, 64, 3) pixels.

The “vehicles” collections consist of the “GTI” collection and the “KITTI” [14]. Both contain 8792 RGB images of size (64, 64, 3) pixels.

These collections with an unzipped size of 149MB.

Data Augmentation: The data is augmented by flipping all the images around the “Y” axis. As a result, the training data become a total of 35,520 images.

6.2 Training data visualization

The following steps describe the implemented data visualization steps in order of execution:

Display of Vehicles Data: 50 randomly selected images of the vehicle data have been displayed as shown shown in Figure 11. Each image has its order in the training data as a title.

Display of Non-Vehicles Data: 50 randomly selected images of the non-vehicle data have been displayed as shown in Figure 12. Each image has its order in the training data as a title.

Display of HOG features of Vehicles Data: A selected image of the vehicle data has been used to extract its hog features after converting it to grayscale. Moreover, the hog features of non-vehicle examples are also extracted, and the result of both is shown in Figure 13.

Fig. 11

Visualization of 50 randomly selected vehicle images.

Fig. 12

Visualization of 50 randomly selected non-vehicle images.

Fig. 13

Visualization of HOG features for vehicles and non-vehicle images.

6.3 Training data visualization

The following steps describe the implemented images feature extraction functions in order of execution:

Color Spatial Features: a function is implemented to extract the contribution of different color channels in each image. Or in other words, to compute the binned color features. The channel of each image is resized to (32×32) and then raveled.

Color Histogram Features: a function is implemented to compute the histogram of each color channel in each image with a designated number of pins, and then concatenate them.

HOG Features: a function is implemented to compute the histogram oriented gradients of each image channel separately and then can use them separately or append them together if this option is selected. The SciKit-Image function “hog” [34] is used in the implementation of this function.

Combining All: The above feature extraction functions produce the following feature vectors:

Using the color spatial features and ‘spatial size = (32, 32)’ results in a feature vector of 32×32×3 = 3072 elements.

Using the color histogram features and ‘histogram bins = 32’ results in a feature vector of 32×3 = 96 elements.

Using the HOG features and ‘gradient orientations cells = 9’, ‘pixels per cell = 8×8’, ‘cells per block = 2×2’, and using all color channels results in a feature vector of 7×7×2×2×9 = 1764 ×3 = 5292 elements.

If all the above functions are used the resulting feature vector will be of the following length: 3072 + 96 + 5292 = 8460 elements.

6.4 Training the classifier

The following steps are used to build up and train the vehicle/non-vehicle SVMC classifier:

Compiling a training data set “X” of 35,520×8,460 size which includes 35,520 vehicle/nonvehicle feature vectors of length 8,460 each. This training set represents the input to the classifier.

The feature sets must be scaled; before combining them together; using the SciKit-Learn “StandardScaler().fit()” function [35]. Figure 14 shows the visualization of raw and normalized feature vectors for a vehicle image.

Compiling an output training set “Y” of a 35,520×1 size in which each element is of a Boolean value of 1 = >vehicle or 0 = >non-vehicle.

Shuffle the training sets randomly and split them to 80% for training and 20% for testing using the SciKit-Learn “train_test_split()” function.

Using a Linear Space Vector Machine Classifier function “LinearSVC()” of the Sci-Kit Learn library [36], the model got trained with high accuracy (above 97.7%) in almost all the selected parameters combinations. Then the trained model is tested on the prepared test images. The results were not good in several cases. Extreme experimentations have been done with many parameter combinations, however, the results still were not acceptable.

After several trials and errors, it is found that the color spatial features are taking a significant portion of the feature vector length (>36%) without adding a real value (sometimes even represents a confusing element) to the distinction between the vehicles / non-vehicles. Moreover, the color histogram features are of a very insignificant contribution (∼ 1.1%) of the feature vector as well as to the distinction between vehicles / non-vehicles.

Therefore, both the color special and histogram features have been removed from the feature vector and keeping only the HOG features. By doing that, this results in a reduction in the length of the feature vector from 8,460 to 5,292 features only. This is off-course simplifies the training and the real-time application of the algorithm, and results in a huge reduction of processing and training time.

The new Linear SVC classifier with a training data set of size = 35,520×5,292 is constructed using several color spaces with the training results shown in Table 1.

Almost all the color spaces produced comparable results except the “RGB”. The “LAB” color space produces the fastest performance in both training and prediction with second to highest accuracy behind the “YUV”. However, while testing on test-images “YUV” produced false positives more than “LAB”. Therefore, “LAB” color space is selected for the next steps.

The training time of the SVC does not affect the performance of the RT_VDT pipeline as it is done offline; however, the prediction time for the labels does affect the performance, as it will be part of the detection time for each camera frame.

Fig. 14

Visualization of feature vectors for vehicles’ images.

Table 1

Linear SVC training results

Colour Space	Training Time (Sec)	Prediction Time for 10 Labels (Sec)	Test Accuracy
RGB	19.5	0.01563	0.9716
HSV	8.94	0.001	0.9865
HLS	8.83	0.0015	0.9831
LUV	8.79	0.002	0.9876
YCrCb	7.76	0.002	0.9899
YUV	8.34	0.003	0.9918
LAB	5.7	0.001	0.9916

7 Vehicle detection and tracking pipeline

The following steps constitute the pipeline used in the detection and tracking of other vehicles on the road (RT_VDT). These steps are presented in order of execution:

Finding lane lines: this function is mainly to detect the road boundaries (in other words, the lane lines in front of the car) which represent the driving space (shown in green in 0). This function is fully implemented in [2] and used here for convenience.

Detecting vehicles by sliding windows technique: a dedicated function is implemented and called for each camera frame and used the following parameters:

“orient = 9” defining the number of histogram bins per cell and it is used for the HOG feature extraction for images or video frames.

“pix_per_cell = 8” defining the number of HOG pixels/cell. In this case, the cell will be 8×8 pixels.

“cell_per_block = 2” defining the number of HOG cells/block. In this case, the cell will be 2×2 cells.

x_start, x_stop, y_start, y_stop: these 4 parameters define a rectangular area on the image or frame that represents the region of interest (ROI) in which the function searches for a vehicle by the sliding windows technique.

“step_size = 2” defining how many cells to step (or to slide) to construct a new search window that will overlap with the previous search window.

“Scale_Step = 0.25” defining the step at which the search window sizes increments from one search scan to the next.

Scale_Multiplier_Start, Scale_Multiplier_End: two parameters defining the starting and stopping of the windows sizes increment while scanning the ROI area.

The function uses the trained SVMC classifier model and applies it to each constructed search window. Sliding windows with different sizes are being constructed to cover the defined ROI as shown in Figure 15. This function as well may be applied several times with a different set of “a→g” parameters based on if it found necessary.

Building active heat-maps: The goal is to construct a heat-map for each found car box during the search of a sliding-windows scan. This heat-maps is used to filter out (try to minimize) the false-positive boxes. A dedicated parameter “HEAT_THRESHOLD” is used to only pass (based on its value) the car boxes with multiple hits (true-positive boxes) as shown in Figure 16.

Labeling car boxes: the overlapped true-positive vehicle boxes are then grouped in bigger boxes and labeled using the “label()” function from the Sci-Kit Learn library.

Drawing the labeled car boxes: as a final step, the labeled boxes are drawn on the original test image or video frame as shown by the red boxes in Figures 2 and 3.

Fig. 15

Sliding windows with different sizes scanning the ROI.

Fig. 16

Detected vehicle boxes and the resulted heat-maps.

Figures 17 & 18 show examples of the results after the execution of the above pipeline on the test images that include shadow patterns that usually confuse vision-based algorithms.

Fig. 17

The execution of vehicle detection and tracking pipeline.

Fig. 18

The execution of vehicle detection and tracking pipeline.

8 Testing and validation

The developed RT_VDT algorithm is further tested on various images representing different scenarios. The results show that the algorithm performs very well under different conditions (at full sunrise, at sunset, with shadows, without shadows, with cars on the other lanes and without). Furthermore, for robustness testing and validation of the developed pipeline, the algorithm is applied to several real-time video samples representing different driving conditions. The RT_VDT proved to be very robust in all the pre-mentioned conditions as shown in Figures 2 and 3. However, the scattered areas of shadows have an effect on the precision of producing the vehicles’ boundary boxes as shown in 17 and Fig. 18. However, the results are still acceptable and produce functional output.

As shown in Figure 2, Figure 3, Figure 17 and Figure 18 the images include as well lane detection results from the work in [6].

The pipeline proved to be acceptably fast in execution in real-time. Using an Intel Core i5-4200U @1.6GHz (2 cores) with 8GB RAM which very moderate computational platform, the following measurements are collected for two testing video streams:

The lowest measured processing speed is 10.01 frames per second, which is considered adequate as per the recommended performance for this application [17]. Therefore, the more powerful computational hardware if employed should significantly enhance the real-time performance of the proposed pipeline.

For the assessment of the RT_VDT performance, the experimental results are evaluated based on the three statistical measures test of a binary classification [37]: Precision, Recall and Intersection over Union (IoU). Precision measures how accurate the predictions are which is indicated by the proportion of actual-positive samples to all positively-identified samples. Recall measures the proportion of actual positive samples that are correctly identified (e.g., the percentage of vehicle images, which are identified as a true car image). IoU measures It measures the overlapping percentage between the predicted area and the ground-truth area, which is to measure how good our detector is with respect to the ground-truth. Their expressions are: $Precision = \frac{TP}{TP + FP}$ (23) $Recall = \frac{TP}{TP + FN}$ (24) $IoU = \frac{Area of Overlap}{Area of Union}$ (25) where TP is the number of true positives; number of vehicle images correctly classified; TN is the number of true negatives; number of non-vehicle images correctly classified; FP is the number of false positives; number of vehicle images classified as non-vehicle; FN is the number of false negatives: number of non-vehicle images classified as a vehicle.

The well-established average precision (AP) and intersection over union (IoU) metrics [37] that has been widely used to assess various vehicle detection algorithms are used here to evaluate the performance and compare it to the state-of-the-art techniques [38]. Single-Shot Detector (SSD) [38] is one of the state-of-the-art single-stage detectors, which make predictions by utilizing different resolutions of feature maps. You Only Look Once (YOLO) [39] is another type of single-stage detector, which makes predictions by regarding raw image data as a 7×7 grid. Moreover, Faster R-CNN [40] is another state-of-the-art detector that was the first to incorporate Region Proposal Network (RPN) as a Region of Interest (RoI) candidate extractor.

Table 3 below compares the proposed RT_VDT technique to the state-of-the-art ones that are based on deeplearning (e.g. SSD, YOLO, and Fast R-CNN), and this comparison is based on the KITTI dataset [14]. It is obvious that the deeplearning algorithms have higher performance than the RT_VDT in terms of detection precision, however, at the expense of enormous computational cost. For example, YOLOv2 shows high performance in terms of Average Precision (AP) as well as real-time performance on a high-end GPU (∼37 FPS). However, on a very high-grade CPU, the performance is extremely poor (∼0.08 FPS) compared to the 12.52 FPS of the RT_VDT on a lower-cost affordable CPU. For ADAS applications with limited computational resources, the feature cost is as important as accuracy, and a delicate balance between them is what the automotive industry requires.

Table 2

Computation Speed for the RT_VDT Algorithm

Sample Name	No. of Frames	Total Time Min:Sec	Frame per Sec
Challenge Video	485	00:39	12.52
Challenge Video + Lane Detection	485	01:24	5.77
Project Video	1261	02:06	10.01
Project Video + Lane Detection	1261	03:26	6.11

Table 3

Comparison of different techniques on the KITTI car-detection validation set

Technique	KITTI – Average Precision (AP) %			GPU Time (Sec)	CPU Time (Sec)
	Easy	Moderate	Hard
RT_VDT	87.19	77.40	60.60	0.058 NVIDIA Tesla K80, 13GB RAM	0.079 i5-4200U @1.6 GHz (2 Cores)
SSD [41]	97.68	93.44	80.36	0.087 NVIDIA GeForce GTX 1080Ti @1.6 GHz	16.784 i7-6820HQ (4 Cores) @ 2.70 GHz
YOLOv2 [41]	95.73	89.15	78.69	0.027 NVIDIA GeForce GTX 1080Ti @1.6 GHz	12.597 i7-6820HQ (4 Cores) with 2.70 GHz
Fast R-CNN [41]	96.55	90.21	81.79	0.093 NVIDIA GeForce GTX 1080Ti @1.6 GHz	14.721 i7-6820HQ (4 Cores) with 2.70 GHz

The RT_VDT pipeline is executed as well on the google Colab cloud platform [42] in two different modes: GPU (NVIDIA Tesla K80, 13GB RAM) and TPU (v2) [42]. The best results achieved on the GPU is 0.058 Sec and on TPU is 0.073 sec. These trials indicating that not much difference in performance is taking place compared to the results on CPUs. The existence of the GPU added only an improvement of 27% in computational speed, while the TPU is adding only 7.5%. The justification for these results is that the GPU is mainly speeding the matrix operations and the developed pipeline does involve much of matrix operations. Moreover, the TPU is mainly designed to speed up computation based on tensors which are not used in the formulation of the RT_VDT algorithm [43]. The performance of the RT_VDT algorithm is also illustrated in Fig. 19 and Fig. 20 below.

Fig. 19

Detected Vehicle boundaries by the RT_VDT algorithm on the KITTI dataset - 1.

Fig. 20

Detected Vehicle boundaries by the RT_VDT algorithm on the KITTI dataset - 2.

9 Discussion of the implemented approaches

The following points shed some light on some technical tricks and aspects that have been tried or implemented in the described pipelines:

Color spaces: around 7 different color spaces have been tried on both testing images and videos. Throughout the experimentation, both HSV and LAB produced the best results in both vehicle finding and lower false positives. The other color spaces like YUV, LUV, YCrCb, HLS produces comparable results. However, RGB produced the worst results among them by far. Therefore, HSV and LAB are adopted during the development and testing phases.

Decision function: After applying the trained SVMC model on every constructed sliding widow to search for vehicles, the decision function [35] (from the SciKit-Learn library) is used instead of a simple prediction function. The decision function returns the probability of the object being a vehicle or not. So, positive probabilities mean that the object is at least 50% a vehicle, and accordingly, negative probabilities mean it is more than 50% non-vehicle object. By defining a new parameter “Confidence_score” which identifies the confidence for an object of being a vehicle [44]. The higher the positive value the higher the confidence for the object of being a car. Using decision function helped reducing false positives significantly.

Heat-maps filtering: the calculated heat-maps on each frame are not used directly, however, they will be been filtered using an FIR filter. This FIR is designed to use the current and the previous values of the previous four frames, before applying a threshold. This technique helped to smooth out the constructed vehicle windows and helped in reducing false positives as well.

Vehicle box vertices filtering: Similar to the heat-maps filtering, the constructed vehicle boxes are also filtered out using FIRs. The calculated vertices are not used directly but got filtered first using the calculated values of the previous three frames. This technique helped to reduce the jitter of the position and the size of the identified final car boxes for each frame.

Identification of the regions of interest: the RT_VDT pipeline has been constructed to include the designation of several ROI search areas by both x and y-axis. This approach helped to more accurately identify search areas, reduces the search time, improve search performance and eliminates undesired false positives.

Frame sampling: throughout the experimentation, it is found that it is not necessary to search for vehicles every frame at the current sampling rate of the camera (25 FPS), as the movement of vehicles from frame to frame is not that fast. Therefore, the active search for vehicles is restricted to every other frame, which reduces the video processing time by half and almost didn’t affect the result at all.

Sanity checks: some sanity checks are used to improve the identified vehicle boxes like:

Vehicle box size: the identified vehicle box size is being measured and checked out before it is being drawn to the image or video frame. This is done by measuring the diagonal of the identified box and compare it with certain specified constraints.

Vehicle box position: some checks are added to validate the position of the identified car boxes. For example, in the test images, vehicle boxes can’t be found at a position lower than “y = 400”.

10 Conclusion

In this paper, reliable and sophisticated vehicle detection and tracking technique based on handcrafted-feature extraction is developed, presented thoroughly and given the name RT_VDT. RT_VDT uses a pipeline of distinguished color spaces such as LAB, YUV, LUV, etc. In addition, it uses computer-vision algorithms like HOG features, and machine learning algorithms like Support Vector Machines. Besides, the pipeline uses a comprehensive image distortion suppression and camera calibration techniques to produce undistorted road images suitable for more accurate vehicle detection. Furthermore, several sanity-check tricks are exercised to improve the robustness of the used techniques. The proposed RT_VDT technique needs only raw RGB images from a single CCD camera mounted behind the front windshield of the vehicle. The performance of the RT_VDT algorithm is tested and evaluated using many stationary images and several real-time videos. The validation results show a fairly accurate and robust detection with a slight insignificant deviation in one scenario where complex shadow patterns exist. The measured throughput (execution time) using an affordable CPU proved that the RT_VDT is very suitable for real-time vehicle detection even without adding further processing power like GPUs. Furthermore, the performance of RT_VDT technique is compared to the state-of-the-art deeplearning ones on the KITTI dataset. The deeplearning algorithms have higher performance than the RT_VDT, however, at the expense of much higher computational cost (high-end GPUs). However, on lower-cost CPUs, the RT_VDT real-time performance clearly shows that it is fitting for ADAS functions or self-driving cars. Future work will focus on augmenting the technique with the detection and tracking pedestrians and cyclists.

Footnotes

Acknowledgments

This work used the High-Performance Computing (HPC) facilities of the American University of the Middle East, Kuwait.

References

Farag

, Traffic signs classification by deep learning for advanced driving assistance systems, Intelligent Decision Technologies, IOS Press 13(3) (2019), 215–231.

Farag

and Saleh

, Lane-Lines Detection in Real-Time for Advanced Driving Assistance Systems, Intern Conf on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT’18), Bahrain, 18-20 Nov., (2018).

Mansour

and Farag

, AiroDiag: A Sophisticated Tool that Diagnoses and Updates Vehicles Software Over Air, 2012 IEEE Intern. Electric Vehicle Conference (IEVC), TD Convention Center Greenville, SC, USA, March 4, 2012, ISBN: 978-1-4673-1562-3.

Farag

, CANTrack: Enhancing automotive CAN bus security using intuitive encryption algorithms, 7th Inter Conf on Modeling, Simulation, and Applied Optimization (ICMSAO), UAE, March 2017.

Farag

, A Comprehensive Real-Time Road-Lanes Tracking Technique for Autonomous Driving, International Journal of Computing and Digital Systems (IJCDS) 9(3) (2020), 349–362.

Farag

and Saleh

, An Advanced Road-Lanes Finding Scheme for Self-Driving Cars, Smart Cities Symposium (SCS’19), IET Digital Library, Bahrain, 24-26 March, (2019).

Farag

and Saleh

, Behavior Cloning for Autonomous Driving using Convolutional Neural Networks, Intern Conf on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT’18), Bahrain, 18-20 Nov., (2018).

Farag

, Recognition of traffic signs by convolutional neural nets for self-driving vehicles, International Journal of Knowledge-based and Intelligent Engineering Systems, IOS Press 22(3) (2018), 205–214.

Farag

and Saleh

, Tuning of PID Track Followers for Autonomous Driving, Intern Conf on Innovation and Intelligence for Informatics, Computing, and Technologies(3ICT’18), Bahrain, 18-20 Nov., (2018).

10.

Farag

, Safe-driving cloning by deep learning for autonomous cars, International Journal of Advanced Mechatronic Systems, Inderscience Publishers 7(6) (2019), 390–397.

11.

Farag

, Cloning Safe Driving Behavior for Self-Driving Cars using Convolutional Neural Networks, Bentham Science Publishers, The Netherlands, Recent Patents on Computer Science 12(2) (2019), 120–127(8).

12.

Farag

and Saleh

, An Advanced Vehicle Detection and Tracking Scheme for Self-Driving Cars, 2nd Smart Cities Symposium (SCS’19), IET Digital Library, Bahrain, 24-26 March, (2019).

13.

Wei

, He

, Zhou

, Chen

, Tang

and Xiong

, Enhanced Object Detection With Deep Convolutional Neural Networks for Advanced Driving Assistance, IEEE Trans on Intelligent Transportation Systems, April 2019.

14.

Geiger

, Lenz

, Stiller

and Urtasun

, Vision meets robotics: The KITTI dataset, The International Journal of Robotics Research 32(11) (2013), 1231–1237.

15.

, Xu

, Xiao

, Chen

, He

, Qin

and Heng

P.-A.

, SINet: A Scale-insensitive Convolutional Neural Network for Fast Vehicle Detection, IEEE Transactions on Intelligent Transportation Systems 20(3) (2019), 1010–1019.

16.

Xiao

, Vehicle Detection in Deep Learning, M.Sc. Thesis, Virginia Polytechnic Institute & State University, USA, 2019.

17.

Botsch

, Real-time lane detection and tracking on high-performance computing devices, Bachelor’s Thesis in Informatics, Technische Universitat, Munchen, Germany, March 2015.

18.

Rogowitz

B.E.

, Pappas

T.N.

and Daly

S.J.

, Human Vision and Electronic Imaging XII, SPIE, 2007, ISBN 0-8194-6605-0.

19.

Shevell

S.K.

, The Science of Color, 2nd ed., Elsevier Science & Technology, ISBN 0-444-51251-9. (2003), 202–206.

20.

Dalal

and Triggs

, Histograms of oriented gradients for human detection, IEEE Computer Society Conf on Computer Vision and Pattern Recog. (CVPR’05), 20-25 June 2005, San Diego, CA, USA.

21.

Kankanhallia

M.S.

, Mehtreb

B.M.

and Huang

H.Y.

, Color and spatial feature for content-based image retrieval, Pattern Recognition Letters, Elsevier 20(1) (1999), 109–118.

22.

Sergyan

, Color histogram features based image classification in content-based image retrieval systems, 6th International Symposium on Applied Machine Intelligence and Informatics, 21-22 Jan. 2008, Herlany, Slovakia.

23.

Cortes

and Vapnik

V.N.

, Support-vector networks, Machine Learning 20(3) (1995), 273–297. DOI: 10.1007/BF00994018

24.

Feng

, Rosenbaum

and Dietmayer

, Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network For Lidar 3D Vehicle Detection, 21st IEEE Intern Conf on Intelligent Transportation Sys. (ITSC), Hawaii, USA, Nov. 2018.

25.

Ben-Hur

, Horn

and Siegelmann

, and VN Vapnik, Support vector clustering, Journal of Machine Learning Research 2 (2001), 125–137.

26.

Hsu

C.-W.

and Lin

C.-J.

, A Comparison of Methods for Multiclass Support Vector Machines, IEEE Transactions on Neural Networks, 2002.

27.

Farag

and Tawfik

, On fuzzy model identification and the gas furnace data, Proceedings of the IASTED International Conference Intelligent Systems and Control, Honolulu, Hawaii, USA, August 14-16, (2000).

28.

Nocedal

and Wright

J.S.

, Numerical Optimization, (2nd ed.), Berlin, New York: Springer-Verlag, (2006), pp. 449, ISBN 978-0-387-30303-1.

29.

Kaehler

and Bradski

, Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library, O’Reilly Media, ISBN 978-1-4919-3800-3, 2006.

30.

Python Pickle Module, https://docs.python.org/3.1/library/pickle.html, retrieved on (24 Sept. 2019).

31.

Udacity vehicles data, https://s3.amazonaws.com/udacity-sdc/Vehicle_Tracking/vehicles.zip, retrieved on (24 Sept. 2019).

32.

Udacity non-vehicles data, https://s3.amazonaws.com/udacity-sdc/Vehicle_Tracking/non-vehicles.zip, retrieved on (24 Sept. 2019).

33.

GTI vehicle image database, http://www.gti.ssr.upm.es/data/Vehicle_database.html, retrieved on (24 Sept. 2018).

34.

The HOG feature descriptor, http://scikit-image.org/docs/dev/auto_examples/features_detection/plot_hog.html, retrieved on (24 Sept. 2018).

35.

SciKit-Learn StandardScaler Function, http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html, retrieved on (24 Sept. 2018).

36.

Linear SVM Classifier Function, http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html, retrieved on (24 Sept. 2018).

37.

Everingham

, Van Gool

, Williams

C.K.

, Winn

and Zisserman

, The pascal visual object classes (VOC) challenge, International Journal of Computer Vision 88(2) (2010), 303–338.

38.

Liu

, et al., SSD: Single Shot multi-box Detector, in Computer Vision—ECCV, New York, NY, USA: Springer, (2016), pp. 21–37.

39.

Redmon

and Farhadi

, Yolo9000: Better, faster, stronger, in Proc IEEE Conf Computer Vision and Pattern Recognition, (2017), 7236–7271.

40.

Ren

, He

, Girshick

R.B.

and Sun

, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6) (2017), 1137–1149.

41.

Liu

, Cao

, Lasang

and Shen

, Modular Lightweight Network for Road Object Detection Using a Feature Fusion Approach, IEEE Trans Systems Man & Cybernetics, 2019.

42.

Google Colaboratory, https://colab.research.google.com/notebooks/welcome.ipynb, accessed on (5 Jan 2020).

43.

Nagiub

and Farag

, Automatic selection of compiler options using genetic techniques for embedded software design, IEEE 14th Inter. Symposium on Comp. Intelligence and Informatics (CINTI), Budapest, Hungary, 19 Nov., (2013).

44.

Farag

, Synthesis of intelligent hybrid systems for modeling and control, Ph.D. Thesis, ECE Dept., University of Waterloo, Canada, 1998.