Abstract
Traffic sign detection and recognition has been a topic of research for at least the last two decades. Efforts are being made to reliably detect candidate traffic signs in natural uncontrolled environment and to recognize their contents. For detection, a large proportion of relevant literature discusses color based segmentation by either sticking to a predefined color space (e.g., RGB, HSI, YCbCr etc.) or make use of empirically selected subset of eigen space to achieve partially data dependent segmentation. Since, the input RGB data for various color classes and the background is not linearly separable, none of the existing methods guarantee to achieve complete separation among pixels corresponding to traffic signs and the background objects. To tackle this problem, we propose a completely data driven segmentation technique that adaptively selects an optimized color space based on available training data. To recognize the contents of potential traffic signs, we present a hybrid spatio-frequency radial feature extraction technique with an emphasis on the regions containing useful information. We explore the energy compaction property of steerable discrete cosine transform for feature extraction and augment it with well known circular histogram of oriented gradients in a pyramid. Using our proposed method, experiments on (1) German Traffic Sign Detection Benchmark, (2) our self collected dataset and on a (3) hand crafted version of the combination of the two provide competitive performance compared to various latest and state of the art methods by achieving up to 0.978 precision and 0.98 recall values at an expense of only an insignificant additional computational cost. The method also obtained 0.81 precision on traffic signs partially occluded by other objects.
Introduction
Traffic sign detection and recognition (TSDR) is an active field of research to develop driver assistance systems with the objective of automatically detecting and reading various types of traffic signs under varying environmental conditions [1]. Another purpose of TSDR is inventory management [2] where traffic signs are extracted from videos captured during a field survey and are labeled according to their physical condition.
Both the subproblems of TSDR i.e., detection and recognition are challenging in real world uncontrolled situations due to problems like varying illumination, partial occlusions, and signs deterioration. [1, 4].
Some methods take the task of detection by segmenting traffic signs on the basis of color because traffic signs, most of the times, possess high contrast edges which make them easily distinguishable in varying background conditions [5]. Color based segmentation is quite common and many initial systems used thresholding in various color spaces. The most common being HSI, because the hue component, an indicator of color, is invariant to illumination changes. Methods relying on statistical distribution of colors of interest in a selected color space constitute another set of segmentation techniques [6–8].
For recognition, the first step is to extract features of the pictogram containing traffic sign. This is a one dimensional representation of the shape expected to be invariant to noise, illumination, and rotation etc. Various spatial and spectral methods are in use for this purpose. To extract features in spatial domain, the most common approach is to divide the image in various overlapping or non overlapping blocks and features are extracted in each block. The overall image descriptor is then obtained by concatenating all local features. An extension of block based approach is multiscale feature extraction [9]. In this technique, the process of dividing the input image in blocks is repeated at various spatial scales and finally the feature vectors computed at each level are concatenated to obtain the full image descriptor. This method is useful, especially, when some objects in a scene are small, others are large and the description of all the objects is required to be present in the final image descriptor [10]. The shape descriptor thus obtained is more detailed but the technique suffers due to (1) high computational cost and (2) irrelevant and redundant data. Shape descriptors, obtained by concatenating features computed at different spatial/spectral levels, are generally high dimensional [11] and hence training and testing models is time consuming. In literature, we find Principal component analysis and linear discriminant analysis [12] used to reduce dimensionality of descriptors.
We, in this paper, present a novel automatic traffic sign detection and recognition approach. To detect a potential traffic sign, we map to an optimized color space that ensures maximum statistical discrimination among traffic signs and the background while the data is not linearly separable. Super class is obtained by training a neural network using extreme learning machine [13]. Features from the potential traffic signs are computed using a hybrid spatio-frequency technique. This multilevel feature extraction approach is also effective to recognize partially occluded traffic signs with reasonably high accuracy. In an effort to enhance performance of the classifier, dimensionality of the image descriptor is reduced with the help of feature interaction among data features and the classes [14]. Finally, an SVM classifier [15, 16] is trained on the reduced subset of training data to classify test images.
Our contribution is summarized below: We propose a fully data driven novel traffic sign detection method by working in a custom color space keeping in mind the non linearly separable nature of training data. To recognize contents of traffic signs, we propose a hybrid radial multiscale feature extraction method. Features are extracted by working intelligently at multiple scales and by giving importance to the most informative portion of the pictogram. To get rid of irrelevant and redundant attributes in long hybrid feature vector, we use an effective feature selection strategy based on feature interaction. In addition to a benchmark dataset, the proposed traffic sign detection and recognition algorithm is tested on a self collected dataset obtained from the longest national highway in Pakistan (N5 highway) represented with deteriorations typical to the developing world.
The rest of the paper is organized as follows: Section 2 presents state of the art and recent work related to automatic detection and recognition of traffic signs (TSDR). Section 3 demonstrates, in detail, our proposed TSDR method along with suitable examples. Section 4 gives an overview of the datasets, explains experimental settings and the results compared with the state of the art methods. Section 5 discusses some important implementation details and finally, Section 6 presents conclusions and possible future directions.
Related work
A lot of early methods for traffic sign detection utilize color thresholding based segmentation in various color spaces e.g. RGB, HSI, and CIELab etc [2, 18]. Fleyeh and Davami [19] performed segmentation using HSI color space by putting their mainstay on Hue, an illumination invariant, channel. Ellayani et al. [17] used thresholding in all three channels of HSI color space to segment traffic signs in complex natural scenes. Yang et al. [7] presented segmentation on the basis of maximum posterior probability of gray levels in color probability map of input RGB image. Likelihood and prior are pre-calculated on the training samples selected from the given database and the method was tested on GTSDB [20] dataset. A similar work by Tsai et al [6] reports an eigen analysis based segmentation of colors and found Ohta transform [7, 21] to be the best. Kim in [8] applied the idea of eigen color space presented by Ohta [21] for segmentation in HSI and YCbCr color spaces. Saliency is another technique used to segment traffic sign in real world images. Wang et al. [22] present a HOG based cascaded feature extraction on color saliency of traffic signs. Their proposed system works in multiple stages. In the first stage, a simple to train, linear SVM classifier rejects the windows of the image which do not contain a traffic sign, then in the preceding stages, more complex versions (with more complex kernel choices) are used to select only those windows which definitely contain a traffic sign in a coarse to fine manner. Barnas et al. present a radial symmetry based traffic sign segmentation strategy that doesn’t rely on colors [23]. This is useful when the sign is deteriorated and the colors are wiped away due to severe environmental conditions. Greenhalg and Mirmehdi [11] proposed to use maximally stable extremal region (MSER) on normalized red/ blue image transformed from RGB domain. This color and shape based technique looks for the most stable shapes under varying thresholds [24]. An achromatic approach to localize traffic signs is given by [9] and the superclass is identified on the basis of pyramidal feature extraction and Histogram of Oriented Gradients (HOG) [25] is computed at different spatial scales.
Extracting invariant features from the blobs containing candidate traffic signs is a very important step [26]. Ideally, these features are signature of shapes and remain unaffected of variations in illumination, partial occlusion and deterioration of traffic signs [1]. HOG [25] and its variants are by far the most widely used feature extraction techniques. Initially devised for pedestrian detection in images, Greenhalgh and Mirmehdi [11] first used this technique to recognize contents of traffic signs. Since then many variants of the technique e.g. HOG on HSI [17], signed and unsigned HOG [27] and HOG on occlusion maps have been proposed [28]. Bascon et al. [29] presented a distance to border technique to generate feature vectors from various shapes e.g. circular, triangular and hexagonal containing traffic signs. Recently Liu et al [30] presented a solution for occluded traffic signs using local binary pattern (LBP) based color cubic features. The method is based on automatic color thresholding on an integral image in various color spaces and was tested on a Chinese traffic signs database. A few efforts have been made to reduce the dimensionality of feature vector in order to feed only the most relevant and least redundant features to the classifier [12, 31].
Recently, authors in [12] have proposed extracting features on the basis of log polar transforms [32] claiming that the descriptors thus computed are invariant to scale and rotation. Authors in [33] have proposed a long descriptor to extract features of German traffic sign detection benchmark on the basis of bispectrum and gray level co-occurrence matrix [24]. Another recent technique inspired by biomedical analysis of images in conjunction with extreme learning machine was proposed by [31].
Detection and recognition of traffic signs with partial occlusions has rarely been touched in literature. Authors in [28] presented a technique based on HOG, LBP and SVM to detect traffic sign in complex outdoor scenes. Yawar et al. proposed a technique [34] to train a classifier only on those portions of a traffic sign that are unlikely to be occluded and called them the discriminative patches. Another recent technique [30] discussed detection of occluded traffic signs using color cubic LBP features and a cascade of tree classifiers. There is, however, a lot of gap in the field of traffic sign detection and recognition in the presence of partial occlusions.
Our proposed method
Mapping on different color spaces using linear transformation
Mapping on different color spaces using linear transformation
In this step, candidate traffic sign is segmented in a natural scene. For this purpose, as mentioned in Section 2, all previous work uses one or more of many well known color spaces either adopted experimentally or by mapping data on empirically chosen subset of eigen space of color data [6–8]. Keeping in view that the color data is not linearly separable, this work proposes an optimum solution that finds a global maximum in transformed color space. This ensures maximum separation among different color classes by keeping spread of each class as concise as possible.
As mentioned above, given an RGB input, a pixel value in transformed domain is computed with the help of a linear transformation. An illustration of the concept is also given in Table 1 and it can be seen that the parameters of this transformation are different for different color spaces. In our proposed work, maximum discrimination among classes of different colors is achieved by obtaining parameters of the linear transformation from the solution of an optimization problem. In this way, we guarantee to have the optimum discrimination in a given situation. A conceptual description of the process is shown in Fig. 1 in which pixels belonging to different color classes in input RGB color space are mapped to another color space such that their inter centroid distance is increased by keeping the variance of each class as small as possible. The three clouds correspond to two traffic sign colors (i.e., red and blue) and background colors. A famous technique, linear discriminant analysis (LDA), reduces within class variance and increases between class variance but the data is mapped to a hyperplane.
On the other hand, we quickly solve an optimization problem with the help of a set of heuristics to map to another color space (not necessarily a hyperplane) and hence ensuring least loss of information. Since the data for various color classes is not linearly separable, it may not appear absolutely sparse in the transformed color space but is guaranteed to have the optimum discrimination with the available training stuff. It should be noted that with the help of a non linear transformation a better separation on training data can be achieved but that may lead to over fitting [35]. Authors in [6] and [7] adopted the work of Ohta [21] to achieve data dependent segmentation by mapping data to eigen space and chose two empirically best principal components whereas our proposed approach is linear and completely data driven.

An illustration of discrimination achieved using linear mapping.
Major steps in the detection process are summarized in Fig. 2. Training data comprises of RGB information of color pixels corresponding to traffic signs from training images. These values are collected manually and are fed to a parameter estimator module that computes elements of matrix

Block diagram of our proposed detection procedure.
The optimization problem to be solved is given in Equation 1. It is obvious that the Euclidean distance is large for large inter class distance and small variance of each color class. In order to obtain the maximum discrimination, the expression is to be maximized for every two color classes namely C
i
and C
j
. E
g
(.) is a 3 × 1 vector representing expected value or center of a class in transformed color space whereas
Elements of matrix
Let

Collecting samples of a given color class from images in training dataset.
Now, as mentioned in Equation 7, Bayes rule [35] can be used to compute the posterior probability of class C
i
given
Applying Equation 7 on a test image generates a grayscale image with colors belonging to potential traffic signs represented with grayvalues higher than the background. The denominator of Equation 7 is neglected because it remains the same for all classes. The resultant suppressed background image is a good candidate to extract sign of interest. Maximally stable blobs [11] are then retained and are then fed to the superclass recognition step. Fig. 4 shows the processed images obtained as a result of various steps in the detection procedure. Fig. 4(a) represents an input image from GTSDB dataset [20]. Fig. 4(b) shows the traffic sign enhanced image computed using Equation 7 and Fig. 4(c) represents its intensity surface plot. It can be seen that in the surface plot the pixels corresponding to the traffic sign take significantly higher grayvalues than the background pixels which makes segmentation task easy. Fig. 4(d) shows the segmented image as a result of the search for maximally stable blobs.

(a) A sample image taken from GTSDB database. (b) Image corresponding to posterior probability as a result of applying Equation (7) after some post-processing. (c) Surface map of the image in b showing that pixels corresponding to the traffic sign take comparatively larger gray values. (d) Result of applying repeated thresholding and keeping track of the most stable shapes.
This procedure includes identifying superclass of the blob taken from the output of segmentation process followed by recognizing its contents. Fig. 5 presents the internal blocks related to both above mentioned sub-procedures. The following subsections provide details of all sub-blocks. Recognizing super class i.e. circular, triangular etc. is the first recognition related step. A number of techniques have been reported in the literature. Block diagram of our proposed technique is shown in Fig. 6. A low resolution representation of the input blob is computed and is fed to an artificial neural network to detect the superclass i.e., triangle, inverted triangle or circle. Low resolution models of the shape used in literature make use of interpolation that incurs loss of information. But in this work, we make use of energy compaction property of discrete cosine transform (DCT) to ensure least information loss. To the best of authors’ knowledge, his is the first method to use discrete cosine transform for this purpose. The low resolution version, in our work, is obtained by applying discrete cosine transform on a rectangular grid [36–38] and retaining only a few low frequency coefficients by scanning each cell in famous zig zag fashion [24] used for data compression.

Block diagram of our proposed recognition method.

Block diagram of super class identification procedure using low resolution representation of segmented blobs processed using artificial neural network trained using extreme learning machine.
Equation 8 mentions how discrete cosine transform (DCT) is computed on blob
Artificial neural networks are good at classifying low resolution digits. To classify the shapes of the traffic signs (i.e., super class), we use a network with one hidden layer and trained using extreme learning machine (ELM) technique [13] with RBF kernel. ELM is a smart technique in which the input weights are randomly assigned and the output weights are computed analytically without using a gradient descent algorithm [13]. The architecture of the network is shown in Fig. 6, number of input neurons is equal to the size of shape descriptor obtained as a result of the operation mentioned in Equation 9, number of hidden layer neurons are arbitrarily chosen and the output neurons are set equal to three. The blob declared as being a circle, triangle or inverted triangle with a certain minimum probability is passed to the next stage, otherwise it is considered as a background object and is neglected.
To extract contents, we propose a hybrid spatial and frequency domain descriptor based on circular HOG (CHOG) and steered DCT (SDCT). HOG is a famous shape descriptor presented by [25] for pedestrian detection initially and then was widely used in computer vision and biomedical image analysis [11]. HOG at multiple rectangular spatial scales was proposed by [3] and was applied by [4] for traffic signs superclass identification i.e. triangular or circular etc. In our work, CHOG is applied after dividing an image in various blocks each in a certain range of radius and angle assuming center of the image as origin. As shown in Fig. 7, we propose a unique strategy of applying pyramidal features in a grid. The radial distance from the center r and the angle φ are two variables, we first keep ranges of radial distance fixed and increase the sizes of blocks by increasing ranges of angles only, then we do vice versa and finally increase both. In our work, we tried five bins for r and eight bins for φ. Feature extraction algorithm is applied on each of these schemes and the feature vectors are concatenated.

An explanation of our proposed feature extraction approach, (a) Starting with an initial setting of r and φ (b) r is kept constant and no. of φ bins are increased (c) φ bins are constant whereas r bins are increased and finally (d) both bins are increased. The descriptors, computed at each level are finally concatenated.
Second part of the feature vector is computed by exploiting famous energy compact property of discrete cosine transform. This transform has been largely used for data compression in addition to its use as a feature extraction technique in various biomedical image processing applications [36, 38]. Numerical formula to compute steered discrete cosine transform on a two dimensional sign image
Feature vectors obtained as a result of applying SDCT and CHOG are finally concatenated in an interleaved style to form a hybrid multiscale circular feature descriptor as shown in Fig. 8. Applying SDCT and CHOG at multiple scales facilitates exploiting various features. Equation 11 describes that the shape descriptor

Graphical representation of how a shape descriptor is obtained as a result of concatenation performed at various spatial scales.
Since the portion outside the border containing the traffic sign is least important, therefore, the final shape descriptor (
An illustration of radial distances from the center of the shape is given in Fig. 9 for different values of shifting parameter C, the scaling parameter A is kept fixed. Center of the shape is considered as the origin; this ensures that background items are at a large radial distance and hence are given less importance. We see that the area around the center to be emphasized is adjusted by chosing an appropriate value of parameter C.

Distance function used to apply weights to the shape descriptor.
The hybrid descriptor obtained as a result of spatio-frequency feature extraction is detailed and contains both irrelevant and redundant information in addition to useful data. To obtain a subset of the most relevant and least redundant features, we use feature interaction based feature selection [14, 42]. To the best of our knowledge, this technique has never been used for traffic sign recognition applications before. The feature selection algorithm proceeds as follows: Class relevance computed between a feature f and class Top ranked feature is transferred from Feature with the next highest class relevance ( Step 3 is repeated for all remaining features in In order to add Z number of most relevant and least redundant features to the subset
An expression for exact computation of joint mutual information I (
Each of these member terms can be written in terms of joint entropy between
It is noteworthy that since H (
Datasets and test bench
German traffic sign detection benchmark (GTSDB) [20] is currently the most widely used dataset for testing and validation of TSDR algorithms [9, 45]. The dataset contains 900 images from 43 classes of European traffic signs both with red rim and filled with blue color. We used 30 classes for the purpose of our experiments out of which some samples are given in Fig. 10(a).

Samples from GTSDB and our self collected datasets.
As a second dataset, we collected 1500 images corresponding to 17 different classes from N5 national highway in Pakistan [46]. It is an over 1800 km long road having traffic signs presented with a variety of deteriorations. The dataset is available online 1 [ObmZCTHFwZjA?usp=sharing] and a few samples from the dataset are shown in Fig. 10(b).
We randomly picked a number of samples from GTSDB and our self collected datasets and introduced occlusions manually. The percentage of occlusion varies from 5% to 50% and is placed randomly on the right, left, top, bottom or in the center of the board containing the sign. This synthetic dataset contains occluded images of 20 classes presented by 600 images.
Extreme Learning Machine (ELM) classifier was trained on low resolution versions of broad shapes of the targeted traffic signs. Input neurons were set equal to the number of pixels in the image being presented and the number of output neurons equal to the classes. A blob is considered as a background item if the score of all classes are below a certain threshold. To read contents of traffic signs, support vector machine (SVM) classifier was trained on the training data and was tested on images reserved for the purpose. Since SVM is a two class classifier by default, its implementation given in LibSVM [16] was used to carry out all experiments. LibSVM is originally written in C language and is a famous and widely used implementation of multi class SVM classifier in research literature [1, 26].
Since our proposed algorithm presents a novel method to detect traffic signs in natural scenes (the parameters were estimated solely on the basis of available training data), its comparison with other detection techniques is interesting. This is provided in Table 2 where the detection portion of our proposed algorithm is compared with four other methods. The first method to compare [3] uses thresholding in HSI color space, the second method [17] also uses HSI thresholding but with different limits in each channel, the third method [11] uses normalized RGB in which each channel in original RGB image is divided by the sum of R, G and B values and the last method [7] uses a variation of Ohta transform [21] obtained by dividing the color channels in Ohta space by the sum of R, G and B values. Table 2 shows that our proposed method achieves the highest detection rate for all three datasets. Normalized Ohta based detection [7] is the next closes technique whereas thresholding in HSI domain with fixed limits is the worst choice.
Comparison of detection accuracies on all datasets.
Comparison of detection accuracies on all datasets.
To carry out experiments for recognition, 30 classes of traffic signs were selected for GTSDB, 17 for our self collected and 20 classes for manually occluded datasets. 70% of the available data in each dataset was used for training and the remaining 30% was reserved for testing.
A comparison of precision and recall for various TSDR methods on GTSDB, our self collected and manually occluded datasets is given in Table 3. The first method to compare is a HOG and SVM method on color enhanced grayscale version of the input image presented in [11]. The method is state of the art and achieves fairly high values of precision and recall. The second method [9] is a pyramidal HOG based technique where HOG is applied in a predefined pyramid after some preprocessing, i.e., applying a suitable edge detection filter. Although this method was originally proposed to detect super class of the traffic sign, we used it for the purpose of identifying the contents. The third method [7] presents a color probability map technique classified with computationally expensive convolution neural network but claiming that their complete system runs faster than many similar methods. The method presents a low cost computational model and achieves a good detection accuracy. The last method in Table 3 proposes using HOG on Hue, Saturation, Intensity (HSI) color space [17]. HOG features are computed for each channel and are then concatenated. Table 3 shows that for GTSDB and our self collected datasets, our proposed method achieves the highest precision and recall values. For manually occluded dataset, it gets the highest recall.
Precision and Recall values of all three datasets compared with four other techniques
Table 4 provides precision and recall values for individual signs taken from GTSDB dataset. The table provides comparison of our proposed method with two other techniques presented in [11] and [9]. The comparison includes 8 mandatory speed limit signs and 10 prohibitory signs. Our proposed method exhibits its superiority by achieving the highest precision and recall values for most of the signs. However, the methods given by [11] and [9] achieve the best recall values only for a few signs. Table 5 shows precision and recall values of all 17 types of traffic signs on our self collected dataset. Our proposed method outperforms the other two competing methods for all types of traffic signs.
Recognition precision and recall on GTSDB [20] Dataset
Recognition precision and recall on self collected real world dataset
Figure 11 shows error rates of various methods plotted versus increasing sizes of subsets of feature vectors on GTSDB dataset. Subsets were computed using approximate joint mutual information maximization technique described in Section 3. The same technique was also applied on HOG+SVM method described in [11]. It can be seen that our proposed method achieves lower error rate with less number of features. Minimum error rate achieved by our proposed method is by using a subset of approximately 1500 featues and is valued around 0.03. The method presented in [11], however, exhibits the least error (around 0.04) using 1764 features of the training data. Fig. 11(b) describes the same comparison on our self collected dataset and our proposed method achieves the least error rate with around 1700 features.

Size of feature vector versus error rates.
On the third dataset, obtained by collecting samples from both GTSDB and our collected dataset, experiments were performed for manually crafted partial occlusions. Fig. 12 shows error rate of three different methods versus these partial occlusions on our third dataset. All three competing methods are claimed to be robust against partial occlusions. Our proposed method was compared with the method presented in [11] and was found better for signs occluded up to 45%. However, for occlusions beyond that the method in [11] performs slightly better. The other method presented by Fleyeh et al. [3] achieved the highest error rate for all tested occlusions.

Finally, in order to eliminate the doubt that the improvement in accuracy achieved, by applying our proposed feature selection method on long shape descriptor, was by chance, a statistical test, i.e., paired t-test was performed on all three datasets. Recognition accuracy on test sets of all three datasets before and after applying feature selection algorithm was determined separately. Difference of the recognition accuracies for each test sample was calculated and finally mean m
d
and standard deviation s
d
of difference array were determined. Test statistic was calculated using the formula
Statistical analysis of feature selection method
Finally, Fig. 13 shows examples of traffic signs successfully detected using our proposed technique taken from our self collected dataset. Fig. 13(a) show a bridge ahead warning sign found on N5 highway between two major cities whereas the sign shown in Fig. 13(b) is a an SL100 sign detected on the same highway.

Sample detected signs from our self collected dataset.
A novel method for automatic detection and recognition of traffic signs was tested on three different datasets and was compared with a number of recent and famous methods. Our proposed algorithm involves a number of steps having high computational cost such as: Adjusting coefficients of matrix Adjusting maximum likelihood estimate based parameters of color classes of interest Running joint mutual information based feature selection algorithm on long shape descriptor. The operation is time consuming because it involves a number of steps requiring computing joint probability between two and three variables.
Fortunately, all these steps are executed on training data only once. In other words, tuning of parameters and execution of feature selection algorithm, the time consuming processes, are complete before the first test instance is presented to the system. A comparison of time taken to train our proposed system and other state of the art methods on both GTSDB and our self collected datasets is shown in Fig. 14(a). Results are based on running the proposed system on Pentium 4, core i7 computer with 8GB RAM. It can be seen that the training time of our proposed method is the largest among all. This is because the parameters of our proposed color space were set by undergoing certain iterations. Moreover, since the data belonging to the shape descriptor consists of multivalued integers, applying feature selection algorithm on this data requires computing joint entropy terms. The formula to compute joint mutual information requires computing joint probability terms which are pretty complicated especially when the variables have more than two values [47, 48].

A comparison of execution speeds of various methods for detection and recognition.
A similar comparison of average prediction time taken by a single test instance is provided in Fig. 14(b). Our proposed method is found taking only 8 ms and 10 ms extra time compared to the fastest method i.e., [3] for the self collected and the GTSDB datasets respectively. That is not a great deal, because the proposed method offers at least more than 5% accuracy compared to the fastest method [3] in the competition for both datasets. Further, it is also mentioned in Fig. 14(b) that for the self collected dataset, the proposed method is not the slowest on test data. For both datasets, our proposed method is the most accurate; it achieved 3% higher accuracy by being less than 5ms slower compared to the second most accurate method i.e., [7]. To summarize, the proposed method offers better accuracy on test data by making an insignificant compromise on execution speed.
In this paper, we present a novel traffic sign detection and recognition method. Detection is based on segmenting a candidate traffic sign in uncontrolled natural environment. Optimum separation among classes of colors of interest and the background is achieved by quickly solving an optimization problem with the help of set heuristics. The most stable shapes are then accepted as potential traffic sign. Superclass is identified based on testing a pretrained ANN on low resolution versions of the desired shapes.
For recognition, we have proposed multiscale feature extraction, based on spatial and frequency analysis of the templates utilizing multiscale versions of circular HOG and steerable DCT. To reduce the large dimensional shape feature descriptor, joint mutual information based feature selection is proposed to obtain the most representative subset of features. A linear kernel SVM classifier is then trained on training data (70%) and is tested on the with-held instances (30%).
For experiments we used three datasets (1) German traffic sign detection benchmark (GTSDB) (2) our self collected dataset from N5 highway in Pakistan and (3) manually occluded samples from each of these two datasets. The results show that our proposed method was found by at least 5% more accurate compared to the other state of the art and recent methods at the cost of only an insignificant additional computational cost.
