Automatic vehicle detection using spatial time frame and object based classification

Abstract

This paper presents an automated vehicle identification and classification method from traffic videos. The proposed method unlike other traditional methods combines the multiple time spatial frames to detect moving objects. These moving objects are the potential vehicles however there may be some other moving objects also. Therefore to further improve the accuracy of the proposed method, the moving objects are classified using object oriented classification scheme. The identification of vehicles from traffic videos plays an important role in Intelligent Transport systems (ITS). A virtual line is placed on each frame such that the objects crossing this line are the desired moving objects. The object based classifier makes use of fuzzy rules based on features like area, perimeter, and elongation and so on. These fuzzy rules are used to classify them into vehicle and non-vehicle classes. The second level of classification further classifies the vehicles into two wheeler, four wheeler and six wheeler vehicles. The method can be appropriately used for traffic surveillance as it also computes the speed of vehicles using the time spatial frames. The proposed method is applied on traffic videos of multiple time lengths. A comparative study of the proposed method with the existing methods reveals that the proposed work has higher accuracy. The motion detection, vehicle classification and speed of computation make this method best suited for many ITS applications like traffic surveillance and other similar applications.

Keywords

Vehicle detection multiple spatial time frames object oriented classification rule based method motion detection

1 Introduction

Computer vision based techniques have evolved as an efficient solution for video analysis. Monitoring vehicles is widely used for applications like detection of traffic violations, traffic monitoring and other such suspicious activities. The available methods require specialized camera views and cannot be used for wide area coverage. Also, the quality of surveillance videos obtained is generally poor. The range of acquisition conditions like occlusions, night time, changeable weather etc make the task more challenging. Thus, in this paper, all the above mentioned issues are addressed with the use of image processing techniques. Fatalities and accidents caused by traffic is a very serious and growing problem worldwide due to increasing usage of automobiles (National Highway Traffic Safety Administration, 2011). In the past decades, a lot of work has been done on designing an intelligent transportation system which can control the traffic and detect possibility of any kind of miss-happenings on the roads. The system should have the capability to automatically smooth the traffic movement and give automatic alerts for over-speeding, congestion etc. With the availability of CCTV cameras in 1950s designing of such systems has become cheaper and easier as you can get live streaming of all the movements on the roads. The videos obtained from these cameras can be used to design intelligent transportation systems which can be used for both real time monitoring and post event analysis for finding reasons of occurrence of an event.

The first and most important step in designing any Intelligent Transportation System is detecting and tracking the vehicles. After detection the information can be further used to design numerous transports surveillance related applications including detection of speed, congestion of vehicles, detection of suspicions activities etc. [1]. The approaches for detecting different vehicles on roads can be classified into three different categories including radar sensing [2, 3], lidar sensing [4, 5] and computer vision based detection [6, 7]. Performance of sensor based methods is highly dependent on quality and availability of controlling frequency signals. Also radar or lidar sensors are costly and sometimes fail to provide sufficient field-of-view required for detection. In this context, video based tracking techniques have immensely progressed in recent years as they do not depend on availability of network and such signals.

Vehicle detection methods using image processing techniques can be classified into two broad categories: appearance based and motion based techniques [8]. In appearance based methods, vehicles are detected directly from images applying some pixel operations. Some features of vehicles like length, height, aspect ratio, compact ratio, blob features etc may be used for type recognition purpose [9]. Motion based approaches on the other hand require sequence of images to recognize vehicles. Some of the motion estimation based techniques for detection include inter-frame difference method [10] Gaussian Scale Mixture model method [11] and background subtraction method [12, 13]. The most popular technique among all these is background subtraction method. In this technique, we estimate the background of the video. This background is then subtracted from subsequent frames of video to get foreground blobs which are nothing but moving vehicles. Background detection is the most challenging task in this technique as background of video may undergo changes due to jittering of camera, shadow, and change of illumination and presence of noise [14].

Some techniques based on deformable 3-D geometric models [15], virtual detection line [16] and Haar rectangular features for detection have also been proposed in literature [17]. Some detection and tracking methods based on active learning using networks and SVM is also proposed [18, 19]. Multi-time spatial images have also shown good results for detection and classification of vehicles [1, 14]. Detecting vehicles from traffic videos face some serious challenges like heterogeneous size, shape and color of the vehicles. Various acquisition conditions like illumination, background and scene complexity. Many methods in the literature have been proposed. But some of them do not work in cases of occlusion. Some other work well but are very time consuming and complex. The proposed method in this paper is based on Spatial Time Frame generations (STF) and object based classification. The use of STF helps in identifying moving objects only hence limiting the chances of miss detections. In existing time frame generation based methods classification is done by using different nearest neighbor classification methods after extracting features which increases complexity of the system. The vehicles to be identified are heterogeneous in nature. The features used should be shape and texture invariant. Thus, in this paper object based classification approach is used, which is simpler and better as compared to other state of the art methods. The method is applied on videos of available datasets [20, 21] and compared with other existing methods.

2 Proposed method

The proposed method is based on spatial time frame (STF) generation and Object based classification using fuzzy based rule approach. The method is novel method for classification as it has used spatial times frames for removal of occlusion which is a great concern in this area and also combined object based techniques with this which has reduced the complexity of the prior methods available. The flowchart of the proposed method is shown in Fig. 1. The various steps are discussed in the following sections.

Fig. 1

Flowchart of proposed method.

2.1 Frames extraction

Vehicles on road can be detected and tracked by recording the videos of traffic. The recording can be done by either CCTV cameras which can be placed and installed anywhere at some elevation or an ego vehicle. CCTV cameras capture the videos of fixed location while ego vehicles are moving vehicles having camera installed and capture videos. These cameras record videos in the form of digital frames or succession of still images. The total number of frames contained by video in one second is called frame rate of the video. Mostly videos are recorded at a frame rate of 25– 30 frames per second. Each video sequence can be described as given in Equation (1) $ϑ_{T} = {φ_{1}, φ_{2}, φ_{3}, . . ., φ_{n}}$ (1) Where

ϑ_T = Video Sequence recorded at T_th second. φ_n = n_th frame of video sequence.

2.2 Preprocessing

Noise free and high contrast images are requires for identification & classifications from videos. The videos recorded from CCTV cameras under different environmental and illumination conditions often contain some disturbances. All these disturbances are modeled as noise, shadows, occlusion, illumination and reflection. These disturbances need to be removed before further processing of these videos. The preprocessing steps are applied on HIS image thus the RGB image is first converted to HIS image. This is done because unlike RGB, HSI separates image intensity from color information. Also HSI components are not much affected by noise and indirect lightening from surroundings [22]. HSI frame from a RGB video frame is obtained using the following equations: $I = \frac{1}{3} (R + G + B)$ (2) $S = 1 - \frac{3}{R + G + B} min (R, G, B)$ (3) $H = {\begin{matrix} θ if B ⩽ G \\ 360^{0} - θ if B \geq G \end{matrix}$ (4)

Where $θ = {cos}^{- 1} {\frac{\frac{1}{2} [(R - G) + (R - B)]}{\sqrt{\begin{matrix} {(R - G)}^{2} + (R - B) + (G - B) \end{matrix}}}$ (5)

The shadow removal is applied on the HSI image. The videos recorded from different sources are noisy due to quality of image sensors and camera calibration. Dynamic nature of surveillance sequences is also a reason for noise generation. Due to presence of noise, a large area of frames obtained from these noisy video sequences may be misclassified as moving objects. Averaging the frames obtained is one of the possible solutions but it has limited de-noising effect and also results in blur. Gaussian low pass filters are the traditional models which were used mostly for removal of impulse noise in the digital images. But usage of this filter results in loss of some high frequency components in the image. As a result the edge and shape related detailed information may lose in the image. In this paper, Butterworth low pass noise removal filter is applied to each HSI frame of the video to remove noise using Equation (6). $B (x, y) = \frac{1}{1 + {[\frac{D (x, y)}{D_{0}}]}^{2 m}}$ (6) Where D₀ is the cut - off frequency and D (x, y)

$= \sqrt{x^{2} + y^{2}}$

Where (x, y) denotes the spatial location of the pixel.

2.3 Contrast enhancement

Contrast enhancement is the technique used to adjust the dynamic grey levels in the image. The process improves the quality of the image so that it can give better results in further feature extraction and recognition faces. We are using dynamic histogram equalization technique [23] to adjust the contrast of the preprocessed images. The method works in three steps. Steps include division of histogram, range allocation to each sub part and finally histogram equalization. The algorithm for the same is as follows:

Step 1: Divide the histogram into different parts such that no dominating portion is there as below.

Step 1.1: Apply one dimensional smoothing filter to get rid of minima.

Step 1.2: Take the portion of histogram that falls between first and last non-zero histogram components.

Step 1.3: Create sub-histograms taking first component in the range [m₀, m₁] second in range [m₁ + 1, m₂] and so on where m₀, m₁, …, m_n are (n + 1) grey levels belonging to n + 1 local minima.

Step 2: Allocate dynamic grey level ranges to each sub-part formed in step 1 for mapping using Equation 7

GL_i = k_i - k_i-1 (7)

where GL_i is the dynamic range of the i_th subpart of the input . k_i is the i_th local minima

{GL}_{\max} = \frac{{GL}_{i}}{\sum {GL}_{i}} \times (L - 1)

(8)

where L is the highest grey level GL_new is the new dynamic range of the output

Step 3: Apply histogram equalization on each of the sub-parts in the image.

The result of applying this algorithm on an image and their equivalent histograms are shown in Fig. 2.

Fig. 2

(a) Normal Image, (b) Corresponding Histogram of Normal Image, (c) Contrast Enhanced Image, (d) Histogram after Contrast Enhancement.

2.4 Spatial time frames generation

Spatial time frames are generated (STF) from the traffic videos for detecting moving vehicles. Multiple virtual lines [14] are used for generating these frames which are basically set of pixels in a frame taken in the direction perpendicular to the motion of vehicles as shown in Fig. 3.

Fig. 3

Virtual lines on consecutive frames in the video.

STF frame is generated by placing the pixel intensities of all the vehicles which pass through the virtual line in the chronological order as shown in Fig. 4. Thus the image is generated using the luminance value of pixels of the moving objects that pass the virtual line. Therefore if we are having n frames of size x × y in time interval of t then the resultant frame will be a n × x STF frame. This frame will contain different objects which pass through the virtual line in the time interval t.

Fig. 4

STF generated for 130 consecutive frames.

In the proposed method, three different virtual lines are used at different locations in the frame as shown in Fig. 5. The basic idea behind using multiple lines is occlusion detection. In certain scenarios, vehicles in STF will seem to be merged into each other resulting in occlusion. As the vehicles will be in motion they will appear disjoint in at least one of the three STFs.

Fig. 5

Multiple virtual lines.

As STF are generated using continuous pixel intensities passing VDL thus, horizontal length of objects will provide information regarding number of frames through which the object has passed and similarly vertical length can tell about their geometry. Therefore next step is to find all the object blobs in the STF. Vehicle Identification operator β used for vehicle detection is Top hat transform of the STF and is calculated as

$β = γ (T (x, y)) = (T (x, y) - T (x, y) o α)$ (9) $where γ = tophat transform of T (x, y)$

T (x, y) = STF frame

α = structuring element

Finally all the vehicle pixels along with pixels which are similar to the vehicle pixels are detected by applying a threshold value to the β calculated to get binary STF ∂_n. After finding ∂_n, all the connected components of STF frame are identified. Canny edge detector is used to detect fine details of objects in ∂_n [24]. The gradient components of the images in x and y direction are calculated as $\begin{matrix} G_{x} & = & (\sqrt{2 G} m [i + 1, j + 1] - m [i + 1, j] \\ + m [i, j + 1] - m [i, j]) \vec{i} \end{matrix}$ (10) $\begin{matrix} G_{y} & = & (\sqrt{2 G} m [i + 1, j + 1] - m [i + 1, j] \\ + m [i, j + 1] - m [i, j]) \vec{j} \end{matrix}$ (11)

Magnitude of the gradient is calculated as $M = \sqrt{G_{x}^{2} + G_{y}^{2}}$ (12)

Direction of the gradient is calculated as $θ = \arctan (\frac{G_{y}}{G_{x}})$ (13)

After calculating gradient non-maximum suppression is applied on the image to get correct edge positions. To remove the discontinuity in the edges, dilation morphology operation denoted as K_γ applied with a vertical line structuring element, where γ where γ > 1 is length of line in unit of pixels. This operation is followed by another morphology closing operation with a square structuring element St_μ, μ > 1 where μ is width and height of the square in unit of pixels. The choice of γ and μ may vary and is highly dependent on the type of videos, their resolution and size of vehicles on the road. As a result, final STFs of a video sequence will contain segmented blobs corresponding tsso moving objects, and total number of vehicles can easily be counted by counting number of blobs in STFs. Counting error may occur if objects are occluded with each other as shown in Fig. 6. The use of multiple STFs is that it helps in overcoming joint blobs. For example in the first and third STF, the blobs are occluded with each other and hence appear to be a single object. But in the second STF they appear disjointed due to continuous motion.

Fig. 6

Multiple STFs of the continuous frames (a) STF1 (b) STF2 (c) STF3.

For detection of possible merging a feature based technique is used [1]. As all traffic motion is taken in one direction and the virtual detection lines taken are also very close to each other, the y axis of the centroid and its area does not change significantly in different STFs. Therefore we have used area and centroid for comparison of the blobs generated in different frame sequences. The merging of blobs is decided based on the threshold values. If the dissimilarity is above the selected threshold then the blobs are merged. Else the blobs are not merged. The algorithm developed for merging the blobs is as follows:

Algorithm: Determining Blob Merging
Step 1: calculate centroid of y axis and area for n_th blob as $C_{n}^{y}$ and Ar_n respectively. Where n = 1,2,3 . . . n are the number of blobs generated in each STF.
Step 2: calculate dissimilarities as
$η_{1} = \frac{\| C_{l}^{y} - C_{n}^{y} \|}{\| C_{n}^{y} \|}$ (14)
$η_{2} = \frac{\| {Ar}^{l} - {Ar}^{n} \|}{\| {Ar}^{n} \|}$ (15)
where l, n denote the l_th and n_th blob and l ≠ n
Step 3: Compare with threshold values
η ₁ > Th_c and η₂ > Th_A
Step 4: If both of the conditions are true this implies blobs are merged otherwise not.

The threshold values for centroid and area are dependent on illumination and video resolution. In case there is high illumination conditions the shadows will affect the segmentation and thus high threshold values are taken. The values are selected based on the scenario used in the experiment.

2.4 Object based classification of vehicles using fuzzy rules

Object based classification (OB) using fuzzy logics are used to analyze the frames. Pixel based methods are based on information in each pixel. But to detect vehicles OB technique is more reliable as it works on the information from a set of similar pixels or objects [25]. The objects can be classified based on size, shape, and texture.

The image blobs obtained from STF correspond to connected regions representing different objects. Using OB classification approach thus objects are classified into vehicles using specific features. Vehicles are expected to be compact objects with rectangular shape that have a length and width within a certain size range and orientation parallel to the road. Thus different features that were expected to reflect these properties were tested. The vehicles are rectangular objects. As the perimeter will become large as compared to area, the object will be thinner which is characteristic of trees, buildings as they are thin. Hence in this paper, we have used shape oriented features to remove the misdetections and classify vehicles into three categories which are two wheeler (T_w), four wheeler (F_w) and six wheeler (S_w). Cars are covered in F_w class and trucks and buses are classified as S_w. Once the segmented STF images are obtained fuzzy logic based rules are applied for classification of vehicles.

Table 1
Features used for level 1and Level 2 classification

Level 1 Description: Feature Membership function Minimum Value Maximum Value

Class:

Non-Vehicles Objects that are non vehicles n/a n/a n/a n/a

Vehicles Objects that may be vehicles a) Rectangularity r ₁ r ₂

b) Compactness c ₁ c ₂

c) perimeter p ₁ p ₂

Level 2 class: Parent Vehicle

Vehicles Objects that are identified as vehicles are further classified Parent class as Vehicles

T_w Objects that are 2 wheelers a) Area a _2w1 a _2w2

b) elongation e _2w1 e _2w2

F_w Objects that are 4 wheelers or cars a) Area a _4w1 a _4w2

b) elongation e _4w1 e _4w2

S_w Objects that are 6wheelers for e.g. bus. a) Area a _6w1 a _6w2

b) elongation e _6w1 e _6w2

Level 1	Description:	Feature	Minimum Value	Maximum Value
Vehicles	Objects that may be vehicles	a) Rectangularity	r ₁	r ₂
		b) Compactness	c ₁	c ₂
		c) perimeter	p ₁	p ₂
Level 2 class: Parent Vehicle
Vehicles	Objects that are identified as vehicles are further classified	Parent class as Vehicles
T_w	Objects that are 2 wheelers	a) Area	a _2w1	a _2w2
		b) elongation	e _2w1	e _2w2
F_w	Objects that are 4 wheelers or cars	a) Area	a _4w1	a _4w2
		b) elongation	e _4w1	e _4w2
S_w	Objects that are 6wheelers for e.g. bus.	a) Area	a _6w1	a _6w2
		b) elongation	e _6w1	e _6w2

All the blobs needs to be divided into any of the four classes based on the calculation of likelihood of a blob to class defined with the help of membership. Memberships are calculated as valued returned by the fuzzy rules which are designed using different features or set of features. Following features of all the connected regions are calculated for designing the rule base. Elongation is used to determine how close the object to rectangular shape is. It is measures as ratio of area and square of perimeter. Image objects were classified using different membership functions and threshold values which are summarized in Table 1. The classification was done in two levels. In the first level, vehicles all non-vehicles are identified using three features Rectangularity, perimeter and compactness. In the second level, vehicles are further classified into T_w, F_w and S_w using two features area and elongation. Rule sets used for the classification are defined below.

Using the above parameters the rule set is defined as follows in Table 2.

Table 2

Rule set used for level 1and Level 2 classification

Rule Set for Level 1
If (r₁ ⩽ Rectangularity (ob_j) ⩽ r₂)
&& (c₁ ⩽ compactness (ob_j) ⩽ c₂)
&& (p₁ ⩽ Perimeter (ob_j) ⩽ p₂)
then ob_j ∈ W_v
else ob_j ∈ W_m
where W_v and W_m are vehicles and non-vehicle classes
r₁ and r₂ : Range of threshold rectangularity.
c₁ and c₂ : Range of threshold compactness.
p₁ and p₂ : Range of threshold perimeter.
Rule set for Level 2
If (a_2w1 ⩽ area (ob_j) ⩽ a_2w2)
&& (e_2w1 ⩽ elongation (ob_j) ⩽ e_2w2)
then ob_j ∈ F_w
else If (a_4w1 ⩽ area (ob_j) ⩽ a_4w2)
&& (e_4w1 ⩽ elongation (ob_j) ⩽ e_4w2)
then ob_j ∈ F_w
else If (a_6w1 ⩽ area (ob_j) ⩽ a_6w2)
&& (e_6w1 ⩽ elongation (ob_j) ⩽ e_6w2)
then ob_j ∈ S_w
where (a_2w1, a_2w2) , (a_4w1, a_4w2) and (a_6w1, a_6w2) are ranges of threshold area for 4W and 6W class respectively.
and (e_2w1, e_2w2), (e_4w1, e_4w2) and (e_6w1_,e_6w2) represent the range for threshold elongation for 4W and 6W class respectively.

2.4 Behavior analysis of detected vehicles

Once we have identified all the vehicles from the STF of the video we can analyze the behavior by calculating the speed of the detected vehicles. As STF is generated using pixels on the virtual line of each frame, x-axis length of the object blob in the STF will tell that through how many frames the vehicles has been passed. Let x - axis (OB_j) denotes this length. Let R_v be the resolution of the video in meters per pixel and F_r be the frame rate of the video than speed of the vehicles in meters per second may be calculated as $Speed ({OB}_{j}) = R_{v} \times F_{r} / x - axis ({OB}_{j})$ (16)

Once we have detected the speed of vehicles we can identify the vehicles which are over-speed by setting a threshold limit for the same.

3 Dataset description

Experiments are carried out using various publicly available video databases of on-road traffic. One of the databases contains video clips captured at different locations of Dhaka, Bangladesh and Suwon using a fixed camera [20]. The frame rate of the captured videos is 25 frames per second with a frame size of 176 × 144. Other dataset used is LISA-Q vehicle dataset [21]. The dataset contains 1600 consecutive frames in the videos captured in sunny light. The dataset consists of on road data that is taken per day and added to the archive into naturalistic driving and intelligent driver-assistance systems. The data taken consists of videos captured from six cameras. The front facing cameras contribute to the LISA-Q Front FOVdata sets. Some of the frames from both the datasets are shown in Fig. 7.

Fig. 7

Sample Frames from Datasets (A) First Dataset [20] (B) Second Dataset [21].

The dataset does not contain any validation datasets. However, for the evaluation procedure the numerical values of the vehicles are used to compute the performance metrics.

4 Experimental results

The generation of spatial time frames is done on Matlab 2017a. Thereafter the blobs are classified using ecognition software. The ecognition software is a development environment for performing object based image classification. It is used to design rule sets or networks and is very useful especially in change detections, object recognition and satellite image processing. The result of various steps of the proposed method for a STF of the video is shown in Fig. 8. The structuring element used for the generation is disc with a radius of size 15. The threshold values used for the object classification are summarized in Table 3.

Fig. 8

Results of various steps of proposed method using first data set [20] for incoming vehicle (a) RGB Frame, (b) HSI Frame, (c) Preprocessed Frame, (d) STF Generation, (e) Binary Top-Hat transform of STF, (f) Canny edge detection. (Bottom row): (g) Morphology operation and object detection, (h) Final detection and classification of vehicles.

Table 3

Threshold values used for classification

Feature	Minimum value	Maximum value
Rectangularity	r₁ = 0.7	r₂ = 1
Compactness	c₁ = 1.2	c₂ = 1.4
Perimeter	p₁ = 4	p₂ = 7
Area	a_2w1 = 40	a_2w2 = 80
	a_4w1 = 100	a_4w2 = 180
	a_6w1 = 181	a_6w2 = 220
Elongation	e_2w1 = 2	e_2w2 = 3.5
	e_4w1 = 4.5	e_4w2 = 6.5
	e_6w1 = 6	e_2w2 = 9

Speed of the detected vehicles in STF is shown in Table 4. Some more results showing the input binary STF, one level classification and two level classifications are also shown in Fig. 9.

Fig. 9

Output examples from both datasets (First Column): STF input. (Second column): Binay STF. (Third column): Level 1 classification. (Forth column): Final Classification.

Table 4

Speed of vehicles

Object No	Speed m/s	Speed km/hr
1	29.3	105.4
2	21.8	78.4
3	24.5	88.2
4	25.3	91

Table 5

Comparative analysis with other methods

METHOD	TDR	FDR
BDM [13]	75.5%	38%
ALR[17]	80.2%	41.7%
HFD [17]	91 %	25%
PMD1	97.2%	12.4%
PMD2	97.8%	17%

5 Quantitative analysis

The performance of the proposed method is quantitatively analyzed using two different metrics: true Detection rate (TDR) and False Detection rate (FDR).True Detection rate is calculated as ration of correctly identified vehicles and total number of vehicles. $TDR = \frac{Correctly identified vehicles}{Total number of vehicles}$ (17)

There may be some false detection in the frames as well. False detections are basically the object blobs which are not vehicles but identified as vehicles using proposed method. False detection rate is calculated as ration of such false detections and total number of detections. $FDR = \frac{False detections}{Total number of detections}$ (18)

The proposed system was also compared with different other method. It was observed that Background Detection Method (BDM) was giving low TDR and high FDR due to large number of misdetections and background of the videos is not always constant. Active Learning based methods (ALR) show better TDR as they keep on learning the changes in background. Proposed method was tested for two different data sets (PMD1 and PMD2).The comparative analysis is shown in Table 5. It can be seen that the proposed methods give better results as compared to other state of the art methods. Results are also summarized in the form of graphs in Figs. 10 and 11.

Fig. 10

Comparative analysis of TDR of different methods.

Fig. 11

Comparative analysis of FDRConclusion.

6 Conclusion

In this paper, we proposed an automatic vehicle detection method from video frames. The detection was carried out in two steps. In the first step frames from the videos were extracted and preprocessed for applying detection methods. After that multiple spatial time frames are formed from the preprocessed frames. In the second step, binary morphology transforms of the preprocessed frame were calculated and object oriented segmentation and classification was applied to detect and then classify vehicles from the resultant components. Various experimental and quantitative analyses were done to compare the proposed method with other methods proposed in literature. It is observed that the proposed method gives better false detection rate and also improved TDR. Also, the method is based on simple object oriented features and is not complex in nature.

References

Mithun

N.C.

, Howlader

and Rahman

S.M.

, Video based tracking of vehicles using multiple time-spatial images, Expert Systems with Applications (2016), 17–31.

Gunnarsson

, Svensson

, Danielsson

and Bengtsson

, Tracking vehicles using radar detections. In: Proceedings IEEE intelligent vehicles symposium, vol. 1. Istanbul, Turkey, 2007, pp: 296–302.

Tokoro

, Kuroda

, Kawakubo

, Fujita

and Fujinami

, Electronically scanned millimeter-wave radar for pre-cash safety and adaptive cruise control system, In: Proceedings IEEE intelligent vevhicles symposium, 2003, vol. 1. Columbus, pp: 304–309.

Premebida

, Monterio

, Nunes

and Peixoto

, A lidar and vision based approach for pedestrian and vehicle detection and tracking, In: Proceedings IEEE international conference intelligent transportation systems 1 (2007), 1044–1049.

Weigel

, Lindner

and Wanielik

, Vehicle tracking with lane assignement by camera and lidar sensor fusion, In: Prceedings IEEE intelligent vehicles symposium 1 (2009), 513–520.

Mandellos

N.A.

, Keramitsoglou

and Kiranoudis

C.T.

, A background subtraction algorithm for detecting and tracking vehicles, Expert system with applications 38(3) (2011), 1619–1631.

Sivaraman

and Trivedi

M.M.

, Looking at vehicles on road: A survey of vision-based vehicle detection, tracking and behavior analysis, IEEE transactions Intelligent Transportation Systems 14(4) (2013), 1773–1795.

Brunelli

, Template matching techniques in computer vision: Theory and practice (1st), 2009,Wiley&Sons, UK.

Boukerche

, et al., Automated Vehicle Detection and Classification: Models, Methods, and Techniques, ACM Computing Surveys 50(5) (2017), 62:1–62:39.

10.

Park

, Lee

and Park

, Video based detection of street-parking violation, In: Proceedings international conference Image Process, Computer vision, Pattern Recognition, Las Vegas, 2007, pp: 152–156.

11.

Zhang

, Dong

, Huang

and Tan

, EDA approach for model based localization and recognition of vehicles, In: Proceedings IEEE international conference, Computer vision and pattern recognition, Minneapolis, 2007, pp: 1–8.

12.

Xie

, Zhu

, Wang

, Xu

and Zhang

, Robust vehicles extraction in a video based intelligent transportation system, In: Proceedings IEEE international conference, Circuit systems, 2005, pp: 887–890.

13.

Cao

and Li

, Vehicle object detection of video images based on gray-scale characteristics, In: proceedings IEEE international conference. Educ. Technol. Computer Science, China, 2009, pp: 936–940.

14.

Mithun

N.C.

, Rashid

N.U.

and Rahman

S.M.M.

, Detection and classification of vehicles from video using Multiple Time-spatial Images, IEEE transactions on intelligent transportation systems 13(3) (2012), 1215–1225.

15.

Zhang

, Huang

C.K.

and Tan

, Real time moving object classification with automatic scene divison, In: Proceedings IEEE international conference Image Process, 2007, pp: 149–152.

16.

Anan

, Zhaoxuan

and Jintao

, Video vehicle detection algorithm based on virtual-line group, In: Proceedings IEEE Asia Pecofic conf., Circuit systems, Singapore, 2006, pp: 1148–1151.

17.

Sivraman

and Trivedi

M.M.

, A general Active-Learning framework for on-road vehicle recognition and tracking, IEEE transactions on intelligent transportation systems 11(2) (2010), 267–276.

18.

Sun

, Bebis

and Miller

, Monocular precrash vehicle detection: Features and classifiers, IEEE Trans Image Processing 15(7) (2006), 2019–2034.

19.

Sivraman

and Trivedi

M.M.

, Active learning based robust monocular vehicle detection for on-road safety systems, In: Proceedings IEEE international vehicles symposium (2009), pp: 399–404.

20.

https://mahbubur.buet.ac.bd/resources/DatabaseEBVT.htm

21.

Sivaraman

and Trivedi

M.M.

, “A General Active Learning Framework for On-road Vehicle Recognition and Tracking,” IEEE Transactions on Intelligent Transportation Systems, 2010. Available Online: http://cvrr.ucsd.edu/LISA/vehicledetection.html

22.

Singh

K.K.

, Pal

and Nigam

M.J.

, Shadow detection and removal from remote sensing images using NDI and morphological operators, International journal of computer applications 42(10) (2012), 37–40.

23.

Wadud

M.A.-A.

, et al., A Dynamic Histogram Equalization for Image Contrast Enhancement, IEEE Transactions on Consumer Electronics 53(2) (2007), 593–600.

24.

Rong

, Li

, Zhang

and Sun

, An Improved Canny Edge Detection Algorithm, In: Proceedings IEEE International Conference on Mechatronics and Automation, 2014, pp: 577–582.

25.

Tansley

, et al., Object-oriented classification of very high resolution airborne imagery for the extraction of hedgerows and field margincover in agricultural areas, Applied Geography 29 (2009), 145–157.