Collaborative representation based classification for vehicle recognition in acoustic sensor networks

Abstract

Sparse Representation Classification has led to state-of-the-art results in pattern classification tasks. However, as Sparse Representation Classification has significantly high lower complexity, and vehicle recognition is a typical small-sample-size problem and trained dictionary is under-complete, all these give rise to big representation errors and unstable recognition results. In this paper, we develop a new Collaborative Representation based vehicle recognition framework, using acoustic sensor networks to reduce the time complexity in the training and testing phases, and to improve the classification accuracy in complex scenes. In the recognition, the acoustic signals of vehicles are extracted from the acoustic information to get linearly separable samples by Fast Fourier Transform, and then we encode a testing sample through linear combination of all the training samples with regularized least square and classify the testing sample into the class with the minimum representation error. As demonstrated by experimental results, the proposed method has the following two unique and important characteristics: (1) it achieves a superior performance under the circumstance of complex data sets (2) It also shows highly competitive recognition accuracy while has low computational complexity and memory requirements, compared to k-Nearest Neighbor, Support Vector Machines and Sparse Representation Classification algorithm.

Keywords

Collaborative representation reconstructing error vehicle recognition acoustic sensor networks

1. Introduction

Target recognition of moving objects by using acoustic sensor networks is an important task with many envisioned applications, and the vehicle classification in acoustic sensor networks is a typical example of the pattern classification theory [1]. Eom in paper [2] claimed that many features of vehicles can be inferred from the sounds they generate, so it is feasible to classify the type of the moving vehicle based on the acoustic signals in complex scenes. Duarte et al. in paper [3] compiled a data set which contained time series data of the target vehicle classes, including Assault Amphibian Vehicle (AAV) and Dragon Wagon (DW) at a running rate of 4960 Hz and implemented vehicle type classification in Wireless Sensor Networks (WSNs) environment. He also detailed the data collection procedure, the feature extraction and the pre-processing steps, and accomplished the task of classifying the type of moving vehicles in distributed networks by using the maximum likelihood classifier based on the multi-dimensional frequency spectrum features of acoustic signals.

In recent years, various classification methods have been proposed in pattern recognition field to adapt to different situations and improve the recognition rate, such as k-Nearest Neighbor (k-NN) [12, 15], Support Vector Machines (SVM) [5, 13, 14] and Sparse Representation Classification (SRC) [6, 7, 8].

Among all of these methods, sparse representation theory which assumes that a data sample can be represented as a sparse linear combination of some basic elements in a dictionary is receiving more and more attention in solving regression and classification problems. Wright et al. [6]found that SRC performed well on face recognition and validated the robustness of SRC in noise and occlusion situations. Mei and Ling [7] proposed a robust visual tracking and vehicle classification approach using sparse representation and demonstrated its effectiveness on a vehicle tracking and classification task using outdoor infrared video sequences. Wang et al. [9] proposed a novel vehicle recognition framework in acoustic sensor networks via Sparse Representation (SR) with Mel-frequency Cepstral Coefficients (MFCC). Rahim et al. [16] proposed a homogeneous multi-classifier system for moving vehicle noise which improve the classification accuracy compared to single classifier.

Existing Sparse Representation based (SR-based) methods for vehicle recognition mentioned above suffer from high computational complexity and memory requirements. Zhang et al. [10] analyzed the working mechanism of SRC and pointed out that it was the collaborative representation but not the $l_{1}$ -norm sparsity that made SRC powerful. Based on the analysis, they has proposed a very simple and much more efficient classification scheme, called Collaborative Representation based Classification with Regularized Least Square (CRC_RLS) [10]. To address these issues, in this work, we propose to develop a new vehicle recognition method based on collaborative representation.

In this paper, the research focuses on the effect of vehicle recognition with different classification methods in acoustic sensor networks. The data set originates from the sensor data collected during a real world WSNs experiment carried out at Twenty-nine Palms, CA in November 2001 [3]. In the recognition, the acoustic of vehicles are extracted from the acoustic information to get linearly separable samples by Fast Fourier Transform (FFT), and then we code a testing sample by linear combination of all the training samples with regularized least square. The proposed method will classify the testing sample into the class with the minimum representation error.

The rest of the paper is organized as follows. In Section 2, we present classification models based on sparse representation and collaborative representation. In Section 3, we explain our vehicle recognition method based on collaborative representation. Experimental results are presented in Section 4. In Section 5, a conclusion is made.

2. Related works

In this section, we will briefly review SRC for vehicle recognition and then discuss the collaborative representation.

2.1 Sparse representation classification

Sparse representation is a signal processing method to represent the main information of the signal using non-zero coefficients as little as possible. To find the sparse representation of a signal, a formula can be established as follows [6]. $y=Dx$ , $D$ is a $M\times N$ matrix which contains the elements of the over-complete dictionary of the signal $y\in{\bf R}^{M}$ . If the signal $y$ can be represented sparsely, there must be a $N\times 1$ matrix $x$ which is a sparse vector. As a result, the goal is to find an appropriate $x$ which has the least non-zero coefficients. Therefore, it can be transformed to an optimization problem,

$\displaystyle\mathop{\min}\limits_{x}\left\|x\right\|_{0}\quad\quad s.\;t.\;\;% y=Dx$ (1)

where $\left\|x\right\|_{0}$ is the $l_{0}$ norm which stands for the number of the non-zero coefficients of the matrix. While the Eq. (4) is a NP-hard due to its nature of combinational optimization, Candes proved that $l_{0}$ norm can be substituted by $l_{1}$ norm in Eq. (2) as an approximate solution if the solution of Eq. (1) is sparse enough.

$\displaystyle\mathop{\min}\limits_{x}\left\|x\right\|_{1}\quad\quad s.\;t.\;\;% y=Dx$ (2)

Sparse representation has been widely used in practice such as denoising and classification for its remarkable ability in simplifying signals. In practice, SRC has been proven to be an excellent classification algorithm in the fields of image processing, face recognition [6] and vehicle recognition [7], but we observe that it suffers from high computational complexity and memory requirements.

The procedure of the SRC algorithm can be described as follows. $D_{i}=\left[d_{i,1},d_{i,2},\ldots,d_{i,n}\right]\in{\bf R}^{n}$ is a vector, where $D_{i}$ contains the features of training sample which belong to the $i$ -th class, and $n$ means the amount of each kind of features. Suppose that the test sample $y\in{\rm{\bf R}}^{m}$ belongs to the $i$ -th class, then $y$ can be linearly represented by $D_{i}$ . So we establish $D=\left[{D_{1},D_{2},\ldots,D_{i}}\right]\in{\bf R}^{m\times n}\left({m% \leqslant n}\right)$ as the dictionary in Eq. (2) which contains $m$ classes and each class has $n$ samples.

The SRC Algorithm:

Normalize the columns of $D$ to have unit $l_{2}$ -norm. 2.

Code $y$ over $D$ via $l_{1}$ -norm minimization

$\displaystyle\hat{x}=\arg\min_{x}\left\{{\left\|{y-Dx}\right\|_{2}^{2}+\lambda% \left\|x\right\|_{1}}\right\}$

where $\lambda$ is a positive scalar.

Compute the residuals

$\displaystyle r_{i}=\left\|{y-D_{i}\hat{x}_{i}}\right\|_{2}$

$\hat{x}_{i}$ is the coefficient vector associated with class $i$ .

Output the predicted label value of $y$ :

$\displaystyle lable\left(y\right)=\arg\min_{i}\left\{{r_{i}}\right\}$

Through the dictionary $D$ which is also called the training matrix, we can easily recognize which class the test sample $y$ belongs to as long as we solve the problem in Eq. (2). Considering the modeling error, Eq. (2) is transformed to optimization problem about $J\left(x\right)$ as follows.

$\displaystyle\mathop{\min}\limits_{x}\;J\left(x\right)=\frac{1}{2}\left\|{y-Dx% }\right\|_{2}^{2}+\lambda\left\|x\right\|_{1}$ (3)

where the former part is the residual and $\lambda$ is a compromise weight. The procedures in detail of the sparse representation classification algorithm are summarized in Algorithm 1.

2.2 Collaborative Representation (CR)

From Algorithm 1, there are two key points in SRC. The first key point is that the coding vector of query sample $y$ is required to be sparse, and the second key point is that the coding of $y$ is performed collaboratively over the whole trained database $D$ instead of each subset $D_{i}$ [8].

In our discussion in Subsection 2.1, we assumed that there are enough trained samples for each class so that the dictionary $D_{i}$ is over-complete. Unfortunately, vehicle recognition is small-sample-size problem [8], and $D_{i}$ is under-complete in general. If we use $D_{i}$ to represent $y$ , the representation error can be big, even when $y$ is from $i$ -th class. Consequently, the classification will be unstable, whether the error $e_{i}$ or the sparse $\left\|{x_{i}}\right\|_{p}$ or both of them are used for decision making.

One obvious solution to solving this problem is to use more samples of $i$ -th class to represent $y$ . As is known to all, one fact in vehicle recognition is that acoustic signals of different classes share similarities. Some samples from other classes may be helpful to represent the testing sample with label $i$ . In SRC [6], this “lack-of-samples” problem is solved by taking the acoustic signals from all the other classes as the possible samples of each class, which means that it codes the testing signal $y$ collaboratively over the dictionary of all samples $D=\left[{D_{1},D_{2},\ldots,D_{i}}\right]$ under the $l_{1}$ -norm sparsity constraint [4].

Figure 1.

The framework of vehicle recognition.

After the CR with all classes, the classifier classifies $y$ individually. For the simplicity of analysis, let’s remove the $l_{1}$ -norm sparsity term in Eq. (3), and then the representation becomes a least square problem: $\hat{x}=\arg\min_{x}\left\{{\left\|{y-Dx}\right\|_{2}^{2}}\right\}$ . The associated representation $\hat{y}=\sum_{i=1}^{k}{D_{i}\hat{x}_{i}}$ is actually the perpendicular projection of $y$ onto the space spanned by $D$ . In SRC, the reconstruction error by each class $r_{i}=\left\|{y-D_{i}\hat{x}_{i}}\right\|_{2}^{2}$ is used for classification. It can be readily derived that $i=\left\|{y-D_{i}\hat{x}_{i}}\right\|_{2}^{2}=\left\|{y-\hat{y}}\right\|_{2}^{% 2}+\left\|{\hat{y}-D_{i}\hat{x}_{i}}\right\|_{2}^{2}$ . Obviously, it is the amount $r_{i}^{\ast}=\left\|{\hat{y}-D_{i}\hat{x}_{i}}\right\|_{2}^{2}$ that works for classification because $\left\|{y-\hat{y}}\right\|_{2}^{2}$ is a constant for all classes [8].

It was claimed in [8] that when we judge if $y$ belongs to class $i$ by using CR, a “double checking” makes the classification more effective and robust. In [6], the $l_{1}$ -norm sparsity constraint is imposed on $x$ to make the solution stable. However, it is not necessary to use the strong $l_{1}$ -norm to this end [8]. $l_{2}$ -norm can be used to regularize the solution and get a competitive performance with less computation complexity. In vehicle recognition, we can have similar classification results but with significantly lower complexity.

3. Proposed scheme

3.1 Vehicle recognition framework

We have developed a new vehicle recognition method based on collaborative representation which has low computational complexity and memory requirements. Figure 1 shows the block diagram of the proposed method. After receiving the raw acoustic signals from the sensors distributed in the monitoring zone, pre-processing is essential in order to pick up the useful events own the noise and some other uncertain conditions. Besides, FFT features will be extracted from the original signals. The proposed method will be used to recognize the vehicles according to the compressed features: FFT.

We will get the residuals by the collaborative representation. Finally, the recognition results can be obtained by the minimum residuals of each class.

In the test samples, multiple observations are captured by multiple heterogeneous sensors. In order to improve classification accuracy, we can exploit correlation between different sources. It is clear that voting scheme does not exploit the relationship between different sensors, so here we use a joint model from different sources via fusion in order to make a joint classification decision. Generally, an over-complete dictionary is learned to represent the test signals which we consider containing the whole information of the original signals. Therefore, our joint classification decision can be easily made by a learned over-complete dictionary.

3.2 Feature extraction

Acoustic signals on time domain always change quickly and appear to be unsteady. So many methods are proposed on frequency domain for feature extracting. An appropriate feature extraction method is important for classification. Herein, we use classical FFT algorithm to extract the acoustic features.

3.3 Collaborative Representation based Classification (CRC)

In this sub-section, we introduce collaborative representation algorithm to solve the vehicle recognition problem. Zhang et al. [10] pointed that it was collaborative representation, but not the $l_{1}$ -norm sparse constraint, that truly improved the face recognition performance. Although in SRC $l_{1}$ -norm is used to regularize the solution, it is not necessary to use $l_{1}$ -norm constraint.

According to [6], to regularize the solution and obtain a competitive performance with less computation complexity, we can use $l_{2}$ -norm to achieve these goals. As a result, a collaborative representation based classification with regularized least square algorithm is presented for vehicle recognition.

In order to collaboratively represent the query sample using $x$ with low computational burden [8], the objective function of the CRC with regularized least square is as follows:

$\displaystyle\hat{x}=\arg\min_{x}\left\{{\left\|{y-Dx}\right\|_{2}^{2}+\lambda% \left\|x\right\|_{2}^{2}}\right\}$ (4)

where $\lambda$ is the regularized parameter.

It can be seen in Eq. (3) that the collaborative represent part is least square ( $l_{2}$ -norm) constraint instead of $l_{1}$ -norm sparsity constraint. There are two constraints of the regularization: first, it makes the least square solution stable; second, it also introduces a certain amount of sparsity to the solution $\hat{x}$ , yet this sparsity is much weaker than that by $l_{1}$ -norm. The solution of the collaborative representation with regularized least square can be easily and analytically derived as

$\displaystyle\hat{x}=\left({D^{T}D+\lambda I}\right)^{-1}D^{T}y$ (5)

let $P=\left({D^{T}D+\lambda I}\right)^{-1}D^{T}$ . Clearly, $P$ is independent of $y$ so that it can be pre-calculated as a pro-jection matrix. Once a query sample y comes, we can just simply project $y$ onto $P$ via $P y$ .

3.4 Summary of algorithm

The proposed method for vehicle recognition based on collaborative representation is summarized in the following table:

Summary:

1.
Use Constant False Alarm Rate (CFAR) [3] detection algorithm to draw the useful events in raw time series data. 2.
Use FFT to extract multiple dimensional features of raw time series data.
3.
Sort the FFT features randomly, and choose the training sets $D$ in different sizes to normalize the columns of $D$ to be unit $l_{2}$ -norm.
4.
Choose testing sets $y$ randomly in different sizes.
5.
Code testing sets $y$ over training sets $D$ by

$\displaystyle\hat{x}=\left({D^{T}D+\lambda I}\right)^{-1}D^{T}y$

where $\lambda$ is a regularization parameter and empirical value
6.
Compute the regularized residuals of testing sets $y$

$\displaystyle r_{i}=\frac{\left\|{y-D_{i}\hat{x}_{i}}\right\|_{2}}{\left\|{% \hat{x}_{i}}\right\|_{2}}$

where $\hat{x}_{i}$ is the coefficient vector associated with class $i$ .
7.
Output the predicted label value:

$\displaystyle\textit{label}\left(y\right)=\arg\min_{i}\left\{{r_{i}}\right\}$

4. Experiment results

We evaluate the performance of the proposed method on the data set which is extracted based on the sensor data collected during a real word Wireless Distributed Sensor Networks (WDSN) experiment carried out at Twenty-nine Palms, CA in November 2001 [3]. This data set is available at website on http://www.ecs.umass.edu/∼mduarte/Software.html. It contains the acoustic and infrared information of two kinds of military vehicles, AAV and DW. The raw time series data were observed by twenty-three sensor nodes distributed around three pre-set running routes as shown in Fig. 2. The sensor field covering about 900 $\times$ 300 m ${}^{2}$ consists of an east-west road, a south-north road and an intersection area.

Figure 2.

Sensor field layout.

We implement the proposed algorithm in last Section with MATLAB2013a and run it on an Intel Core (TM) i5-2401 M 2.30 GHzPC with 4-GB RAM.

Table 1

The results of classification through different method

Training sizes	Recognition rates (%)
	K-NN	SVM	SRC	CRC
20	71.06	81.16	84.10	85.77
30	73.57	81.78	85.55	88.06
40	74.69	81.45	85.27	89.02
50	75.35	82.23	84.67	89.38
60	76.33	82.66	84.21	89.71
70	76.37	83.55	84.80	89.92

Table 2

Computational cost of classification through different methods

Training sizes	Computational costing (s)
	K-NN	SVM	SRC	CRC
20	0.014	0.019	5.593	0.050
30	0.018	0.026	11.756	0.064
40	0.016	0.032	19.022	0.072
50	0.014	0.038	26.337	0.072
60	0.012	0.042	31.596	0.066
70	0.009	0.043	31.803	0.054

Figure 3.

Recognition accuracy by different methods.

In this experiment, we just consider the acoustic data recorded at a rate of 4960 Hz. Firstly, we choose the data collected from the third to eleventh runs (AAV3 $\sim$ AAV11 and DW3 $\sim$ DW11) as the data source to assessment and classification methods. Secondly, to draw the useful events in raw time series data, we use CFAR detection algorithm [3] which is able to mark times with high energy values and preprocess them. Thirdly, the pre-processed events in time serials are then used to extract multiple dimensional features by FFT.

Figure 4.

Time costing of vehicle recognition by CRC, SVM and K-NN.

After feature extraction, the features are sent to the proposed classification method for solving the vehicle recognition problem. There are 90 sample sets for each vehicle totally. Specially, the files used for the test are different from the files used for the training, so we choose feature vectors as training samples randomly and the rest as testing samples.

At the same time, some other methods: K-NN, SVM and SRC are also worked as references to the proposed method. Among them, since kin K-NN equals to 5 ( $k=$ 5), optimization problem of SVM can be solved by using LIBSVM software package [11] , so the sparse level in SRC [7] is 0.7 and we set $\lambda$ as 512 in proposed method.

Figure 5.

Computational cost of vehicle recognition by SRC.

To achieve more reliable vehicle recognition results, we have repeated the test for 100 times of each method in the same condition. For the numerous simulations, we are confident to believe that the results are concluded from the reality. First, we study the recognition accuracy of CRC with different size of training samples. Table 1 and Fig. 2 illustrate the vehicle recognition rates for various sizes of training samples with different classification methods: SRC, CRC, SVM and K-NN.

Table 1 and Fig. 3 show that the proposed method lead to an improved performance of vehicle recognition in acoustic filed. Because of the low complexity of the proposed method, there is a big gap of the classification results between CRC and SRC. Obviously, the recognition rates of the proposed method with different size of training samples are better than SRC. Moreover, the proposed method outperformed SRC in terms of CPU running time.

Table 2, Figs 4 and 5 show the computational cost of different methods for vehicle recognition. By observing and analyzing Tables 1 and 2, we discover that, although the recognition effects of different training sizes have large differences, the proposed method yields about 1–5 percentage improvements of the recognition rates and shorter running time, compared to SRC. Generally speaking, the performances of the proposed method in recognition for acoustic signals are not outstanding. However, the shorter running time is the greatest strength over SRC.

It can be obviously seen from Fig. 4 that the computational cost of SRC algorithm enlarges rapidly when the training sizes increase. Fortunately, the proposed method takes very short time for recognition no matter how large the training sizes are. Benefiting from the low complexity of the proposed method, a real-time practical and effective system for vehicle recognition will be established easier.

Judging from the computational cost and the improved performance of vehicle recognition, we can conclude that the proposed method performs better than other three methods. In the practical application, we can achieve comparable recognition quality to some traditional methods.

5. Conclusion

In this work, we have developed a vehicle recognition method based on collaborative representation, and the proposed method solves the vehicle recognition problem in acoustic sensor networks. Results of experiments demonstrate that the proposed method has higher recognition rates with different size of testing samples when compared to other recognition method. Furthermore, the proposed method has lower computational complexity, and computational cost. In future, we will further study the kernel collaborative representation classification which maps the features into a higher dimensional space to solve the non-linear classification problems in acoustic field. Meanwhile, we will also develop some feature selection methods to enhance the efficiency of vehicle recognition in acoustic sensor networks.

Footnotes

Acknowledgments

This research was supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LY14F030007, and National Natural Science Foundation of China (NSFC) under Grant No. 61301027, 61771299.

References

Liu

Wang

Liu

and Li

, Does wireless sensor network scale? A measurement study on GreenOrbs, IEEE Transactions on Parallel and Distributed Systems 52(10) (Oct. 2013), 1983–1993.

Eom

K.B.

, Analysis of acoustic signatures from moving vehicles using time-varying autoregressive models, Multidimensional Systems and Signal Processing 52(5) (Oct. 1999), 357–37.

Duarte

M.F.

and Hen Hu

, Vehicle classification in distributed sensor networks, Journal of Parallel and Distributed Computing 64(7) (Oct. 2004), 826–838.

Wright

Mairial

Sapiro

Huang

and Yan

, Sparse representation for computer vision and pattern recognition, Pro of IEEE, Special Issue on Applications of Compressive Sensing & Sparse Representation 98(6) (2010), 1031–1044.

Zhu

Y.W.

, Seismic facies classification based on the improved transductive support vector machine, Journal of Computational Methods in Sciences and Engineering 15(4) (2015), 677–684.

Wright

Yang

A.Y.

and Ganesh

, Robust face recognition via sparse representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 31(4) (Mar. 2009), 210–227.

Mei

and Ling

H.B.

, Robust visual tracking and vehicle classification via sparse representation, IEEE Transactions on Pattern analysis and Machine Intelligence 33(11) (Nov. 2011), 2259–2272.

Cui

, Class-dependent sparse representation classifier for robust hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing 53(5) (May. 2015), 2683–2695.

Wang

and Feng

, Vehicle recognition in acoustic sensor networks via sparse representation, in: Multimedia and Expo Workshops (ICMEW), 2014 IEEE International Conference on, 2014, pp. 1–4.

10.

Zhang

Yang

and Feng

, Sparse Representation or Collaborative Representation: Which Helps Face Recognition? Proc. IEEE Int’l Conf. Computer Vision, 2011, pp. 471–478.

11.

Hsu

C.W.

Chang

C.C.

and Lin

C.J.

, A practical guide to support vector classification, Technical report, Department of Computer Science, National Taiwan University, 2003.

12.

Komai

Sasaki

Hara

and Nishio

, k Nearest neighbor search for location-dependent sensor data in manets, IEEE Access 3 (2015), 942–954.

13.

Baser

and Apaydin

, Hybrid fuzzy support vector regression analysis, Journal of Intelligent & Fuzzy Systems 28(5) (2015), 2037–2045.

14.

Aida-zade

Xocayev

and Rustamov

, Speech recognition using Support Vector Machines, in: 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT), Baku, 2016, pp. 1–4.

15.

Zhang

and Song

, Predicting the number of nearest neighbors for the k-NN classification algorithm, Intelligent Data Analysis 18(3) (2014), 449–464.

16.

Rahim

N.A.

Paulraj

M.P.

Adom

A.H.

et al., Homogeneous multi-classifier system for moving vehicles noise classification based on multilayer perceptron, Journal of Intelligent & Fuzzy Systems 26(1) (2014), 421–430.