Abstract
Keywords
Introduction
Wireless sensor networks (WSNs) have gained much attention recently due to advances in micro-electro-mechanical systems (MEMS) technology which has facilitated the development of smart sensors. 1 WSNs consist of a large number of sensor nodes whose position need not be pre-determined. 2 The protocols and algorithms of WSNs have self-organization capabilities. At the same time, sensor nodes have the ability to carry out simple computations locally and transmit required data to nodes responsible for the fusion instead of sending raw data, which truly compress the power consumption of the networks. For the powerful self-organization and fault-tolerant capacity, a wide variety of applications are being envisioned for sensor networks, including habitat monitoring, health monitoring, battlefield surveillance, and target tracking.
Detection and classification of objects moving through the sensor field is an important task in many field applications. 3 For vehicle recognition, early work focused on military vehicle detection for battlefield surveillance. Nowadays, vehicle detection has become an important task for traffic monitoring and management. Recognition of different vehicle types, such as cars, motorbikes, buses, or trucks, provides detailed traffic statistics and useful information about road utilization. Since many characteristics of the vehicle can be inferred from the sound it generates, 4 it is feasible to recognize the type of the moving vehicle in acoustic sensor networks.4–7 However, how to improve the robustness of vehicle recognition algorithms from sensor networks within complex and noisy environments remains a critical and challenging issue in practice.
The problem of vehicle classification in acoustic sensor networks is essentially a pattern recognition problem. In recent years, various classification methods have been proposed in this field to adapt to different situations and improve the recognition rate, such as maximum likelihood (ML), k-nearest neighbor (k-NN), support vector machines (SVMs),2,8 and decision tree (DT) 9 methods. Wright et al. 10 found that sparse representation–based classification (SRC) performed well on face recognition, especially in noisy and cluttered environments. Mei and Ling. 11 proposed a robust visual tracking and vehicle classification approach using sparse representation and demonstrated its effectiveness on a vehicle tracking and classification task using outdoor infrared (IR) video sequences. Kernel methods are effective in machine learning to solve real-world problems with nonlinear data structures. Gao et al. 12 applied kernel sparse representation–based classification (KSRC) to map the data into a high-dimensional space to image classification and face recognition and achieved state-of-the-art performance. However, the success of kernel methods often depends on the choice of an appropriate kernel and features. Specific kernel functions are proposed for particular applications, such as text documents categorizing 13 and computational biology. 14 So, it is critical to select the kernel function that fits the samples. Instead of selecting one specific kernel function, multiple kernel learning methods which learn the kernel from the samples by a linear combination of base kernels is more effective, especially for complex scenarios with different sources or modalities.
The sparse representation classification model above assumes a data sample can be represented as a sparse combination of several atoms from the pre-specified or non-adaptive dictionaries, which cannot represent a given class of signals efficiently. To address this issue, recent research has focused on designing dictionaries using learning methods.15,16 Engan et al. 17 introduced the method of optimal directions (MOD) to find a dictionary and a sparse matrix which minimize the representation error. This method suffers from the relatively high complexity. To train a generic dictionary for sparse signal representation, Aharon et al. 18 developed the K-singular value decomposition (K-SVD) algorithm which updated the dictionary atom-by-atom in a simple process rather than using a matrix inversion.
In this article, we try to combine the dictionary learning method with the sparse representation to solve the multi-sensor vehicle classification problem, focusing on recognizing different types of vehicles. The dataset contains acoustic recordings observed at each individual sensor in a real-world experiment carried out at the city of Twenty-Nine Palms, CA, in November 2001. 2 The features of vehicles are extracted using Mel-frequency cepstral coefficients (MFCC), which has proven to be efficient in acoustic signal recognition. Chitra and Sumalatha 19 used MFCC to extract the sound features of the emergency vehicles and performed the classification and identification task using SVM. This approach achieved increased accuracy and reduced time delay for the emergency. Matthias and Rainer 20 presented a mobile sound classification system which extracted 13 MFCC from the data collected by the microphone and classified the sound using the neural networks to recognize sounds of emergency vehicles in road traffic.
In this work, we study whether this set of features can effectively be applied to vehicle recognition in transportation applications. Since the acoustic signals gathered in a real-world setting is inherently complex, it is difficult to choose the best kernel function. It is better to have a set of kernel functions and let the algorithm select the best subset of kernels. 21 Therefore, we propose multiple kernel sparse representation–based classification (MKSRC), which combines several possible kernels as the kernel function and optimizes the multiple kernel weights while training the KSRC to adapt to different cases. Meanwhile, in contrast to previous approaches on sparse representation in which the dictionary is fixed by the training samples,10,12 in this article, we update the dictionary by K-SVD to adapt to complex scenes.
The major contribution of this article lies in the following aspects. First, we have developed a new classification algorithm based on MKSRC and successfully applied it to vehicle recognition from acoustic sensor networks. Second, we have developed a new and effective multi-kernel weight update scheme based on gradient descend, enabling our multi-kernel representation to fit different input source characteristics. This source-adaptive representation scheme has demonstrated its unique advantages by our experiments. Third, we have proposed a K-SVD method for dictionary update instead of using fixed dictionary obtained from the training samples as in existing methods. This new method is able to handle the classification tasks within different and complex environments.
The remainder of this article is organized as follows. In section “Framework of vehicle recognition,” we present the framework of vehicle recognition. Section “Sparse representation models” explains the sparsity models and MKSRC. Our dictionary learning method is presented in section “Dictionary learning methods.” Experimental results and performance comparisons are provided in section “Experimental results.” Finally, conclusions and discussions on future work are provided in section “Conclusion.”
Framework of vehicle recognition
Figure 1 shows our vehicle recognition framework using MKSRC, which has the following major components:

The proposed recognition framework of MKSRC.
Sparse representation models
SRC
Sparse representation is a signal processing method to represent the main information of the signal using non-zero coefficients as few as possible. 10 For object recognition, our goal is to classify the test sample using labeled training data. Here, our central approach is to represent the test sample as a sparse linear combination of training samples.
Suppose we have
where
In fact, since real-world data are often noisy, it may not be possible to express the test sample exactly as a sparse combination of the training samples. 10 The model in equation (1) can be rewritten as
where
and generally, model (4) is transformed to the optimization problem about
where the first part is the residual and the parameter
where
Sparse representation–based classification algorithm.
The kernel method
To ensure an ideal classification performance, the vector
A kernel is called a Mercer kernel if it satisfies the Mercer’s condition: continuous, symmetric, and positive semi-definite.
27
Suppose
where <·,·> denotes dot product. It transforms the dot product calculation in the high-dimensional feature space to the kernel function in the input space avoiding curse of dimensionality. Then, we can focus on the kernel function instead of
KSRC
Note that kernel methods are effective for linearly non-separable problems. In this section, we propose a new classification method, called KSRC, which is a kernel-based sparse representation. We can recognize vehicles by solving equation (5) in SRC, but in kernel methods, we should construct a Mercer kernel. To make the training samples separable, we assume that there exists a feature mapping function
In SRC, the test sample can be sparsely represented by the training samples in the input space
Likewise, the optimization problem in equation (5) can be mapped in the high-dimensional space
However, since the mapping function
where
Multiple kernel sparse representation–based classification algorithm.
MKSRC
In this article, to determine the best kernel for our classification task, we propose MKSRC which assumes that sample
where
Here, according to Lemma 2 in Chapelle et al.,
28
it is possible to differentiate
Using the gradient descent method, we can update kernel weights
Under the condition of convergence,
Dictionary learning methods
The key component in sparse representation is to construct an over-complete dictionary. It is crucial to choose an appropriate dictionary. One can use pre-determined dictionaries, such as undecimated wavelets, 29 steerable wavelets, 30 and curvelets. 31 Their major advantage is simplicity and low complexity. However, their performance largely depends on the specific characteristics of the target signal. In this article, to address this issue, we introduce a dictionary learning method to update the over-complete dictionary and represent the signals sparsely. Dictionary learning has been widely used recently in many signal processing applications, such as image compression enhancement 32 and classification tasks. 33
To update an over-complete dictionary
The MOD introduced by Engan et al.17,34 is one of the first methods to implement sparsification.
15
Like other learning methods, MOD alternates two steps, with a sparse coding stage that uses OMP followed by an update of the dictionary. The aim of MOD is to find a dictionary
where
Suppose that the sparse matrix
In this article, we propose to use K-SVD to update the dictionary which is more efficient than MOD. The K-SVD algorithm is based on the SVD process (K is the number of columns in
where
Experimental results
In this section, we evaluate the performance of the proposed method on the dataset collected from a real-world wireless sensor networks (WDSN) in the city of Twenty-Nine Palms, CA, in November 2001. This dataset is available at http://www.ecs.umass.edu/∼mduarte/Software.html. 2 It contains the acoustic, seismic, and IR information of two types of military vehicles, namely, Assault Amphibian Vehicle (AAV) and Dragon Wagon (DW). The original time series data are collected from 18 sensor nodes on 3 routes, as shown in Figure 2. Each node has three types of sensors: microphone, geophone, and polarized IR sensor. These sensors are able to cover a field of about 900 × 300 m2, which consists of an east–west road, a south–north road, and an intersection. Each record in the dataset represents a vehicle passing by at a constant speed. Note that the Doppler effect will cause change in the frequencies of the measured signal. Similar to Duarte and Hu, 2 we do not consider this Doppler effect since the relative speed between the moving vehicles and sensor nodes is stable and relatively slow.

Sensor field layout.
Feature extraction
In this experiment, we aim to recognize each vehicle using the acoustic data. The acoustic data were recorded at a rate of 4960 Hz by microphones equipped on sensor nodes. First, we choose the data collected from the 3rd to 11th runs (AAV3–AAV11 and DW3–DW11) as the data source to assess different feature extraction and classification methods. To detect the useful events in raw time series data, we use CFAR detection algorithm 2 which is able to mark times with high energy values. Then, we use the MFCC method to extract features from the event time series for classification purposes.
Choosing an arbitrary set of the above data and comparing its 12-dimensional features extracted by MFCC with 50-dimensional features extracted by fast Fourier transform (FFT)
2
which is calculated for every 512 point sample (every 103.2 ms for the current sample rate 4960 Hz), we can find the distinct difference between them in Figure 3(b) and (c) where the

(a) Sample time series and features extracted by (b) MFCC and (c) FFT.
Dictionary learning
For cross-validation, after feature extraction, we divide the acoustic features into two parts, one as test samples and the other as training samples. For vehicle recognition, we need to compute the sparse representation of the test samples using a specific dictionary. The initial dictionary consists of acoustic features of training samples. Then, to better fit the current dataset, we use the K-SVD approach to update the initial dictionary. Using the OMP algorithm, we can first get the corresponding sparse matrix of the test samples. The sparsity level, which stands for the number of non-zero coefficients in the sparse matrix, will affect the recognition performance, as well as the computational complexity of the algorithm. Assumed that there are total 100 training samples in the dictionary, we demonstrate the sparse coding result of a test sample in Figure 4 with different sparsity levels (

Sparse coding with different sparsity levels: (a)

Relationship between the time consumption and the sparsity level.
To further study the impact of different dictionaries on the algorithm performance, we define the relative error of the sparse representation as
Compared with the initial dictionary which has the original training samples, the dictionary updated by K-SVD (with eight times iteration) shows its significantly improved performance as we can see from Figure 6. The relative error decreases significantly with the sparsity level. This implies that, in practice, we need to choose the appropriate sparsity level and find a good tradeoff between the relative error and complexity.

Relative error with initial and updated dictionaries.
Classification methods
To solve the vehicle recognition problem by the proposed MKSRC model, the features are mapped into a high-dimensional feature space with the kernel function. Here, we choose two common kernels, polynomial kernel (22), and Gaussian RBF kernel (23)
where
In our experiments, there are 90 samples for each vehicle collected from 9 runs (3–11) of 10 sensor nodes (51–56 and 58–61), as well as 90 samples for the noise in the acquisition process. To validate the results of a classifier, we employ threefold cross validation with stratified partition of the samples. The classifier is trained three times and each time a different set is used as a validation set. Tables 1 and 2 present the detection, false alarm, and classification rates based on FFT and MFCC. Here, the detection rate is defined to be the ratio between the number of correct classification samples and the size of the class. The false alarm rate is defined to be the ratio between the number of incorrect classification samples and the total number of samples in other classes. Furthermore, to analyze the effect with different dictionaries, we list the classification result based on the dictionary updated by K-SVD in Table 3.
Detection, false alarm, and classification rates based on FFT.
FFT: fast Fourier transform; SVM: support vector machine; SRC: sparse representation–based classification; KSRC: kernel sparse representation–based classification; RBF: radial basis function; MKSRC: multiple kernel sparse representation–based classification; AAV: Assault Amphibian Vehicle; DW: Dragon Wagon.
The significance of bold values is that the classification rates of this study method is significantly higher than other methods.
Detection, false alarm, and classification rates based on MFCC.
MFCC: Mel-frequency cepstral coefficients; SVM: support vector machine; SRC: sparse representation–based classification; KSRC: kernel sparse representation–based classification; RBF: radial basis function; MKSRC: multiple kernel sparse representation–based classification; AAV: Assault Amphibian Vehicle; DW: Dragon Wagon.
The significance of bold values is that the classification rates of this study method is significantly higher than other methods.
Detection, false alarm, and classification rates based on MFCC with the dictionary updated by K-SVD.
MFCC: Mel-frequency cepstral coefficients; K-SVD: K-singular value decomposition; SVM: support vector machine; SRC: sparse representation–based classification; KSRC: kernel sparse representation–based classification; RBF: radial basis function; MKSRC: multiple kernel sparse representation–based classification; AAV: Assault Amphibian Vehicle; DW: Dragon Wagon.
The significance of bold values is that the classification rates of this study method is significantly higher than other methods.
The kernel parameters,
To further illustrate the performance of MKSRC, we introduce the normalized correlation between the sparse codes of SRC, KSRC, and MKSRC. We list the results in two classes (each contains 30 samples) in Figure 7, where the

Normalized correlation between the sparse codes of (a) SRC, (b) KSRC, and (c) MKSRC.
According to the definition of correlation, the normalized correlation of the sparse codes should be block-wise since the sparse codes belonging to the same class are more similar. In Figure 7, we find that MKSRC becomes more discriminative sparse codes since the correlation coefficients of the same class (the first 30 samples belong to AAV and the rest belong to DW) in MKSRC are generally higher than in SRC and KSRC, which facilitates the better classification performance.
Conclusion
In this work, we have studied the problem of vehicle recognition using acoustic sensor networks. We have developed a new method, called MKSRC, for vehicle recognition. Acoustic features of vehicles are extracted and mapped into a high-dimensional feature space using the kernel function, which combines multiple kernels to obtain linearly separable samples. To improve the recognition accuracy, we incorporate dictionary learning into the MKSRC framework. By calculating the reconstructing error and updating the kernel weights, the objective vehicles will be recognized by solving the optimization problem. Our extensive experimental results demonstrate that the proposed MKSRC method with learned dictionaries outperforms other existing methods based on SVM, SRC, and KSRC in the literature on vehicle recognition from complex acoustic sensor network datasets. In our future work, we will focus on the self-adaption of the kernel parameters to further improve the recognition efficiency and robustness.
