Sage Journals: Discover world-class research

Abstract

Sparse representation–based classification and kernel methods have emerged as important methods for pattern recognition. In this work, we study the problem of vehicle recognition using acoustic sensor networks in real-world applications. To improve the recognition accuracy with noise sensor data collected from challenging sensing environments, we develop a new method, called multiple kernel sparse representation–based classification, for vehicle recognition. In the proposed multiple kernel sparse representation–based classification method, acoustic features of vehicles are extracted and mapped into a high-dimensional feature space using a kernel function, which combines multiple kernels to obtain linearly separable samples. To improve the recognition accuracy, we incorporate dictionary learning method K-singular value decomposition into the multiple kernel sparse representation–based classification framework. The vehicle recognition from acoustic sensor network is then formulated into an optimization problem. Our extensive experimental results demonstrate that the proposed multiple kernel sparse representation–based classification method with learned dictionaries outperforms other existing methods in the literature on vehicle recognition from complex acoustic sensor network datasets.

Keywords

Multiple kernel sparse representation–based classification K-singular value decomposition vehicle recognition sensor networks

Introduction

Wireless sensor networks (WSNs) have gained much attention recently due to advances in micro-electro-mechanical systems (MEMS) technology which has facilitated the development of smart sensors.¹ WSNs consist of a large number of sensor nodes whose position need not be pre-determined.² The protocols and algorithms of WSNs have self-organization capabilities. At the same time, sensor nodes have the ability to carry out simple computations locally and transmit required data to nodes responsible for the fusion instead of sending raw data, which truly compress the power consumption of the networks. For the powerful self-organization and fault-tolerant capacity, a wide variety of applications are being envisioned for sensor networks, including habitat monitoring, health monitoring, battlefield surveillance, and target tracking.

Detection and classification of objects moving through the sensor field is an important task in many field applications.³ For vehicle recognition, early work focused on military vehicle detection for battlefield surveillance. Nowadays, vehicle detection has become an important task for traffic monitoring and management. Recognition of different vehicle types, such as cars, motorbikes, buses, or trucks, provides detailed traffic statistics and useful information about road utilization. Since many characteristics of the vehicle can be inferred from the sound it generates,⁴ it is feasible to recognize the type of the moving vehicle in acoustic sensor networks.^4–7 However, how to improve the robustness of vehicle recognition algorithms from sensor networks within complex and noisy environments remains a critical and challenging issue in practice.

The problem of vehicle classification in acoustic sensor networks is essentially a pattern recognition problem. In recent years, various classification methods have been proposed in this field to adapt to different situations and improve the recognition rate, such as maximum likelihood (ML), k-nearest neighbor (k-NN), support vector machines (SVMs),^2,8 and decision tree (DT)⁹ methods. Wright et al.¹⁰ found that sparse representation–based classification (SRC) performed well on face recognition, especially in noisy and cluttered environments. Mei and Ling.¹¹ proposed a robust visual tracking and vehicle classification approach using sparse representation and demonstrated its effectiveness on a vehicle tracking and classification task using outdoor infrared (IR) video sequences. Kernel methods are effective in machine learning to solve real-world problems with nonlinear data structures. Gao et al.¹² applied kernel sparse representation–based classification (KSRC) to map the data into a high-dimensional space to image classification and face recognition and achieved state-of-the-art performance. However, the success of kernel methods often depends on the choice of an appropriate kernel and features. Specific kernel functions are proposed for particular applications, such as text documents categorizing¹³ and computational biology.¹⁴ So, it is critical to select the kernel function that fits the samples. Instead of selecting one specific kernel function, multiple kernel learning methods which learn the kernel from the samples by a linear combination of base kernels is more effective, especially for complex scenarios with different sources or modalities.

The sparse representation classification model above assumes a data sample can be represented as a sparse combination of several atoms from the pre-specified or non-adaptive dictionaries, which cannot represent a given class of signals efficiently. To address this issue, recent research has focused on designing dictionaries using learning methods.^15,16 Engan et al.¹⁷ introduced the method of optimal directions (MOD) to find a dictionary and a sparse matrix which minimize the representation error. This method suffers from the relatively high complexity. To train a generic dictionary for sparse signal representation, Aharon et al.¹⁸ developed the K-singular value decomposition (K-SVD) algorithm which updated the dictionary atom-by-atom in a simple process rather than using a matrix inversion.

In this article, we try to combine the dictionary learning method with the sparse representation to solve the multi-sensor vehicle classification problem, focusing on recognizing different types of vehicles. The dataset contains acoustic recordings observed at each individual sensor in a real-world experiment carried out at the city of Twenty-Nine Palms, CA, in November 2001.² The features of vehicles are extracted using Mel-frequency cepstral coefficients (MFCC), which has proven to be efficient in acoustic signal recognition. Chitra and Sumalatha¹⁹ used MFCC to extract the sound features of the emergency vehicles and performed the classification and identification task using SVM. This approach achieved increased accuracy and reduced time delay for the emergency. Matthias and Rainer²⁰ presented a mobile sound classification system which extracted 13 MFCC from the data collected by the microphone and classified the sound using the neural networks to recognize sounds of emergency vehicles in road traffic.

In this work, we study whether this set of features can effectively be applied to vehicle recognition in transportation applications. Since the acoustic signals gathered in a real-world setting is inherently complex, it is difficult to choose the best kernel function. It is better to have a set of kernel functions and let the algorithm select the best subset of kernels.²¹ Therefore, we propose multiple kernel sparse representation–based classification (MKSRC), which combines several possible kernels as the kernel function and optimizes the multiple kernel weights while training the KSRC to adapt to different cases. Meanwhile, in contrast to previous approaches on sparse representation in which the dictionary is fixed by the training samples,^10,12 in this article, we update the dictionary by K-SVD to adapt to complex scenes.

The major contribution of this article lies in the following aspects. First, we have developed a new classification algorithm based on MKSRC and successfully applied it to vehicle recognition from acoustic sensor networks. Second, we have developed a new and effective multi-kernel weight update scheme based on gradient descend, enabling our multi-kernel representation to fit different input source characteristics. This source-adaptive representation scheme has demonstrated its unique advantages by our experiments. Third, we have proposed a K-SVD method for dictionary update instead of using fixed dictionary obtained from the training samples as in existing methods. This new method is able to handle the classification tasks within different and complex environments.

The remainder of this article is organized as follows. In section “Framework of vehicle recognition,” we present the framework of vehicle recognition. Section “Sparse representation models” explains the sparsity models and MKSRC. Our dictionary learning method is presented in section “Dictionary learning methods.” Experimental results and performance comparisons are provided in section “Experimental results.” Finally, conclusions and discussions on future work are provided in section “Conclusion.”

Framework of vehicle recognition

Figure 1 shows our vehicle recognition framework using MKSRC, which has the following major components:

Pre-processing. The raw acoustic signals of vehicles are gathered from an acoustic sensor network. This pre-processing step is important for noise reduction. We use the constant false alarm rate (CFAR) detection method.² After the CFAR detection, useful event series are converted to frames and the default frame increment is half length of the frame as shown in Figure 1.

Feature extraction. An appropriate feature extraction method is important for classification. We use MFCC acoustic features. Acoustic signals often change quickly over time. Compared with linear prediction cepstral coefficients (LPCC),²² MFCC is more extensively used because of its robustness.²³

Dictionary learning and classification. We construct an over-complete dictionary based on the MFCC features and map it to a high-dimensional feature space using the kernel function which consists of multiple kernels $ϕ_{1}, ϕ_{2}, \dots, ϕ_{p}$ with weights $α_{1}, α_{2}, \dots, α_{p}$ . To establish the best possible representations for each member in this set with sparsity constraints, we update the dictionary columns by K-SVD in an iterative manner.¹⁸ Then, the object recognition problem becomes determining the best sparse representation of the test sample using the learned dictionary which has the minimum representation error.

Figure 1.

The proposed recognition framework of MKSRC.

Sparse representation models

SRC

Sparse representation is a signal processing method to represent the main information of the signal using non-zero coefficients as few as possible.¹⁰ For object recognition, our goal is to classify the test sample using labeled training data. Here, our central approach is to represent the test sample as a sparse linear combination of training samples.

Suppose we have $l$ classes of objects, and let $D = [D_{1}, D_{2}, . . ., D_{l}] \in R^{m \times n} (m < < n)$ be a set of $n$ training samples in $l$ classes, where $D_{i} = [d_{i, 1}, d_{i, 2}, . . ., d_{i, n_{i}}] \in R^{m \times n_{i}} (i = 1, 2, \dots, l)$ contains m-dimensional features of training samples in the ith class. Suppose the test sample $y \in R^{m \times 1}$ can be sparsely represented by the over-complete dictionary $D$ , we can obtain the sparse vector $x$ by solving the following optimization problem

$min_{x} ‖ x ‖_{0} \begin{matrix} s . t . \begin{matrix} y = Dx \end{matrix} \end{matrix}$ (1)

where $‖ x ‖_{0}$ is the $l_{0}$ norm which stands for the number of the non-zero coefficients of the matrix. While equation (1) is a NP-hard combinational optimization, Candes²⁴ proved that $l_{0}$ norm can be substituted by $l_{1}$ norm in equation (2) as an approximate solution if the solution of equation (1) is sparse enough

$min_{x} ‖ x ‖_{1} \begin{matrix} s . t . \begin{matrix} y = Dx \end{matrix} \end{matrix}$ (2)

In fact, since real-world data are often noisy, it may not be possible to express the test sample exactly as a sparse combination of the training samples.¹⁰ The model in equation (1) can be rewritten as

$min_{x} ‖ x ‖_{0} \begin{matrix} s . t . \begin{matrix} y = Dx \end{matrix} \end{matrix} + z$ (3)

where $z \in R^{m \times 1}$ is a noise term with bounded energy $‖ z ‖_{2} < ε$ . Thus, the sparse presentation model can be modified as

$min_{x} ‖ x ‖_{1} {\begin{matrix} s . t . \begin{matrix} ‖ y - Dx ‖ \end{matrix} \end{matrix}}_{2} \leq ε$ (4)

and generally, model (4) is transformed to the optimization problem about $J (x)$ as follows

$min_{x} J (x) = \frac{1}{2} ‖ y - Dx ‖_{2}^{2} + λ ‖ x ‖_{1}$ (5)

where the first part is the residual and the parameter $λ > 0$ is a scalar regularization parameter that balances the sparsity of the solution and fidelity of the approximation to $y$ . The solution of equation (5) is already available, and we use orthogonal matching pursuit (OMP)²⁵ to solve the sparse minimization problem. After sparsely coding $y$ on $D$ via $l_{1}$ -norm minimization, we obtain the solution denoted by ${\hat{x}}_{r}$ and the classification result can be obtained by computing the residuals for each class

$r_{i} (y) = ‖ y - D δ_{i} ({\hat{x}}_{r}) ‖_{2}$ (6)

where $δ_{i} ({\hat{x}}_{r})$ selects only the non-zero coefficients belonging to class $i$ . We then classify $y$ to the object class that minimizes the residual error.

Algorithm 1.

Sparse representation–based classification algorithm.

1. Normalize the columns of D to have unit l₂-norm.2. Code y over D via l₁-norm minimization

\hat{x} = \arg min_{x} {‖ y - Dx ‖_{2}^{2} + λ {‖ x ‖}_{1}}

where λ is a positive scalar.3. Compute the residuals

r_{i} = ‖ y - D_{i} {\hat{x}}_{i} ‖_{2}

{\hat{x}}_{i}

is the coefficient vector associated with class i.4. Output the identity of y as

identity (y) = \arg min_{i} {r_{i}}

The kernel method

To ensure an ideal classification performance, the vector $D_{i}$ in the dictionary should be uncorrelated. However, in practice, the underlying similarities between the extracted features of different vehicles may result in some correlation among $D_{i}$ and affect the final result. Similar to kernel methods in SVM,²⁶ our kernel method is used to deal with the linearly non-separable problems using nonlinear mapping.

A kernel is called a Mercer kernel if it satisfies the Mercer’s condition: continuous, symmetric, and positive semi-definite.²⁷ Suppose $x$ and $x'$ are two vectors in $χ$ and $ϕ$ is a nonlinear function realizing the mapping from the input space $χ$ to the feature space $F$ . Usually, a Mercer kernel $k$ can be expressed as

$k (x, x') = < ϕ (x), ϕ (x') >$ (7)

where <·,·> denotes dot product. It transforms the dot product calculation in the high-dimensional feature space to the kernel function in the input space avoiding curse of dimensionality. Then, we can focus on the kernel function instead of $ϕ$ . The linear kernel, polynomial kernels, and Gaussian radial basis function (RBF) kernels are commonly used in kernel function design.

KSRC

Note that kernel methods are effective for linearly non-separable problems. In this section, we propose a new classification method, called KSRC, which is a kernel-based sparse representation. We can recognize vehicles by solving equation (5) in SRC, but in kernel methods, we should construct a Mercer kernel. To make the training samples separable, we assume that there exists a feature mapping function $ϕ$ which maps the test sample $y$ and the dictionary $D$ from the input space $χ$ to a high-dimensional kernel feature space $F$

$y \to ϕ (y)$

$\begin{array}{l} D = [d_{1, 1}, d_{1, 2}, \dots, d_{l, n_{l}}] \to \\ D^{'} = [ϕ (d_{1, 1}), ϕ (d_{1, 2}), \dots, ϕ (d_{l, n_{l}})] \end{array}$ (8)

In SRC, the test sample can be sparsely represented by the training samples in the input space $χ$ . Similarly, we arrive at a kernel sparse–based representation of the test sample in the kernel feature space

$min_{x} ‖ x ‖_{0} \begin{matrix} s . t . \begin{matrix} ϕ (y) = D' x \end{matrix} \end{matrix}$ (9)

Likewise, the optimization problem in equation (5) can be mapped in the high-dimensional space $F$ as follows

$min_{x} J (x) = \frac{1}{2} ‖ ϕ (y) - D' x ‖_{2}^{2} + λ ‖ x ‖_{1}$ (10)

However, since the mapping function $ϕ$ is unknown, the solution of KSRC is based on the selected kernel function. When the dictionary $D$ is fixed, equation (10) can be rewritten as

$\begin{matrix} min_{x} J (x) = \frac{1}{2} | | ϕ (y) - D' x | |_{2}^{2} + λ | | x | |_{1} \\ = \frac{1}{2} k (y, y) + \frac{1}{2} x^{T} K_{DD} x - x^{T} K_{D} (y) + λ | | x | |_{1} \\ = L (x) + λ | | x | |_{1} \end{matrix}$ (11)

where $L (x) = \frac{1}{2} k (y, y) + \frac{1}{2} x^{T} K_{DD} x - x^{T} K_{D} (y)$ , $K_{DD}$ is a $k \times k$ matrix with ${K_{DD}}_{ij} = k (D_{i}, D_{j})$ , and $K_{D} (x)$ is a $k \times 1$ vector with ${K_{D} (x)}_{i} = k (D_{i}, x)$ .²⁷ By selecting appropriate kernel functions, we can solve the optimization problem.

Algorithm 2.

Multiple kernel sparse representation–based classification algorithm.

Input. The dictionary D composed by l training samples. A test sample y. The number of kernel function p.1. Initialize the kernel weights

α_{m} = 1 / \sqrt{p}

.2. Compute the coefficient

\bar{x} = \arg min {\frac{1}{2} ‖ ϕ (y) - D' x ‖_{2}^{2} + λ {‖ x ‖}_{1}}

with the kernel function

k (a, b) = \sum_{m = 1}^{p} α_{m}^{2} k_{m} (a, b)

.3. Update the kernel weights by

(α_{m}^{2})^{n + 1} = (α_{m}^{2})^{n} - ε^{n} \nabla_{α^{2}} J

.4. Go back to step 2 until the condition of convergence

| α^{n + 1} - α^{n} | \leq q

is met.5. Compute the residual

r_{j} (y) = \frac{1}{2} ‖ ϕ (y) - D' δ_{j} (\bar{x}) ‖_{2}^{2}

, for

j = 1, \dots, l

.Output.

Identity (y) = \arg min_{j} {r_{j}}

MKSRC

In this article, to determine the best kernel for our classification task, we propose MKSRC which assumes that sample $a$ can be mapped into several different feature spaces by nonlinear mapping functions $ϕ_{m} (a), (m = 1, 2, \dots, p)$ with different weights $α_{m}$ to achieve the best performance in classification. The corresponding kernel function for samples $a$ and $b$ is defined as

$k (a, b) = \sum_{m = 1}^{p} α_{m}^{2} k_{m} (a, b)$ (12)

where $k_{m} (a, b) = ϕ_{m} (a)^{T} ϕ_{k} (b)$ and the kernel weights have to satisfy $\sum_{m = 1}^{p} α_{m}^{2} = 1, α_{m} \geq 0$ . Within this MKSRC framework, the optimization problem in equation (5) can be mapped into a high-dimensional feature space $F$ . The parts related to the kernel can be described as follows

$k (y, y) = \sum_{m = 1}^{p} α_{m}^{2} k_{m} (y, y)$ (13)

$\begin{matrix} K_{DD} = [\begin{matrix} \sum_{m = 1}^{p} α_{m}^{2} k_{m} (d_{1, 1}, d_{1, 1}) & \sum_{k = 1}^{p} α_{m}^{2} k_{m} (d_{1, 1}, d_{1, 2}) & \dots & \sum_{m = 1}^{p} α_{m}^{2} k_{m} (d_{1, 1}, {d_{l,}}_{n_{l}}) \\ \sum_{m = 1}^{p} α_{m}^{2} k_{m} (d_{1, 2}, d_{1, 1}) & \sum_{m = 1}^{p} α_{m}^{2} k_{m} (d_{1, 2}, d_{1, 2}) & \dots & \sum_{k = 1}^{p} α_{m}^{2} k_{m} (d_{1, 2}, {d_{l,}}_{n_{l}}) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ \sum_{m = 1}^{p} α_{m}^{2} k_{m} ({d_{l,}}_{n_{l}}, d_{1, 1}) & \sum_{m = 1}^{p} α_{m}^{2} k_{m} ({d_{l,}}_{n_{l}}, d_{1, 2}) & \dots & \sum_{m = 1}^{p} α_{m}^{2} k_{m} ({d_{l,}}_{n_{l}}, {d_{l,}}_{n_{l}}) \end{matrix}] \\ K_{D} (y) = {[\sum_{m = 1}^{p} α_{m}^{2} k_{m} (d_{1, 1}, y), \sum_{m = 1}^{p} α_{m}^{2} k_{m} (d_{1, 2}, y), \dots, \sum_{m = 1}^{p} α_{m}^{2} k_{m} ({d_{l,}}_{n_{l}}, y)]}^{T} \end{matrix}$ (14)

$k (x, x') = < ϕ (x), ϕ (x') >$ (15)

Here, according to Lemma 2 in Chapelle et al.,²⁸ it is possible to differentiate $J$ with respect to $α_{m}^{2}$ as if $\bar{x}$ does not depend on $α_{m}^{2}$ ( $\bar{x}$ is the vector $x$ where the extreme value in $J$ is attained.). We have

$\begin{matrix} \frac{\partial J (α_{m}^{2})}{\partial α_{m}^{2}} = \frac{1}{2} k_{m} (y, y) + \frac{1}{2} {\bar{x}}^{T} [\begin{matrix} k_{m} (d_{1, 1}, d_{1, 1}) & k_{m} (d_{1, 1}, d_{1, 2}) & \dots & k_{m} (d_{1, 1}, d_{1, n_{l}}) \\ k_{m} (d_{1, 2}, d_{1, 1}) & k_{m} (d_{1, 2}, d_{1, 2}) & \dots & k_{m} (d_{1, 2}, d_{1, n_{l}}) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ k_{m} (d_{1, n_{l}}, d_{1, 1}) & k_{m} (d_{1, n_{l}}, d_{1, 2}) & \dots & k_{m} (d_{1, n_{l}}, d_{1, n_{l}}) \end{matrix}] \bar{x} \\ - {\bar{x}}^{T} [k_{m} (d_{1, 1}, y), k_{m} (d_{1, 2}, y), \dots, k_{m} (d_{1, n_{l}}, y)]^{T} + λ | | x | |_{1} \end{matrix}$ (16)

Using the gradient descent method, we can update kernel weights $(α_{m}^{2})^{n + 1} = (α_{m}^{2})^{n} - ε^{n} \nabla_{α^{2}} J$ and solve the optimization problem. The following table summarizes the proposed MKSRC method.

Under the condition of convergence, $q$ plays a critical role in determining the complexity of the algorithm and it should be adjusted based on the application requirements. Besides, $δ_{j} (x)$ is a vector whose only non-zero entries are the entries in $x$ that are associated with class $j$ .¹⁰ By computing the residual, we can classify the test sample $y$ into the group which minimizes residual error.

Dictionary learning methods

The key component in sparse representation is to construct an over-complete dictionary. It is crucial to choose an appropriate dictionary. One can use pre-determined dictionaries, such as undecimated wavelets,²⁹ steerable wavelets,³⁰ and curvelets.³¹ Their major advantage is simplicity and low complexity. However, their performance largely depends on the specific characteristics of the target signal. In this article, to address this issue, we introduce a dictionary learning method to update the over-complete dictionary and represent the signals sparsely. Dictionary learning has been widely used recently in many signal processing applications, such as image compression enhancement³² and classification tasks.³³

To update an over-complete dictionary $D$ in equation (5), a typical dictionary learning algorithm performs a two-stage procedure iteratively: sparse coding and dictionary update. Sparse coding is the process of computing the representation coefficients $x$ based on the given signal and the present dictionary. The dictionary is updated in stage 2 to reduce the representation error.

The MOD introduced by Engan et al.^17,34 is one of the first methods to implement sparsification.¹⁵ Like other learning methods, MOD alternates two steps, with a sparse coding stage that uses OMP followed by an update of the dictionary. The aim of MOD is to find a dictionary $D$ and a sparse matrix $x$ that minimize the representation error

$\arg min_{D, x} ‖ y - Dx ‖_{F}^{2} {\begin{matrix} s . t . \begin{matrix} ‖ x_{i} ‖ \end{matrix} \end{matrix}}_{0} \leq ε$ (17)

where ${x_{i}}$ represents the columns of $x$ and the notation $‖ x ‖_{F}$ represents the Frobenius norm, defined as

$‖ x ‖_{F} = \sqrt{\sum_{ij} x_{ij}^{2}}$

Suppose that the sparse matrix $x$ is fixed, and we can find the appropriate $D$ which minimizes the above error by taking the derivative of equation (17) with respect to $D$ . To achieve $(y - Dx) x^{T} = 0$ , we use the following iterative method

$D^{(n + 1)} = y x^{{(n)}^{T}} \cdot (x^{(n)} x^{{(n)}^{T}})^{- 1}$ (18)

In this article, we propose to use K-SVD to update the dictionary which is more efficient than MOD. The K-SVD algorithm is based on the SVD process (K is the number of columns in $D$ ) and the dictionary update is performed atom-by-atom in a simple and efficient process.¹⁵ Assuming that $D$ and $x$ are fixed, to update the kth column $d_{k}$ in $D$ , the object function can be rewritten as

$| | y - Dx | |_{2}^{2} = | | y - \sum_{j = 1}^{m} d_{j} x_{j}^{T} | |_{2}^{2} = | | y - \sum_{j = k} d_{j} x_{j}^{T} - d_{k} x_{k}^{T} | |_{2}^{2}$ (19)

where $x_{j}^{T}$ stands for the jth row in $x$ . In the above expression, we aim to the update both $d_{k}$ and $x_{k}^{T}$ by a simple rank-1 approximation³⁵ of the pre-computed error term

$E_{k} = y - \sum_{j \neq k} d_{j} x_{j}^{T}$ (20)

Experimental results

In this section, we evaluate the performance of the proposed method on the dataset collected from a real-world wireless sensor networks (WDSN) in the city of Twenty-Nine Palms, CA, in November 2001. This dataset is available at http://www.ecs.umass.edu/∼mduarte/Software.html.² It contains the acoustic, seismic, and IR information of two types of military vehicles, namely, Assault Amphibian Vehicle (AAV) and Dragon Wagon (DW). The original time series data are collected from 18 sensor nodes on 3 routes, as shown in Figure 2. Each node has three types of sensors: microphone, geophone, and polarized IR sensor. These sensors are able to cover a field of about 900 × 300 m², which consists of an east–west road, a south–north road, and an intersection. Each record in the dataset represents a vehicle passing by at a constant speed. Note that the Doppler effect will cause change in the frequencies of the measured signal. Similar to Duarte and Hu,² we do not consider this Doppler effect since the relative speed between the moving vehicles and sensor nodes is stable and relatively slow.

Figure 2.

Sensor field layout.

Feature extraction

In this experiment, we aim to recognize each vehicle using the acoustic data. The acoustic data were recorded at a rate of 4960 Hz by microphones equipped on sensor nodes. First, we choose the data collected from the 3rd to 11th runs (AAV3–AAV11 and DW3–DW11) as the data source to assess different feature extraction and classification methods. To detect the useful events in raw time series data, we use CFAR detection algorithm² which is able to mark times with high energy values. Then, we use the MFCC method to extract features from the event time series for classification purposes.

Choosing an arbitrary set of the above data and comparing its 12-dimensional features extracted by MFCC with 50-dimensional features extracted by fast Fourier transform (FFT)² which is calculated for every 512 point sample (every 103.2 ms for the current sample rate 4960 Hz), we can find the distinct difference between them in Figure 3(b) and (c) where the x-axes stand for the frame number and the y-axe is on behalf of the magnitude of each feature item. The MFCC features mainly concentrate in lower feature item compared with FFT ones.

Figure 3.

(a) Sample time series and features extracted by (b) MFCC and (c) FFT.

Dictionary learning

For cross-validation, after feature extraction, we divide the acoustic features into two parts, one as test samples and the other as training samples. For vehicle recognition, we need to compute the sparse representation of the test samples using a specific dictionary. The initial dictionary consists of acoustic features of training samples. Then, to better fit the current dataset, we use the K-SVD approach to update the initial dictionary. Using the OMP algorithm, we can first get the corresponding sparse matrix of the test samples. The sparsity level, which stands for the number of non-zero coefficients in the sparse matrix, will affect the recognition performance, as well as the computational complexity of the algorithm. Assumed that there are total 100 training samples in the dictionary, we demonstrate the sparse coding result of a test sample in Figure 4 with different sparsity levels ( $K = 30$ and $K = 60$ ). Figure 5 shows the relationship between the sparsity and the time consumption of the sparse coding process with a single sample. The algorithm is written in MATLAB without optimization for speed, and ran on a laptop of Core i5 at 1.60 G with 4-GB memory. We can see that complexity increases significantly with the sparsity level.

Figure 4.

Sparse coding with different sparsity levels: (a) K = 30 and (b) K = 60.

Figure 5.

Relationship between the time consumption and the sparsity level.

To further study the impact of different dictionaries on the algorithm performance, we define the relative error of the sparse representation as

$γ = \frac{{‖ y - Dx ‖}_{2}}{{‖ y ‖}_{2}}$ (21)

Compared with the initial dictionary which has the original training samples, the dictionary updated by K-SVD (with eight times iteration) shows its significantly improved performance as we can see from Figure 6. The relative error decreases significantly with the sparsity level. This implies that, in practice, we need to choose the appropriate sparsity level and find a good tradeoff between the relative error and complexity.

Figure 6.

Relative error with initial and updated dictionaries.

Classification methods

To solve the vehicle recognition problem by the proposed MKSRC model, the features are mapped into a high-dimensional feature space with the kernel function. Here, we choose two common kernels, polynomial kernel (22), and Gaussian RBF kernel (23)

$k (a, b) = (a^{T} b + 1)^{d}$ (22)

$k (a, b) = \exp (- \frac{‖ a - b ‖_{2}^{2}}{2 σ^{2}}) = \exp (- β ‖ a - b ‖_{2}^{2})$ (23)

where $d$ and $β$ are the parameters related to the characteristics of acoustic features and they are tuned using cross validations. We compare our MKSRC method with existing classification methods SVM^2,36 and SRC^10,37,38

In our experiments, there are 90 samples for each vehicle collected from 9 runs (3–11) of 10 sensor nodes (51–56 and 58–61), as well as 90 samples for the noise in the acquisition process. To validate the results of a classifier, we employ threefold cross validation with stratified partition of the samples. The classifier is trained three times and each time a different set is used as a validation set. Tables 1 and 2 present the detection, false alarm, and classification rates based on FFT and MFCC. Here, the detection rate is defined to be the ratio between the number of correct classification samples and the size of the class. The false alarm rate is defined to be the ratio between the number of incorrect classification samples and the total number of samples in other classes. Furthermore, to analyze the effect with different dictionaries, we list the classification result based on the dictionary updated by K-SVD in Table 3.

Table 1.

Detection, false alarm, and classification rates based on FFT.

Classification method		SVM	SRC	KSRC (RBF)	MKSRC
Detection rate	AAV (%)	73.33	78.89	70.00	74.44
	DW (%)	64.44	72.22	80.00	80.00
	Noise (%)	81.11	40.00	55.56	83.33
False alarm rate	AAV (%)	13.33	22.78	10.56	5.56
	DW (%)	11.67	24.44	28.33	15.56
	Noise (%)	15.56	7.22	8.33	10.00
Classification rate (%)		72.96	63.70	68.52	79.26

FFT: fast Fourier transform; SVM: support vector machine; SRC: sparse representation–based classification; KSRC: kernel sparse representation–based classification; RBF: radial basis function; MKSRC: multiple kernel sparse representation–based classification; AAV: Assault Amphibian Vehicle; DW: Dragon Wagon.

The significance of bold values is that the classification rates of this study method is significantly higher than other methods.

Table 2.

Detection, false alarm, and classification rates based on MFCC.

Classification method		SVM	SRC	KSRC (RBF)	MKSRC
Detection rate	AAV (%)	88.89	68.89	84.44	86.67
	DW (%)	81.11	81.11	81.11	86.67
	Noise (%)	96.67	91.11	97.78	95.56
False alarm rate	AAV (%)	7.22	6.11	6.11	4.44
	DW (%)	5.56	12.78	6.11	5.56
	Noise (%)	3.89	10.56	6.11	5.56
Classification rate (%)		88.89	80.37	87.78	89.63

MFCC: Mel-frequency cepstral coefficients; SVM: support vector machine; SRC: sparse representation–based classification; KSRC: kernel sparse representation–based classification; RBF: radial basis function; MKSRC: multiple kernel sparse representation–based classification; AAV: Assault Amphibian Vehicle; DW: Dragon Wagon.

The significance of bold values is that the classification rates of this study method is significantly higher than other methods.

Table 3.

Detection, false alarm, and classification rates based on MFCC with the dictionary updated by K-SVD.

Classification method		SVM	SRC	KSRC (RBF)	MKSRC
Detection rate	AAV (%)	80.00	73.33	80.00	93.33
	DW (%)	70.00	81.67	88.33	80.00
	Noise (%)	90.00	95.67	95.00	96.97
False alarm rate	AAV (%)	6.67	2.50	1.67	3.33
	DW (%)	15.00	10.00	6.67	3.33
	Noise (%)	8.33	12.50	10.00	8.33
Classification rate (%)		80.00	83.33	87.78	90.00

MFCC: Mel-frequency cepstral coefficients; K-SVD: K-singular value decomposition; SVM: support vector machine; SRC: sparse representation–based classification; KSRC: kernel sparse representation–based classification; RBF: radial basis function; MKSRC: multiple kernel sparse representation–based classification; AAV: Assault Amphibian Vehicle; DW: Dragon Wagon.

The significance of bold values is that the classification rates of this study method is significantly higher than other methods.

The kernel parameters, $d, β$ , are tuned by cross validations. In this work, we have obtained $d = 2, β = 0.0002$ . For feature extraction, the MFCC method has a distinct advantage over FFT since it considers human auditory characteristics. For the choice of dictionary, we find that our method using updated dictionary outperforms other classifiers based on sparse representation. From Table 3, we can see that the proposed MKSRC achieves a high classification rate of 90.00%, outperforming other methods while achieving lower false alarm rates.

To further illustrate the performance of MKSRC, we introduce the normalized correlation between the sparse codes of SRC, KSRC, and MKSRC. We list the results in two classes (each contains 30 samples) in Figure 7, where the x-axes and y-axes both stand for the number of samples used and the first 30 (1–30) and the rest (31–60) are from the same class, respectively. The entry $(p, q)$ is the normalized correlation of the sparse codes between the acoustic test samples $p$ and $q$ .

Figure 7.

Normalized correlation between the sparse codes of (a) SRC, (b) KSRC, and (c) MKSRC.

According to the definition of correlation, the normalized correlation of the sparse codes should be block-wise since the sparse codes belonging to the same class are more similar. In Figure 7, we find that MKSRC becomes more discriminative sparse codes since the correlation coefficients of the same class (the first 30 samples belong to AAV and the rest belong to DW) in MKSRC are generally higher than in SRC and KSRC, which facilitates the better classification performance.

Conclusion

In this work, we have studied the problem of vehicle recognition using acoustic sensor networks. We have developed a new method, called MKSRC, for vehicle recognition. Acoustic features of vehicles are extracted and mapped into a high-dimensional feature space using the kernel function, which combines multiple kernels to obtain linearly separable samples. To improve the recognition accuracy, we incorporate dictionary learning into the MKSRC framework. By calculating the reconstructing error and updating the kernel weights, the objective vehicles will be recognized by solving the optimization problem. Our extensive experimental results demonstrate that the proposed MKSRC method with learned dictionaries outperforms other existing methods based on SVM, SRC, and KSRC in the literature on vehicle recognition from complex acoustic sensor network datasets. In our future work, we will focus on the self-adaption of the kernel parameters to further improve the recognition efficiency and robustness.

Footnotes

Academic Editor: Pedro P Rodrigues

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was supported by the National Natural Science Foundation of China (NSFC) under grant nos 61301027 and 61375015,and the Zhejiang Provincial Natural Science Foundation under grant no. LY14F030007.

References

Yick

Mukherjee

Ghosal

. Wireless sensor network survey. Comput Netw 2008; 52(12): 2292–2330.

Duarte

. Vehicle classification in distributed sensor networks. J Parallel Distr Com 2004; 64(7): 826–838.

D’Costa

Sayeed

. Collaborative signal processing for distributed classification in sensor networks. In: IPSN’03 proceedings of the 2nd international conference on information processing in sensor network, Palo Alto, CA, 22–23 April 2003, pp.193–208. Berlin: Springer.

Eom

. Analysis of acoustic signatures from moving vehicles using time-varying auto regressive models. Multidim Syst Sign P 1999; 10(4): 357–378.

Thomas

Wilkins

. The analysis of vehicle sounds for recognition. Pattern Recogn 1972; 4(4): 379–389.

Sokolov

Rogers

. Removing harmonic signal nonstationarity by dynamic resampling. In: Proceedings of the IEEE international symposium on industrial electronics, Athens, 10–14 July 1995, vol. 1, pp.303–308. New York: IEEE.

Choe

Karlsen

Gerhart

. Wavelet-based ground vehicle recognition using acoustic signals. Proc SPIE 1996; 2762: 434–445.

Wong

. Detection, classification and tracking of targets in distributed sensor networks. IEEE Signal Proc Mag 2002; 19(2): 17–29.

Ghasemzadeh

Jafari

. Physical movement monitoring using body sensor networks: a phonological approach to construct spatial decision trees. IEEE T Ind Inform 2011; 7(1): 66–77.

10.

Wright

Yang

Ganesh

. Robust face recognition via sparse representation. IEEE T Pattern Anal 2009; 31(2): 210–227.

11.

Mei

Ling

. Robust visual tracking and vehicle classification via sparse representation. IEEE T Pattern Anal 2011; 33(11): 2259–2272.

12.

Gao

Tsang

IWH

Chia

. Kernel sparse representation for image classification and face recognition. In ECCV’10 proceedings of the 11th European conference on computer vision: Part IV, Heraklion, 5–11 September 2010, vol. 6314, pp.1–14. Berlin: Springer.

13.

Huma

Craig

John

. Text classification using string kernels. J Mach Learn Res 2002; 2: 419–444.

14.

Schölkopf

Tsuda

Vert

. Support vector machine applications in computational biology. In: Schölkopf

Tsuda

Vert

J-P

(eds) Kernel methods in computational biology. Cambridge, MA: The MIT Press, 2004, pp.71–92.

15.

Rubinstein

Bruckstein

Elad

. Dictionaries for sparse representation modeling. Proc IEEE 2010; 98(6): 1045–1057.

16.

Tosic

Frossard

. Dictionary learning. IEEE Signal Proc Mag 2011; 28(2): 27–38.

17.

Engan

Aase

Husoy

. Method of optimal directions for frame design. In 1999 IEEE international conference on acoustic, speech, and signal processing, Phoenix, AZ, 15–19 March 1999, vol. 5, pp.2443–2446. New York: IEEE.

18.

Aharon

Elad

Bruckstein

. The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 2006; 54(11): 4311–4322.

19.

Chitra

VPJ

Sumalatha

. SVM-instance based approach to improve QoS parameters for time critical applications in WSN. In: 2012 fourth international conference on advanced computing (ICoAC), Chennai, India, 13–15 December 2012, pp.1–6. New York: IEEE.

20.

Matthias

Rainer

. Smartphone application for automatic classification of environmental sound. In: 2013 proceedings of the 20th international conference on mixed design of integrated circuits and systems (MIXDES), Gdynia, 20–22 June 2013, pp.512–515. New York: IEEE.

21.

Gönen

Alpaydın

. Multiple kernel learning algorithms. J Mach Learn Res 2011; 12: 2211–2268.

22.

Gray

Jr Markel

. Distance measures for speech processing. IEEE Trans Acoustics Speech Signal Process 1976; 24(5): 380–391.

23.

Hasan

Jamil

Rabbani

. Speaker identification using Mel frequency cepstral coefficients. In: 3rd international conference on electrical & computer engineering, Dhaka, Bangladesh, 28–30 December 2004, pp.565–568, https://pdfs.semanticscholar.org/32c4/db25607bd52a6d0aeb5498ec9c8d564e6d2e.pdf

24.

Candes

Romberg

Tao

. Robust Uncertainty Principles: exact Signal Reconstruction from Highly Incomplete Frequency Information. IEEE T Inform Theory 2004; 52(2): 489–509.

25.

Tropp

Gilbert

. Signal recovery from random measurements via orthogonal matching pursuit. IEEE T Inform Theory 2007; 53(12): 4655–4666.

26.

Burges

CJC

. A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 1998; 2(2): 121–167.

27.

Zhang

Zhou

Chang

. Kernel sparse representation-based classifier. IEEE T Signal Proces 2012; 60(4): 1684–1695.

28.

Chapelle

Vapnik

Bousquet

. Choosing multiple parameters for support vector machines. Mach Learn 2002; 46(1–3): 131–159.

29.

Starck

Fadili

Murtagh

. The undecimated wavelet decomposition and its reconstruction. IEEE Trans Image Process 2007; 16(2): 297–309.

30.

Portilla

Strela

Wainwright

. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans Image Process 2003; 12(11): 1338–1351.

31.

Starck

Candes

Donoho

. The curvelet transform for image denoising. IEEE Trans Image Process 2002; 11(6): 670–684.

32.

Elad

. Sparse and redundant representations: from theory to applications in signal and image processing. Berlin, Germany: Springer, 2010.

33.

Mairal

Bach

Ponce

. Task-driven dictionary learning. IEEE Trans Pattern Anal Mach Intell 2012; 34(4): 791–804.

34.

Engan

Rao

Kreutz-Delgado

. Frame design using FOCUSS with method of optimized directions (MOD). In: Proceedings of the NORSIG’99, Oslo, September 1999, pp.65–69, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.719.8961&rep=rep1&type=pdf

35.

Golub

Van Loan

. Matrix computations. 3rd ed.Baltimore, MD: Johns Hopkins University Press, 1996.

36.

Zhou

Chen

. The research of vehicle classification using SVM and KNN in a ramp. In: International forum on computer science-technology and applications, USA, 25–27 December 2009, vol. 3, pp.25–27. New York: IEEE.

37.

Zheng

. Two-stage nonnegative sparse representation for large-scale face recognition. IEEE Trans Neural Netw Learn Syst 2013; 24(1): 35–46.

38.

Wang

Feng

. Vehicle recognition in acoustic sensor networks via sparse representation. In: 2014 IEEE international conference on multimedia and expo workshops (ICMEW), Chengdu, China, 14–18 July 2014, pp.1–4. New York: IEEE.

Vehicle recognition in acoustic sensor networks using multiple kernel sparse representation over learned dictionaries

Abstract

Keywords

Introduction

Framework of vehicle recognition

Sparse representation models

SRC

The kernel method

KSRC

MKSRC

Dictionary learning methods

Experimental results

Feature extraction

Dictionary learning

Classification methods

Conclusion

Footnotes

Declaration of conflicting interests

Funding

References