Sage Journals: Discover world-class research

Abstract

In this article, we propose a novel multi-task hybrid dictionary learning approach for moving vehicle classification tasks using multi-sensor networks to improve the classification accuracy in complex scenes with low time complexity, which considers both correlations and complementary information among multiple heterogeneous sensors simultaneously to learn a hybrid dictionary within observations of each sensor. The efficient hybrid dictionary consists of a synthesis dictionary and an analysis dictionary, where discriminative codes can be generated by the trained analysis dictionary and class-specific discriminative reconstruction can be achieved by the trained synthesis dictionary. Extensive experiments are conducted on real data sets captured by the multiple heterogeneous sensors, and the results demonstrate that the proposed method can use the multi-feature fusion method to improve the vehicle classification accuracy, and it can learn a hybrid dictionary to make sure that the sparse coding matrix is obtained by simple linear mapping function. Moreover, the problem of $ℓ_{p}$ -norm $(p ⩽ 1)$ sparse coding can been solved, to reduce the time complexity of this algorithm, compared with support vector machine, sparse representation classification, label consistent KSVD, Fisher discrimination dictionary learning, hybrid dictionary learning, multi-task sparse representation classification, and multi-task Fisher discrimination dictionary learning algorithms.

Keywords

Multi-task hybrid dictionary learning sensor networks vehicle classification dictionary learning

Introduction

Multi-sensor fusion has attracted a wide range of attentions over the past few years for both civilian and military applications.^1–3 Among them, classification acted as a particular interest in multi-sensor fusion, especially for the moving vehicle classification,^4,5 in which the essential problem is how to make use of relevant information from different tasks while recording the same physical events to achieve an improvement in the classification performance. KB Eom⁶ proposed to deduce the features of vehicles from the sounds generated by the vehicle, so the classification of the moving vehicles tends to be practicable under the acoustic signals in complex scenes. Later, Duarte and Hu⁷ detailed the procedure of data collection, the feature extraction and pre-processing steps, and accomplished the task of classifying the types of moving vehicles in distributed networks with the maximum likelihood classifier based on the multi-dimensional frequency spectrum features of the sensor signals.

Many classification approaches have been put forward to improve the applicability for different situations and make the classification performance enhanced, such as support vector machines (SVM),^8,9 sparse representation classification (SRC),^10–12 Kernel sparse representation classification (KSRC),^13,14 label consistent KSVD (LC-KSVD),^15,16 Fisher discrimination dictionary learning (FDDL),^17,18 and hybrid dictionary learning (HDL).¹⁹ Till now, all the classification methods mentioned above have achieved state-of-the-art performance in terms of the specific situations.

For signal sparse representation, the dictionary plays an important role. When sparse representation model with an analytical dictionary is used to represent a signal, the representation coefficients problem can be reduced to solve a simple inner product operation. Gao et al.¹³ utilized kernel sparse representation based classification (KSRC) to achieve the data mapping to a high-dimensional space for classification, which avoids the limitation that the model must be linear, and it showed state-of-the-art performance. However, it is less effective especially when it comes to model the complex local structures of natural scene. In recent years, the synthesis dictionary has been positively applied in sparse representation and has been widely studied. Based on synthesis dictionary, a signal is decomposed and its representation coefficients are usually obtained via a $ℓ_{p} - norm (p ⩽ 1)$ sparse coding, which makes its higher time consumption but can better model local structures of the complex image.¹⁹ Then, Yang et al.¹⁷ introduced an FDDL method based on the Fisher discrimination criterion, whose atoms have the corresponding relation of subject class labels, with which not only the representation residual can be used to distinguish different classes but also the representation coefficients have small within-class scattering and big between-class scattering. But it affirmatively makes quite higher time consumption in training and testing phases by solving a $ℓ_{p} - norm (p ⩽ 1)$ sparse coding problem. To improve the classification performance considerably, Guo et al.¹⁹ proposed a HDL for vehicle classification in acoustic sensor networks, in which discriminative codes can be generated by the trained analysis dictionary and class-specific discriminative reconstruction can be achieved by the trained synthesis dictionary. However, it only considers the classification problem of a single sensor, paying close attention to classification the types of the moving vehicles to achieve the improvement of the classification accuracy with low-time complexity under complex scenes.

However, the information among different sensors has not been considered. Therefore, Nguyen et al.²⁰ proposed a novel multi-task multivariate (MTMV) sparse representation method for multi-sensor classification, which took advantage of different sensors having related information while recording the same physical event and achieved excellent classification performance. However, the representation coefficients are still obtained by solving a sparse coding problem, so the time complexity in the training and testing phases remains inevitably huge. Since then, various sophisticated techniques have been developed and applied to the field of pattern classification such as acoustic signal classification,²¹ hyperspectral target detection,²² and visual classification.^23,24 For example, Zhang et al.²¹ put forward a joint sparse model for the classification of acoustic signal, which utilized the truth that several columns of the training dictionary had the ability to simultaneously represent the multiple observations from the same class. Therefore, coefficient vectors associated with these observations can provide the same sparse pattern. Similarly, in terms of visual classification, Yuan and Yan²³ studied a multi-task model and also presented the resumption that tasks belonging to the same class share the same sparse support distributions on their coefficient vectors. In signal processing, the inherent information often exists in low-dimensional subspaces, and the semantic information is usually encoded in the sparse representation. Especially, with the emergence of those appealing models above, sparse representation and related optimization problems have gradually attracted more and more attention of researchers.

In this article, inspired by the advantage of HDL¹⁹ and multi-task dictionary learning, we propose a novel method, namely, multi-task hybrid dictionary learning (MT-HDL), by thoroughly considering the correlations and complementary information among multiple heterogeneous sensors. This technique imposes joint-sparsity constraints both within each task and across multiple tasks, which effectively incorporates both HDL and multi-task to reach a quite great performance.

The contribution of our work is threefold. First, we consider the correlations as well as complementary information among different sensors simultaneously to solve the multi-sensor classification problem. The experimental results show its great superiority when considering the importance of collaborative heterogeneous sensors. Second, we utilize the multi-feature signals to learn a hybrid dictionary, in which discriminative codes can be generated by the trained analysis dictionary G and class-specific discriminative reconstruction can be achieved by the trained synthesis dictionary D. Third, we get the vehicle types by the decision fusion function, which avoids the huge time complexity caused by solving the $ℓ_{p} - norm (p ⩽ 1)$ sparse coding problem.

The remainder of this article is organized as follows. Section “Related work” briefly introduces SRC and dictionary learning (DL) methods. We present in section “The framework of MT-HDL in sensor networks for vehicle classification,” a framework of the MT-HDL in sensor networks for vehicle classification. Section “MT-HDL” describes the single-task HDL and the MT-HDL algorithms in detail. Extensive experiments are shown in section “Experimental results” and conclusions are drawn in section “Conclusion.”

Related work

SRC

Suppose the sparse representation of a test sample $y \in R^{m}$ can be achieved using the over-complete dictionary D, then the sparse vector $x$ can be obtained by solving the following optimization problem

$(ℓ_{0}) : \begin{matrix} {\hat{x}}_{0} = \arg \end{matrix} min_{x} ‖ x ‖_{0} \begin{matrix} s . t . \end{matrix} \begin{matrix} y = D x \end{matrix}$ (1)

Since it is a NP-hard combinational optimization, the $ℓ_{0} - norm$ in equation (1) can be substituted by $ℓ_{1} - norm$ in equation (2) as an approximate solution if the solution of equation (1) is sparse enough

$(ℓ_{1}) : \begin{matrix} {\hat{x}}_{1} = \arg \end{matrix} min_{x} ‖ x ‖_{1} \begin{matrix} s . t . \end{matrix} \begin{matrix} y = D x \end{matrix}$ (2)

The noise case is shown as

$(ℓ_{0}^{s}) : \begin{matrix} \hat{x} = \arg \end{matrix} min_{x} ‖ x ‖_{0} \begin{matrix} s . t . \end{matrix} \begin{matrix} y = D x + z \end{matrix}$ (3)

where $z \in R^{m}$ is a noise term with bounded energy $‖ z ‖_{2} < ε$ . Accordingly, we can obtain the modified sparse presentation model as equation (4)

$(ℓ_{1}^{s}) \begin{matrix} \hat{x} = \arg min_{x} \end{matrix} ‖ x ‖_{1} \begin{matrix} s . t . \end{matrix} \begin{matrix} {‖ y - D x ‖}_{2} ⩽ ε \end{matrix}$ (4)

Using the coefficients of the $j th$ class only, according to ${\hat{y}}_{j} = D δ_{j} (\hat{x})$ , we can roughly estimate the reconstructed test sample $y$ and calculate the minimum residual between $y$ and ${\hat{y}}_{j}$ as equation (5)

$r_{j} (y) = ‖ y - D δ_{j} (\hat{x}) ‖_{2} j \in {1, 2, \dots, K}$ (5)

Finally, we identify which category y belongs to by minimizing the residual error according to equation (6)

$identity (y) = \arg min_{j} r_{j} (y)$ (6)

Multi-task sparse representation classification

We take a multi-task (multi-sensor) K classification problem into account. For each sensor $i = 1, \dots, M$ , we denote $D^{i} = [D_{1}^{i}, D_{2}^{i}, \dots, D_{j}^{i}, \dots, D_{K}^{i}]$ as an $m \times h$ dictionary, consisting of K sub-dictionaries $D_{j}^{i}' s$ with respect to K classes, in which $D_{j}^{i} = [D_{j, 1}^{i}, D_{j, 2}^{i}, \dots, \dots, D_{j, t}^{i}, D_{j, h_{j}}^{i}]$ .

For the testing sample $Y = [Y^{1}, Y^{2}, \dots, Y^{i}, \dots, Y^{M}]$ , suppose $Y^{i}$ is the observation of the $i th$ sample belongs to the $j th$ class, then it can be reconstructed by the over-complete dictionary $D^{i} = [D_{1}^{i}, D_{2}^{i}, \dots, D_{j}^{i}, \dots, D_{K}^{i}]$ as equation (7)

$Y^{i} = D^{i} X^{i} + Z^{i}$ (7)

where $X^{i}$ is a sparse matrix associated with the over-complete dictionary $D^{i}$ , and $Z^{i}$ is a small noise matrix.

Define $X = [X^{1}, X^{2}, \dots, X^{i}, \dots, X^{M}]$ , then X is a sparse matrix with only $q_{j}$ nonzero terms. The representation coefficients matrix X can be obtained by solving the following $ℓ_{p} - norm (p ⩽ 1)$ least square problem as equation (8)

$\hat{X} = \arg min \frac{1}{2} \sum_{i = 1}^{M} | | Y^{i} - D^{i} X^{i} | |_{F}^{2} + λ | | X | |_{1, p}$ (8)

where $λ$ is a constant and $p ⩽ 1$ to make the optimization convex.

Once $\hat{X}$ is obtained, we could reconstruct the original signal and classify the target type by the minimal residuals as equation (9). The specific classification decision function is shown as

$\begin{matrix} r_{j} (Y) = \sum_{i = 1}^{M} {‖ Y^{i} - D^{i} δ_{j}^{i} ({\hat{X}}^{i}) ‖}_{2} \\ i \in {1, 2, \dots, M}, j \in {1, 2, \dots, K} \end{matrix}$ (9)

$identity (Y) = \arg min_{j} r_{j} (Y)$ (10)

where $δ_{j}^{i}$ is a unit matrix corresponding to the $j th$ class.

DL

KSVD is a generalization of the k-means clustering method via a singular value decomposition approach and utilized as a powerful DL algorithm for sparse representations. It works by iteratively alternating between sparse coding the input data based on the current dictionary and updating the atoms in the dictionary to better fit the data.

The unsupervised DL algorithm KSVD has achieved promising results in signal restoration, but it is not adequate for classification tasks because the learnt dictionary only represents the trained samples. The success of DL in signal restoration sparks its applications in classification tasks. Since the goal of classification is to assign the correct class-label to the tested sample, it will majorly concern the discrimination ability of the dictionary. There exist two categories of discriminative DL methods for pattern classification.

As for the first category, a shared dictionary^25,26 for all classes is learned by making the representation coefficients discriminative. However, the shared dictionary considerably loses the correspondence between dictionary atoms as well as the class labels, which leads to the impossibility of performing classification based on the class-specific representation residual. Another category of DL method is to learn a structured dictionary to promote discrimination between classes. For the algorithm in Wang et al.,¹⁸ the coding coefficients can be achieved more discriminatively with the use of Fisher discrimination criterion. However, Fisher discrimination criterion shows the limitations of data distribution hypothesis and also fails to take the local manifold structure of the coding coefficients into consideration.

The framework of MT-HDL in sensor networks for vehicle classification

Aiming at the difficult issues on the long-term vehicle classification using sensor networks in complex scenes, we establish a vehicle classification framework based on MT-HDL as shown in Figure 1, which has the following main components. To describe this model, let us first consider a two-task classification with a testing sample $Y$ consisting of two tasks $Y^{1}$ and $Y^{2}$ collected from acoustic and seismic sensors, respectively.

Figure 1.

The framework of multi-task hybrid dictionary learning model for vehicle classification.

Pre-processing

The raw acoustic and seismic signals of vehicles are gathered from the multiple heterogeneous sensor nodes in complex scenes using sensor networks. However, the signal will inevitably be interfered by noise and other uncertain conditions, so the pre-processing is essential to pick up the useful events. In the procedure of pre-processing, considering the useful event series span a short period of time when the vehicles is close to the sensor nodes, constant false alarm rate (CFAR) algorithm is used to detect whether the vehicles is present and finally the useful event series are converted to frames.

Feature extraction

Acoustic and seismic signals often change quickly over time and seem to be unstable, thus lots of approaches are developed in the frequency domain for feature extraction as they can be considered quasi-stationary and analyzed using the Fourier transform. Among them, Mel Frequency Cepstral Coefficient (MFCC)^27,28 is more extensively used because of its robustness. In this article, MFCC is utilized to extract multi-dimensional frequency spectrum features of target vehicles.

Multi-task hybrid dictionary training

By exploiting both the correlation and complementary information of different heterogeneous sensors, we construct a multi-task hybrid dictionary based on multi-feature signals, in which the synthesis dictionary and analysis dictionary are trained jointly, which makes no time consumption in $ℓ_{p} - norm (p ⩽ 1)$ sparsity regularizer. Then, the multi-task hybrid dictionary is output for vehicle classification tasks.

Vehicle classification

Once a multi-task hybrid dictionary is trained using multi-feature signals, we use the analysis sub-dictionary $G_{k}^{m}$ to produce small coefficients for samples. Simultaneously, the samples of class k is reconstructed by synthesis sub-dictionary $D_{k}^{m}$ . Finally, a more accurate classification results can be obtained by minimizing the decision fusion function.

MT-HDL

Single-task HDL

In discriminative DL models, the sparse representation of signal A usually utilizes the synthesis dictionary D. Therefore, the representation coefficients X are obtained by solving a $ℓ_{p} - norm (p ⩽ 1)$ sparse coding problem, in which the training and testing phases will lead to high time consumption. Then, for the achievement of signal representation and discrimination, the discriminative synthesis dictionary D is now extended to a hybrid dictionary $(G, D)$ . In the hybrid dictionary $(G, D)$ , $A = DX$ , $D \in R^{m \times s}$ is used to sparsely represent the signal A over X with the synthesis dictionary D, and $X = GA$ , $G \in R^{s \times m}$ is used to obtain the coding coefficient matrix X with the analysis dictionary G, where s is the size of dictionary D and G. In this way, the representation of A would become very efficient, and the following was the model of HDL

$J (G, D) = \underset{G, D}{\arg min} ‖ A - DGA ‖_{F}^{2} + r (D, G, A, Y)$ (11)

where Y is the class label matrix of signal A. Since the sub-dictionary $G_{l}$ can project the samples from class $j, j \neq l$ , so for a nearly null space, the coefficient matrix GA is nearly block diagonal.

As for the synthesis dictionary D, the sub-dictionary $D_{j}$ is used to make the data matrix $A_{j}$ well reconstructed from its projective code matrix $G_{j} A_{j}$ . Then, the following HDL model can be obtained while minimizing the reconstruction residual of the hybrid dictionary

$\begin{matrix} \begin{matrix} J (G, D) = \underset{G, D}{\arg min} \sum_{j = 1}^{K} {‖ A_{j} - D_{j} G_{j} A_{j} ‖_{F}^{2} + λ ‖ G_{j} {\bar{A}}_{j} ‖_{F}^{2}} \\ s . t . \begin{matrix} ‖ d_{l} ‖_{2}^{2} ⩽ 1 \end{matrix} \end{matrix} \end{matrix}$ (12)

where $d_{l}$ is the $l th$ atom of synthesis dictionary D, and in the whole training set A, ${\bar{A}}_{j}$ denotes the complementary data matrix.

As shown in equation (12), although the HDL model is not sparse representation, group sparsity is enforced on the code matrix $G_{j} A_{j}$ , since $G_{j} A_{j}$ is almost a block diagonal. Clearly, the objective function as equation (13) can be obtained according to the scheme of HDL. For the reason that it is actually difficult to solve the objective function of equation (12) for it is not jointly convex to $(G, D)$ , we then introduce a variable matrix X, in which $X_{j} = G_{j} A_{j}$ .

With fixed hybrid dictionary $(G, D)$ , the objective function of HDL can be simplified to a standard least squares problem shown as equation (13), and then it can be handled by the closed-form solution as

$X^{*} = \arg min_{X} \sum_{j = 1}^{K} {| | A_{j} - D_{j} X_{j} | |_{F}^{2} + τ | | G_{j} A_{j} - X_{j} | |_{F}^{2}}$ (13)

$X_{j}^{*} = (D_{j}^{T} D_{j} + τ I)^{- 1} (τ G_{j} A_{j} + D_{j}^{T} A_{j}) A$ (14)

With fixed synthesis dictionary D and fixed variable matrix X, the objective function of HDL can be simplified to equation (15) and then we can obtain the closed-form solutions as

$G^{*} = \arg min_{G} \sum_{j = 1}^{K} {τ | | G_{j} A_{j} - X_{j} | |_{F}^{2} + λ | | G_{j} {\bar{A}}_{j} | |_{F}^{2}}$ (15)

$G_{j}^{*} = τ X_{j} A_{j}^{T} (τ A_{j} A_{j}^{T} + λ {\bar{A}}_{j} {\bar{A}}_{j}^{T} + γ I)^{- 1}$ (16)

where $γ$ is a scalar constant.

Moreover, with fixed analysis dictionary G and fixed variable matrix X, the objective function can be simplified to equation (17), and then we introduce the ADMM²⁹ algorithm to solve the problem

$D^{*} = \arg min_{D} \sum_{j = 1}^{K} | | A_{j} - D_{j} X_{j} | |_{F}^{2} s . t . | | d_{l} | |_{2}^{2} ⩽ 1$ (17)

In the testing stage, suppose that a test sample y comes from class l, then its coding projective vector of $G_{l}^{*}$ has more possibility to be significant, while its projective coding vectors of $G_{j}^{*}, j \neq l$ are inclined to be small. Therefore, the reconstruction residual ${‖ y - D_{l}^{*} G_{l}^{*} X_{l} ‖}_{F}^{2}$ is usually quite smaller than the residual ${‖ y - D_{j}^{*} G_{j}^{*} X_{j} ‖}_{F}^{2}, j \neq l$ .

MT-HDL

In the previous section, we only consider a single task, where the test sample is captured by a single sensor, and each contains only one vector representing a single observation value. However, the test sample $Y$ may consist of multiple observations of the same physical event obtained by the same sensor: $Y = [Y^{1}, Y^{2}, \dots, Y^{i}, \dots, Y^{M}]$ , in which $Y^{i} \in R^{m \times 1}$ . Also, it is mainly considered that the same moving event is monitored by multiple heterogeneous sensors in the course of the vehicle movement, so classification and recognition tasks of the moving vehicle can be carried out by integrating the monitoring value of multiple heterogeneous sensors. Using the relevant information and supplementary information of multiple heterogeneous sensors, the monitoring value of multiple heterogeneous sensors is jointly sparse to improve the classification accuracy of the moving vehicle.

In view of the above problems, we focus on taking the advantages of HDL algorithm in SRC tasks and exploit the MT-HDL algorithm. In this section, we take a multi-task $K - class$ classification problem into account, and suppose have a training set of l samples, in which each sample has different feature modalities.

In order to avoid the time consuming caused by solving a problem of $ℓ_{p} - norm (p ⩽ 1)$ sparse coding in training and testing phases, we then introduce a hybrid dictionary to achieve the goal of signal representation and discrimination. Specifically, in a hybrid dictionary the synthesis dictionary $D = [D^{1}, D^{2}, \dots, D^{i}, \dots, D^{M}]$ was denoted as $A^{i} = D^{i} X^{i}$ , $D^{i} \in R^{m \times s}$ , to achieve the sparse representation of the signal A over X, and the analysis dictionary $G = [G^{1}, G^{2}, \dots, G^{i}, \dots, G^{M}]$ , denoted as $X^{i} = G^{i} A^{i}$ , $G^{i} \in R^{s \times m}$ , to obtain the coding coefficient matrix X. In this way, the representation of X would become very efficient. Then the MT-HDL model is shown as equation (18)

${G^{*}, D^{*}} = \arg min_{G, D} \sum_{i = 1}^{M} | | A^{i} - D^{i} G^{i} A^{i} | |_{F}^{2} + γ (D^{i}, G^{i}, A^{i}, Y)$ (18)

where $γ (D^{i}, G^{i}, A^{i}, Y)$ is the discrimination promotion function, the synthesis dictionary $D^{i}$ is utilized to reconstruct $A^{i}$ and the analysis dictionary $G^{i}$ is used to analytically code $A^{i}$ .

Because the discrimination power of equation (20) depends on the discriminative fidelity term $γ (D^{i}, G^{i}, A^{i}, Y)$ , we try to train a synthesis dictionary $D^{i} = [D_{1}^{i}, D_{2}^{i}, \dots, D_{j}^{i}, \dots, D_{K}^{i}]$ and an analysis dictionary $G^{i} = [G_{1}^{i}, G_{2}^{i}, \dots, G_{j}^{i}, \dots, G_{K}^{i}]$ to form a hybrid dictionary corresponding to different classes. In addition, if the signal satisfies certain irrelevant conditions, the sample can be represented by its corresponding dictionary. Then, we know that the sub-dictionary $G_{l}^{i}$ can project the samples from class $j, j \neq l$ , to a nearly null space as equation (19), so the coefficient matrix $G^{i} A^{i}$ is nearly block diagonal

$G_{j}^{i} A_{l}^{i} \approx 0, \forall j \neq l$ (19)

For the synthesis dictionary $D^{i}$ , we want to achieve the better reconstruction of the data matrix $A_{l}^{i}$ from its projective code matrix $G_{l}^{i} A_{l}^{i}$ over the sub-dictionary $D_{l}^{i}$ . Then, when the reconstruction error of the hybrid dictionary is minimized, we can easily obtain the MT-HDL model as follows

$min_{G, D} \sum_{i = 1}^{M} \sum_{j = 1}^{K} ‖ A_{j}^{i} - D_{j}^{i} G_{j}^{i} A_{j}^{i} ‖_{F}^{2}$ (20)

$\begin{matrix} {G^{*}, D^{*}} = {\arg min}_{G, D} \sum_{i = 1}^{M} \sum_{j = 1}^{K} ‖ A_{j}^{i} - D_{j}^{i} G_{j}^{i} A_{j}^{i} ‖_{F}^{2} + λ ‖ G_{j}^{i} {\bar{A}}_{j}^{i} ‖_{F}^{2} \\ s . t . | | d_{l}^{i} | |_{2}^{2} ⩽ 1 \end{matrix}$ (21)

where $d_{l}^{i}$ is the $l th$ atom of synthesis dictionary $D^{i}$ , and ${\bar{A}}_{j}^{i}$ is the complementary data matrix of $A_{j}^{i}$ in the whole data set of $i th$ sensor feature $A^{i}$ .

As shown in equation (21), although the MT-HDL model is not a sparse representation model, group sparsity is enforced on the code matrix $G_{j}^{i} A_{j}^{i}$ , since $G_{j}^{i} A_{j}^{i}$ is almost a block diagonal. Clearly, the objective function as equation (21) can be obtained according to the scheme of MT-HDL for vehicle classification. For the reason that it is actually difficult to solve the objective function of equation (21) for it is not jointly convex to $(G, D)$ , we then introduce a variable matrix $X = [X^{1}, X^{2}, \dots, X^{i}, \dots, X^{M}]$ , in which $X_{j}^{i} = G_{j}^{i} A_{j}^{i}$

$\begin{matrix} {G^{*}, D^{*}, X^{*}} = \arg \min_{G, D, X} \sum_{i = 1}^{M} \sum_{j = 1}^{K} | | A_{j}^{i} - D_{j}^{i} X_{j}^{i} {| |}_{F}^{2} + τ | | G_{j}^{i} A_{j}^{i} - X_{j}^{i} {| |}_{F}^{2} \\ + λ | | G_{j}^{i} \bar{\underset{|}{A_{j}^{i}} |_{F}^{2}} s . t . | | d_{l}^{i} {| |}_{2}^{2} ⩽ 1 \end{matrix}$ (22)

where $τ$ is a scalar constant, and the objective function is convex to all terms when the others are fixed.

With fixed hybrid dictionary, the objective function of MT-HDL can be simplified to a standard least squares problem shown as equation (23), and then it can be handled by the closed-form solution as

$X^{*} = \arg min_{X} \sum_{i = 1}^{M} \sum_{j = 1}^{K} | | A_{j}^{i} - D_{j}^{i} X_{j}^{i} | |_{F}^{2} + τ | | G_{j}^{i} A_{j}^{i} - X_{j}^{i} | |_{F}^{2}$ (23)

${(X_{j}^{i})}^{*} = ((D_{j}^{i})^{T} D_{j}^{i} + τ I)^{- 1} (τ G_{j}^{i} A_{j}^{i} + (D_{j}^{i})^{T} A_{j}^{i})$ (24)

With fixed synthesis dictionary D and fixed variable matrix X, the objective function of MT-HDL can be simplified to equation (25) and then we can obtain the closed-form solutions as

$G^{*} = \arg min_{G} \sum_{i = 1}^{M} \sum_{j = 1}^{K} τ | | G_{j}^{i} A_{j}^{i} - X_{j}^{i} | |_{F}^{2} + λ | | G_{j}^{i} {\bar{A}}_{j}^{i} | |_{F}^{2}$ (25)

${(G_{j}^{i})}^{*} = τ X_{j}^{i} {(A_{j}^{i})}^{T} {(τ A_{j}^{i} {(A_{j}^{i})}^{T} + λ {\bar{A}}_{j}^{i} {({\bar{A}}_{j}^{i})}^{T} + ζ I)}^{- 1}$ (26)

where $ζ = 10 e^{- 4}$ is a small number.

Moreover, with fixed analysis dictionary G and fixed variable matrix X, the objective function can be simplified to equation (27), and then we introduce the ADMM²⁹ algorithm to solve the problem

$D^{*} = \arg min_{D} \sum_{i = 1}^{M} \sum_{j = 1}^{K} | | A_{j}^{i} - D_{j}^{i} X_{j}^{i} | |_{F}^{2} s . t . | | d_{l}^{i} | |_{2}^{2} ⩽ 1$ (27)

The optimal hybrid dictionary we gained are then used for vehicle classification tasks, in which the trained analysis sub-dictionary $G_{j}^{i}$ is used to generate small coefficients for samples from classes j and just produce significant coding coefficients for samples of class j, and the trained synthesis sub-dictionary $G_{j}^{i}$ is used for the reconstruction of the samples from class j of the projective coefficients $(G_{j}^{i})^{*} A_{j}^{i}$ .

In the testing stage, suppose that a test vehicle sample $Y = [Y^{1}, Y^{2}, \dots, Y^{i}, \dots, Y^{M}]$ comes from class l, then its projective coding vector of $(G_{l}^{i})^{*}$ has more possibility to be significant, while its projective coding vectors by $(G_{j}^{i})^{*}, j \neq l$ , are inclined to be small. Therefore, the reconstruction residual $\sum_{i = 1}^{M} | | Y^{i} - {(D_{l}^{i})}^{*} {(G_{l}^{i})}^{*} A_{l}^{i} | |_{F}^{2}$ is usually quite smaller than the residual $\sum_{i = 1}^{M} | | Y^{i} - {(D_{j}^{i})}^{*} {(G_{j}^{i})}^{*} A_{j}^{i} | |_{F}^{2}, j \neq l$ . Based on this idea, we can utilize the class-specific reconstruction residual as equation (28) for the identification of the class label of $Y$ , and finally the classifier associated with the MT-HDL model can be obtained as follows

$\begin{matrix} r_{j} (Y) = \sum_{i = 1}^{M} | | Y^{i} - D_{j}^{i} G_{j}^{i} Y^{i} | |_{2} j \in {1, 2, \dots, K} \\ i \in {1, 2, \dots, M} \end{matrix}$ (28)

$identity (Y) = \arg min_{j} r_{j} (Y)$ (29)

The MT-HDL algorithm is given in Table 1.

Table 1.

The details of multi-task hybrid dictionary learning algorithm.

1. Input the training samples

A^{i} = [A_{1}^{i}, A_{2}^{i}, \dots, A_{j}^{i}, \dots, A_{K}^{i}], i \in {1, 2, \dots, M}

and testing samples

Y = [Y^{1}, Y^{2}, \dots, Y^{i}, \dots, Y^{M}]

2. Initialize the hybrid dictionary

(G^{(0)}, D^{(0)})

as random matrices with unit Frobenius.

3. Fix the hybrid dictionary

(G, D)

, then solve the optimization problem of the objective function equation (23) by equation (24).

4. Fix the synthesis dictionary D and the variable matrix X, then solve the optimization problem of the objective functionequation (25) by equation (26).

5. Fix the analysis dictionary G and the variable matrix X, then solve the optimization problem of the objective function in equation (27) using the ADMM²⁹ algorithm.

6. Return to step 3 until convergence or the maximal iteration number are reached.

7. Compute the residuals via

(G, D) r_{j} (Y) = \sum_{i = 1}^{M} | | Y^{i} - D_{j}^{i} G_{j}^{i} Y^{i} | |_{2}

8. Output the identity of

Y

identity (Y) = \arg min_{j} r_{j} (Y)

Experimental results

In this section, extensive experiments on a real multi-sensor data set are performed and the corresponding results with several traditional classification methods are compared to demonstrate the effectiveness of our proposed approach. Here, let us first consider a two-task classification problem with a testing sample collected from acoustic and seismic sensors, respectively.

Experimental setup

Data sets

In this article, all experiments in this article are run on a desktop PC with Intel(R) Core(TM) i5-2467M 1.60 GHz CPU and 4 GB memory, and the sensor data sets was captured by the Defense Advanced Research Program in the DARPA/IXOs SensIT program through a truly distributed wireless distribution sensor network. In the experiment, two types of military vehicles, such as Assault Amphibian Vehicle (AAV) and Dragon Wagon (DW), were observed by multiple heterogeneous sensor nodes distributed around three pre-set running routes as shown in Figure 2, and we obtain three types of features, including the acoustic, seismic, and infrared information, in which AAV repeat the movement for 9 times and DW repeat 12 times. In this article, we select the acoustic and seismic data sets as the major features for vehicle classification task. The sensors field consists of an east-west road, a south-north road, and an intersection area, and this data set is available at http://www.ecs.umass.edu/mduarte/Software.html

Figure 2.

Sensor field layout.

Feature extraction

To consider the acoustic and seismic sensor data recorded by microphones equipped on multiple heterogeneous sensor nodes at a rate of 4960 Hz, the signal will be inevitably disturbed by noise and some other uncertain conditions during the experiment. In order to reduce the accidental error, we achieved the classification of the sensor databases by increasing the number of test data and computing the average of multiple tests.

The acoustic and seismic sensor data collected by the nodes of the 41 to 60 are selected and shown in Figure 3(a)–(d), when the two kinds of military vehicles run from the third to eleventh, called AAV3_41 AAV11_60 and DW3_41 DW11_60, so we obtain 450 sets of sensor data regarded as the data source to evaluate feature extraction and classification tasks. In order to extract useful events from raw time series data, CFAR detection algorithm is utilized to mark times according to high energy values.

Figure 3.

Sample time series and features extracted by MFCC: (a) acoustic time series (AAV3_51), (b) seismic time series (AAV3_51), (c) acoustic time series (DW3_51), (d) seismic time series (DW3_51), (e) MFCC features of acoustic time series (AAV3_51), (f) MFCC features of seismic time series (AAV3_51), (g) MFCC features of acoustic time series (DW3_51), and (h) MFCC features of seismic time series (DW3_51).

A large number of methods have been proposed in frequency domain for feature extraction since acoustic signals in time domain always change rapidly and seem to be unstable. Among them, MFCC acts as a widespread used one due to its robustness to noise, while considering the variation of human ear critical bandwidths with respect to frequency. The major procedures of MFCC include: (1) Fast Fourier Transform, it conducts transformation of the signal from time domain to frequency domain; (2) Mel Filtering, the Mel filter banks consist of triangle filter banks which make full use of the similar properties with human ear. Then the Mel spectral coefficients can be obtained using the Mel filtering; (3) Taking the Logarithm, the purpose of obtaining the logarithm of the Mel spectral coefficients is to compress the dynamic range of the spectrum remove the multiplicative noise simultaneously; (4) Discrete Consine Transform, it transforms the logarithmic Mel spectrum to time domain, which are called the Mel frequency cepstral coefficients and are the features needed. The multi-dimensional frequency spectrum features as shown in Figure 3(e)–(h) are extracted from the event time series for classification using MFCC^27,28 algorithm.

Vehicle classification

After feature extraction by MFCC, the multi-dimensional frequency spectrum features of vehicles are used for the proposed classification method to improve classification accuracy and reduce time complexity for vehicle classification tasks. We selected 75, 90, 105, 120, 135, and 150 sets of sensor data as the training data and 300 sets of sensor data as the testing data, including acoustic and seismic signals, to classify the target vehicles. To speed up the process of MT-HDL model, while ensuring that the classification efficiency is not reduced, the maximal iteration number is set 25, and the size of dictionary is set 30.

At the same time, some other classification methods: SVM, SRC, MT-SRC, FDDL, MT-FDDL, and HDL algorithms are also worked as references to the proposed method, and all of them utilize the acoustic signal to classify the types of the moving vehicles. Among them, the SVM algorithm is derived from Huang et al.,⁸ where the optimization problem is solved by LIBSVM software package. The SRC algorithm is obtained from Mei and Ling,¹¹ in which the sparse level is set 0.7. In addition, the LC-KSVD algorithm is proposed in the paper by Jiang et al.,¹⁵ where the maximal iteration number is set 25 and the sparsity threshold is set 8. The FDDL algorithm is presented in the study by Yang et al.,¹⁷ in which the way to initialize the dictionary is PCA and the maximal iteration number is set 80. Finally, the single HDL algorithm is described in section “MT-HDL,” in which the maximal iteration number is set 25 and the size of dictionary is set 30.

Classification accuracy

To achieve more reliable vehicle classification results, in this article, we get the classification rates by running 50 times the classification procedure. And our extensive experiments are divided into single-task and multi-task classification experiments.

1. Single-task classification analysis

Figure 4 illustrates the vehicle classification rates of different classification algorithms under the acoustic or seismic signals of moving vehicles. It can be seen from the figure that the classification rates under the acoustic signals are significantly higher than those of the seismic signals of moving vehicles. Also, we know that, from the figure, whether it is under the acoustic or seismic signals of moving vehicles, the classification rate of the FDDL algorithm has been greatly improved, compared with the SVM, SRC, and LC-KSVD algorithms, for the reason that the size of the over-complete dictionary in SRC is much larger than that of the fisher discrimination dictionary in FDDL. Moreover, the HDL method under the acoustic signals achieves higher classification rates in moving vehicles classification tasks, which is superior to the FDDL algorithm. However, it is slightly lower than the FDDL algorithm under the seismic signals. Therefore, we can conclude that the HDL algorithm is more suitable for vehicle classification tasks under acoustic signals, and the classification rates of it under acoustic signals are superior to the SVM, SRC, LC-KSVD, and FDDL methods.

Figure 4.

The trends of classification rates across various classification methods under single signals.

Figure 4 shows the general trend of classification accuracy of various classification algorithms in vehicle classification. The following experiment data focuses on the classification of specific parameters of various algorithms in running vehicle classification, as shown in Table 2.

Table 2.

The classification rates across various classification methods under single signals (%).

Classification methods	The number of samples	Detection rates			False alarm rates			Rates
Classification methods	The number of samples	AAV	DW	Noise	AAV	DW	Noise	Rates
SVM (Acoustic)	75	78.1	76.7	96.8	14.6	15.5	0.8	79.9
	90	80.2	79.3	96.4	13.2	13.8	0.9	81.9
	105	80.7	81.5	99.3	12.9	12.3	0.2	83.5
	120	82.9	82.9	97.1	11.4	11.4	0.7	84.8
	135	84.2	83.1	97.0	10.6	11.3	0.8	85.4
	150	83.7	83.5	96.8	10.9	11.0	0.8	85.3
SRC (Acoustic)	75	78.2	79.3	98.1	17.3	16.7	0.5	81.9
	90	77.7	79.4	98.0	17.5	16.4	0.5	81.7
	105	80.3	78.9	97.5	15.7	16.5	0.6	82.6
	120	79.7	78.3	95.5	16.2	17.1	1.1	81.8
	135	80.0	78.7	96.0	15.6	16.9	1.0	82.2
	150	77.5	77.1	96.0	17.4	17.7	1.0	80.4
LC-KSVD (Acoustic)	75	78.5	80.9	98.9	15.6	15.1	1.0	82.2
	90	79.5	79.6	97.6	15.3	15.1	0.9	82.0
	105	80.5	80.3	98.4	15.8	15.1	0.5	82.8
	120	80.3	82.0	98.9	14.5	14.0	0.5	83.5
	135	81.3	81.5	98.3	14.0	13.2	0.4	83.7
	150	83.2	81.2	98.4	13.9	13.3	0.3	84.4
FDDL (Acoustic)	75	79.9	80.9	99.6	13.4	12.7	0.1	82.9
	90	91.9	81.2	100	12.0	12.6	0	84.0
	105	82.7	83.4	100	11.5	11.1	0	85.3
	120	84.0	83.5	99.4	107	11.0	0.2	85.8
	135	84.2	83.3	100	10.6	11.1	0	85.9
	150	83.9	83.8	99.8	10.8	11.8	0.1	85.9
HDL (Acoustic)	75	83.0	82.8	99.4	11.3	11.5	0.2	85.0
	90	83.9	83.4	99.0	10.7	11.1	0.3	85.7
	105	86.0	84.2	98.9	9.4	10.1	0.3	86.9
	120	84.7	84.5	98.6	10.2	10.3	0.3	86.4
	135	86.1	85.8	98.0	9.3	9.5	0.5	87.5
	150	86.1	86.8	97.4	9.3	8.8	0.7	87.9
SVM (Seismic)	75	68.4	69.4	100	21.1	20.4	0	73.0
	90	68.5	68.9	100	21.0	20.7	0	72.8
	105	69.2	69.1	100	20.5	20.6	0	73.2
	120	69.9	70.0	100	20.0	20.0	0	73.9
	135	68.4	70.9	100	21.1	19.4	0	73.6
	150	70.7	67.8	100	19.5	21.5	0	73.3
SRC (Seismic)	75	63.8	60.7	100	24.1	26.2	0	67.2
	90	62.9	61.5	100	24.8	25.6	0	67.2
	105	59.2	61.8	100	27.2	25.4	0	65.8
	120	58.6	60.0	100	27.6	26.7	0	64.7
	135	58.6	59.9	100	27.6	26.7	0	64.7
	150	58.2	58.4	100	27.9	27.7	0	63.8
LC-KSVD (Seismic)	75	64.6	63.4	100	9.7	10.8	0	68.9
	90	65.2	63.7	99.9	9.1	10.1	0	69.2
	105	66.0	65.3	99.7	9.3	9.1	0	70.2
	120	66.2	66.6	99.9	9.6	8.9	0	70.9
	135	65.9	65.6	100	8.5	8.8	0	70.3
	150	65.8	65.9	99.9	8.1	8.8	0	70.4
FDDL (Seismic)	75	68.7	69.0	100	20.9	20.7	0	72.9
	90	69.2	70.2	100	20.5	19.8	0	73.7
	105	70.2	69.6	100	19.9	20.3	0	73.9
	120	70.8	71.2	100	19.5	19.2	0	74.8
	135	70.0	70.7	100	20.0	19.5	0	74.3
	150	90.2	71.2	100	19.9	19.2	0	74.6
HDL (Seismic)	75	66.4	68.2	100	22.4	21.3	0	71.6
	90	65.3	65.9	100	23.1	22.7	0	70.2
	105	65.3	67.4	100	23.2	21.7	0	70.8
	120	64.4	65.5	100	23.7	22.9	0	69.6
	135	66.1	65.3	100	22.5	23.2	0	70.2
	150	65.2	64.2	100	23.3	23.9	0	69.3

From the detection rates of noise in Table 2, we know that the FDDL algorithm can well recognize the background noise of the environment in acoustic and seismic sensor networks. It is also shown that the classification rates of the HDL based on acoustic signals (87.9%) are much too higher than the HDL algorithm based on seismic signals (71.6%), and the classification rates of the former are gradually increased with the increase in number of training samples, while the latter remains essentially stable or shows a slight downward trend. All in all, the HDL method shows prominently high performance.

2. Multi-task classification analysis

As shown in Figure 5 and Table 3, we can clearly see that the MT-SRC algorithm, which combines the feature of both acoustic and seismic signals, shows much higher classification rates (88.0%) than single acoustic or seismic signals for vehicle classification. In addition, the MT-SRC algorithm also shows an absolute advantage over the FDDL and HDL algorithms. Therefore, it is significant to study the target classification and recognition based on multi-sensors.

Figure 5.

The trends of classification rates across various classification methods.

Table 3.

The classification rates across various classification methods (%).

Classification methods	The number of samples	Detection rates			False alarm rates			Rates
Classification methods	The number of samples	AAV	DW	Noise	AAV	DW	Noise	Rates
SVM (Acoustic)	75	78.1	76.7	96.8	14.6	15.5	0.8	79.9
	90	80.2	79.3	96.4	13.2	13.8	0.9	81.9
	105	80.7	81.5	99.3	12.9	12.3	0.2	83.5
	120	82.9	82.9	97.1	11.4	11.4	0.7	84.8
	135	84.2	83.1	97.0	10.6	11.3	0.8	85.4
	150	83.7	83.5	96.8	10.9	11.0	0.8	85.3
SRC (Acoustic)	75	78.2	79.3	98.1	17.3	16.7	0.5	81.9
	90	77.7	79.4	98.0	17.5	16.4	0.5	81.7
	105	80.3	78.9	97.5	15.7	16.5	0.6	82.6
	120	79.7	78.3	95.5	16.2	17.1	1.1	81.8
	135	80.0	78.7	96.0	15.6	16.9	1.0	82.2
	150	77.5	77.1	96.0	17.4	17.7	1.0	80.4
LC-KSVD (Acoustic)	75	78.5	80.9	98.9	15.6	15.1	1.0	82.2
	90	79.5	79.6	97.6	15.3	15.1	0.9	82.0
	105	80.5	80.3	98.4	15.8	15.1	0.5	82.8
	120	80.3	82.0	98.9	14.5	14.0	0.5	83.5
	135	81.3	81.5	98.3	14.0	13.2	0.4	83.7
	150	83.2	81.2	98.4	13.9	13.3	0.3	84.4
FDDL (Acoustic)	75	79.9	80.9	99.6	13.4	12.7	0.1	82.9
	90	91.9	81.2	100	12.0	12.6	0	84.0
	105	82.7	83.4	100	11.5	11.1	0	85.3
	120	84.0	83.5	99.4	107	11.0	0.2	85.8
	135	84.2	83.3	100	10.6	11.1	0	85.9
	150	83.9	83.8	99.8	10.8	11.8	0.1	85.9
HDL (Acoustic)	75	83.0	82.8	99.4	11.3	11.5	0.2	85.0
	90	83.9	83.4	99.0	10.7	11.1	0.3	85.7
	105	86.0	84.2	98.9	9.4	10.1	0.3	86.9
	120	84.7	84.5	98.6	10.2	10.3	0.3	86.4
	135	86.1	85.8	98.0	9.3	9.5	0.5	87.5
	150	86.1	86.8	97.4	9.3	8.8	0.7	87.9
MT-SRC	75	84.5	85.2	100	10.3	9.9	0	86.9
	90	85.0	84.7	100	9.9	10.2	0	86.8
	105	85.8	85.5	100	9.5	9.7	0	87.5
	120	85.9	85.9	100	9.4	9.4	0	87.7
	135	85.9	85.9	100	9.4	9.36	0	87.7
	150	85.7	86.9	100	9.6	8.8	0	88.0
MT-FDDL	75	84.4	85.3	100	10.4	9.8	0	86.8
	90	85.7	86.8	100	9.5	8.8	0	88.1
	105	87.2	87.2	100	8.5	8.5	0	88.9
	120	87.9	87.0	100	8.1	8.6	0	89.1
	135	89.3	88.5	100	7.1	7.6	0	90.4
	150	88.8	89.0	100	7.5	7.3	0	90.4
MT-HDL	75	82.8	83.2	100	11.5	11.2	0	85.2
	90	85.3	86.1	100	9.3	9.3	0	87.6
	105	86.0	86.0	100	9.3	9.3	0	87.9
	120	86.0	85.9	100	9.3	9.4	0	87.8
	135	86.3	86.1	100	9.1	9.3	0	88.0
	150	87.5	87.0	100	8.4	8.6	0	88.9

AAV: Assault Amphibian Vehicle; DW: Dragon Wagon; SVM: support vector machine; SRC: sparse representation classification; LC-KSVD: label consistent KSVD; FDDL: Fisher discrimination dictionary learning; HDL: hybrid dictionary learning; MT-SRC: multi-task sparse representation classification; MT-FDDL: multi-task Fisher discrimination dictionary learning; MT-HDL: multi-task hybrid dictionary learning.

It also can be seen that, compared with the SRC and MT-SRC algorithms, the MT-SRC method makes full use of the advantages of the noise recognition rates under the seismic signal and greatly improves the classification rates of both AAV and DW while ensuring the noise recognition rates to be stable at 100%. The classification rates of the MT-FDDL algorithm based on the combination of acoustic and seismic signals (about 90%) are much better than those of the FDDL method under the acoustic signals (about 85%). It can be learned that the multi-task feature fusion method utilizes the advantages of each signal feature to make the classification rates much greater than that of any kind of single sensor feature. At the same time, it can be obtained from the figure: compared with MT-SRC algorithm (about 87%), the MT-FDDL algorithm has a better vehicle classification effect. From Tables 2 and 3, we can conclude that the MT-FDDL method is able to preserve the noise recognition advantages of seismic signals in the moving vehicles classification tasks (about 100%). In addition, with the increasing number of training samples, the classification rates of the MT-FDDL method is also rising, and basically stable at 90%. In summary, MT-FDDL algorithm has made great progress in the classification of moving vehicles, which greatly improves the classification accuracy of single sensor classification algorithm.

The classification rates of the MT-HDL algorithm (88.9%), which combines acoustic and seismic signals, are obviously higher than those of the HDL algorithm, and it has the trend that the classification accuracy is improved with the increase in the size of training samples. Besides, the classification rates of the MT-HDL algorithm are obviously higher than those of the MT-SRC algorithm (88.0%), but lower than those of the MT-FDDL algorithm (90.4%). In other words, it shows us that the DL plays a decisive role in the SRC.

From the detection rates and the false alarm rates in Table 3, we know that the MT-HDL algorithm shows superior performance in vehicle classification task compared to other classification algorithms. Furthermore, with the increasing number of training samples, the performances are improved and algorithm achieved higher classification rates. In addition, we know that the MT-HDL algorithm is similar to other multi-feature fusion methods, which inherit the advantages of acoustic signals in the recognition rates of noise (noise recognition rates reaches 100%), and the classification rates of both AAV and DW are basically stable at around 86.0%. Therefore, we can conclude that the MT-HDL algorithm can achieve a high and stable classification performance in vehicle classification and suitable for the target classification and recognition tasks in complex cases.

Time accuracy

It is true that the time complexity is also an important evaluation basis of the classification model; thus, to further demonstrate the efficiency of the MT-HDL method, we analyze time complexity of this algorithm. In the training phase, the time complexities of $X_{j}^{i}$ , $G_{j}^{i}$ , and $D_{j}^{i}$ are calculated, respectively, as $O (smh + s^{3} + s^{2} h) D_{j}^{i}$ , $O (shm + m^{3} + s m^{2})$ , and $O (W (msh + s^{3} + s^{2} m + m^{2} s))$ , where s denotes the size of dictionary D and G, and W is the iteration number for updating D. In our proposed scheme, the number of training samples are set 75, 90, 105, 120, 135, and 150, which are much smaller than the dimension of the multi-dimensional frequency spectrum features $(m = 636)$ . Therefore, in the training phase, the major computational time consumption is concentrated in updating $G_{j}^{i}$ , which involves an inverse of a $m \times m$ matrix $τ A_{j}^{i} (A_{j}^{i})^{T} + λ {\bar{A}}_{j}^{i} ({\bar{A}}_{j}^{i})^{T} + ζ I$ . Fortunately, this matrix will not change in the iteration, and the inverse of it can be pre-computed.

In the testing phase, thanks to the small complexity of class-specific reconstruction error $\sum_{i = 1}^{M} | | Y^{i} - D_{j}^{i} G_{j}^{i} A_{j}^{i} | |_{F}^{2}$ computation, our classification schemes are much faster, and the running efficiency of various classification methods is given in Table 4 to further confirm the advantage of the proposed method over time consumption.

Table 4.

The running efficiency of various classification methods (s).

Classification methods	Size of training samples
Classification methods	75	90	105	120	135	150
SVM (Acoustic)	0.13	0.15	0.17	0.19	0.21	0.23
SRC (Acoustic)	22.97	35.17	50.93	70.75	94.78	124.20
LC-KSVD (Acoustic)	3.62	4.42	5.42	6.11	7.23	7.94
FDDL (Acoustic)	17.83	23.10	29.68	36.11	43.45	52.14
HDL (Acoustic)	0.19	0.19	0.31	0.34	0.36	0.36
SVM (Seismic)	0.03	0.03	0.04	0.04	0.05	0.06
SRC (Seismic)	25.45	39.18	56.94	79.42	107.50	140.36
LC-KSVD (Seismic)	4.43	4.98	5.34	6.44	7.05	8.56
FDDL (Seismic)	20.22	26.13	33.31	38.87	49.29	58.16
HDL (Seismic)	0.19	0.19	0.35	0.35	0.34	0.32
MT-SRC	31.21	47.70	73.37	105.39	144.40	190.67
MT-FDDL	17.84	23.47	40.24	50.68	59.94	70.87
MT-HDL	0.68	0.78	0.53	0.99	0.93	0.91

SVM: support vector machine; SRC: sparse representation classification; LC-KSVD: label consistent KSVD; FDDL: Fisher discrimination dictionary learning; HDL: hybrid dictionary learning; MT-SRC: multi-task sparse representation classification; MT-FDDL: multi-task Fisher discrimination dictionary learning; MT-HDL: multi-task hybrid dictionary learning.

As we all know, the SRC algorithm needs to solve the problem of $ℓ_{p} - norm (p ⩽ 1)$ sparse coding, so the time consumption in training and testing phases is huge, compared with the SVM and LC-KSVD algorithms. Although the FDDL algorithm also needs to do the sparse coding, the training dictionary dimension of FDDL algorithm is much smaller than the over-complete dictionary of the SRC algorithm. Thus, the FDDL algorithm achieves much less time consumption compared to SRC algorithm, but still higher than the SVM and LC-KSVD algorithms. From Table 4, we can clearly demonstrate the above conclusion. Based on the experimental results above, we conclude that the FDDL classification algorithm can greatly improve the classification rates of the vehicle classification and reduce the time complexity of the algorithm by learning a discriminative dictionary.

Compared with the FDDL method and the MT-FDDL method, although the sample data of the MT-FDDL method is the sum of the FDDL methods under single acoustic and seismic signals, its time consumption is far less than the sum of the two in the process of moving vehicle classification. Therefore, we can conclude that the MT-FDDL method based on multi-feature fusion have reduced the time complexity in some way. However, it can also be seen from Table 4 that the MT-FDDL also has its shortcomings, that is, with the increasing number of the training sample data, the time consumption of MT-FDDL method is also increasing, and much larger than the SVM algorithm.

In addition, the MT-HDL method shows an advantage of efficiency and achieves better results in the time consumption of the training and testing phases. It firmly avoids the problem of time consumption caused by solving $ℓ_{p} - norm (p ⩽ 1)$ sparse coding problems, so the time consumption of the MT-HDL method is much smaller than the MT-SRC and MT-FDDL methods, meanwhile, almost keeps the same level with the SVM algorithm. Therefore, MT-HDL method greatly reduces the time complexity.

Conclusion

In this work, we propose a new method, called MT-HDL method for moving vehicle classification in complex scenes, to achieve improved performance and significantly higher efficiency, where the data are collected from acoustic and seismic sensor nodes. Among them, the multi-feature fusion method is used to fuse the features and reduce the time complexity. Our experimental results demonstrate that our method yields highly accurate classification performance and outperforms many classical methods such as the SVM, SRC, LC-KSVD, and FDDL as well as some slightly prominent algorithms, including the MT-SRC and MT-FDDL. Furthermore, by applying our model in dealing with moving vehicle classification tasks in complex scenes, we experimentally illustrate that our proposed method not only takes advantage of each feature signals but also improves the classification accuracy of the moving vehicles. Especially, the MT-HDL fairly shows good classification performance, which not only ensures that the sparse coding matrix can be obtained by simple linear mapping and jointing an analysis dictionary with a synthesis dictionary but also reduces the time complexity caused by solving the problem of $ℓ_{p} - norm (p ⩽ 1)$ sparse coding simultaneously.

Footnotes

Handling Editor: Jaime Lloret

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was supported by National Natural Science Foundation of China (NSFC) under Grant Nos 61771299,61301027,61771322,and 61375015.

References

Zhou

Multi-sensor fusion for robust target tracking in the simultaneous presence of set-membership and stochastic Gaussian uncertainties. IET Radar Sonar Nav 2017; 11(4): 621–628.

Nevat

Peters

Collings

IB.

Distributed detection in sensor networks over fading channels with multiple antennas at the fusion centre. IEEE T Signal Proces 2014; 62(3): 671–683.

Belmonte-Hernández

Hernández-Peñaloza

Álvarez

et al . Adaptive fingerprinting in multi-sensor fusion for accurate indoor tracking. IEEE Sens J 2017; 17: 4983–4998.

Tian

Dong

Jia

et al . Multi-sensor signature fusion algorithm for vehicle type classification. J S China Univ Tech 2014; 42(3): 52–58.

Klausner

Tengg

Rinner

. Vehicle classification on multi-sensor smart cameras using feature- and decision-fusion. In: ACM/IEEE international conference on distributed smart cameras, Vienna, 25–28 September 2007, pp.67–74. New York: IEEE.

Eom

KB.

Analysis of acoustic signatures from moving vehicles using time-varying autoregressive models. Multidim Syst Sign P 1999; 10(4): 357–378.

Duarte

YH.

Vehicle classification in distributed sensor networks. J Parallel Distr Com 2004; 64(7): 826–838.

Huang

Lei

Suykens

JAK

. Solution path for pin-SVM classifiers with positive and negative τ values. IEEE T Neur Net Lear 2016; 28(7): 1584–1593.

Górriz

Ramírez

Suckling

et al . Case-based statistical learning: a non-parametric implementation with a conditional-error rate SVM. IEEE Access 2017; 5: 11468–11478.

10.

Wright

Yang

Ganesh

Robust face recognition via sparse representation. IEEE T Pattern Anal 2009; 31(2): 210–227.

11.

Mei

Ling

Robust visual tracking and vehicle classification via sparse representation. IEEE T Pattern Anal 2011; 33(11): 2259–2272.

12.

Cui

Prasad

Class-dependent sparse representation classifier for robust hyperspectral image classification. IEEE T Geosci Remote 2015; 53(5): 2683–2695.

13.

Gao

Tsang

IHT

Chia

LT.

Kernel sparse representation for image classification and face recognition. In: Daniilidis

Maragos

Paragios

(eds) Computer Vision-ECCV 2010, Berlin: Springer, 2010, pp.1–14.

14.

Zhang

Sun

Xia

et al . Multiple kernel sparse representation-based orthogonal discriminative projection and its cost-sensitive extension. IEEE T Image Process 2016; 25(9): 4271–4285.

15.

Jiang

Lin

Davis

LS.

Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE T Pattern Anal 2013; 35(11): 2651–2664.

16.

Jiang

Lin

Davis

. Learning a discriminative dictionary for sparse coding via label consistent K-SVD. In: IEEE conference on computer vision and pattern recognition (CVPR), Colorado Springs, CO, 20–25 June 2011, pp.1697–1704. New York: IEEE.

17.

Yang

Zhang

Feng

et al . Fisher discrimination dictionary learning for sparse representation. In: IEEE international conference on computer vision (ICCV), Barcelona, 6–13 November 2011, pp.543–550. New York: IEEE.

18.

Wang

Guo

et al . Fisher discriminative dictionary learning for vehicle classification in acoustic sensor networks. J Signal Process Syst 2017; 86(1): 99–107.

19.

Guo

Wang

Liu

et al . Vehicle classification in acoustic sensor networks based on hybrid dictionary learning. In: IEEE 14th international conference on dependable, autonomic and secure computing, 14th international conference on pervasive intelligence and computing, 2nd international conference on big data intelligence and computing and cyber science and technology congress (DASC/PiCom/DataCom/CyberSciTech), Auckland, New Zealand, 8–12 August 2016, pp.861–865. New York: IEEE.

20.

Nguyen

Nasrabadi

Tran

TD.

Robust multi-sensor classification via joint sparse representation. In: Proceedings of the 14th international conference on information fusion (FUSION), Chicago, IL, 5–8 July 2011, pp.1–8. New York: IEEE.

21.

Zhang

Nasrabadi

Huang

et al . Transient acoustic signal classification using joint sparse representation. In: International conference on acoustics, speech and signal processing (ICASSP), Prague, 22–27 May 2011, pp.2220–2223. New York: IEEE.

22.

Chen

Nasrabadi

Tran

TD.

Sparse representation for target detection in hyperspectral imagery. IEEE J Sel Top Signa 2011; 5(3): 629–640.

23.

Yuan

Yan

Visual classification with multi-task joint sparse representation. In: IEEE computer society conference on computer vision and pattern recognition (CVPR), San Francisco, CA, 13–18 June 2010, pp.3493–3500. New York: IEEE.

24.

Mairal

Bach

Ponce

et al . Discriminative learned dictionaries for local image analysis. In: Conference on computer vision and pattern recognition (CVPR), Anchorage, AK, 23–28 June 2008, pp.1–8. New York: IEEE.

25.

Mairal

Bach

Ponce

Task-driven dictionary learning. IEEE T Pattern Anal 2012; 34(4): 791–804.

26.

Taalimi

Chapter 4 - Supervised dictionary learning. Learning-based local visual representation and indexing. Elsevier Inc. 2015.

27.

Muda

Begam

Elamvazuthi

Voice recognition algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques. J Comput 2010; 2(3): 138–143.

28.

Shumaila

Tahira

Malik

Voice recognition system using HMM with MFCC for secure ATM. IJCSI Int J Comput Sci Issues 2011; 8(3): 297–302.

29.

Robinson

Tappenden

A flexible ADMM algorithm for big data applications. J Sci Comput 2017; 71: 435–467.