Abstract
Keywords
Introduction
Multi-sensor fusion has attracted a wide range of attentions over the past few years for both civilian and military applications.1–3 Among them, classification acted as a particular interest in multi-sensor fusion, especially for the moving vehicle classification,4,5 in which the essential problem is how to make use of relevant information from different tasks while recording the same physical events to achieve an improvement in the classification performance. KB Eom 6 proposed to deduce the features of vehicles from the sounds generated by the vehicle, so the classification of the moving vehicles tends to be practicable under the acoustic signals in complex scenes. Later, Duarte and Hu 7 detailed the procedure of data collection, the feature extraction and pre-processing steps, and accomplished the task of classifying the types of moving vehicles in distributed networks with the maximum likelihood classifier based on the multi-dimensional frequency spectrum features of the sensor signals.
Many classification approaches have been put forward to improve the applicability for different situations and make the classification performance enhanced, such as support vector machines (SVM),8,9 sparse representation classification (SRC),10–12 Kernel sparse representation classification (KSRC),13,14 label consistent KSVD (LC-KSVD),15,16 Fisher discrimination dictionary learning (FDDL),17,18 and hybrid dictionary learning (HDL). 19 Till now, all the classification methods mentioned above have achieved state-of-the-art performance in terms of the specific situations.
For signal sparse representation, the dictionary plays an important role. When sparse representation model with an analytical dictionary is used to represent a signal, the representation coefficients problem can be reduced to solve a simple inner product operation. Gao et al.
13
utilized kernel sparse representation based classification (KSRC) to achieve the data mapping to a high-dimensional space for classification, which avoids the limitation that the model must be linear, and it showed state-of-the-art performance. However, it is less effective especially when it comes to model the complex local structures of natural scene. In recent years, the synthesis dictionary has been positively applied in sparse representation and has been widely studied. Based on synthesis dictionary, a signal is decomposed and its representation coefficients are usually obtained via a
However, the information among different sensors has not been considered. Therefore, Nguyen et al. 20 proposed a novel multi-task multivariate (MTMV) sparse representation method for multi-sensor classification, which took advantage of different sensors having related information while recording the same physical event and achieved excellent classification performance. However, the representation coefficients are still obtained by solving a sparse coding problem, so the time complexity in the training and testing phases remains inevitably huge. Since then, various sophisticated techniques have been developed and applied to the field of pattern classification such as acoustic signal classification, 21 hyperspectral target detection, 22 and visual classification.23,24 For example, Zhang et al. 21 put forward a joint sparse model for the classification of acoustic signal, which utilized the truth that several columns of the training dictionary had the ability to simultaneously represent the multiple observations from the same class. Therefore, coefficient vectors associated with these observations can provide the same sparse pattern. Similarly, in terms of visual classification, Yuan and Yan 23 studied a multi-task model and also presented the resumption that tasks belonging to the same class share the same sparse support distributions on their coefficient vectors. In signal processing, the inherent information often exists in low-dimensional subspaces, and the semantic information is usually encoded in the sparse representation. Especially, with the emergence of those appealing models above, sparse representation and related optimization problems have gradually attracted more and more attention of researchers.
In this article, inspired by the advantage of HDL 19 and multi-task dictionary learning, we propose a novel method, namely, multi-task hybrid dictionary learning (MT-HDL), by thoroughly considering the correlations and complementary information among multiple heterogeneous sensors. This technique imposes joint-sparsity constraints both within each task and across multiple tasks, which effectively incorporates both HDL and multi-task to reach a quite great performance.
The contribution of our work is threefold. First, we consider the correlations as well as complementary information among different sensors simultaneously to solve the multi-sensor classification problem. The experimental results show its great superiority when considering the importance of collaborative heterogeneous sensors. Second, we utilize the multi-feature signals to learn a hybrid dictionary, in which discriminative codes can be generated by the trained analysis dictionary
The remainder of this article is organized as follows. Section “Related work” briefly introduces SRC and dictionary learning (DL) methods. We present in section “The framework of MT-HDL in sensor networks for vehicle classification,” a framework of the MT-HDL in sensor networks for vehicle classification. Section “MT-HDL” describes the single-task HDL and the MT-HDL algorithms in detail. Extensive experiments are shown in section “Experimental results” and conclusions are drawn in section “Conclusion.”
Related work
SRC
Suppose the sparse representation of a test sample
Since it is a NP-hard combinational optimization, the
The noise case is shown as
where
Using the coefficients of the
Finally, we identify which category
Multi-task sparse representation classification
We take a multi-task (multi-sensor) K classification problem into account. For each sensor
For the testing sample
where
Define
where
Once
where
DL
KSVD is a generalization of the k-means clustering method via a singular value decomposition approach and utilized as a powerful DL algorithm for sparse representations. It works by iteratively alternating between sparse coding the input data based on the current dictionary and updating the atoms in the dictionary to better fit the data.
The unsupervised DL algorithm KSVD has achieved promising results in signal restoration, but it is not adequate for classification tasks because the learnt dictionary only represents the trained samples. The success of DL in signal restoration sparks its applications in classification tasks. Since the goal of classification is to assign the correct class-label to the tested sample, it will majorly concern the discrimination ability of the dictionary. There exist two categories of discriminative DL methods for pattern classification.
As for the first category, a shared dictionary25,26 for all classes is learned by making the representation coefficients discriminative. However, the shared dictionary considerably loses the correspondence between dictionary atoms as well as the class labels, which leads to the impossibility of performing classification based on the class-specific representation residual. Another category of DL method is to learn a structured dictionary to promote discrimination between classes. For the algorithm in Wang et al., 18 the coding coefficients can be achieved more discriminatively with the use of Fisher discrimination criterion. However, Fisher discrimination criterion shows the limitations of data distribution hypothesis and also fails to take the local manifold structure of the coding coefficients into consideration.
The framework of MT-HDL in sensor networks for vehicle classification
Aiming at the difficult issues on the long-term vehicle classification using sensor networks in complex scenes, we establish a vehicle classification framework based on MT-HDL as shown in Figure 1, which has the following main components. To describe this model, let us first consider a two-task classification with a testing sample

The framework of multi-task hybrid dictionary learning model for vehicle classification.
Pre-processing
The raw acoustic and seismic signals of vehicles are gathered from the multiple heterogeneous sensor nodes in complex scenes using sensor networks. However, the signal will inevitably be interfered by noise and other uncertain conditions, so the pre-processing is essential to pick up the useful events. In the procedure of pre-processing, considering the useful event series span a short period of time when the vehicles is close to the sensor nodes, constant false alarm rate (CFAR) algorithm is used to detect whether the vehicles is present and finally the useful event series are converted to frames.
Feature extraction
Acoustic and seismic signals often change quickly over time and seem to be unstable, thus lots of approaches are developed in the frequency domain for feature extraction as they can be considered quasi-stationary and analyzed using the Fourier transform. Among them, Mel Frequency Cepstral Coefficient (MFCC)27,28 is more extensively used because of its robustness. In this article, MFCC is utilized to extract multi-dimensional frequency spectrum features of target vehicles.
Multi-task hybrid dictionary training
By exploiting both the correlation and complementary information of different heterogeneous sensors, we construct a multi-task hybrid dictionary based on multi-feature signals, in which the synthesis dictionary and analysis dictionary are trained jointly, which makes no time consumption in
Vehicle classification
Once a multi-task hybrid dictionary is trained using multi-feature signals, we use the analysis sub-dictionary
MT-HDL
Single-task HDL
In discriminative DL models, the sparse representation of signal
where
As for the synthesis dictionary
where
As shown in equation (12), although the HDL model is not sparse representation, group sparsity is enforced on the code matrix
With fixed hybrid dictionary
With fixed synthesis dictionary
where
Moreover, with fixed analysis dictionary
In the testing stage, suppose that a test sample
MT-HDL
In the previous section, we only consider a single task, where the test sample is captured by a single sensor, and each contains only one vector representing a single observation value. However, the test sample
In view of the above problems, we focus on taking the advantages of HDL algorithm in SRC tasks and exploit the MT-HDL algorithm. In this section, we take a multi-task
In order to avoid the time consuming caused by solving a problem of
where
Because the discrimination power of equation (20) depends on the discriminative fidelity term
For the synthesis dictionary
where
As shown in equation (21), although the MT-HDL model is not a sparse representation model, group sparsity is enforced on the code matrix
where
With fixed hybrid dictionary, the objective function of MT-HDL can be simplified to a standard least squares problem shown as equation (23), and then it can be handled by the closed-form solution as
With fixed synthesis dictionary
where
Moreover, with fixed analysis dictionary
The optimal hybrid dictionary we gained are then used for vehicle classification tasks, in which the trained analysis sub-dictionary
In the testing stage, suppose that a test vehicle sample
The MT-HDL algorithm is given in Table 1.
The details of multi-task hybrid dictionary learning algorithm.
Experimental results
In this section, extensive experiments on a real multi-sensor data set are performed and the corresponding results with several traditional classification methods are compared to demonstrate the effectiveness of our proposed approach. Here, let us first consider a two-task classification problem with a testing sample collected from acoustic and seismic sensors, respectively.
Experimental setup
Data sets
In this article, all experiments in this article are run on a desktop PC with Intel(R) Core(TM) i5-2467M 1.60 GHz CPU and 4 GB memory, and the sensor data sets was captured by the Defense Advanced Research Program in the DARPA/IXOs SensIT program through a truly distributed wireless distribution sensor network. In the experiment, two types of military vehicles, such as Assault Amphibian Vehicle (AAV) and Dragon Wagon (DW), were observed by multiple heterogeneous sensor nodes distributed around three pre-set running routes as shown in Figure 2, and we obtain three types of features, including the acoustic, seismic, and infrared information, in which AAV repeat the movement for 9 times and DW repeat 12 times. In this article, we select the acoustic and seismic data sets as the major features for vehicle classification task. The sensors field consists of an east-west road, a south-north road, and an intersection area, and this data set is available at http://www.ecs.umass.edu/mduarte/Software.html

Sensor field layout.
Feature extraction
To consider the acoustic and seismic sensor data recorded by microphones equipped on multiple heterogeneous sensor nodes at a rate of 4960 Hz, the signal will be inevitably disturbed by noise and some other uncertain conditions during the experiment. In order to reduce the accidental error, we achieved the classification of the sensor databases by increasing the number of test data and computing the average of multiple tests.
The acoustic and seismic sensor data collected by the nodes of the 41 to 60 are selected and shown in Figure 3(a)–(d), when the two kinds of military vehicles run from the third to eleventh, called AAV3_41 AAV11_60 and DW3_41 DW11_60, so we obtain 450 sets of sensor data regarded as the data source to evaluate feature extraction and classification tasks. In order to extract useful events from raw time series data, CFAR detection algorithm is utilized to mark times according to high energy values.

Sample time series and features extracted by MFCC: (a) acoustic time series (AAV3_51), (b) seismic time series (AAV3_51), (c) acoustic time series (DW3_51), (d) seismic time series (DW3_51), (e) MFCC features of acoustic time series (AAV3_51), (f) MFCC features of seismic time series (AAV3_51), (g) MFCC features of acoustic time series (DW3_51), and (h) MFCC features of seismic time series (DW3_51).
A large number of methods have been proposed in frequency domain for feature extraction since acoustic signals in time domain always change rapidly and seem to be unstable. Among them, MFCC acts as a widespread used one due to its robustness to noise, while considering the variation of human ear critical bandwidths with respect to frequency. The major procedures of MFCC include: (1) Fast Fourier Transform, it conducts transformation of the signal from time domain to frequency domain; (2) Mel Filtering, the Mel filter banks consist of triangle filter banks which make full use of the similar properties with human ear. Then the Mel spectral coefficients can be obtained using the Mel filtering; (3) Taking the Logarithm, the purpose of obtaining the logarithm of the Mel spectral coefficients is to compress the dynamic range of the spectrum remove the multiplicative noise simultaneously; (4) Discrete Consine Transform, it transforms the logarithmic Mel spectrum to time domain, which are called the Mel frequency cepstral coefficients and are the features needed. The multi-dimensional frequency spectrum features as shown in Figure 3(e)–(h) are extracted from the event time series for classification using MFCC27,28 algorithm.
Vehicle classification
After feature extraction by MFCC, the multi-dimensional frequency spectrum features of vehicles are used for the proposed classification method to improve classification accuracy and reduce time complexity for vehicle classification tasks. We selected 75, 90, 105, 120, 135, and 150 sets of sensor data as the training data and 300 sets of sensor data as the testing data, including acoustic and seismic signals, to classify the target vehicles. To speed up the process of MT-HDL model, while ensuring that the classification efficiency is not reduced, the maximal iteration number is set 25, and the size of dictionary is set 30.
At the same time, some other classification methods: SVM, SRC, MT-SRC, FDDL, MT-FDDL, and HDL algorithms are also worked as references to the proposed method, and all of them utilize the acoustic signal to classify the types of the moving vehicles. Among them, the SVM algorithm is derived from Huang et al., 8 where the optimization problem is solved by LIBSVM software package. The SRC algorithm is obtained from Mei and Ling, 11 in which the sparse level is set 0.7. In addition, the LC-KSVD algorithm is proposed in the paper by Jiang et al., 15 where the maximal iteration number is set 25 and the sparsity threshold is set 8. The FDDL algorithm is presented in the study by Yang et al., 17 in which the way to initialize the dictionary is PCA and the maximal iteration number is set 80. Finally, the single HDL algorithm is described in section “MT-HDL,” in which the maximal iteration number is set 25 and the size of dictionary is set 30.
Classification accuracy
To achieve more reliable vehicle classification results, in this article, we get the classification rates by running 50 times the classification procedure. And our extensive experiments are divided into single-task and multi-task classification experiments.
1. Single-task classification analysis
Figure 4 illustrates the vehicle classification rates of different classification algorithms under the acoustic or seismic signals of moving vehicles. It can be seen from the figure that the classification rates under the acoustic signals are significantly higher than those of the seismic signals of moving vehicles. Also, we know that, from the figure, whether it is under the acoustic or seismic signals of moving vehicles, the classification rate of the FDDL algorithm has been greatly improved, compared with the SVM, SRC, and LC-KSVD algorithms, for the reason that the size of the over-complete dictionary in SRC is much larger than that of the fisher discrimination dictionary in FDDL. Moreover, the HDL method under the acoustic signals achieves higher classification rates in moving vehicles classification tasks, which is superior to the FDDL algorithm. However, it is slightly lower than the FDDL algorithm under the seismic signals. Therefore, we can conclude that the HDL algorithm is more suitable for vehicle classification tasks under acoustic signals, and the classification rates of it under acoustic signals are superior to the SVM, SRC, LC-KSVD, and FDDL methods.

The trends of classification rates across various classification methods under single signals.
Figure 4 shows the general trend of classification accuracy of various classification algorithms in vehicle classification. The following experiment data focuses on the classification of specific parameters of various algorithms in running vehicle classification, as shown in Table 2.
The classification rates across various classification methods under single signals (%).
AAV: Assault Amphibian Vehicle; DW: Dragon Wagon; SVM: support vector machine; SRC: sparse representation classification; LC-KSVD: label consistent KSVD; FDDL: Fisher discrimination dictionary learning; HDL: hybrid dictionary learning.
From the detection rates of noise in Table 2, we know that the FDDL algorithm can well recognize the background noise of the environment in acoustic and seismic sensor networks. It is also shown that the classification rates of the HDL based on acoustic signals (87.9%) are much too higher than the HDL algorithm based on seismic signals (71.6%), and the classification rates of the former are gradually increased with the increase in number of training samples, while the latter remains essentially stable or shows a slight downward trend. All in all, the HDL method shows prominently high performance.
2. Multi-task classification analysis
As shown in Figure 5 and Table 3, we can clearly see that the MT-SRC algorithm, which combines the feature of both acoustic and seismic signals, shows much higher classification rates (88.0%) than single acoustic or seismic signals for vehicle classification. In addition, the MT-SRC algorithm also shows an absolute advantage over the FDDL and HDL algorithms. Therefore, it is significant to study the target classification and recognition based on multi-sensors.

The trends of classification rates across various classification methods.
The classification rates across various classification methods (%).
AAV: Assault Amphibian Vehicle; DW: Dragon Wagon; SVM: support vector machine; SRC: sparse representation classification; LC-KSVD: label consistent KSVD; FDDL: Fisher discrimination dictionary learning; HDL: hybrid dictionary learning; MT-SRC: multi-task sparse representation classification; MT-FDDL: multi-task Fisher discrimination dictionary learning; MT-HDL: multi-task hybrid dictionary learning.
It also can be seen that, compared with the SRC and MT-SRC algorithms, the MT-SRC method makes full use of the advantages of the noise recognition rates under the seismic signal and greatly improves the classification rates of both AAV and DW while ensuring the noise recognition rates to be stable at 100%. The classification rates of the MT-FDDL algorithm based on the combination of acoustic and seismic signals (about 90%) are much better than those of the FDDL method under the acoustic signals (about 85%). It can be learned that the multi-task feature fusion method utilizes the advantages of each signal feature to make the classification rates much greater than that of any kind of single sensor feature. At the same time, it can be obtained from the figure: compared with MT-SRC algorithm (about 87%), the MT-FDDL algorithm has a better vehicle classification effect. From Tables 2 and 3, we can conclude that the MT-FDDL method is able to preserve the noise recognition advantages of seismic signals in the moving vehicles classification tasks (about 100%). In addition, with the increasing number of training samples, the classification rates of the MT-FDDL method is also rising, and basically stable at 90%. In summary, MT-FDDL algorithm has made great progress in the classification of moving vehicles, which greatly improves the classification accuracy of single sensor classification algorithm.
The classification rates of the MT-HDL algorithm (88.9%), which combines acoustic and seismic signals, are obviously higher than those of the HDL algorithm, and it has the trend that the classification accuracy is improved with the increase in the size of training samples. Besides, the classification rates of the MT-HDL algorithm are obviously higher than those of the MT-SRC algorithm (88.0%), but lower than those of the MT-FDDL algorithm (90.4%). In other words, it shows us that the DL plays a decisive role in the SRC.
From the detection rates and the false alarm rates in Table 3, we know that the MT-HDL algorithm shows superior performance in vehicle classification task compared to other classification algorithms. Furthermore, with the increasing number of training samples, the performances are improved and algorithm achieved higher classification rates. In addition, we know that the MT-HDL algorithm is similar to other multi-feature fusion methods, which inherit the advantages of acoustic signals in the recognition rates of noise (noise recognition rates reaches 100%), and the classification rates of both AAV and DW are basically stable at around 86.0%. Therefore, we can conclude that the MT-HDL algorithm can achieve a high and stable classification performance in vehicle classification and suitable for the target classification and recognition tasks in complex cases.
Time accuracy
It is true that the time complexity is also an important evaluation basis of the classification model; thus, to further demonstrate the efficiency of the MT-HDL method, we analyze time complexity of this algorithm. In the training phase, the time complexities of
In the testing phase, thanks to the small complexity of class-specific reconstruction error
The running efficiency of various classification methods (s).
SVM: support vector machine; SRC: sparse representation classification; LC-KSVD: label consistent KSVD; FDDL: Fisher discrimination dictionary learning; HDL: hybrid dictionary learning; MT-SRC: multi-task sparse representation classification; MT-FDDL: multi-task Fisher discrimination dictionary learning; MT-HDL: multi-task hybrid dictionary learning.
As we all know, the SRC algorithm needs to solve the problem of
Compared with the FDDL method and the MT-FDDL method, although the sample data of the MT-FDDL method is the sum of the FDDL methods under single acoustic and seismic signals, its time consumption is far less than the sum of the two in the process of moving vehicle classification. Therefore, we can conclude that the MT-FDDL method based on multi-feature fusion have reduced the time complexity in some way. However, it can also be seen from Table 4 that the MT-FDDL also has its shortcomings, that is, with the increasing number of the training sample data, the time consumption of MT-FDDL method is also increasing, and much larger than the SVM algorithm.
In addition, the MT-HDL method shows an advantage of efficiency and achieves better results in the time consumption of the training and testing phases. It firmly avoids the problem of time consumption caused by solving
Conclusion
In this work, we propose a new method, called MT-HDL method for moving vehicle classification in complex scenes, to achieve improved performance and significantly higher efficiency, where the data are collected from acoustic and seismic sensor nodes. Among them, the multi-feature fusion method is used to fuse the features and reduce the time complexity. Our experimental results demonstrate that our method yields highly accurate classification performance and outperforms many classical methods such as the SVM, SRC, LC-KSVD, and FDDL as well as some slightly prominent algorithms, including the MT-SRC and MT-FDDL. Furthermore, by applying our model in dealing with moving vehicle classification tasks in complex scenes, we experimentally illustrate that our proposed method not only takes advantage of each feature signals but also improves the classification accuracy of the moving vehicles. Especially, the MT-HDL fairly shows good classification performance, which not only ensures that the sparse coding matrix can be obtained by simple linear mapping and jointing an analysis dictionary with a synthesis dictionary but also reduces the time complexity caused by solving the problem of
