Abstract
1. Introduction
Wireless sensor networks (WSNs) have been widely used in a variety of applications such as precision agriculture [1], personal health monitoring [2], and environment surveillance [3], in which sensor nodes with limited battery energy are deployed and periodically report their sensor readings to the base station or sink node. Therefore, a key issue in wireless sensor networks is how to efficiently gather these data from sensors provided with only limited energy resources. In the classical data gathering approach [4], each sensor node simply forwards its sensor readings to the sink, resulting in a large amount of traffic and energy consumption.
Recently, Compressive Sensing (CS) [5, 6] has emerged as a new approach to tackle the efficient data gathering problem in WSNs. Taking advantage of the sparsity in sensor readings, CS based methods [7–9] require fewer data packets than the classical approach. However, there are many practical problems when applying CS to data gathering in WSNs. Firstly, CS based methods require a prior dictionary to sparsify the sensor readings. Secondly, the measurement matrix in CS is composed of independent and identically distributed random Gaussian entries, which is dense with very few zero elements. Therefore, sensor nodes need to sample all sensor readings and perform a considerable number of measurement operations, resulting in a large amount of energy waste. Thirdly, CS requires the number of measurements to exceed a certain threshold (depending on the sparsity level of sensor readings) to achieve exact recovery. However, the realistic sensor signal is not always exactly sparse as it should be. Therefore, low sampling rate may lead to insufficient measurements and result in a bad recovery accuracy.
As an extension of CS, matrix completion [10] has shown its potential for enabling efficient data gathering in WSNs. Because the sensor readings are highly temporal-spatial correlated, the data matrix structured by the sensor readings will approximate to a low-rank matrix. Therefore, the sink node can gather only a few of the total sensor readings and adopt the matrix completion algorithm to reconstruct the missing data. However, unlike CS based methods, matrix completion based methods do not require the prior dictionary to sparsify the original signal. Furthermore, the sampling matrix (or measurement matrix) in matrix completion is much sparser than CS, which makes it more suitable for wireless sensor networks.
Utilizing the low-rankness feature in sensory data, there are many pioneering works [11–13] on applying matrix completion to WSN, which adopt the alternating least squares technique to estimate the low-rank matrix. The recovery accuracy is further improved by utilizing the spatiotemporal structure in the WSN data. However, the improvement is limited because the spatiotemporal structure directly implies the low-rank feature, and in some sense, these two features are equivalent. Moreover, the alternating least squares algorithm does not scale to large rank.
Besides the low-rankness feature, we observe that the sensor readings in WSNs also exhibit Discrete Cosine Transform (DCT) compactness feature. In other words, the sensor readings can be approximated by only a small number of DCT coefficients. Therefore, by taking full advantage of the DCT compactness feature of the WSNs data, in this paper, we propose a DCT Regularized Matrix Completion (DRMC) algorithm. We analyze the time complexity of DRMC, which indicates that the proposed algorithm has a low computational cost. Moreover, we analyze the recovery error of DRMC and derive a theoretical upper bound. The error bound is then validated by experimental results. Extensive experiments are carried out on three datasets that are collected from two realistic WSN testbeds. We compare the performance of DRMC with state-of-the-art methods. Experimental results show that DRMC outperforms state-of-the-art methods for low sampling rate and achieves a good recovery accuracy even if the sampling rate is very low.
The main contributions of this paper are summarized in the following:
We examine the sensor data collected from real-world WSNs, which reveal two data features: (1) Inspired by these observations, we design a novel We analyze the time complexity of the DRMC algorithm, which indicates that DRMC has a low computational cost. We analyze the recovery error of the DRMC algorithm and derive a theoretical upper bound, which is then validated by experimental results.
The rest of this paper is organized as follows. Section 2 reviews the state-of-the-art methods. Section 3 models the problem. Section 4 examines the data features in WSNs. Section 5 proposes the DRMC algorithm. Section 6 analyzes the time complexity of the algorithm. Section 7 analyzes the recovery error of the algorithm. Section 8 evaluates the effectiveness of the proposed algorithm. Section 9 concludes this paper.
2. Related Works
In this section, we make a brief review of previous works related to the data gathering problem in wireless sensor networks.
2.1. Compressive Sensing
Compressive Sensing (CS) theory suggests that sparse signals can be accurately reconstructed from only a small number of measurements [5, 6]. It is a new paradigm for signal processing of networked data [7] and there are many CS based methods for data gathering in wireless sensor networks. Luo et al. proposed a data gathering scheme that applies Compressive Sensing theory to reduce communication cost [14]. Quer et al. presented a framework for data gathering and signals recovery in actual WSN deployments with the integration of CS [9]. Ebrahimi and Assi recently proposed a decentralized method to apply the Compressive Sensing to data gathering in wireless sensor networks [15].
2.2. Matrix Completion
Recently, there are many applications that apply matrix completion technique to wireless sensor networks. Utilizing the low-rankness and spatiotemporal correlation, Zhang et al. proposed a method to recover the lost data in internet traffic matrices [13]. Kong et al. designed an algorithm using the low-rank structure, time stability, space similarity, and multiattribute correlation to estimate the missing data in highly incomplete data matrix [12]. Cheng et al. presented a Spatiotemporal Compressive Data Collection (STCDG) algorithm that utilizes the low-rankness and short-term stability features to reduce data traffic in WSNs [11].
In our earlier work [16], we have studied the data recovery problem in wireless sensor networks when historical data are available and proposed a DCT-Regularized Partial Matrix Completion (DCT-RPMC) algorithm. However, the new algorithm proposed in this paper does not depend on any historical data, which greatly widens its applicability to more general scenarios.
3. Problem Formulation
In this section, we formally formulate the data gathering and recovery problem in wireless sensor networks and state the goal of this paper. The main notations that will be used in the rest of this paper are listed in Summary of Notations.
Assume that the network is composed of
In order to reduce energy consumption, only a fraction of the entries of
Let
After obtaining
4. Exploring the Data Features
In this section, we examine the data features in real-world wireless sensor networks.
4.1. Experimental Datasets
We use three datasets, which are collected from two WSN testbeds, to serve as the ground truth. The summary of the datasets is shown in Table 1.
Experimental datasets.
The first category of datasets is collected by 54 Mica2Dot nodes deployed in the Intel Berkeley Research Lab [17] between February 28 and April 5, 2004. The Mica2Dot node reports collected sensor data including humidity and temperature once every 30 seconds. However, we find that the raw dataset has considerable missing data. Therefore, we have rearranged the raw data (by changing the reporting interval from 30 seconds to 10 minutes) to avoid the missing data.
The second category of dataset consists of temperature readings, which are collected with a 10-minute interval by our own testbed, namely, PARED. PARED consists of 50 sensor nodes. More details about PARED can be found in [18].
4.2. Low-Rank Structure
We first examine the low-rank structure in WSN datasets using the Singular Value Decomposition (SVD). The environment matrix
Because sensor readings in a WSN are spatiotemporal correlated, the environment matrix
Figure 1 shows the low-rank approximation quality of the three datasets. We found that the largest 10 singular values occupy the 93%–99% of the total energy, which suggests that the WSN datasets exhibit a good low-rank feature.

Low-rankness feature.
4.3. DCT Compactness
We also observed that the sensor readings in WSN exhibit DCT compactness feature. In other words, the first
We first define
Therefore, if the first
From Figure 2, we can see that the first 10 DCT coefficients concentrate

DCT compactness feature.
5. Algorithm
In this section, we proposed a novel matrix completion algorithm, namely, DCT Regularized Matrix Completion (DRMC), to solve the data recovery problem in WSN data gathering. DRMC takes full advantage of the low-rankness and DCT compactness features to improve the recovery accuracy.
5.1. Utilization of Low-Rankness
As mentioned before in Section 3, the goal of the recovery problem is to estimate
However, the rank minimization problem (12) is NP-hard and is not solvable in polynomial time. Since the nuclear norm is the optimal convex approximation of the rank function, a reasonable solution is to solve a convex relaxation problem with the rank function replaced by the nuclear norm:
However, in a more realistic occasion, the environment matrix
5.2. Utilization of DCT Compactness
Though we can estimate
Therefore, to reduce the overfitting in (14), we exploit the DCT compactness feature of the sensor data. As mentioned in Section 4.3,
5.3. The DRMC Algorithm
We present the DRMC algorithm by solving the optimization problem in (15). The pseudocode is shown in Algorithm 1. Next, we will describe the design of DRMC in details.
( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (
The object function in (15) can be rewritten into the following form:
Note that
Since
We then introduce an auxiliary variable
What is more, we consider a warm-start technique for the nuclear regularization parameter
Lemma 1.
Proof.
Since
And note that
Proposition 2.
Assume that
Proof.
Note that
Definition 3.
Decompose the matrix
Lemma 4.
Let
Proof.
Proposition 5.
Let
Proof.
Consider
6. Complexity Analysis
In this section, we discuss the time complexity of the proposed algorithm.
After analyzing the steps in Algorithm 1, we find that the most computationally intensive step is the singular value shrinkage operation that performs SVD on
We can further decrease the time complexity of the DRMC algorithm by applying Partial Reorthogonalization Package (PROPACK) [26] in the singular value shrinkage operation. PROPACK uses the Lanczos method [24] to compute only a partial SVD of
The time complexity of the Lanczos method is
7. Error Analysis
In this section, we analyze the recovery error of the DRMC algorithm and present a theoretical upper bound.
Before starting the analysis, we first introduce some assumptions and lemmas.
Assumption 6.
The original data matrix
Assumption 7.
Let
Then, there exists a constant
Lemma 8.
Suppose that
Proof.
Lemma 9.
Suppose that
Proof.
Consider
The upper bound of the recovery error is then given in the following theorem.
Theorem 10.
Suppose that
Proof.
When the iterative process in Algorithm 1 has converged,
Combining (43) and Lemma 8, we have
Then, applying Assumption 6 and Lemma 9 to (44), we have
Let
8. Evaluation
We designed a data gathering scheme based on the proposed DRMC algorithm. The data gathering procedure is similar to [11]. Firstly, sink node broadcasts a sampling rate to all sensor nodes. Secondly, each sensor node randomly and independently decides whether to forward its readings to the sink according to the sampling rate. Finally, the sink node collects the incomplete data matrix and uses DRMC to retrieve the missing data. After implementing this data gathering scheme by Matlab, we carried out extensive experiments on three real-world datasets (as shown in Table 1) to evaluate the effectiveness of DRMC.
8.1. Baseline Methods
We select two state-of-the-art methods to compare with DRMC. The first method is Compressive Sensing (CS). We choose the DCT matrix defined in (9) to serve as the orthonormal basis in CS. The second method is Spatiotemporal Compressive Data Collection (STCDG). The parameters of STCDG are set to
8.2. Recovery Accuracy
Firstly, we compared the recovery accuracy of the proposed algorithm with two baseline methods described above. The parameters of DRMC are listed in Table 2.
Parameter settings for DRMC.
Simulation experiments are carried out on three real-world datasets. Each simulation is conducted for 100 independent trials. The recovery errors are computed according to (6) and are averaged over the 100 trials.
Comparison results are shown in Figures 3–5. For experiments on Intel Temperature Trace, all methods achieve nearly the same recovery accuracy when the sampling rate is high. When the sampling rate is below a certain value (

Recovery accuracy on Intel Temperature Trace.

Recovery accuracy on Intel Humidity Trace.

Recovery accuracy on PARED Temperature Trace.
Comparison results on Intel Humidity Trace are very similar to that on Intel Temperature Trace. Recovery error of DRMC is about 9% when the sampling rate is 0.03, which is noticeably better than that of baseline methods.
For experiments on PARED Temperature Trace, DRMC still outperforms baseline methods for low sampling rate. The recovery error of DRMC is about
8.3. Parameter Settings
The DRMC algorithm depends on several input parameters. Clearly, the choice of these parameters will affect the recovery performance of DRMC. In this subsection, we discuss how to choose the parameters for DRMC.
The nuclear norm regularization parameter

Effect of parameter
Parameter

Effect of parameter
Recall that

Effect of parameter
8.4. Energy Consumption and Network Lifespan
DRMC-based data gathering protocol is more energy efficient because it transmits less packets than the classic one (receiving and forwarding). As a result, DRMC-based protocol can save more energy and prolong the lifespan of wireless sensor networks. To verify this, simulation experiments are conducted and the simulation configuration is shown in Table 3.
Simulation configuration for network lifespan.
In the simulation, sensor nodes are randomly deployed in a 500 m × 500 m area and the sink node is deployed in the center. Each sensor node is equipped with 1 J energy. To evaluate the energy consumption, we adopt the following energy model [29]:
Figure 9 demonstrates the network lifespan of DRMC-based protocol and other baseline protocols under different sampling rate. Note that the network lifespan is defined as the time when the first energy exhausted node appears. Apparently, the sampling rate does not play a role in the classic data gathering, since it directly transmits all data without compression. Therefore, the lifespan curve of the classic protocol in Figure 9 is a straight line. For the CS method, the smaller the sampling rate is, the less measurements are taken. And as shown in Figure 9, the lifespan of CS is decreasing with the sampling rate. However, when the sampling rate is above a certain value, the lifespan of CS is even worse than the classic one. The reason why CS performs badly for large sampling rate is well analyzed in [30]. Figure 9 shows that DRMC-based protocol achieves the best lifespan. Similarly, the lifespan of DRMC is decreasing with the sampling rate. When the sampling rate decreases to 1, DRMC-based protocol is equivalent to the classic protocol and the lifespan of DRMC is equal to the classic one. Note that the lifespan of STCDG is exactly the same as DRMC, because both of the two methods are based on matrix completion. The lifespan of DRMC is longer than CS because the sampling matrix in DRMC is much sparser than that in CS.

Relation of network lifespan to sampling rate.
9. Conclusion
In this paper, we studied the data gathering and reconstruction problem in WSNs. We modeled the problem as matrix completion problem and investigated the data features in real WSN datasets. Then, by taking advantage of the low-rankness and DCT compactness features in WSNs, we proposed a DCT Regularized Matrix Completion (DRMC) algorithm to reconstruct the missing data. The recovery error of DRMC is carefully analyzed and a theoretical error upper bound is presented. Experimental results show that DRMC outperforms state-of-the-art methods for low sampling rate and achieves a good recovery accuracy even if the sampling rate is very low.
