Abstract
Keywords
Introduction
Wireless sensor networks (WSNs) are regarded as the bridge to connect human society and the physical world, which are widely deployed for monitoring and disseminating information about various phenomena of interests.1,2 A WSN can contain large-scale sensor nodes, but it has its own features: (1) sensor node is resource-constrained such as energy, computation, storage, and bandwidth; (2) sensor networks are error-prone such as packet loss, transmission error, and abnormal reading; and (3) the sensory data of the same monitoring field often have strong spatial–temporal correlation. One of the major challenges for designing sensor networks is minimal samples and communication cost with obtaining fidelity information at sink.
In-network compression is an essential technique to reduce communication costs. Traditional compression techniques require sensor nodes with a strong computational power and need to exchange side information among sensor nodes such as transform-based compression and joint entropy compression, which are not suitable to data compression for WSNs. Compressive sensing (CS) is a new sampling and compression paradigm which is based on the fact that a relatively small number of linear combination of a compressible or sparse signal can contain most of its salient information. To the best of our knowledge, existing periodical compressive data gathering techniques based on CS still separate the process of sampling and compression.3–6 Sampling than compression CS data gathering techniques would bring many problems. First, the sampling ratio is relatively high because too many sensor nodes should take part in one measurement gathering. Second, the data gathered could be easily damaged because the sensor network is an error-prone network. Third, the number of CS measurements is difficult to adaptive control because the sink lacks of trusted comparison sensory data.
In this article, we present a random sampling zero-encoding data gathering model to reconstruct the sensory data for WSNs, which aims to make compression and sampling simultaneously, and it is used for improving the robustness of CS measurements and reducing sampling rate. Our main contributions are as follows:
We presented a random sampling zero-encoding data gathering model based on virtual Gaussian energy diffusion model which can make simultaneous sampling and compression and need not to assign projection matrix to sensor nodes
We analyzed that the orthogonal Gaussian energy diffusion basis has good compression performance for spatially correlated signals. Meanwhile, we also proved that sampling matrix and orthogonal Gaussian energy diffusion basis satisfy restricted isometric property (RIP) condition with probability tending to 1
According to our proposed random sampling zero-encoding data gathering model, we proposed an efficient missing sensory data recovery scheme, which can reduce the number of sampling sensors significantly
The rest of this article is organized as follows. In section “Related works,” we present the related work. The foundations of CS are introduced in section “Basic of CS.” In section “Problem statement,” we give the problem statement and present random sampling zero-encoding data gathering model. In section “Random sampling zero-encoding sensory data reconstruction,” we propose two types of applications based on random sampling zero-encoding model in detail. Section “Experimental results” reports our experimental results, and the conclusions are given in section “Conclusion.”
Related works
Compressive data gathering
In recent years, many compressive data gathering techniques were proposed. D Baron et al. 7 proposed distributed CS that enables new distributed coding algorithms for multi-signal ensembles exploiting both intra- and inter-signal correlation structures, and also gave three joint sparsity models. Rabbat and colleagues8,9 applied CS theory to single-hop data gathering in WSNs to obtain efficient compression for network data. S Lee et al.10,11 proposed joint optimization of transport cost and reconstruction for spatially localized CS in multi-hop sensor networks. Luo et al.5,6 proposed compressive data gathering based on CS theory to effectively reduce communication costs and prolong network lifetime in large-scale monitoring sensor networks. In the previous studies,3,4,12 the authors extended CS data gathering to dual-layer compressed aggregation and adapted the number of measurements during the data gathering. To the best of our knowledge, existing researches still separate the processes of sampling and compression. Sampling than compression compressive data gathering techniques would bring high data transmission cost.
Incomplete sensory data recovery
There exist two groups of incomplete sensory data recovery, which recover the missing sensory data using correctly received sensory data and reconstruct the monitoring field sensory data using incomplete sampling. The missing sensory data could be obtained by retransmission techniques or reconstructed by spatial interpolation techniques.
13
But retransmission would increase network burden which can lead to more packets loss and more energy consumption. To reconstruct the entire monitoring field using incomplete sampling, spatial correlation interpolation or transform domain interpolation can be done to reconstruct the missing samples.3,14–16 Sheikhhasan
14
and Umer et al.
15
presented distance-weighted interpolation technique to reconstruct the missing samples–based spatial correlation of the sensory data. Guo et al.
16
proposed sparsity-based spatial interpolation algorithm via solving the
Basic of CS
CS is a new kind of compression and sampling paradigm. It asserts that a small number of linear projections of sparse or compressible signals can contain sufficient information for reconstruction of the signals.17–19 While Shannon–Nyquist sampling theory stated that the sampling rate must be at least twice the maximum frequency to avoid losing information when capturing a signal, CS theory breaks through the bottleneck of Shannon–Nyquist’s sampling theory for sparse or compressible signal and makes simultaneous sampling and compression possible.
We assume that
where
where
Definition 1 (RIP)
Suppose
for all
For the second problem, the reconstruction
If the measurement vector
where
From the framework of CS theory, we can see that CS codec scheme shifts the complexity from encoder to decoder and makes encoder become very simple.
Problem statement
In this article, we focus on periodical compressive data gathering in large-scale WSNs. According to CS theory, compressible signal recovery process is to first recover its corresponding sparse signal and then recover the original signal by inverse sparse basis transformation.
In this section, we shall establish an energy diffusion model to meet the above conditions and to prove that a single sample value can be considered as a CS measurement.
Basic assumption
To simplify the problem statement, we make the following reasonable assumptions for periodical compressive data gathering in WSNs:
The monitoring field contains
All sensors sample once in a given periodical time interval and each periodical time is called a round data gathering
The monitoring field is partitioned into
For each round data gathering, the monitoring data of
Problem formulation
To transform spatially correlated signals to energy diffusion sources, we need to establish an energy diffusion model which does not need to meet the real energy diffusion model because energy sources are virtual. For the same spatially correlated signal, when different energy diffusion models are selected, it means different energy sources distribute in the monitoring field. Without loss of generality, we define the energy diffusion model as Gaussian model
where
We denote Gaussian energy sources by
Since there exist
where
where
where
If we can exploit
and
where
Does
satisfy sparsity?
In this subsection, we display that
For any spatially correlated signal

The element mean and element variance of every row of
Figure 2 shows two types of signals compression results. In Figure 2(a), the signal comes from a block pixel value of “Lena” which can be considered as strong spatially correlated signal. In Figure 2(c), the signal comes from GreenOrbs 22 within the same round data gathering and sorted by sensor node Mote_ID which can be considered as weak spatially correlated signal. Figure 2(b) and (d) illustrates that the energy of transformed coefficients under orthogonal Gaussian energy diffusion basis are mainly concentrated in a few elements. Based on the above analysis, we can consider that the orthogonal energy diffusion basis has compression function for spatially correlated signal.

The transformed coefficient signal
Does
obey RIP?
In this subsection, we first present the statistical properties of

The element mean and element variance of every row of
In the following part, we shall prove that
Definition 2 (sub-Gaussian)
A random variable
holds for all
Corollary 1
If the
Theorem 1
Fix
for all
According to Theorem 1, we know that
Random sampling zero-encoding sensory data reconstruction
In the above section, we have analyzed that each sample can be considered as a CS measurement under Gaussian energy diffusion model. In this section, we present a random sampling zero-encoding data gathering scheme according to the above theory discussion. Our proposed compressive data gathering model can be applied to two types of practical applications: (1) recover the missing sensory data due to packet loss, transmission error, abnormal reading, and so on and (2) extend the monitoring field using incomplete random sampling.
Efficiently recovering missing sensory data
Because sensor network is an error-prone network, packet loss, transmission error, and abnormal reading are very common phenomena especially for large-scale WSNs. First, we analyzed sensory data of two real sensory systems to illustrate the statistical results of missing sensory data. We selected 45 rounds sensory data of GreenOrbs 22 which contain 330 sensor nodes and Intel Berkeley Research lab 25 which contain 54 sensor nodes, respectively. In this subsection, the missing sensory data only refer to lost packet and abnormal reading. Figure 4(a) displays continuous 45 rounds missing sensory data ratio of GreenOrbs from 18 December 2010, 00:00 with 10-min interval for each round.Figure 4(b) shows continuous 45 rounds missing sensory data ratio of Intel Berkeley Research lab from the epoch 6486 to 6528. Figure 4 illustrates that the missing sensory data are very common phenomena which can reach more than 50% such as Intel Lab Data. According to Figure 4, we can know that the missing sensory data seriously affect the overall monitoring results. Retransmission techniques, however, are commonly used to handle packet loss phenomenon. It can effectively resolve the packet loss when the load of corresponding sensor node and communication link is lighter. Otherwise, packet retransmission could lead to more packet loss.

The missing sensory data ratio of two real sensor networks: (a) continuous 45 rounds sensory data ratio comes from GreenOrbs data with 10-min interval for each round and (b) continuous 45 rounds sensory data ratio comes from Intel Lab Data from the epoch of 6848 to 6528.
In order to recover the missing sensory data, we designed a post-processing sensory data recovery scheme based on our proposed random sampling zero-encoding data gathering model. Moreover, the lost packet cannot be retransmitted in our sensory data recovery scheme which can reduce the load of sensor networks. Our proposed missing sensory data recovery scheme includes the following steps:
Partition the monitoring field into
Record the missing sensory data;
Reconstruct all sensory data based on our proposed random sampling zero-encoding data gathering scheme using correctly received sensory data;
Extract the missing sensory data from the reconstructed
In this data recovery scheme, the number of grid cells
If the missing sensory data can be considered as randomly distributed in all sensory data, then the correctly received sensory data can also be considered as random distribution. We can use
and
where
where
If the missing sensory data cannot be considered as random distribution such as consecutive sensory data loss corresponding to spatial block sensory field, we can transform this case into randomly distributed by reshuffling techniques and also can improve the compressibility of the original sensory data. In the following, we give the sorted method based on Gaussian energy diffusion model which is called Gaussian sort (GS). Figure 5 shows the GS order of

Gaussian energy diffusion order—GS order.
Extend monitoring field using incomplete random sampling
Through a limited number of sensor nodes to obtain a larger monitoring data is also an important research issue for sensor networks. For example, to obtain the entire monitoring field sensory data as an “image,” if each “pixel” of the monitoring field should be deployed a sensor which requires deploying too many sensors. It may be impossible in sometimes considering from the scale of sensor network and sensor node deployment. In this subsection, we proposed an extension monitoring field scheme using incomplete random sampling based on our proposed random sampling zero-encoding data gathering model.
To implement our proposed scheme, we assume that the monitoring field is two-dimensional (2D) plane, and the sink needs know the location of each sensor nodes. To simplify the reconstruction scheme, we also assume that the WSN is ideal and the missing sensory data cannot occur. The process of extending the monitoring field using incomplete random sampling contains the following steps:
Divide the entire monitoring field into
Assign the
The sink gathers
The sink reconstructs the monitoring data of
For step 1, we first must divide the entire monitoring field into
and
where
Experimental results
In this section, we conduct extensive experiments to evaluate the performance of our proposed zero-encoding sensory data gathering model, which contains two aspects: the performance of missing data recovery and the performance of extending the monitoring field using incomplete sensory data. Before the experiments, we give the experimental data sets and the performance metrics. To obtain a variety of real data experimental results, we come up with four real different ways to generate the real data sets for our experiments:
In our experiments, we use the CoSaMP
21
algorithm to solve
where
Real sensed data recovery
To evaluate the performance of our proposed missing sensory data recovery scheme, the experimental data sets are selected from the Intel Lab Data 25 and GreenOrbs data. 22 CS technique cannot directly apply to single round data gathering because the Intel Lab Data only contains 54 sensor nodes. In our experiment, we exploit multi-round sensory data as single signal to carry out our proposed missing data recovery scheme. In the following, we carry out our experiment from three types of sensory data under Mote_ID order and GS order, and the three types of sensory data are as follows:
Continuous six rounds temperature sensory data of epoch 6520–6525 come from Intel Lab Data which contains 208 correctly received sensory data and 116 missing sensory data
A round temperature sensory data of 19 December 2010, 14:30–14:40 come from GreenOrbs data which contains 274 correctly received sensory data and 40 missing sensory data
Randomly selected missing sensory data from 256 correctly received temperature sensory data of 19 December 2010, 14:30–14:40 of GreenOrbs data under the same round
Figure 6 shows the original continuous six rounds temperature sensory data of Intel Lab Data and its recovery sensory data under Mote_ID order and GS order. In Figure 6(a), there are 324 temperature sensory data which contain 208 correctly received sensory data and 116 missing sensory data; the missing temperature sensory value is set to

(a) Continuous six rounds sensory data come from Intel Lab Data sorted by Mote_ID; (b) and (c) are the recovery sensory data using Mote_ID order and GS order, respectively; (d) 208 correctly received sensory data; and (e) and (f) are the recovery sensory data corresponding to the correctly received sensory data using Mote_ID order and GS order, respectively.
Because the Mote_ID order sensory data of the Intel Lab Data are considered as strong spatial correlation signal, it achieved good recovery performance. In the following, we shall exploit GreenOrbs data to evaluate our proposed missing sensory data recovery scheme. Figure 7 shows the round temperature sensory data of GreenOrbs and its recovery sensory data under Mote_ID order and GS order. In Figure 7(a), there are 274 correctly received sensory data and 40 missing sensory data which are denoted by

(a) A round sensory data come from GreenOrbs data sorted by Mote_ID; (b) and (c) are the recovery sensory data using Mote_ID order and GS order, respectively; (d) 284 correctly received sensory data; and (e) and (f) are the recovery sensory data corresponding to the correctly received sensory data using Mote_ID order and GS order, respectively.
In order to better display the comparison results between the original sensory data and the recovered sensory data, we randomly selected missing sensory data from 256 correctly received sensory data of GreenOrbs data set to evaluate our proposed missing data recovery scheme. Figure 8 illustrates that the recovery performance of GS order significantly outperforms Mote_ID order under different number of CS measurements. Actually, our proposed random sampling zero-encoding data gathering model can not only be used for missing sensory data recovery but also be used to reduce the samples for WSN data gathering, namely, we can randomly select a part of sensors to take part in sampling during each round data gathering.

The MAE comparison between Mote_ID order and GS order of sensory data under different number of measurements.
Meteorological data reconstruction
In this subsection, we evaluate the performance of our proposed extension of the monitoring field using incomplete random sampling over a set of temperature distribution data provided by WorldClim.
26
The temperature data set partitions the global surface into

A snapshot of mean monthly surface temperature in September over global land areas, excluding Antarctica.

(a) Original temperature data and (b), (c), (d), (e), and (f) are the reconstructed temperature data corresponding to
The

The reconstructed errors corresponding to Figure 10(b)–(f).
Conclusion
This article investigates the problem of sensory data reconstruction in WSNs based on CS. We first proposed a random sampling compressive data gathering model based on virtual Gaussian energy diffusion model. Then, we analyzed that orthogonal Gaussian energy diffusion basis has good compression function and proved that the product of sampling matrix and orthogonal Gaussian energy diffusion basis satisfies RIP of CS. Our proposed random sampling compressive data gathering model makes simultaneous sampling and compression be possible, which does not want to assign projection matrix to sensor nodes. The experimental results show that our proposed random sampling zero-encoding data gathering model has good performance.
