Sage Journals: Discover world-class research

Abstract

The processing capacity and power of nodes in a Wireless Sensor Network (WSN) are limited. And most image compression algorithms in WSN are subject to random image content changes or have low image qualities after the images are decoded. Therefore, an image compression method based on multilayer Restricted Boltzmann Machine (RBM) network is proposed in this paper. The alternative iteration algorithm is also applied in RBM to optimize the training process. The proposed image compression method is compared with a region of interest (ROI) compression method in simulations. Under the same compression ratio, the qualities of reconstructed images are better than that of ROI. When the number of hidden units in top RBM layer is 8, the peak signal-to-noise ratio (PSNR) of the multilayer RBM network compression method is 74.2141, and it is much higher than that of ROI which is 60.2093. The multilayer RBM based image compression method has better compression performance and can effectively reduce the energy consumption during image transmission in WSN.

1. Introduction

WSNs emerge as a new research hot spot in recent years. In a WSN, the resources in each sensor node are limited. Therefore, it is a huge challenge to reduce energy consumption and extend the lifetime of a sensor node. The energy cost of transmitting image in WSN remains to be a main factor that affects the lifetime of a sensor node. To reduce the bandwidth and energy consumption in image transmission, it is necessary to propose a more effective image compression method.

Currently, the image compression algorithms in WSNs are subject to the random changes in image contents. It is unrealistic to describe various images in real world with only one kind of image model. To address this issue, the neural network is adopted in WSNs to compress images. As an important branch of neural network, Deep Learning [1, 2] has many computation models. Restricted Boltzmann Machine (RBM) [3, 4] is one of the prime models of Deep Learning. When the multilayer RBM based network is used to compress images, the quality of compressed images has much to do with the likelihood of training data in each RBM layer. Moreover, the training complexity of RBM has a great effect on the energy consumption of image compression coding.

Most of the current RBM training algorithms are carried out only with large quantities of sampling using Markov Chain Monte Carlo (MCMC) method. The average joint probability between visible units and hidden units is estimated based on these samples without calculating the normalizing parameter. However, the frequency of state transitions should be enough to ensure that the acquired samples satisfy the target distribution when MCMC sampling is conducted. Also, large quantities of sampling are needed to improve the accuracy of the estimated value, which increases the difficulties of RBM training. Aiming at the problems encountered in the current RBM training process, alternative algorithm is used in the RBM training process.

We have adopted the alternative iteration algorithm into the process of RBM training. In this algorithm, the normalizing parameter is considered as another unknown parameter. Therefore, the likelihood function can be transformed into two subfunctions. One is about the normalizing parameter and the other is about the model distribution parameter. The model distribution parameter which is about to be assessed is calculated alternatively with the normalizing parameter and eventually can be obtained through a highly efficient training process. This training process is of low complexity. This algorithm can improve the likelihood of RBM for training data.

Furthermore, we have used the improved RBM training process in image compression in WSNs. A multilayer improved RBM based image compression method is presented in this paper. This image compression method can extract more abstract data to coding based on the image features and has a better compression effect. In the simulations, the reconstructed image quality of multilayer RBM networks is superior to that of another image compression method under the same compression ratio, which will be stated in detail in Section 5. At the same time, the proposed image compression method can reduce the energy consumption during image data transmission process.

The rest of the paper is organized as follows. In Section 2, related work on image compression and RBM training algorithms is discussed. Section 3 presents the basic idea of the multilayer RBM network based image compression method. And the RBM model and the improved RBM algorithm based on alternative iteration are depicted in Section 4. The performance of the proposed algorithm is compared with some typical algorithms in Section 5. At last, conclusions and future work are presented in Section 6.

2. Related Work

Typical image compression algorithms include time-space related data compression algorithm, wavelet transform based data compression algorithm, distributed data compression algorithm, and improved traditional data compression algorithm.

The space-time relativity based data compression algorithm mainly includes prediction coding and linear fitting method for time series. A prediction coding method is proposed in [5]. It can effectively evaluate the source data based on the time relativity of the source data. However, the prediction coding based data compression method does not involve large amount of image data transmission. Reference [6] proposes a curve fitting technology based data flow compression method. It compresses data collected on each node and restores the data in the base station. But this method is very complex, and it does not consider the transmission delay in each sensor node. Reference [7] presents a space-time data compression technology based on simple linear regression model. This method can eliminate data redundancy in single node and collector node, respectively. But only the data that satisfies the error requirement is considered in this method. Abnormal data is not involved in this method.

Wavelet transform is a time-frequency analysis method which is superior to traditional signal analysis methods. Reference [8] considers the existence of stream data in the data transmission of sensor networks. It compresses data by using wavelet transform based on the data aggregation and the DC routing algorithm. In [9, 10], a ring model based distributed time-space data compression algorithm and a wavelet based self-fitting data compression algorithm are proposed. Storage efficient two-dimension and three-dimension continuous wavelet data compression methods are proposed in [11]. They are based on the ring model of fitting sensor network wavelet transform and the overlapping cluster partition model, respectively. They are storage efficient and can save the transmission energy consumption in networks.

The distributed data compression algorithm is based on the fact that all the centralized and decentralized information services can be implemented. The feature of a distributed data compression algorithm is that it can reduce the data amount by the cooperative work among different sensor nodes. A chain model based distributed data compression algorithm is proposed in [12] based on the random lengths of wavelets. This algorithm designs a chain model that is suitable for wavelet transform. It is suitable for random lengths of wavelet functions.

Traditional lossless data compression methods mainly include Run Length Encoding technology, Huffman coding compression, dictionary compression method, and arithmetic compression method. These methods are mainly adopted in advanced computers or workstations. In the application of sensor networks, the processing capacity of each processor is limited. Its memory is small. Therefore, it is essential to optimize the traditional compression algorithm. In [13], the difference between the two perceptual pieces of data is encoded based on the self-fitting Huffman coding algorithm. Reference [14] proposes a region of interest (ROI) based lossy-lossless image compression method. It carries out different coding compression methods on the small area that is important to itself and the other large area. In this way, compression ratio is improved under the condition that sensitive information is reserved.

In recent years, Deep Learning (DL) is widely used in WSNs to carry out image compression. Deep Learning extracts the characteristics of data from low to high layers by modeling the layer model of analyzing in human brains. However, the effect of image compression using DL is subject to the likelihood of RBM for training data and the training complexity of RBM. Therefore, an improved training algorithm based on RBM training is also proposed in this paper.

Currently, researchers have made lots of researches on RBM training algorithms. In 2002, Hinton proposed a fast learning algorithm of RBM, Contrastive Divergence (CD) [15]. This algorithm is a RBM approximate learning algorithm of high efficiency. However, the RBM model acquired by the CD algorithm is not a maximum entropy model and does not have high likelihood when training data [16].

In 2008, Tijmen Tieleman proposed a Persistent Contrastive Divergence (PCD) algorithm [17]. This algorithm has remedied the deficiency in CD algorithm. It has the same efficiency of CD algorithm and does not violate the maximum likelihood learning. In addition, the RBM obtained by PCD training has more powerful pattern generation capacity. In 2009, Tieleman and Hinton made further improvement of PCD algorithm [18] and proposed Fast Persistent Contrastive Divergence (FPCD) algorithm. A group of auxiliary parameters are involved in improving the Markov chain composite rate in PCD. Another group of parameters, which are called Fast Weight and denoted by $W^{'}$ , are learned at the same time when carrying out RBM learning.

Some RBM learning algorithms of MCMC sampling methods based on tempering also appear during these years. A parallel tempering algorithm based on RBM is introduced in [19]. This algorithm maintains a state for every distribution under a certain temperature. During state transition, the low temperature distribution state can be transmitted to high temperature distribution state by exchanging the two distribution states. In this way, there is a high chance that the low temperature distribution state can be transmitted to a remote peak value; therefore, the whole distribution can be sampled. In 2014, Xu et al. proposed a tempered based MCMC method, Tempered Transition, in [20] to learn RBM model. The main idea of Tempered Transition is to maintain the current state in the target distribution. When a new state appears, the state transition is carried out step by step from low to high temperature, by which the state gravity from the current peak value can be decreased. At last, a group of state transitions from high to low temperature are conducted until the temperature gets normal. The essence of the above two algorithms can improve the RBM training efficiency by adopting the MCMC sampling method based on tempering [21].

3. Image Compression Using Multilayer RBM Network

In a WSN, the data transmission process can be divided into two parts: data compression encoding process and data decoding process. The image sending and receiving process in WSNs can be shown in Figure 1.

Figure 1

The image sending and receiving process.

The basic idea of the multilayer based RBM network data compression encoding method is as follows: firstly, an image whose pixel is $M \times N$ is changed into $M \times N$ pixel matrix; then normalization processing is carried out on each element in this matrix based on the mean distribution preprocessing method, and each element in the original matrix is changed in the range [0,1]. We denote the changed matrix by g: $\begin{matrix} g = {(g_{0 r o w}, g_{1 r o w}, \dots, g_{(N - 1) r o w})}^{T} = [\begin{bmatrix} g_{0,0} & g_{0,1} & \dots & g_{0, N - 1} \\ g_{1,0} & g_{1,1} & \dots & g_{1, N - 1} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ g_{M - 1,0} & g_{M - 1,1} & \dots & g_{M - 1, N - 1} \end{bmatrix}], \end{matrix}$ (1) where $g_{i r o w}$ denotes a row in the matrix and $0 \leq i r o w < M$ . N is the number of elements in each row.

Then, the image matrix g is inputted into the multilayer RBM network. The network contains an input layer and multiple hidden layers. The connection weights and bias between input layer units and hidden layer units can be adjusted so as to make the hidden layer output equal to the input of input layer as much as possible. The output of RBM hidden layer in the first layer is inputted into the RBM in the second layer. When the number of hidden layer units is smaller than that of the input layer units, it means that the hidden layer can effectively express the input of the input layer. The transformation from input layer to hidden layer can be seen as the process of compression encoding. The process of multilayer RBM network image compression is shown in Figure 2.

Figure 2

Basic idea of image compression using multilayer RBM.

The bottom input layer consists of $M \times N$ neural units. Each neural unit represents a pixel in $M \times N$ image. The number of hidden units can be determined based on the image compression ratio.

Image decoding is the inverse process of image compression coding process. The compressed image is inputted into the topmost layer and then is decoded from layer to layer. At last, the bottom level outputs the original image.

The main part of the improved image compression coding method is RBM. It is essential to improve the likelihood of RBM for image data so as to ensure high similarity between the original image and the image after decoding. Therefore, we have improved the training method of RBM.

4. An Improved RBM Algorithm Based on Alternative Iteration Algorithm

4.1. The RBM Model

RBM can be assumed as an undirected graph model [22]. As is shown in Figure 3, v is the visible layer and h is the hidden layer. W denotes the connecting weights of the edges that connect the two layers.

Figure 3

Graph of RBM model.

Assume that there are n visible units and m hidden units. And the states of the visible and hidden units are referred to as vectors v and h accordingly. Then, for a certain combined state $(v, h)$ , the energy of the RBM system can be defined as $\begin{matrix} E (v, h ∣ θ) = - \sum_{i = 1}^{n} a_{i} v_{i} - \sum_{j = 1}^{m} b_{j} h_{j} - \sum_{i = 1}^{n} \sum_{j = 1}^{m} h_{j} w_{i j} v_{i}, \end{matrix}$ (2) where $θ = \{w_{i j}, a_{i}, b_{j}\}$ , $w_{i j}$ denotes the connecting weight between visible unit i and hidden unit j, $a_{i}$ is the bias of the visible unit i, and $b_{j}$ is the bias of the hidden units. All the parameters are real numbers. And when these parameters are decided, we can get the joint probability distribution of $(v, h)$ based on this energy function (2): $\begin{matrix} P (v, h ∣ θ) = \frac{e^{- E (v, h ∣ θ)}}{Z (θ)}, \end{matrix}$ (3) where $Z (θ) = \sum_{v, h}^{} e^{- E (v, h ∣ θ)}$ is the normalizing parameter.

The object of RBM learning process is to determine θ, the parameter fitting training data. Traditional way of getting the value of θ is to maximize the log-likelihood function based on RBM: $\begin{matrix} θ^{*} = \arg \max_{θ} L (θ) = \arg \max_{θ} \sum_{i = 1}^{n_{s}} \log P (v^{i} ∣ θ) . \end{matrix}$ (4)

The stochastic gradient ascent is usually used to get the optimum parameter $θ^{*}$ [23]. The critical part during this process is to calculate partial derivatives of $\log P (v^{i} ∣ θ)$ with respect to every model distribution parameter. The joint distribution of the visible and hidden units is involved, and this distribution will be determined only if $Z (θ)$ is obtained which will need $2^{m + n}$ times of calculation.

From the current researches, the approximate value of joint distribution can be obtained by some sampling methods like Gibbs Sampling [24], CD, and so on. However, most of these methods have the defect that the RBM training is very complex because of the frequent state transitions and large quantities of sampling. We propose applying alternative iteration algorithm to RBM training. When the model parameter cannot be calculated because of some other uncertain parameters, alternative iteration algorithm can be applied to get the maximum estimated value of these parameters using iteration strategy.

4.2. The Alternative Iteration

The alternative iteration algorithm is a common method to solve optimization problems. For instance, there is a maximization problem $\max f (x, y)$ . Firstly, x keeps unchangeable and y is changed to increase function f. Then, y keeps unchangeable and x is changed to increase f. The two operations are carried out alternatively until f cannot increase any more.

The alternative iteration algorithm is adopted in this paper to solve the problem of RBM training. The likelihood function l in RBM training is defined as $\begin{matrix} m a x l (θ, z), \end{matrix}$ (5) where x is the formalizing parameter z and y denotes the model parameter θ.

The traditional way of getting the value of θ is to maximize $l (θ)$ by using maximum likelihood estimation. Generally, we intend to get the maximum model distribution parameter θ which can maximize the likelihood function. But the normalizing parameter z is involved in this process, which makes it quite difficult to calculate the likelihood function. However, it will be easier to get the model parameter when z is already known. The alternative iteration algorithm is firstly adopted in RBM in this paper. The problem of training RBM can be considered as the double parameters solving problem. For the two groups of unknown parameters z and θ, we firstly keep θ unchangeable to get the expression of z. With θ in this expression, the likelihood function of z can be obtained. And then we keep estimated z unchangeable and deduce the maximization function of θ based on the marginal distribution of the joint distribution of visible and hidden layers in RBM. The improved algorithm will carry out the two operations alternatively until it satisfies the termination conditions.

4.3. The Process of Calculating RBM Parameters by Alternative Iteration

Assume that there is a group of training samples $V = \{v^{1}, v^{2}, v^{3}, \dots, v^{i}, \dots\}$ , $v^{i} = (v_{1}^{i}, v_{2}^{i}, \dots, v_{n_{v}}^{i})$ , $i = 1,2, \dots, n_{s}$ , where $n_{v}$ is the dimension of input sample whose value is equal to the number of units in the visible layer and $n_{s}$ is the number of samples. All of these samples are independent of each other. After the input of sample $v^{i}$ and t times of iterations, the value of the model parameter is denoted by $θ_{t}^{i}$ and $z_{t}$ denotes the value of the normalizing parameter, where $t \in (0, T)$ and T is the maximum number of iterations. $θ^{i}$ denotes the final value of the model parameter when finishing the training with the input of sample $v^{i}$ . And $θ^{0}$ is the initial input value of the model parameter.

When the algorithm begins, the model parameter is firstly initialized by $θ^{0}$ . Then, we take the first sample $v^{1}$ as the input of RBM, and we set $θ_{0}^{1} = θ^{0}$ . The normalized parameter is then estimated and we get an initial value $z_{1}$ based on $θ_{0}^{1}$ . The model distributed parameter is estimated and changed to $θ_{1}^{1}$ based on $z_{1}$ . Continue to estimate the above two parameters alternatively until the convergence condition is satisfied or it reaches the maximum iteration times T. The final value of the model parameter, which is obtained by sample $v^{i}$ , is denoted by $θ^{i}$ . After that, the model parameter is denoted by $θ^{1}$ . It is the initial value of the model parameter when inputting the second sample $v^{2}$ , which means $θ_{0}^{2} = θ^{1}$ .

When sample $v^{i}$ and model parameter $θ_{t}^{i}$ are inputted, we need to consider the objective function of $z_{t + 1}$ . Assume that $z_{t} = Z (z^{i} ∣ θ_{t}^{i})$ is the distribution of the normalized parameter of the sample when $θ_{t}^{i}$ keeps unchanged, where $z^{i}$ is the normalized parameter of sample $v^{i}$ . The satisfied conditions of $Z (z^{i} ∣ θ_{t}^{i})$ are $\sum_{z^{i}} Z (z^{i} ∣ θ_{t}^{i}) = 1$ and $Z (z^{i} ∣ θ_{t}^{i}) \geq 0$ . Because the log function is concave, we can calculate the approximate expression of z by using the Jensen inequality. Then, we can derive the following equation: $\begin{matrix} \sum_{n = 1}^{n_{v}} \log p (v_{n}^{i}; θ_{t}^{i}) = \sum_{n = 1}^{n_{v}} \log \sum_{z^{i}} p (v_{n}^{i}, z^{i}; θ_{t}^{i}) . \end{matrix}$ (6)

Multiply the denominator and numerator of the right fraction in (6) by $Z (z^{i} ∣ θ_{t}^{i})$ : $\begin{matrix} \sum_{n = 1}^{n_{v}} \log p (v_{n}^{i}; θ_{t}^{i}) = \sum_{n = 1}^{n_{v}} \log \sum_{z^{i}} Z (z^{i} ∣ θ_{t}^{i}) \frac{p (v^{i}, z^{i}; θ_{t}^{i})}{Z (z^{i} ∣ θ_{t}^{i})} . \end{matrix}$ (7)

We can deduce from the Jensen inequality and the property concave function the following equation: $\begin{matrix} \sum_{n = 1}^{n_{v}} \log \sum_{z^{i}} Z (z^{i} ∣ θ_{t}^{i}) \frac{p (v_{n}^{i}, z^{i}; θ_{t}^{i})}{Z (z^{i} ∣ θ_{t}^{i})} \geq \sum_{n = 1}^{n_{v}} \sum_{z^{i}}^{} Z (z^{i} ∣ θ_{t}^{i}) \log \frac{p (v_{n}^{i}, z^{i}; θ_{t}^{i})}{Z (z^{i} ∣ θ_{t}^{i})} . \end{matrix}$ (8)

Equation (8) is true if and only if $p (v_{n}^{i}, z^{i}; θ_{t}^{i}) / Z (z^{i} ∣ θ_{t}^{i}) = c$ , where c is a constant which is independent of $z^{i}$ .

According to $\sum_{z^{i}} Z (z^{i} ∣ θ_{t}^{i}) = 1$ , we can draw the following equation: $\begin{matrix} Z (z^{i} ∣ θ_{t}^{i}) = \frac{p (v_{n}^{i}, z^{i}; θ_{t}^{i})}{\sum_{z^{i}} p (v_{n}^{i}, z^{i}; θ_{t}^{i})} = \frac{p (v_{n}^{i}, z^{i}; θ_{t}^{i})}{p (v_{n}^{i}; θ_{t}^{i})} = p (z^{i} ∣ v_{n}^{i}; θ_{t}^{i}) . \end{matrix}$ (9)

When $θ_{t}^{i}$ keeps unchangeable, we maximize $p (v^{i} ∣ z_{t})$ , which is the same as maximizing $\ln p (v^{i} ∣ z_{t})$ . Hence, $\begin{matrix} z_{t + 1} = \underset{z_{t}}{\arg \max} l (z_{t}) = \underset{z_{t}}{\arg \max} \sum_{n = 1}^{n_{v}} \ln p (v_{n}^{i} ∣ z_{t}) . \end{matrix}$ (10)

The normalized parameter can be estimated and we get a value $z_{t + 1}$ .

At this time, we need to choose the equation to calculate $θ_{t + 1}^{i}$ . When $z_{t + 1}$ and $θ_{t}^{i}$ are all already known, we can get the joint probability distribution of $(v^{i}, h)$ based on (1): $\begin{matrix} p (v^{i}, h ∣ θ_{t}^{i}) = \frac{e^{- E (v^{i}, h ∣ θ_{t}^{i})}}{Z (θ_{t}^{i})}, \end{matrix}$ (11) where $Z (θ_{t}^{i})$ is the normalized parameter $z_{t + 1}$ which is obtained above.

We can get the marginal distribution of the joint probability distribution $p (v^{i}, h ∣ θ_{t}^{i})$ based on the derivation equation of the original RBM: $\begin{matrix} p (v^{i} ∣ θ_{t}^{i}) = \frac{1}{z_{t + 1}} \sum_{h} e^{- E (v^{i}, h ∣ θ_{t}^{i})} . \end{matrix}$ (12)

Then, we keep $z_{t + 1}$ unchanged and get a value $θ_{t + 1}^{i}$ of the model parameter: $\begin{matrix} θ_{t + 1}^{i} = \underset{θ_{t}^{i}}{\arg \max} l (θ_{t}^{i}) = \underset{θ_{t}^{i}}{\arg \max} \sum_{n = 1}^{n_{v}} \ln p (v_{n}^{i} ∣ θ_{t}^{i}) . \end{matrix}$ (13)

However, the initial value $θ^{0}$ we assigned to the model parameter may not be suitable for the model. In that case, we can update the value of the model parameter by iterative optimization based on the alternative iteration algorithm. Thus, sample $v^{i}$ can be used to estimate a value of $θ^{i}$ . $θ^{i}$ which is obtained by the training of former sample can be used as the initial value of $θ^{i + 1}$ . $θ^{i + 1}$ is the model parameter which is about to be estimated based on the next sample. Repeat the optimization operations until termination conditions are satisfied.

The improved RBM algorithm is described in Algorithm 1.

Algorithm 1:The RBM algorithm based on alternative iteration algorithm.

Setting the convergence threshold σ, the terminal threshold ε

Input: the value of the model parameter by pre training $θ^{0}$ , maximum

iteration times T, number of hidden units $n_{h}$

Output: the final value of the model parameter $θ^{i}$

For $i = 1,2, \dots, n_{s}$ (for all samples)

For $t = 1,2, \dots, T$

Computing $z_{t + 1}$ , $z_{t + 1} = a r g \max_{z_{t}} l (z_{t}) = a r g {m a x}_{z_{t}} \sum_{n = 1}^{n_{v}} \ln p (v_{n}^{i} ∣ z_{t})$

Computing $θ_{t + 1}^{i}$ , $θ_{t + 1}^{i} = a r g \max_{θ_{t}^{i}} l (θ_{t}^{i}) = a r g {m a x}_{θ_{t}^{i}} \sum_{n = 1}^{n_{v}} \ln p (v_{n}^{i} ∣ θ_{t}^{i})$

Judging the reconstruction error of the model reaches σ

End for

Judging the difference of the likelihoods of the two RBMs defined

by the adjacent parameters is within $(0, ε)$

End for

5. Simulation Experiments and Results Analysis

The experiment consists of three parts: the performance analysis of RBM; the analysis of the compression performance of the proposed image compression method and the evaluation of reconstructed image quality; the analysis of energy consumption in WSNs when multilayer RBM network image compression method is used. MATLAB 2013a is used to carry out the simulations.

5.1. Performance Analysis of the Improved RBM

The datasets of our experiment are the famous handwritten digital database of MNIST [25] and the toy dataset. The MNIST dataset consists of 50,000 groups of training samples and 10,000 groups of testing samples. Each group of samples consists of a grayscale image whose resolution is $28 * 28$ . There are handwritten Arabic numerals in the image. These Arabic numerals have their indexes so as to conduct the experiment with supervised learning. Part of the data samples in the MNIST dataset is shown in Figure 4.

Figure 4

Part of samples of MNIST database.

Compared with MINST dataset, the toy dataset is simpler and lower dimensional. It consists of 10,000 images. Each image has $4 \times 4$ binary pixels. The dataset is generated in the same way as that mentioned in [26].

We compare the proposed algorithm with PCD algorithm, parallel tempering algorithm (PT-K) [27], and parallel tempering with equienergy moves (PTEE) [28] in the experiments. In PT-K, K is the number of auxiliary distributions of parallel tempering under different temperatures. The value of each temperature is usually between 0.9 and 1. The parameter in PT-K can be easily controlled and in our experiments K is set to 5 and 10, respectively. Based on some preliminary experiments, we find that PT can achieve better likelihood scores than that of using 5 chains when using 10 chains. Result yielded by PTEE when using 5 chains is similar to that when using 10 chains, which means that PTEE cannot be affected by the number of Markov chains to some extent [28]. So, we show the results obtained by using PTEE and PT with 10 chains.

We evaluate their qualities by the likelihood of the RBM for training data with two methods: the reconstruction error and enumerating states of hidden units.

Firstly, we compare the reconstruction errors of four algorithms with different numbers of hidden nodes on the MNIST dataset and toy dataset. The first 30,000 groups of samples in MINIST are divided into three parts. Each part includes 10,000 groups of samples. The number of hidden units is set to 10, 15, 20, 25, 30, 50, 100, 200, 250, 300, and 350. The number of iterations on each part ranges from 1 to 45. And the average reconstruction errors of the three parts after 15 and 30 times of iterations are shown, respectively, below. Then, the experiments on the toy dataset are also executed. We set the number of hidden units to 10, 15, 20, 25, 30, 50, 100, 150, 200, 250, and 300. Results obtained by using PT-10, PTEE-10, PCD, and the proposed algorithm are shown in Figures 5–8.

Figure 5

The reconstruction errors of the four algorithms after 15 times of iterations on the MNIST dataset.

Figure 6

The reconstruction errors of the four algorithms after 30 times of iterations on the MNIST dataset.

Figure 7

The reconstruction errors of the four algorithms after 15 times of iterations on the toy dataset.

Figure 8

The reconstruction errors of the four algorithms after 30 times of iterations on the toy dataset.

From Figures 5 and 6, we can see that the average reconstruction error of the proposed algorithm is always smaller than the other three algorithms on the MNIST dataset. And we can get similar results from Figures 7 and 8.

Figures 5–8 show that all of the reconstruction errors of four algorithms decrease when the number of hidden units increases. When there are a small number of hidden units, the reconstruction error obtained by the proposed algorithm is close to that of the other three algorithms. However, when the number of hidden units increases, the superiority of the proposed algorithm appears gradually. And we can see decreasing ratios of the average reconstruction errors of PT-10, PTEE-10, and our proposed algorithm compared with PCD on the MNIST and the toy dataset when there are the same numbers of hidden units. When the number of hidden units is 350, after 30 times of iterations, the reconstruction error of the proposed algorithm is 26.60% lower than that of PCD on MNIST dataset. Under the same conditions and compared with PCD, PT-10 is 8.20% lower and PTEE-10 is 16.64% lower.

Next, a small scale of experiment with 15 hidden units is conducted on the MNIST dataset. The log-likelihood can be obtained by enumerating states of hidden units. Therefore, high accuracy can be achieved. Figure 9 shows the averaged log-likelihood by training each model 5 times.

Figure 9

The average likelihood of the four algorithms when there are 15 hidden units on the MNIST dataset.

Figure 9 shows that the likelihood of the proposed algorithm is not as good as the other three algorithms within 10,000 times of parameter updates. However, as the number of updates gradually increases, the likelihood for data of the proposed algorithm also increases which can be better than that of PTEE-10. When the number of updates is 30,000, the likelihood of PCD reaches the peak. And it decreases because the number of Gibbs transitions increases and the model distribution also gets steeper and steeper. PT-10 algorithm is a Monte Carlo method based on tempering. It has more even distribution when the temperature gets higher so that it can overcome the steep distribution difficulties by conducting state transitions from low to high temperatures. So it has a better effect than that of PCD. PTEE-10 proposes a new type of move called equienergy move, which improves the swap rates between neighboring chains to some extent. But after $18 * 5000$ times of parameter updates, the likelihood of PTEE-10 decreases gradually. However, the proposed algorithm will always try to skip the steep distributions by increasing the number of samples constantly. It has an overall effect which is comparable with PTEE-10 at the beginning. After $4 * 5000$ times of parameter updates, it has a better effect than the other three algorithms. To validate the efficiency of the proposed algorithm, experiments are conducted on the toy dataset. Only the PTEE-10 and our proposed algorithm are compared on the toy dataset since PTEE-10 is the most competitive model to our proposed algorithm. The number of hidden units is set to 10. RBMs are trained five times via the proposed algorithm and PTEE-10. The average likelihood scores are shown in Figure 10. We can conclude that the proposed algorithm works better than PTEE-10 from Figure 10.

Figure 10

The average likelihood of the two algorithms when there are 10 hidden units on the toy dataset.

Moreover, we also recorded the running time of different algorithms on one epoch and on one training sample when they are applied to the toy dataset and the MNIST dataset. All experiments were conducted on a Windows operating system machine with Intel® Core™ i5-3210M 2.50 GHz CPU and 8 GB RAM. Table 1 displays the results.

Table 1

The running time (in seconds) of different algorithms when they are applied to toy dataset and MNIST dataset.

Dataset	PCD	PT-10	PTEE-10	Our proposed algorithm
MNIST	17.497	183.23	165.35	40.273
Toy	0.937	10.562	8.419	3.017

Based on the results in Table 1, we can see that the running time of our proposed algorithm is less than PT-10 and PTEE-10.

From all the simulation results above, we can see that the reconstruction error of the proposed algorithm is better than that of PCD, PT-10, and PTEE-10. Under the same conditions, PT-10 and PTEE-10 perform better than PCD, but PT-10 and PTEE-10 will spend tenfold time. However, the experiment results obtained on MINST dataset show that the proposed algorithm has better performance over the three other algorithms, which is also validated on the toy dataset. Moreover, the proposed algorithm only takes 2 or 3 times the time that PCD will take.

5.2. Performance Analysis of the Multilayer RBM Network Image Compression Method

In this section, the $256 \times 256$ Lena image is used. Compression ratio is used to evaluate the compression performance. It is the data size ratio between the compressed image and the original image. It can reflect the storage size and transmission efficiency of an image. Peak signal-to-noise ratio (PSNR) and signal-to-noise ratio (SNR) are the two main criteria to evaluate the performance of the reconstructed image quality. For an image that has $M \times N$ pixels, $\begin{matrix} SNR = 10 \log \frac{\sum_{i = 1}^{M} \sum_{j}^{N} {[x (i, j)]}^{2}}{\sum_{i = 1}^{M} \sum_{j}^{N} {[x (i, j) - \hat{x} (i, j)]}^{2}}, \\ PSNR = 10 \log \frac{25 5^{2}}{(1 / M N) \sum_{1}^{M} \sum_{1}^{N} {[x (i, j) - \hat{x} (i, j)]}^{2}}, \end{matrix}$ (14) where $x (i, j)$ and $\hat{x} (i, j)$ , respectively, denote the gray values of the original image and the reconstructed image.

In this experiment, ROI image compression method [14] is compared with the proposed algorithm. The compression ratio of the ROI compression method is calculated based on the region of interest: $\begin{matrix} R_{R O I} = \frac{S_{i} + S_{b}}{S}, \end{matrix}$ (15) where $S_{i}$ denotes the size of interest region, $S_{b}$ is the background region size of the reconstructed image, and S represents the original image size.

In the proposed algorithm, the compression ratio is determined by the number of neural units in hidden layer: $\begin{matrix} R_{RBM} = \frac{H_{1} * H_{2} * \dots * H_{n - 1} * U}{M \times N}, \end{matrix}$ (16) where $M \times N$ is the number of units in the bottom input layer and it is the number of pixels of an image. $H_{i}$ is the number of nodes in the hidden layer $i_{}$ of RBM. U is the number of units in the output layer of network. n is the number of layers.

During the experiment, the number of hidden layer units U of RBM is set to 2, 4, and 8, respectively. We compare the reconstructed image quality of ROI compression algorithm with that of the proposed algorithm under the condition that the compression ratio is unchangeable. In a multilayer RBM network, the middle data quantification process will bring about some damage to image compression. Therefore, the increasing number of hidden layers will make the reconstructed image decline in quality. So we set the number of layers in RBM to 3. The experiment results are shown in Table 2.

Table 2

The SNR and PSNR of Lena image when using multilayer RBM network compression algorithm and interest based compression algorithm.

Methods	SNR (dB)	PSNR (dB)
$U = 2$
Multilayer RBM network	34.3152	49.9201
ROI	30.4617	47.1041

$U = 4$
Multilayer RBM network	42.6412	51.2351
ROI	36.7224	47.5036

$U = 8$
Multilayer RBM network	59.8027	74.2141
ROI	51.1349	60.2093

The compression ratio is in inverse proportion to the number of hidden units U. From the objective quality assessments of the reconstructed Lena image in Table 2, we can conclude that the quality of low compression ratio image is better than that of the high compression ratio image. When the compression ratio is high, although much storage space is saved, the reconstructed image cannot describe the image texture details. From Table 2, when the number of hidden units is 8, the PSNR of the multilayer RBM network compression method is 74.2141. At this time, the visual effects of the reconstructed Lena image are very close to that of the original image.

From Table 2, we can also conclude that the reconstructed image quality of multilayer RBM networks is superior to that of ROI compression method under the same compression ratio. The ROI compression algorithm can compress the interest region and the background region, respectively, and therefore it can get high compression ratio. But the overall reconstructed image quality is not good because of the high compression ratio of background region. In multilayer RBM networks, the compression ratio can be improved by setting the number of neural units in each layer of RBM. In addition, the training process in a multilayer RBM network is layered. The data from the bottom input layer to the bottom hidden layer is the first compression. The data from the first hidden layer to the second hidden layer is the second compression. The second compression is based on the first compression. RBM in each layer will compress the image and greatly remove the redundancy of the original image.

5.3. The Energy Consumption Analysis of Wireless Sensor Network

In this section, the energy consumption of a WSN is analyzed in the aspect of image transmitting. The energy consumed during the transmitting process can be calculated using the formula below: $\begin{matrix} E_{T x} = \sum_{i}^{k} (2 E_{elec} + ε_{amp} g_{i}^{2}) M, \end{matrix}$ (17) where $E_{e l e c}$ is the line loss in the electrical circuit, $ε_{a m p}$ represents the amplifier parameter, k is the number of nodes, $g_{i}$ is the distance between transmitting nodes, and M is the bit number of an image to be transmitted.

In the process of simulation, when cluster head nodes receive images, they transmit these images to coding nodes to carry out compression coding. Then, the compressed image is assigned to the transmitting node. In the experiment, we calculate the energy consumption of every transmitting node. We compare the proposed algorithm with the ROI lossy-lossless image compression method under three conditions: (1) no image compression methods are used; (2) only the multilayer RBM network compression method is used; (3) only the ROI compression method is used. We compare the energy consumption of transmitting nodes under the three conditions. In conditions (2) and (3), we compare the energy consumption of the two algorithms under the same SNR.

The parameter settings are as follows: $E_{e l e c} = 0.5 \times 1 0^{- 6} E J / b i t, ε_{a m p} = 1 \times 1 0^{- 9} E J / b i t / m^{2} .$ When the energy consumption values of transmitting nodes are compared, the distance between transmitting nodes is between 0 and 250 meters and the step size is 10 meters. The experiment results are shown in Figure 11.

Figure 11

The energy consumption of transmitting nodes.

Figure 11 shows that more energy is consumed when the transmitting distance increases. When the transmitting distances are the same, the energy consumption of transmitting nodes using multilayer RBM network is obviously smaller than that using ROI compression method. Although ROI compression method can code the interest region and background region, respectively, and get high compression ratio, it cannot ensure high quality of the reconstructed image. However, in the multilayer RBM network compression method, data redundancy is reduced in every layer and therefore it has high compression ratio.

We continue to find out the relationship between image compression performance and compression energy consumption. The image compression energy consumption can be calculated using the formula below: $\begin{matrix} E_{C} = N_{C} * C * V_{d d}^{2}, \end{matrix}$ (18) where $N_{C}$ is the time spent during the image compression process. C is the capacitance and $V_{d d}^{}$ is the voltage. Therefore, under the same compression environment, the compression energy consumption $E_{C}$ is only subject to $N_{C}$ . The proposed image compression method includes additional RBM training process. And the RBM is multilayered which will extend the training process. However, the RBM training process need not be carried out every time when images are compressed. When the RBM training process is finished, it can be used for all coding nodes.

We continue to test the total energy consumption in WSN when the three image compression algorithms are used, respectively, and Figure 12 shows the results.

Figure 12

Total energy consumption in WSN.

Figure 12 shows that although RBM training process extends $N_{C}$ , the total energy consumption of the proposed method is superior to the other two methods when the transmitting distance increases. Based on Table 2, the proposed image compression method has better reconstructed image quality than ROI under the same compression ratio. Therefore, we can conclude that the proposed method can ensure a better compression performance and smaller energy consumption at the same time.

6. Conclusions and Future Work

Image compression is an important research field in WSNs. It is difficult to find a comprehensive method of image compression because of the complex features in sensor networks. A multilayer RBM network based image compression method is proposed in this paper. And an improved RBM training algorithm based on alternative iteration is presented to improve the likelihood of RBM. However, there remain many problems to be solved when using multilayer RBM network to compress image. The multilayer RBM network can affect the delay in the sensor network. We should find more suitable normalizing parameter function during the RBM training process. Besides, the problem of finding routing path should also be considered. Therefore, we endeavor to find out more integrated image compression method so as to accelerate the application of WSNs in real life.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is sponsored by the Fundamental Research Funds for the Central Universities (no. LGZD201502),the Natural Science Foundation of China (nos. 61403208 and 61373139),and the Research and Innovation Projects for Graduates of Jiangsu Graduates of Jiangsu Province (no. CXZZ12_0483).

References

Sadowski

P. J.

Whiteson

Baldi

Searching for Higgs Boson decay modes with deep learning

Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS '14)

December 2014

Montreal, Canada

2393 2401

Ding

Zhang

Liu

Duan

Deep learning for event-driven stock prediction

Proceedings of the 24th International Joint Conference on Artificial Intelligence (ICJAI '15)

July 2015

Buenos Aires, Argentina

ACM

2327 2333

Chatzis

Echo-state conditional restricted boltzmann machines

Proceedings of the 28th AAAI Conference on Artificial Intelligence

2014

1738 1744

Osogami

Otsuka

Restricted Boltzmann machines modeling human choice

Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS '14)

December 2014

Montreal, Canada

73 81

Zhang

Sun

G.-L.

W.-X.

Gao

Research on data compression algorithm based on prediction coding for wireless sensor network nodes

Proceedings of the International Forum on Information Technology and Applications (IFITA '09)

May 2009

Chengdu, China

IEEE

283 286

10.1109/IFITA.2009.403

Xiang-Yu

Ya-Zhe

Xiao-Chun

Facing the wireless sensor network streaming data compression technology

Computer Science 2007 34 2 141 143

Wang

L.-C.

C.-X.

Based on a linear model of the space-time data compression algorithm in sensor networks

Electronics and Information Technology 2010 32 3 755 758

Wang

Zhou

S.-W.

Based on interval wavelet transform in hybrid entropy data compression algorithm in sensor network

Computer Applications 2005 25 11 1676 1678

Si-wang

Ya-ping

Jian-ming

Based on ring model of wavelet compression algorithm in sensor networks

Journal of Software 2007 18 3 669 680

10.

Tie-jun

Ya-ping

Si-wang

Based on adaptive multiple modules data compression algorithm of wavelet in wireless sensor networks

Journal of Communication 2008 30 3 48 53

11.

Zhou

S.-W.

Lin

Y.-P.

S.-T.

A kind of sensor network storage effective wavelet incremental data compression algorithm

Journal of Computer Research and Development 2009 46 12 2085 2092

12.

Luo

W.-H.

Wang

J.-L.

Based on chain model of distributed wavelet compression algorithm

Computer Engineering 2010 36 16 74 76

13.

Xiang-Hui

Shi-Ning

Peng-Lei

Adaptive nondestructive data compression system of WSN

Computer Measurement and Control 2010 18 2 463 465

14.

Cai-xiang

Study on Image Data Compression Processing in Wireless Multimedia Sensor Network 2014

Xi'an, China

Chang'an University

15.

Hinton

G. E.

Training products of experts by minimizing contrastive divergence

Neural Computation 2002 14 8 1771 1800

10.1162/089976602760128018

ZBL1010.68111

2-s2.0-0013344078

16.

Sutskever

Tieleman

On the convergence properties of contrastive divergence

Journal of Machine Learning Research—Proceedings Track 2010 9 789 795

17.

Tieleman

Training restricted Boltzmann machines using approximations to the likelihood gradient

Proceedings of the 25th International Conference on Machine Learning

July 2008

Helsinki, Finland

ACM

1064 1071

18.

Tieleman

Hinton

G. E.

Using fast weights to improve persistent contrastive divergence

Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09)

June 2009

ACM

1033 1040

2-s2.0-71149084943

19.

Desjardins

Courville

Bengio

Adaptive parallel tempering for stochastic maximum likelihood learning of RBMs

Neural Information Processing Systems (NIPS) 2010

MIT Press

20.

Zhou

Improving mixing rate with tempered transition for learning restricted Boltzmann machines

Neurocomputing 2014 139 328 335

10.1016/j.neucom.2014.02.024

2-s2.0-84901064273

21.

Markov chain Monte Carlo based improvements to the learning algorithm of restricted Boltzmann machines [M.S. thesis] 2012

Shanghai, China

Shanghai Jiao Tong University

22.

Bengio

Courville

A. C.

Vincent

Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives 2012

Montreal, Canada

Department of Computer Science and Operations Research, University of Montreal

23.

Fischer

Igel

Training restricted Boltzmann machines: an introduction

Pattern Recognition 2014 47 1 25 39

10.1016/j.patcog.2013.05.025

2-s2.0-84885023116

24.

Fischer

Igel

An Mpirical analysis of the divergence of Gibbs sampling based learning algorithms for restricted Boltzmann machines

Artificial Neural Networks-ICANN 2010: 20th International Conference, Thessaloniki, Greece, September 15–18, 2010, Proceedings, Part III 2010 6354

Berlin, Germany

Springer

208 217 Lecture Notes in Computer Science

10.1007/978-3-642-15825-4_26

25.

LeCun

Bottou

Bengio

Haffner

Gradient-based learning applied to document recognition

Proceedings of the IEEE 1998 86 11 2278 2324

10.1109/5.726791

2-s2.0-0032203257

26.

Desjardins

Courville

Bengio

Parallel tempering for training of restricted Boltzmann machines

Journal of Machine Learning Research Workshop & Conference Proceedings 2010 9 145 152

27.

Cho

K. H.

Raiko

Ilin

Parallel tempering is efficient for learning restricted Boltzmann machines

Proceedings of the International Joint Conference on Neural Networks (IJCNN '10)

July 2010

Barcelona, Spain

1 8

10.1109/IJCNN.2010.5596837

28.

Zhang

Parallel tempering with equi-energy moves for training of restricted boltzmann machines

Proceedings of the International Joint Conference on Neural Networks (IJCNN '14)

July 2014

Beijing, China

120 127

10.1109/ijcnn.2014.6889634

2-s2.0-84908472621

A Multilayer Improved RBM Network Based Image Compression Method in Wireless Sensor Networks

Abstract

1. Introduction

2. Related Work

3. Image Compression Using Multilayer RBM Network

4. An Improved RBM Algorithm Based on Alternative Iteration Algorithm

4.1. The RBM Model

4.2. The Alternative Iteration

4.3. The Process of Calculating RBM Parameters by Alternative Iteration

Algorithm 1:The RBM algorithm based on alternative iteration algorithm.

5. Simulation Experiments and Results Analysis

5.1. Performance Analysis of the Improved RBM

5.2. Performance Analysis of the Multilayer RBM Network Image Compression Method

5.3. The Energy Consumption Analysis of Wireless Sensor Network

6. Conclusions and Future Work

Footnotes

Conflict of Interests

Acknowledgments

References