Abstract
1. Introduction
With the developing of wireless sensor network technology, a variety of applications based WSN appear, such as land cover classification [1], SCR node detection in vehicular network [2], fault detection [3, 4], and groundwater quality estimation [5]. Traditionally, these applications analyze sample data in a fusion center [6]. However, when a large scale of WSN contains thousands of sensors, the performance for processing the sampling data is limited by the fusion center's hardware, which is too expensive to be updated frequently. Moreover, the network transmitting consumes a large amount of power, especially for wireless relaying nodes.
Data mining techniques, which have been developed to extract useful information from massive data for years, are considered to be an effective tool for analyzing massive data. In the 1990s, shallow data mining models like support vector machine (SVM), boosting, and logistic regression are proposed. And they have successfully been used in massive data analysis since 2000 [7]. Using a shallow data mining algorithm could improve the fusion center's analysis performance, but the power consumption problem is still unsolved. Or we can execute these algorithms in the sensors to reduce the transmitting data amounts, but these algorithms are usually too complex to be executed in the wireless sensors.
In 2006, Professor Hinton [8] proposed a deep data mining model called deep neural network, which could be used to extract the internal representation and reduce the data dimensionality. It has helped researches achieve the state-of-the-art results on voice recognizing, image recognizing, and semantics analysis [9–11]. Moreover, DNN employs a layered structure, which can be divided by layers and executed in different hierarchies of the WSN.
In order to improve the fusion center's data mining performance and save the transmitting power consumption, this paper proposes a distributed data mining method based on DNN for WSN. Section 2 briefly introduces the DNN and points out the problems needed to be solved in detail. Section 3 presents the principle of the distributed data mining method based on DNN. Section 4 proposes the training method of the DNN, as well as the tradeoff between calculating and transmitting power consumption. Simulation is presented in Section 5 to verify the validity of proposed method. A conclusion is given in the last section.
2. Preliminaries and Problem Formulation
2.1. Introduction of DNN
Although there is no exact definition of DNN, we can depict it with some typical features, like self-taught learning ability, internal representation exacting ability [12], and building multilayer perceptron (MLP) with more than one hidden layer [13]. Actually, the self-taught learning ability and internal representation exacting ability of DNN are developed based on a basic neural network called autoencoder (AE). An AE network can be trained without any predefined labels, which saves a large amount of manual work.
In this subsection, we introduce the principle of AE with a three-layer network (3-2-3) shown in Figure 1. Vector

Illustration of autoencoder network.
Now we can build deep neural networks like Figure 2. The main idea is training these layers one by one, which is called greedy layer-wise training (GLT) [14]. If we want to train layer

Structure of DNN.
2.2. Advantages of Applying Distributed DNN Data Mining to WSN
In this subsection, we discuss what advantages can be brought by using a distributed data mining based on DNN. Besides the advantages of DNN introduced in Section 2.1 using a distributed structure can bring more advantages especially for WSN. In conclusion, we can gain at least four advantages.
There is no need to label amounts of training data manually for different applications, and the training can be finished automatically. The internal representations can be combined with other data mining algorithms, improving these algorithms to achieve better results. The dimensionality ability of DNN can reduce the transmitting data via WSN and save the WSN's power. The distributed calculating reduces the calculating burden of the fusion center, which can save a lot of money for updating hardware.
2.3. Challenge of Applying Distributed DNN Data Mining to WSN
Before we use a distributed DNN data mining structure for WSN, there are two challenges needed to be overcome.
One challenge is training the distributed layers of DNN. When using a distributed data mining structure, some of the nodes in WSN need to take the data mining task. And such a node is called a calculating unit in this paper. Obviously, we need to ensure consistency of the data processed by these distributed calculating units. This means that the DNN layers in each distributed calculating unit have the same parameters. Generally, training these distributed DNN layers in each calculating unit separately may lead to different parameters. A solution is training the DNN in the fusion center and sending the corresponding parameter to every calculating unit. However, training in the fusion needs number of samples, which are transmitted via the WSN. And transmitting these samples consumes lots of the WSN's power. Thus, the training problem becomes the first challenge.
The other challenge is the tradeoff between calculating power consumption and transmitting power consumption. When the calculating units join the data mining process, extra power consumption is needed to support the calculating. And this may counterbalance the saving power by reducing transmitting data. Pottie and Kaiser [16] pointed out that the power consumption of transmitting a bit to 100 meters away equals the power consumption of executing about 3000 instructions. Such a relationship between calculating and transmitting power consumption infers that the design of distributed data mining should trade off these power consumptions. And this is also a challenge.
3. Principle of Distributed Data Mining Based on DNN
In this section, we introduce the principle of distributed data mining based on DNN proposed in this paper. Consider that there is a WSN with a fusion center aggregated by three levels (Figure 3(a)) and a 3-layer DNN (Figure 3(b)). We can note that the topology of WSN and the structure of DNN are similar in hierarchy. A feasible solution is dividing the DNN into layers and putting them into different levels of the WSN. Figure 3(c) gives an example of dividing the DNN into two parts and putting them in the fusion center and all sensors.

Principle of distributed data mining.
Generally, assume that a WSN is aggregated by Step 1: let Step 2: calculating the inputs from the calculating units of former level, if Step 3: if Step 4: send data to the fusion center. Step 5: data mining is finished.
4. Training and Design Methods
We propose the solutions for the challenges referred to in Section 2.3 in this section. In Section 4.1 a random source data selection method is used to solve the training problem, and the tradeoff between processing and transmitting power consumption is discussed in Section 4.2.
4.1. Training the Distributed DNN
Before applying DNN to data mining, we need to train the DNN in the fusion center at first. As shown in Figure 4, the training data are sampled from all the WSN sensors, and the trained DNN parameters are sent to the DNN layers distributed in different calculating units.

DNN training process.
Although a wireless sensor network can supply a mass of training data, these data also consume a lot of the network's power. Actually, a sensor's sample data do not change in a short time. Thus we can choose one of them to train the DNN. The problem is that we do not know when the data change. A random data selection method [17] has been proved useful in solving this problem, and a digital recognition research [18] showed that a random selection of 10% training can achieve a good result. So, a random selection method can effectively reduce lots of redundant data to be transmitted.
Then, we give the training flow with a random selection method as follows.
Step 1: the fusion center randomly generates a sensor's ID and sends a request to the sensor. Step 2: the selected senor gets the request and sends the sample data. Step 3: fusion center receives the training data from the selected senor and sends the data to the GLT algorithm. Step 4: the GLT algorithm checks whether the training result achieves the stop condition. If YES, go to Step 5. Else, go to Step 1. Step 5: the fusion center sends each part of the DNN's configuration data to the corresponding calculating unit.
4.2. Tradeoff between Calculating and Transmitting Power Consumption
The distribution hierarchy of DNN depends on its application. However, any distribution hierarchy should be constrained by power consumption. In this subsection, we discuss the rule of designing the distribution hierarchy based on the tradeoff between calculating and transmitting power consumption.
Assume that there is a calculating unit, and it executes
5. Simulation
5.1. Simulation Description
To verify the distributed data mining method, we create an application scenario of fault detection in Matlab 2010a. The DNN's structure contains two parts (shown in Figure 5), the data representation analysis part and the classifying part. The former part uses a 2-layer AE network to extract the internal representations of the sample data. And the other part uses a Softmax regression algorithm. Both parts adopt Sigmoid function as the activation function.

DNN for the simulation application.
The simulated WSN has three levels, one fusion center, ten transmitting relays, and two hundred wireless sensor nodes. Every sensor is a calculating unit with an ARM9 CPU. And the mean distance between each sensor is 100 meters. The source of the simulation sample data is KDD99 database, which have 41 fields. Each sensor samples 15,000 raw data. 300,000 sample data are labeled manually with 23 types. 1/3 of them are used to train the Softmax algorithm and the remaining data are used to test the algorithm.
This paper uses three criterions to verify the proposed method, calculating share rate, fault detection rate, and power consumption rate.
Assume that the calculating task taken by WSN needs executing Fault detection rate is defined as the correctly detected fault counts divided by the total faults. Power consumption rate is defined as the calculating units’ power consumption of executing DNN divided by the power consumption without executing DNN.
5.2. Design the Distributed DNN
This
In ARM9, a multiple instruction needs seven execution cycles [23], which equals seven add instructions. However, Sigmoid function is more complicated, which is given in the following formula:
Parameters value for the calculating unit.
According to Table 1, if calculating unit calculates
5.3. Simulation Result
The calculating share rate is the first checking criterions, which can be directly calculated based on the data given by the simulation assumption. In Table 1,
Then the simulation checks the effect on fault detection rate with different hidden layer size. Result in Figure 6 shows that when the hidden layer has more than 15 neurons, the detection rate becomes stable. And if the hidden layer has less than 12 neurons, the detection rate decreases rapidly. This simulation infers that the fault detection rate does not increase lineally with the hidden layer size. And this verifies that the raw data has lots of redundant information, and the DNN can effectively extract the internal representations to help improving the data mining.

Fault detection rate with different hidden layer size.
Moreover, Table 2 lists the state-of-the-art results of four different data mining algorithms. Compared to Figure 6, when the hidden layer size is bigger than 16, half of detection rates are better than these rates listed in the table. Then, we can assert that the training method is effective, and the distributed data mining method based on DNN improves the data mining's performance.
State-of-the-art result of different algorithms.
To check power consumption rate of the two strategies referred to in Section 5.2, we run another simulation. Figure 7 gives the simulation results. As shown in the figure, both ratios increase as the hidden layer size increases. The difference between the two cases is quite small.

Tradeoff between processing and power consumption.
Combining the result in Figures 6 and 7, setting the hidden layer size 16 is quite reasonable. Then we get the calculating sharing rate is 64.06%, and the power consumption rate is 41.169%.
In conclusion, the above simulations verify the four advantages declared in Section 2.2. Then we can assert that the D-DMBDD method achieves its goal. Moreover, the training and design methods are also proved valid.
6. Conclusion
In this paper, we have presented a distributed data mining method for WSN based on DNN by solving two challenges, which are training the distributed layers of DNN and tradeoff between calculating power consumption and transmitting power consumption. The proposed solution can learn internal representations from unlabeled data collected by distributed sensors. And these representations improve data mining results. Additionally, a distributed DNN solution saves both power consumption of WSN and costs of updating hardware for mass data processing. An application simulation verifies the validity of this method. The results show that performance of data mining for WSN has been improved. The distributed calculating mode is especially suitable for large scale WSN. As a future work, we are planning more researches for additional improvements with sample data noise filtering and data mining with deeper DNN layers.
