Abstract
Introduction
Wireless sensor network (WSN) data are the result of measuring the available values from the physical environment. Sensor data are already widely used in everyday life. The types of data that can be obtained from the sensors are temperature, brightness, motion, chemical values, and biological signals. These data can be used to observe human health conditions, system conditions, or environmental information. It can also continuously monitor the data coming from the sensor, and if the sensor value satisfies a certain condition or goes out of the reference value, it can handle the appropriate task corresponding to the situation.
With the rapid enhancement of hardware in recent years, the WSN environment becomes more complex and more versatile. Munir et al. 1 proposed a WSN architecture composed of nodes with multi-core, and the data collected in this environment can handle high-level applications compared to the past. As the quality of sensor data is improving, researches are being conducted to analyze WSN data by machine learning. Machine learning is a technique for analyzing the meaning of new data based on the history of past data. In the case of streaming sensor data, data can be collected and stored continuously; therefore, we can analyze and monitor sensor data using various machine learning techniques like classification or regression. Alsheikh et al. 2 presented machine learning studies related to WSN that various machine learning algorithms are applicable to WSN because WSN covers a very wide range of information and communication technology (ICT) applications. Especially, the high level of machine learning represented by deep learning shows fairly accurate classification and prediction performance. Thus, when new WSN data arrive, we can predict what the system will potentially be concerned with the ICT domain.
However, in an environment where temporal processing is important, it may be difficult to stably analyze the streaming sensor data. For machine learning analysis, it is necessary to construct several sensor data as one feature vector. Also, the feature vector should consist of data of the same time window. In an environment like WSN, data may not arrive within a definite time or may be imprecise due to problems such as network delay, hardware limitations of device, and complex natures of the surroundings. 3 In this case, if we wait for the data to use the completed vector, it will not be able to produce the analysis results within the deadline, and we will not get the results available in time-critical environments. On the other hand, if the analysis is performed without considering the missing data, the results can be obtained within a definite time, but the accuracy of the analysis is low and the monitoring stability can be deteriorated.
In this article, we propose a query processing system for analyzing streaming sensor data collected from WSN. Especially, we design real-time monitor which can perform deep learning–based analysis within the deadline considering time-constrained environment. Simulation using the published data and the performance of the deep learning under time constraint are reported and discussed to assure the effect of the proposed technique.
Query processing framework for analyzing sensor data
Overview
In WSN system, the base station, a powerful server outside of the WSN, collects sensor data and processes the data to utilize them. 4 In addition, it monitors the overall situation. Our approach is to equip a deep learning analyzer into the base station so as to provide useful analysis results. The overall process flow of sensor data collection, deep learning analysis, and results reporting is shown in Figure 1.

Overview of WSN data processing framework.
As shown in Figure 1, the distributed WSN nodes transmit data to the base station server via the main sink node. The base station server collects data and performs basic processing. The collected data are stored in a database, the sensor measurement value is checked to identify whether the measured value satisfies a specific constraint condition, and the result is informed to the user. In addition, the proposed method queries the data to the deep learning analyzer to perform in-depth analysis of the current situation. For example, even though the individual sensor values do not exceed the constraints and seem to be unproblematic, the deep learning analyzer can produce a prediction that it is in a state of danger when comprehensively assessed. The predicted information is used to trigger the appropriate actions or be delivered to the user.
Architecture design
The architecture of the base station server is shown in Figure 2. In the figure, the solid rectangle represents the module, and the dotted rectangle represents the decomposition of logical function. The following is a description of the major modules.

Architecture of the proposed base station monitor.
Streaming data collector
The
Deep learning model generator
In order to perform deep learning, a training model must be created in advance. The
Deep learning query generator
The data acquired from the sensor needs to be generated as a query for deep learning analysis. The query is the result of constructing each sensor data into a feature vector. The query generator holds the recently arrived data in the temporary memory database and constructs the feature vector at the point where the deep learning query is to be executed. In order to construct the values of several heterogeneous sensors into one feature vector, it needs to be synchronized with time. Therefore, the value of each sensor data is extracted based on a time window of a predetermined size and is generated as a query. However, since certain feature values may be imprecise depending on the situation, we perform query refinement to compensate for this. The completed query is sent to the
Query processor
The
Deep learning predictor
The
Monitoring condition checker
The basic roles of the base station server are to check the value of the sensor data, to grasp the current situation, to respond appropriately, and to notify the manager. The
Deep learning analysis
In this study, we use deep learning which shows good performance in recent data processing among machine learning methods.
5
Deep learning was introduced in the 1980s, but it was not well used for problems such as local optimal convergence problems and excessive computation. As learning algorithms improve and hardware performance improves, they are attracting attention again. Deep learning is most notable because it shows higher accuracy than previously known machine learning algorithms. Currently, our architecture is designed to use deep learning techniques; however, it can be easily integrated with other machine learning algorithms requiring less computation power depending on the situation. The deep learning technique described in this section is the
Figure 3 shows the architecture of the
In the first convolution layer step, 200 convolution filters of 3 × 3 size are used. In our experimental environment, there are 72 features. The convolution filter extracts 3 × 3 convolution features from 8 × 9 input data and collects data from 200 filters in different ways. The size and number of filters can affect the analysis depending on factors such as the size, type, and complexity of the data. Therefore, we determined the values that show good performance through experiments.
In the convolution feature extracted from the convolution layer, a characteristic value is selected through a
The proposed model consists of two stages of convolution and
The values of the convolution layer and the
In

CNN model architecture used in our deep learning engine.
The prediction process is also very similar to the learning process described above. The above three steps (1–3) are the same as the prediction method.
Deep learning analysis of sensor data in real-time system
Guaranteeing timing constraints for streaming sensor data analysis
We focus on real-time processing of deep learning. In time-critical environment such as cruise control, online stock trade, and medical device, since violation of time constraints can cause great disaster, specification for timing constraints and its monitoring are very important. In a real-time system, task processing must be deterministic, and operating system capable of priority-based task scheduling is essential. However, it is not a complete real-time system that only a high-priority task is not preempted by another task. It is more important that the task finishes execution within the deadline and must be guaranteed when the time constraint is hard-real-time. In this section, we present how our framework guarantees timing constraints in streaming sensor data analysis. Figure 4(a) shows the process of collecting various sensor data from WSN and analyzing them in deadline time.

Example for temporal processing of sensor data: (a) normal processing example and (b) deadline missed example.
First, the horizontal axis of the figure represents the time, and three types of sensor values A, B, and C are being collected by the base station over time. Using these data, the deep learning task performs analysis periodically. We assume a real-time system, so deep learning task has constraint that must be completed between release time and deadline. This task combines the collected sensor data to generate a query and inputs the query into a deep learning analyzer for query processing.
However, sensor data may not always be fully collected due to network delays or differences in sampling period. Figure 4(b) shows a situation where data B arrives late. Deep learning task tries to perform query generation after release time but pending due to missing data. After B arrives, the query is completed and analysis is performed. However, when the execution is completed, the deadline is missed and the real-time constraint is not satisfied. We therefore propose a query refinement to solve this problem. Figure 5 is an example of the proposed technique. In this example, the deep learning task starts without sensor data B like previous Figure 4(b). If there is missing data, we perform the query refinement to derive the expected B′ based on the recent history record and complete the query. After the normal query processing is performed, analysis results can be obtained within the deadline.

Proposed method for temporal processing of streaming sensor data.
A specification using
We specify RTL-like expressions for release events of deep learning tasks, query generation start and end events, and query processing start and end events:
@(
@(
@(
@(
@(
The deadline constraint can be expressed as:
@(
where
@(
On the other hand, the deep learning query processing is always performed at a deterministic time because the feature set of the same size is always used, while the execution time of the query refinement may be variable depending on the situation of the arrival data. Next statement shows delay time constraint.
@(
The real-time logic checker constantly monitors these conditions to ensure satisfaction of temporal constraints.
Query refinement
The streaming sensor data may be missing in some situations or may not be synchronized in time. Therefore, we perform refinement of inaccurate sensor data in order to reliably execute the deep learning query. Refinement of the query means complementing some inaccurate sensor data to generate a complete feature vector. The following is a process for refining a query:
Collect sensor data within the time window to be analyzed.
Identify missing sensor data.
Exponential smoothing is performed to estimate the missing sensor value(s).
Complete the query and send it to the deep learning predictor.
The exponential smoothing used in this article is also known as an exponentially weighted moving average (EWMA 8 )
where 0 <
Case study
Experiment settings
For the case study, we performed deep learning analysis using sensor data sets. The case study is a simple simulation of the proposed technique exploiting
Table 1 shows the data set information used in our experiments. There are 72 gas sensor data, and each sensor is used as a feature to construct a vector composed of 72 features. The value of the feature is an integer value representing the air gas value. We used 1,349,470 records using information acquired over time, and each feature vector has a fan speed as a class. The fan speed was defined as 30 classes by setting the range. In the case of deep learning, it is necessary to repeat training to reach a certain level of prediction accuracy. The prediction level was stabilized through 50 training iterations. Based on these models, we measured how accurately the fan speed was predicted when a new query was entered. The base station server used in the experiment was running on
Data set and experiment setting.
Experiments were performed on the query processing every 1 s period and set to 1 s for the deadline as a time constraint. We set the number of queries to 1000 each time, and the accuracy of prediction was analyzed by missing one sensor data every cycle. The purpose of the analysis is to predict the appropriate air cleaning fan speed according to the degree of air pollution. The ratio of the correct prediction is expressed as
Experiment result
Table 2 and Figure 6 show the prediction accuracy of the experiment. In the table,
Prediction accuracy for missing data growth.

Prediction accuracy for missing data growth.
The prediction accuracy is very high as 0.94 when all sensor data arrive normally. In the case of
Also, some features may have stronger prediction power than others. This means that the accuracy can be significantly reduced if a feature that has a large impact on predictive performance is missing. In order to confirm this, we experimented how the prediction accuracy changes by missing each feature one by one. Figure 7 shows the prediction result when one feature is missing. The X-axis is the 72 features aligned in order of the importance, and the Y-axis is the prediction accuracy when there is no such feature.

Prediction power of features.
We also measured the time spent in actual query processing. Table 3 shows the query processing time for 10 rounds. Each round is configured to process 1000 queries at once.
Query processing time.
Related work
WSN moves toward our daily lives such as smart office and smart home. 10 However, today’s advanced ICT environment requires intelligent WSN techniques more and more. In this regard, the convergence of WSN and machine learning is very active.
Alsheikh et al. 2 categorized machine learning as a way to contribute to WSN’s functional issues and how to improve non-functional requirements such as performance. Functional issues for machine learning include network routing, node clustering, data aggregation, event detection, and query processing. For non-functionality, it can be used for security, anomaly intrusion detection, quality of service, data integrity, and fault detection. Among them, our research is related to query processing and event detection.
Conventional WSN query processing was in directing tasks by checking that collected sensor data met certain conditions. However, machine learning–based query processing has been emerged to handle the appropriate action for the situation. In this technique, a supervised learning is performed in advance using a set of correct answers and use a new query to predict the current situation. Yu et al. 11 applied the neural network method to query processing for fire detection. In their study, a large number of sensors collect the forest data (e.g. temperature and relative humidity) and construct a neural network. Based on the neural network, when new sensor data is input, it is classified as a weather index to detect if there is a high possibility of a fire. Bahrepour et al. 12 also used a decision tree for early detection of the disaster.
By extending these studies further, our research attempts to converge deep learning and WSN query processing because deep learning is drawing attention in many areas of computer engineering concerned with machine learning. Abdel-Hamid et al.
13
used
Meanwhile, in the case of WSN, real-time processing tends to be important. For example, in a disaster detection system, WSN data should be analyzed and reported to the administrator as quickly as possible. More precisely, processing should be performed within a precise time condition rather than as fast as possible. To do this, we need to set a time constraint (deadline) and monitor it whether a successful analysis is being performed in a timely manner. For such monitoring, the time specification is required, and Jahanian and Mok 6 proposed a method to formalize timing properties in real-time systems. They defined various timing constraints in a formal language using the relationship between event timestamps. Based on various real-time specifications, Mok et al. 17 studied the size of the bound event history, and Song and Parmer 18 conducted monitoring studies to detect timing errors and deadlocks. Our research also monitors time constraints for computation-intensive environments like deep learning. This is because deep learning research is expected to shift to a domain in which time-critical factor is important as well as accuracy.
Conclusion
We propose a framework for analyzing and monitoring WSN data. Our query processing is based on deep learning, and the system is useful in environment where sensor data collection can be incomplete. The proposed method satisfies the temporal constraint because it uses the correction value through the query refinement process even when there is missing sensor data, and at the same time, the prediction accuracy is not significantly reduced. Through case studies, we confirmed the accuracy of deep learning analysis using real-world sensor data and the stability of task execution. We plan to predict and monitor more complex WSN environments that will further enhance the benefits of deep learning.
