Abstract
Introduction
In recent years, many research works have been done in the field of Internet of Things (IoT).1–3 These works can be classified into three visions: the thing-oriented vision, the Internet-oriented vision, and the semantic-oriented vision. 3 In the thing-oriented field, the wireless sensor network (WSN) is one of the critical research issues. With the WSN technique, the physical environment data can be collected and forwarded to the common network framework. Then, these data can be analyzed to make the smart decision. Up to now, the WSN has played a significant role in widespread application domains.4,5
As the development of the WSN, many research challenges emerged and need to be addressed. First, the preemptive multithreaded operating system (OS) needs to be implemented in the WSN so as to achieve the real-time scheduling. Yet, the multithreaded OSes are high in the data memory cost. As a result, they are not appropriate to run on most memory-constrained WSN nodes, for example, the moteZ node which has only 4-KB RAM. Thus, the optimization to the data memory cost of the multithreaded WSN system becomes significant. Second, the WSN nodes are commonly powered by the energy-limited batteries, and the nodes are difficult to be recharged after being deployed. Therefore, the energy conservation becomes critical for the WSN. 6 Third, the WSN nodes are prone to be deployed in the harsh environments (mine, field, etc.) where they are difficult to be recollected and maintained after being deployed. Consequently, the over-the-air reprogramming and the fault tolerance become significant. With these mechanisms, the WSN nodes can keep high availability and be reprogrammed remotely without the labor recollection.
To address the above challenges, many mechanisms have been investigated in the past. These approaches are effective; yet they are not sufficient for the proliferation of the WSN technique, for example, the energy limitation and the fault tolerance are still the big challenges which restrict the development of the WSN technique. Therefore, it is significant to investigate the new design concepts and the new research approaches which can address the above challenges more efficiently.
In this article, a new WSN platform termed LiveWSN is designed and implemented. LiveWSN targets to address the current WSN challenges such as the memory constraint, the energy limitation, the remote reprogramming, and the fault tolerance. On one hand, some new design concepts are applied in LiveWSN, such as the hierarchical shared-stack scheduling and the pre-linked native-code reprogramming. With these mechanisms, the memory cost of the WSN platform can be decreased significantly. Moreover, the reprogramming performance of the WSN nodes can be improved soundly. On the other hand, the new research approach which addresses the WSN challenges by combining both the software technique and the multi-core hardware technique (rather than only the software technique) is realized in LiveWSN. By means of the multi-core hardware infrastructure, the energy conservation and the fault-tolerant performance of the WSN nodes can be optimized efficiently. Due to the above mechanisms, LiveWSN becomes the WSN platform which is memory efficient, energy efficient, reprogrammable, and fault tolerant.
The structure of this article is organized as follows: In section “LiveWSN memory-efficient multithreaded scheduling mechanism,” the new LiveWSN hierarchical shared-stack scheduling mechanism is presented. With this mechanism, the memory cost of the multithreaded WSN system is optimized significantly. As a result, the real-time multithreaded OS can run even on the high memory-constrained WSN platforms. In section “LiveWSN memory-efficient and energy-efficient reprogramming mechanism,” the LiveWSN pre-linked native-code reprogramming mechanism is introduced. By this means, the application code of LiveWSN can be decoupled from the system code. As a result, only the application binary rather than the monolithic software binary needs to be reprogrammed, and the reprogramming performance can be optimized efficiently. In section “LiveWSN energy conservation mechanism,” the LiveWSN differential-sampling and multi-core energy conservation mechanisms are investigated. These mechanisms conserved the energy resource and prolonged the lifetime of the WSN nodes efficiently. In section “LiveWSN multi-core fault-tolerant mechanism,” the LiveWSN multi-core formal validation and fault-recovery schemes are discussed. By means of these mechanisms, the fault-tolerant performance of the WSN nodes can be optimized and the WSN network becomes more reliable. Finally, in section “Conclusions and ongoing works,” the conclusion and the ongoing works are presented.
LiveWSN memory-efficient multithreaded scheduling mechanism
The scheduling mechanisms of the WSN OS can be classified into two kinds: the event-driven scheduling and the preemptive multithreaded scheduling.7,8 In the event-driven system, the preemption cannot be performed. All the tasks are executed one by one within the global stack. Due to this reason, the data memory cost of the OS keeps low. Yet, the real-time performance of the system is decreased significantly since the time-critical tasks cannot be executed immediately once they are ready. To improve the real-time performance, the preemptive multithreaded scheduling needs to be realized. However, each thread in the multithreaded system needs to run in the independent stack which will be used to store the thread contexts when the threads are preempted. As a result, the data memory cost of the multithreaded OS is high. Since the data memory resources of the WSN nodes are constrained, the multithreaded OS is not feasible to run on the high memory-constrained WSN nodes. To address this challenge, the optimization to the stack memory cost of the multithreaded WSN systems becomes significant. In LiveWSN, this objective is achieved by the implementation of the hierarchical-scheduling mechanism (section “LiveWSN hierarchical scheduling”) and the shared-stack scheduling mechanism (section “LiveWSN shared-stack scheduling”). With these mechanisms, the stack number and the average stack size of the multithreaded systems can be reduced, respectively. As a result, the stack memory cost of the multithreading system can be decreased significantly.
LiveWSN memory-efficient multithreaded scheduling mechanism
The data memory cost of the multithreaded WSN OS can be denoted as follows
in which
Since each thread will be reserved an independent stack, the stack number
LiveWSN hierarchical scheduling
In LiveWSN, the optimization to the stack number
In LiveWSN, the above concept is realized by the hierarchical-scheduling mechanism. The WSN tasks are classified into two kinds: the preemptive tasks and the non-preemptive tasks (Figure 1). The preemptive tasks, for example, the real-time tasks, need the preemption operation. Thus, they are scheduled by the preemptive real-time rate-monotonic scheduling (RMS) algorithm, and each thread runs in an independent stack. The non-preemptive tasks do not need the preemption support. Thus, they can be scheduled by the first input–first output (FIFO) algorithm, and all of them run in a shared stack. By doing so, the stack number

LiveWSN hierarchical-scheduling mechanism.
It is assumed that the number of the preemptive tasks in the WSN system is
Since
LiveWSN shared-stack scheduling
With the hierarchical scheduling, the stack number
In the conventional multithreaded system, each thread runs in the independent pre-reserved stack, and the stack size is reserved statically by estimation. To avoid the overflow problem, the stack size is commonly reserved to the large value which can meet the worst-case scenario, for example, the stack size in uCOS is assigned to 128 bytes for the 8-bit AVR microcontroller. 10 Yet, not all the reserved stack memory will be used during the run time in this case. As a result, the memory waste problem will exist (Figure 2(a)).

(a) Thread stack allocation mechanism in the conventional multithreaded system and (b–d) LiveWSN shared-stack allocation mechanism.
Since the memory waste problem is caused by the pre-reserved static allocation, the shared-stack dynamic stack allocation is realized in LiveWSN. Instead of running the threads in the independent pre-reserved stacks, all the threads in the shared-stack LiveWSN run within a shared memory area, and the thread stacks are allocated dynamically as required (Figure 2(b)). Compared to the conventional static pre-reservation mechanism, the LiveWSN shared-stack dynamic allocation will cost less memory resources. For example, in Figure 2(a), 384-byte memory is needed for the allocation of the thread stacks (384 = 128 × 3). Yet, only 220-byte memory will be used in the shared-stack multithreading LiveWSN (220 = 72 + 66 + 82).
After being allocated within a shared memory area, the thread stacks will be adjacent to each other. Therefore, the execution of one thread can cause the stack data of the others to be corrupted, for example, in Figure 2(b), the stack data of the thread
It is assumed that the thread number is
Since the pre-reserved
In terms of equations (3) and (4), the LiveWSN shared-stack scheduling system will cost less data memory than the conventional multithreaded system.
Integration of the hierarchical scheduling and the shared-stack scheduling
The LiveWSN hierarchical scheduling and shared-stack scheduling can decrease the data memory cost of the conventional multithreaded OS by optimizing the stack number

Stack memory structure of the comprehensive-scheduling LiveWSN.
It is assumed that the number of the preemptive tasks is
In terms of equations (4) and (5), the memory cost of the comprehensive-scheduling LiveWSN is much lower than that of the conventional multithreaded WSN system.
Performance evaluation
In this section, the stack memory cost and the scheduling efficiency of the conventional multithreaded WSN OS MANTIS OS, 9 the hierarchical-scheduling LiveWSN, the shared-stack LiveWSN, and the comprehensive-scheduling LiveWSN are evaluated. The scheduling efficiency can be evaluated by counting the clock cycles during the thread switching process. In the multithreaded system, the thread switching process includes the operations of storing the current thread’s contexts, selecting the next thread to be executed, and restoring the next thread’s context. The evaluation is performed on the IWoT (Internet-of-things and Web-of-Things) node. IWoT node is equipped with the 8-bit AVR Atmega128rfa1 microcontroller (Figure 4), and the smart irrigation system (SIS) project 12 is used as the evaluation example.

Prototype board of the IWoT node.
In the SIS project, nine tasks are created. Three tasks are the real-time preemptive ones, while the others are the non-preemptive ones (namely,
Data memory cost and scheduling efficiency of the different multithreading mechanisms.
RMS: rate-monotonic scheduling; FIFO: first input–first output; WSN: wireless sensor network.
Discussion
From the results in Table 1, it can be seen that the scheduling overhead of the shared-stack threading is high, and this is because the context shifting needs to be performed when the thread is preempted in the shared-stack multithreading system. This result indicates that the shared-stack multithreading mechanism is not appropriate to be applied on the resource-constrained WSN platforms.
One effective way to improve the scheduling efficiency of the shared-stack threading system is to combine it with the hierarchical threading system. With the hierarchical threading scheme, the thread preemption will take place only among the preemptive threads, for example, the real-time threads. As a result, the thread preemption frequency can be decreased greatly. Since the context shifting will take place only during the thread preemption process, the optimization to the preemption frequency can cause the context shifting frequency to be lowered significantly. With the optimization to the shifting frequency, the total scheduling overhead of the shared-stack threading system can be decreased considerably. By doing this, not only can the memory cost of the multithreading system be reduced (section “Integration of the hierarchical scheduling and the shared-stack scheduling”) but also the scheduling overhead of the multithreaded scheduling system can be lowered.
LiveWSN memory-efficient and energy-efficient reprogramming mechanism
Since the application requirements may change over the time, it is practical and economic to reprogram the applications in the WSN. To avoid the works of recollecting the nodes by labor to achieve the reprogramming, the reprogramming needs to be performed remotely over the air. Yet, the wireless communication is high energy cost 13 and the WSN bandwidth is commonly constrained.14,15 Therefore, the optimization to the application reprogramming code size becomes significant in the WSN.
To optimize the application reprogramming code size, the decoupling of the application code from the system code is essential. With this decoupling, only the application part, rather than the monolithic software system, needs to be reprogrammed. As a result, the application reprogramming performance can be optimized efficiently. Currently, several mechanisms have been implemented to decouple the application code from the system code, and these mechanisms can be categorized into four kinds in terms of the code linking way and the code execution model, as depicted in Figure 5. The linking way includes the static pre-linking and the dynamic loading. Compared with the static linking, the dynamic linking shows the advantage in high flexibility; yet it has the drawback of higher memory cost. The execution model includes the direct execution (e.g. the machine-code binary) and the indirect execution (e.g. the interpreted Java bytecode). Compared with the direct-executive code, the indirect-executive code has higher portability as the code can be platform independent. Yet, the energy cost of the indirect-executive system will be higher.

Categorization of the different mechanisms to decouple the application code from the system code.
Concept and implementation of the LiveWSN reprogramming mechanism
Since the WSN platforms are constrained in the memory and energy resources, it is significant to develop a reprogramming mechanism which can decouple the application code from the system code with low memory and energy cost. In order to achieve this objective, it is essential to implement the static-linking machine-code mechanism (Figure 5). With the static-linking scheme, most of the reprogramming code will run on the personal computer rather than on the WSN nodes. By doing this, the memory and energy cost on the WSN nodes can be reduced. With the machine-code scheme, the application code can be executed directly by the processor. As a result, the execution overhead during the run time can keep low. Unfortunately, the static-linking machine-code reprogramming mechanism is currently not considered in the WSN research field as the application binary generated by this mechanism is regarded to be inflexible. 16 In case the pre-linked code is used, any modification to the low-level system binary can cause the application binary to be invalided. However, this problem can be eased using an intermediary jump table in the system space. With this table, the application code need not be linked hardly to the corresponded system code, but can be linked indirectly to a given entry inside the system jump table. Then, this jump table will forward the calling to the related system functions (Figure 6). By doing this, any change to the system binary will no more cause the application binary to be invalided.

Software elementary diagram of the LiveWSN reprogramming mechanism.
Currently, the above pre-linking machine-code reprogramming has been implemented in LiveWSN. The application functions of the LiveWSN are classified into two kinds: the local functions and the system-call functions. The former one performs the calling within the application space, while the latter one performs the calling from the application space to the system space (Figure 6). During the application development process, the application code is built independently from the system code. Yet, after the raw application binary is produced, it will be re-linked for the other time by the LiveWSN reprogramming linker. During this re-linking process, the calling address of the application system-call interfaces will be changed to the appropriate entry in the system jump tables (Figure 6). By doing this, the static-linking machine-code mechanism is implemented, and the flexibility of the application binary can also be kept high.
With the above mechanisms, the application code can call to the system code. Yet, the callback from the system to the application is still not achieved. To solve this problem, the callback registration mechanism is implemented in LiveWSN. In case an application function needs to be called back from the system space, it can be registered into the system space by calling the registration interface. Once registered, its address will be delivered to the system space, and then, the callback from the system space to this application function can be realized.
Performance evaluation
In this section, the reprogramming performance of the LiveWSN is evaluated by comparing with that of the Darjeeling JVM, 11 the Contiki dynamic reprogramming,16,17 and the Atmel OTAU (over-the-air upgrade) 18 from the perspectives of the memory cost and the reprogramming code size.
The memory cost of the different mechanisms is shown in Figure 7. The JVM reprogramming and the dynamic-loading reprogramming mechanisms have the high memory cost since the bytecode interpreters and the dynamic linker need to be implemented. The LiveWSN reprogramming system has the memory cost much lower than the others since it uses the pre-linked machine-code mechanism. With this mechanism, most of the linking operations are performed on the personal computer rather than on the WSN nodes. As a result, less memory resources on the WSN node will be cost.

Memory cost and reprogramming code size of the different reprogramming mechanisms.
In addition to the memory cost, the reprogramming code size is the other critical standard to evaluate the reprogramming performance as it will be directly proportional to the reprogramming time and the reprogramming energy cost. To evaluate the reprogramming code size, an environment monitoring WSN application is applied, and the reprogramming code sizes of the different reprogramming mechanisms can be calculated as shown in Figure 7. The Atmel OTAU does not decouple the application code from the system code, and it reprograms the monolithic software binary. Consequently, the reprogramming code size is large. The reprogramming code size of the JVM reprogramming and the dynamic-loading reprogramming is also large, and this is because the extra interpretation data or reference resolving data need to be contained in the application binary. The LiveWSN generates the pre-linked machine-code application binary. Since no redundant data are contained in the application code, the reprogramming code size of the LiveWSN is rather small. From the results in Figure 7, it can be seen that the code size of the LiveWSN can be optimized by 92.7%, 72.6%, and 99.8% if compared with that of the EJVM Darjeeling, the Contiki dynamic-loading reprogramming, and the Atmel OTAU mechanisms, respectively. This result proves that LiveWSN reprogramming mechanism keeps low memory cost and high reprogramming performance, and it is typically appropriate for the memory-constrained, energy-limited, and bandwidth-limited WSN platforms.
Discussions
Currently in the LiveWSN, only the reprogramming to the application code can be supported, whereas the reprogramming to the system modules such as the hardware drivers and the network protocols cannot be performed. However, this does not mean that the reprogramming to the system modules cannot be realized in LiveWSN. Compared with the reprogramming to the application code, the reprogramming to the system module is more difficult in that the size of the updated system module can be larger than that of the original one. In this case, the updated system module can no longer be stored in the original space, but should be moved to a new space which has larger storing capacity. Yet, once moved to a new location, the updated module can be no longer accessed correctly by the others. To solve this problem, the indirect access mechanism should be applied in the LiveWSN. With this mechanism, all the system modules will not call each other directly, but call each other indirectly through an intermediary jump table. Then, after the updated system module moves to the new area, the updating to the intermediary table can ensure this module to be still accessible to the others. However, the implementation of the indirect access mechanism will increase the system complexity and decrease the system execution efficiency. Therefore, it depends on the application requirements whether the system reprogramming mechanism needs to be implemented or not.
LiveWSN energy conservation mechanism
Energy conservation is significant for the WSN. On the one hand, most WSN nodes are equipped with limited energy resources. On the other hand, the power recharging to the outdoor WSN nodes is quite difficult. Currently, a set of energy conservation mechanisms have been implemented. The data prediction mechanism optimized the energy cost by reducing the sampling redundancy. 19 The data compression mechanisms, 20 the data aggregation approaches, 21 and the topology control schemes 22 reduced the energy cost by optimizing the communication overhead. In addition to the above mechanisms, some energy-efficient protocols have also been developed. These protocols shortened the active period of the wireless transceivers.23,24 By doing this, the energy cost during the wireless communication process can be lowered. Although the above mechanisms prolonged the lifetime of the nodes in a degree, the energy limitation is still a big challenge for the proliferation of the WSN technique.
To optimize the energy cost of the WSN nodes efficiently, the new LiveWSN multi-core energy conservation mechanism is proposed. Different from the current conservation mechanisms19–24 which conserve the energy cost only from the software aspect, LiveWSN optimizes the energy cost by combining both the software technique and the multi-core hardware technique. With the multi-core hardware infrastructure, the lifetime of the WSN nodes can be prolonged effectively.
LiveWSN multi-core task-aware energy conservation mechanism
Concept of LiveWSN multi-core energy conservation mechanism
The concept of the LiveWSN multi-core energy conservation approach is based on the experimental results that the energy cost of executing the same task on the different microcontrollers can be different (Table 2). Thus, several feature-different asymmetric microcontrollers can be equipped on the WSN nodes. During the run time, each WSN task can be assigned to be executed on the microcontroller which is the most energy efficient to execute this task. By doing this, the WSN nodes can be energy aware to the WSN tasks, and the energy cost of the nodes can be optimized.
Energy cost and execution time of the WSN tasks on the different microcontrollers.
WSN: wireless sensor network.
It is supposed that there are
Yet, if these tasks are executed on the multi-core LiveWSN node, the energy cost of these tasks will be
In equation (7),
Since
Performance evaluation
To evaluate the LiveWSN multi-core energy optimization performance, a multi-core WSN node EMWSN (Energy-efficient Multi-core WSN node) is designed and implemented (Figure 8). EMWSN is composed of two microcontrollers: the 8-bit low-power AVR Atmega1281 and the 32-bit ARM AT91SAM7x. In addition, a low-power nano field-programmable gate array (FPGA) core IGLOO 25 is equipped. With the FPGA IGLOO, the working modes of the AVR microcontroller and the ARM microcontroller (working modes can be active, sleeping or power off) can be configured without the wired change. The experimental results in the previous works 26 prove that the lifetime of the multi-core EMWSN can be prolonged by 34.6% and 69.8%, respectively, if compared to the single-core AVR node and the single-core ARM node.

Prototype board of the multi-core EMWSN node.
Discussion
In some applications, the additional energy cost
In these applications, the energy saved by the asymmetric multi-core task scheduling is less than the additional energy cost by this mechanism. In this case, it will not be energy efficient to schedule the WSN tasks to the different microcontrollers to be executed. To achieve the energy conservation in this situation, the other multi-core energy conservation strategy can be applied. Instead of distributing the tasks to the different microcontrollers to be executed, the microcontroller which is the most energy efficient to execute all the WSN tasks can be selected. Then, all the WSN tasks can be assigned to run only on this microcontroller. By doing this, the energy cost of all these tasks will be
Since
LiveWSN differential-sampling energy conservation mechanism
In addition to the multi-core task-aware energy conservation approach, the differential sampling is the other mechanism which has been applied in LiveWSN to optimize the energy cost of the WSN nodes. The concept of the LiveWSN differential sampling is based on the result that one type of sensors can have several different resolutions, and these sensors strike the tradeoff between the sampling accuracy and the energy cost. Therefore, several types of sensors can be equipped on the WSN nodes. During the run time, one appropriate type of sensor which can satisfy the sampling accuracy requirement and meanwhile have the lowest energy cost can be selected to perform the sampling tasks. By this way, the energy cost of the sensing subsystem can be optimized if compared with the WSN platform which is equipped with only one type of sensor.
Currently, the LiveWSN differential-sampling energy conservation mechanism has been applied on the MiLive platform (Figure 9) and used to perform the target tracking application. In this tracking application, the low-accurate sensor, which has the working current of 290 mA, is selected to detect the target during the target detecting process. Yet, once the target is locked, the high-accurate sensors, which has the working current of 415 mA, will be launched to analyze the target characteristics. By doing so, the target tracking task can be performed, whereas the sampling energy cost of the LiveWSN platform can be optimized by 18% if compared to that of the traditional WSN platform which is equipped with only high-accurate sensors.

Prototype board of the multi-core MiLive node.
LiveWSN multi-core fault-tolerant mechanism
Fault tolerance can decrease the economic loss and the maintenance cost caused by the node failure. Currently, a set of fault-tolerant mechanisms have been realized in the WSN. The topology management recovered the communication failure by reconstructing the network topology. 27 The multi-path routing improved the routing reliability by providing the active route replication. 28 The data aggregation mechanism ensured that sufficient information could still be delivered to the recipients even if parts of the nodes were failed. 21 With all these mechanisms, the availability of the network can be improved effectively. Yet, these approaches are limited in that they perform the fault-recovery operations from the network level, without the investigation of recovering the faults from the node level. To improve the fault-tolerant ability of the WSN further, the research of the fault-tolerant mechanism which can perform the fault recovery from the WSN node level becomes significant, and this has currently been achieved in LiveWSN by the multi-core formal validation and multi-core fault-recovery mechanisms.
Concept of the LiveWSN multi-core fault-tolerant mechanism
Traditionally, the WSN nodes are the single-core architecture, and only one microcontroller is equipped. In case that the microcontroller runs abnormally, the node cannot function correctly. To improve the fault-tolerant performance of the node, two microcontrollers can be equipped on the LiveWSN node: one working microcontroller plus one high-reliable auxiliary microcontroller. During the run time, the working microcontroller performs the WSN tasks, while the auxiliary microcontroller inspects the run-time status of the working microcontroller. In case the working microcontroller is observed to run abnormally, the auxiliary microcontroller can take action to recover it. By doing this, the WSN nodes can recover from the faults.
With the multi-core mechanism above, the reliability of the WSN node will depend highly on the auxiliary microcontroller, rather than on the working microcontroller. In case that the auxiliary microcontroller has the reliability higher than the working microcontroller, the fault-tolerant ability of the LiveWSN node can be optimized if compared with the traditional single-core WSN node. Since the auxiliary microcontroller only needs to perform the simple fault validation and recovery tasks, its run-time reliability can be kept high (commonly, the lower the system complexity is, the higher the reliability will be). With the high availability of the auxiliary microcontroller, the faults occurred on the working microcontroller can be monitored. Then, the fault-recovery mechanism can be applied.
LiveWSN multi-core formal specification and verification
To inspect whether the faults occurred on the working microcontroller or not, the run-time behavior of the software system on the working microcontroller needs to be specified formally. In terms of this specification, the auxiliary microcontroller can validate whether the working microcontrollers run correctly or not. If the working microcontroller runs outside the pre-defined specification, the auxiliary microcontroller can launch the fault-recovery mechanism to recover it.
Currently, the popular formal specification models include the history-based specification, the state-based specification, the transition-based specification, the functional specification, and the operational specification. 29 In LiveWSN, the state-based finite-state machine (FSM) specification is applied. Based on the FSM, the software system of the working microcontroller is divided into a set of modules, and each module can be treated as a state during the run time. In terms of the FSM computation model (equation (9)), the auxiliary microcontroller can validate the run-time process of the working microcontroller step by step
Since the memory and computing resources on the WSN nodes are constrained, not all the elements in equation (9) are specified and validated by the auxiliary microcontroller. Instead, only parts of the critical elements are inspected, such as the state transition, the state output, and the state execution time. During the run time, all the state transition debugging code of the working microcontroller will be sent to the auxiliary microcontroller sequentially. In terms of the pre-defined software behavior specification and the received state codes, the auxiliary microcontroller can validate whether the illegal function jump faults occurred on the working microcontroller or not. In addition to the inspection of the illegal function jump, the inspection of the execution results is also performed in LiveWSN. After the code of one state runs to completion, the execution result will be sent to the auxiliary microcontroller. Then, the auxiliary microcontroller will validate whether the output result is within reasonable range or not, for example, the air temperature sampling result should not be higher than 60°C. If not, the faults are indicated to occur during the code execution of this state. Not only are the illegal function jump and the exceptional execution results checked, but also the dead loop problem is inspected during the run time. Before the execution of the code of each state, a watchdog timer will be set. Once the code of this state runs to completion, this timer will be cancelled. In case that a timer is fired, the dead loop fault is indicated to occur during the execution process of this state. With the above mechanisms, the run-time faults on the working microcontroller can be monitored immediately. Then, the fault-recovery mechanism can be performed.
LiveWSN multi-core fault-recovery mechanism
After the faults are validated, the fault recovery to the working microcontroller can be performed. Currently in LiveWSN, two granularities of recovery mechanisms have been implemented. One is the state-granularity recovery and the other is the microcontroller-granularity recovery. The state-granularity recovery recovers the faults by re-executing the code of the state. The microcontroller-granularity recovery recovers the faults by restarting the working microcontroller through the power supply unit. If the state-granularity recovery is performed, the run-time context of each state needs to be stored at the beginning of the state, and the context data should be restored when the recovery mechanism is launched.
Compared with the state-granularity recovery mechanism, the microcontroller-granularity recovery mechanism is easier to be implemented. Yet, the run-time context data will be lost once the microcontroller is restarted. Therefore, this recovery mechanism is appropriate to be applied on the nodes which locate at the edge of the network, for example, the endpoint devices in the ZigBee network. On these nodes, the restarting of the working microcontroller will not have a direct effect on the network connectivity and the other WSN nodes.
Evaluation
Currently, the above LiveWSN multi-core fault-recovery mechanism has been realized on the multi-core IWoT node (Figure 4). IWoT node is equipped with the 8-bit working microcontroller AVR ATmega128rfa1 and the 4-bit ultra-low-power high-reliable auxiliary microcontroller TinyRISC. In experiment, three multi-core IWoT nodes were deployed in the outdoor garden of ISIMA more than 3 years ago, and no nodes have been failed until now (nodes should run abnormal during this period, yet they have been recovered). However, before the LiveWSN multi-core technique is implemented, more than 30% of the traditional single-core Live nodes
30
failed after deployed for about 2 months. To verify the reliability of the multi-core IWoT system, an online demo is available from the website: edss.isima.fr (login name:
Discussion
The LiveWSN multi-core fault-tolerant mechanism improved the reliability of the WSN system. Yet, the energy cost of the WSN platforms increased accordingly due to the equipment of the additional auxiliary microcontroller. One way to strike the balance between the reliability and the energy cost is to integrate the LiveWSN multi-core energy conservation mechanism (section “LiveWSN energy conservation mechanism”) with the LiveWSN multi-core fault-tolerant mechanism (section “LiveWSN multi-core fault-tolerant mechanism”). By this integration, the reliability of the WSN system can be improved, whereas the energy cost of the WSN nodes can also be optimized. In Figure 10, the development stages of the integrated LiveWSN multi-core fault-tolerant and energy-efficient platform are depicted.

Integration of the LiveWSN multi-core energy conservation and fault-tolerant mechanisms.
Conclusion and ongoing works
In this article, a new memory-efficient, energy-efficient, reprogrammable, and fault-tolerant WSN platform LiveWSN is designed and implemented. In LiveWSN, the new hierarchical scheduling, the shared-stack scheduling, the comprehensive scheduling, and the pre-linked native-code reprogramming mechanisms are implemented. By doing so, the stack memory cost of LiveWSN can be optimized by more than 25% if compared to that of the multithreaded MANTIS OS, and the reprogramming code size can be decreased by 72.6% if compared to that of the Contiki dynamic-linking reprogramming. In addition to the above new design concepts, the new multi-core energy conservation and multi-core fault tolerance mechanisms are also applied in LiveWSN. By means of the multi-core hardware infrastructure, the lifetime of the WSN node can be prolonged by more than 34% if compared with the traditional single-core AVR node. Moreover, the fault-tolerant ability of the WSN can be improved well. With the above features, the LiveWSN platform becomes competent to run on the resource-constrained WSN nodes to execute the outdoor real-time WSN applications. More works about LiveWSN can be accessed from the website (edss.isima.fr/sites/smir).
The ongoing works of LiveWSN will focus on the following aspects: (1) Design and implementation of a comprehensive multi-core WSN platform on which the different LiveWSN multi-core mechanisms will be integrated. (2) Research of an intelligent multi-core scheduling mechanism with which the multi-core task scheduling strategy can be decided smartly on the WSN nodes in terms of the run-time contexts.
