Abstract
1. Introduction
Wireless sensor networks (WSNs), consisting of large numbers of sensor nodes, have enabled a new monitoring, and sensing paradigm for large geographical areas. WSN applications include detection of forest fires, habitat monitoring and target tracking in the battlefield. In a sensor network, the data generated by individual sensor nodes is forwarded to the sink for further processing.
Typical sensor network applications generate large amounts of data and send that data to the base station using multihop routing. However, transporting large quantities of data to the base station can quickly drain the limited energy resources of the sensor nodes and reduce the lifetime of the sensor network. One way to significantly reduce communication cost in sensor networks is to perform in-network aggregation (e.g., AVG and MIN) [1, 2] or in-network processing (e.g., beamforming [3, 4]). However, due to the inherent loss of detail, these techniques do not provide the fine data granularity desired by several sensor network applications. Often times, all of the data generated by the sensor network is necessary for the purposes of accuracy and modeling in these applications. For instance, it has been noted that in a networked structural health monitoring application [5], more than 500 samples per second are required to efficiently detect damage. Another example is the Sonoma Redwoods project [6] where the biologists researching the project require as much detailed data from the sensor network as possible to try various physical models and test various hypotheses over the data.
Besides, such multihop routing with static sink nodes results in the early death of the one-hop neighbors of the sinks and make the sensor network unusable. This is expected because most of the data traffic is relayed to the sink by these nodes, thus greatly increasing their energy consumption, resulting in their untimely death and partitioning of the network topology. Consequently, nodes located remotely will be unable to report to the sink, which greatly reduces the lifetime of the WSN. Similarly, in hostile environments, it may not be feasible to replenish the batteries of the sensor nodes.
Recently, mobile data sinks have been proposed as a solution for datacollection to geographicaly balance the energy consumption among the sensor nodes throughout the network. Mobile sinks, either proactively or reactively, move in the sensor network and collect the data from the sensor nodes. This not only solves early death problem of the one-hop neighbors of the sink but also extends the network lifetime by distributing the responsibility of relaying data to the sink among many nodes in the sensor network. The sink mobility assumption may be useful for many applications such as target tracking, intrusion detection, and habitat monitoring. It is also possible to use this strategy of mobile sinks in other environments, where mobile sinks can traverse the sensor network area and collect the data based on the protocols.
Even though a mobile sink is desirable, it poses new challenges for designing energy-efficient sensor network protocols. As the sink moves in the sensor field, it leads to high control overhead to find a route to the sink and route packets to it, which may potentially offset the energy saved from a mobile sink strategy. Furthermore, frequent changes to the sink location can result in an increase in packet latency.
The majority of published methods assume periodic sensing, where sensors are continuously monitoring the network and reporting data to the sink. However, there are a large number of applications (e.g., intrusion detection, environmental monitoring, habitat monitoring) where an event-driven approach is more appropriate. In event-driven sensor networks, data is only generated when events occur.
In this paper, we present an energy-efficient datacollection protocol for event-driven sensor networks. This protocol, which is executed in a distributed fashion on every node in the network, uses a two-tier geographic hash table-based scheme for datacollection. The proposed mobility model moves the sink node only upon the occurrence of an event according to the evolution of current events, so as to eliminate the energy consumption incurred by the multihop transmission of the event-data. Data is collected via single-hop routing when the mobile sink is closer to the sensor nodes. Simulation results demonstrate significant gains in energy savings, while keeping the latency and the communication overhead at very satisfactory levels for various parameter values.
The rest of this paper is organized as follows. In Section 2, we describe the related work. In Section 3, we explain the geographic hash table-based datacollection scheme. In Section 4, we provide our analytical framework to evaluate the performance of the proposed protocol. We present other datacollection protocols for comparison in Section 5 and then present our simulation results in Section 6. Finally, we summarize the paper in Section 7.
2. Related Work
Several approaches have been proposed for data collection in static sensor networks. Partial Network Coding is proposed as a tool for continuous data collection in [7]. Parts of data segments are combined using network coding techniques to extend the network lifetime. A cascading data collection mechanism [8] is proposed for periodic data collection in wireless sensor networks. During each round, only a subset of the nodes send data directly to the sink. Each intermediate node combines all the information using network coding and forwards the data to the sink. An adaptive sampling approach to data collection is proposed in [9]. Data is directly collected from a dynamically changing subset of sampler nodes whereas data for the non-sampler nodes is predicted based on the use of probabilistic models. Although this is an energy-efficient approach for periodic data collection, it is only proposed for static sensor networks. Enhancing data collection by leveraging temporal correlation between data is explored in [10]. Methods for individual and aggregate data collections are proposed in [11]. An offline algorithm to compute the optimal data update strategy as well as online algorithms to cope with message losses have been proposed for static wireless sensor networks. These schemes do not consider mobile sinks for extending the network lifetime.
Several mobile-sink-based approaches are also proposed for data collection in the literature. These approaches can be classified into three categories based on the mobility model of the sink: random mobility, predictable mobility, and controlled mobility of the mobile sink. A sensor network architecture that collects data using Mobile Ubiquitous LAN Extensions (MULEs) that leverages the random mobility of mobile agents has been proposed in [12]. In SEnsor Networks with Mobile Agents (SENMA) [13], data is sent directly to the mobile agent that is flying above the sensor field. Cluster-based data collection is described in [14]. The authors assume that the mobile sink is either a helicopter, airplane or an LEO satellite and hence the sensors have direct access to the mobile sink. Mobile sink periodically collects the data from the sensor nodes. This scheme is not optimized for event-based sensor networks like HTDC. In [15], authors propose three protocols for energy-efficient data collection in sensor networks with multiple mobile sinks. Mobile sinks leave an imprint of their movement in the sensor network and coordinate to reduce the overlap area for data collection amongst each other. Mobile sinks continuously move and collect data from the sensor nodes. This strategy can waste a significant amount of energy for the mobile sinks in event-driven sensor networks where events occur at discrete times. Multi-hop operation between the mobile sink and the source nodes is considered in [16]. Forecast algorithms are used to predict the location of the next event. The work [17–21] describe several applications that perform data collection with mobile sinks.
Two decentralized mobility models for data collection are proposed in [22]. In each mobility model, a data collection request is given bids by all mobile sinks and the mobile sink with the lowest bid services the data collection request. However, each sensor node sends an individual data collection request to the mobile sinks and this consumes more energy compared to our proposed scheme. The security of the data collection mechanism using mobile sinks is studied in [23]. The authors propose a secure data collection mechanism based on one-way hash chains in which the sensors must verify the source of the beacon messages sent by the mobile sink before they send the data to the mobile sink. In [17], data collection by multiple mobile sinks in underwater acoustic networks is studied and three algorithms are presented. Some of the problems identified are the same issues we have attempted to solve in this paper. However, the scheme that we present here is more energy-efficient due to the two-tier hash table-based design.
In this paper, we propose a solution that is significantly different from all the above approaches. We assume an event-driven scenario, where sensors that have detected an event send data to the sink node via single-hop routing. Our goal is to control the mobility of the sink so as to ensure an energy-efficient operation of the network. The sink node is notified about the occurrence of events, and its trajectory so as to maximize network lifetime.
3. HTDC Design
Hash Table-based Data Collection (HTDC) is a reactive protocol: each sensor node reactively initiates the data collection process after an event is sensed by the sensor node. It proposes controlled mobility for the mobile sink to improve the energy-efficiency of the mobile sinks in an event-driven sensor network.
HTDC uses a two-level geographic hash table-based mechanism to select the event announcer node locations and the mobile link locations. This is more energy-efficient compared to a random selection of these nodes. If the random selection mechanism were to be used, the location of the mobile sinks and the announcer nodes must be communicated to all the sensor nodes. This location information can then be used by the sensor nodes to route event-announcement data packets to the event announcer nodes and the mobile sinks. The communication cost for broadcasting this information to all the sensor nodes is
HTDC binds individual sensor nodes in the network to the cells of a virtual grid based on their geographic location. This limits the cost of event announcement to an individual cell. Each node maintains only the information necessary to determine the location of the Local Event ANnouncer (LEAN) node and no additional state needs to be maintained. This provides fault tolerance to the sensor nodes in the case of the failure of an event announcer node as the underlying geographic routing algorithm routes to the sensor node closest to the location specified as the destination and not to a specific node.
HTDC achieves three goals. First it ensures correctness by attaching each sensor node to a mobile sink. Second, it adaptively selects the positions for the local event announcer nodes in each cell as well as the agent nodes for the mobile sinks. Third, by dividing the sensor field into virtual cells among each mobile sink, it ensures load balancing among available mobile sinks.
HTDC runs above the MAC and routing layers. HTDC utilizes geographic routing protocol as the underlying routing protocol. Control packets from the sensor nodes notifying LEAN nodes of new events are routed to the LEAN nodes and control packets from the LEAN nodes are routed to the mobile sink.
3.1. System Model
Consider a sensor network, where All the sensor nodes are fixed and are aware of their location (either through localization techniques or through a GPS receiver). The maximum radio range is the same as The mobile sink nodes are non-sensing nodes, and are aware of their own locations. However, it is not necessary for mobile sinks to know the locations of other sensor nodes. Mobile sinks can communicate with each other in a single-hop. Mobile sinks communicate with the base station via multihop routing. We assume that there are specific types of events in the sensor network. We recognize that an event has occurred when at least two sensor nodes have generated data packets related to that event.
3.2. Virtual Grid Construction
HTDC divides the sensor network into a virtual grid of cells. Each cell is a square. Each sensor node identifies itself with a single cell. For a particular sensor node at location
Division of the sensor field into uniform virtual grids is highly energy-efficient compared to other schemes such as Voronoi graph division [24]. The construction of Voronoi graph consumes more energy in resource-constrained sensor nodes. Besides, in uniform grid, each sensor node is implicitly aware of the boundaries of each cell unlike in a Voronoi graph division.
3.3. Geographic Hash Table
The data collection system architecture, which we describe in this section to meet the above-enumerated design criteria, is a two-level geographic hash table (GHT) [25]. GHT hashes a key
3.4. Hash Functions
HTDC utilizes two hash functions:
This hash function determines the location of the event announcer node in a cell. This hash function takes the cell id of the source node as a parameter and outputs the location of the Event Announcer. Since a sensor node may not be present at the exact location as determined by the hash function, any sensor node that is the closest to the hashed location can act as the Event Announcer. Each event announcement message can travel through multiple hops in the cell and finally stops on a sensor node that is the closest to the location specified in the message. Hash1 is evaluated on the sensor nodes in each cell. The hash function is selected such that the location of the LEAN node is within the cell boundaries. The input to the hash function is the coordinates of the center of the cell and hence there will not be any collisions due to using Hash1 hash function. This hash function determines the location of the mobile sink. This hash function takes the mobile sink name as a parameter and outputs the location of the mobile sink. Each mobile sink stays at the location determined by Hash2 until a data collection request arrives. Hash2 is evaluated on the LEAN nodes to determine the location of the mobile sinks. Hash collisions are possible with Hash2 hash function. If there is a collision, location of more than one mobile sink will be the same. However, such a scenario can be avoided by the selection of an appropriate hash function.
3.5. Local Event Announcer Node
Figure 2 shows a sensor network with four mobile sinks where an event has occurred. After the occurrence of an event, all the sensor nodes that are within the area affected by the event, generate sensory data related to that event. Next, these source sensor nodes execute the AnnounceEvent procedure. The pseudocode for the

Procedure AnnounceEvent.

An event has occurred in the sensor network with four mobile sinks.
The LEAN node stores the source information including the location coordinates of the source node. It waits for a preset number of event announcement message to arrive from the source nodes in its cell. Once event announcement packets are received, LEAN node executes the
Once the event announcement message reaches it, the mobile sink initiates the data collection process by executing the
Each mobile sink also maintains a list of its immediate neighbors and they maintain the state of the mobile sink in turn. A mobile sink periodically refreshes the state of the perimeter nodes (mobile sink agents) by broadcasting beacon messages as shown in Figure 3. They can be in one of the three states, namely,

Local Event Announcer Node. D is the hashed location of the LEAN node. A is the LEAN node for this virtual cell. B and C are the perimeter nodes for the LEAN node.

Source nodes in each cell send an event announcement message to LEAN.
The pseudocode for notifying the mobile sink of an event occurrence by a LEAN node in HTDC is shown in Figure 7.
3.6. Data Collection
The event announcement message contains the location of the cell. The mobile sink moves to the cell where the event has occurred and initiates data collection among the nodes in the cell. The mobile sink traverses the cell in a snake-like fashion in order to cover the complete area of each virtual cell. We chose this mobility model for the mobile sinks as it covers all nodes in the cell for a square-shaped cell. The mobility model of the mobile sink depends on the shape of the cell and node density. It should allow the mobile sink to collect data from all the nodes in the cell. If the form of the cells is different, other mobility models can be considered.
The data is collected passively: sinks periodically broadcast a beacon message at distance
3.7. HTDC Messages
HTDC uses three types of messages to perform data collection.
DCR: data collection request. When a local event announcer node has detected an event in its cell, it can communicate this new event to the mobile sink by transmitting a DCR message. PRQ: Perimeter Request. If a mobile sink agent receives a DCR message and the mobile sink is not at its location, then the mobile sink agent constructs this PRQ request and sends it to the next closest mobile sink node location. LEAM: Local event announcement. This message is sent by a source sensor node upon detecting an event.
3.8. Load Sharing
In this section, we discuss how our data collection mechanism enables load sharing. Sensor networks can have events whose event area can be of different sizes. In such cases, events can span multiple cells. To provide load sharing, the location of the mobile sinks is periodically changed. This will ensure that each mobile sinks stays at a different location in every period. This will also reduce the hotspots in the sensor network. Sensor nodes also periodically change the Event Announcer node in each cell to reduce the hotspot problem.
3.9. Routing of Messages to the Mobile Sink
The Event Announcer node communicates with the mobile sink using a DCR message as shown in Figure 5. If the mobile sink is currently servicing an event, the state of the mobile sink agent (MSA) nodes is set to “not present”. MSA nodes receive the Event Announcement messages meant for the mobile sink in its absence. Upon receipt of the Event Announcement messages, the mobile sink agent nodes generate a new Event Announcement message and forward it to the next nearest mobile sink. This will ensure that mobile sinks that are currently free from servicing any event can share the workload. In applications such as target tracking, the target continuously moves enabling sensor nodes to generate data continuously. In such a scenario, a sensor node will only report an event occurrence to the LEAN node if there is data in its cache that is yet to be collected and no event occurrence message has been sent to report this data. Once a mobile sink collects the data, the sensed data is removed from the node's cache. If the same node senses data related to the same event later, the new data must be collected again by the mobile sink. The sensor node sends a new event announcement message to the LEAN node about the event occurrence. As the LEAN node is aware that the mobile sink is in the same cell, it informs the mobile sink that collects the data again.

LEAN nodes send a DCR packet to the mobile sink to request data collection.

Procedure SendDCRPacket.

Procedure DataCollection.
A “Mobile Sink failed” message will be sent to the base station to inform the base station about the failure of the mobile sink.
4. Analysis
In this section, we analyze the energy consumption of HTDC.
4.1. Model and Notation
We consider a square sensor field of area
4.2. Energy Consumption
The total energy consumption is the summation of three components:
Energy consumed during event notification by the source sensor nodes; Energy consumed to notify the mobile sink; Energy consumed during data collection. This can be divided into two parts: (a) energy consumed by the mobile sink and (b) energy consumed by the sensor nodes.
Now we analyze the energy consumption for each of these parts separately.
4.2.1. Energy Consumed for Event Notification
The energy consumed during event notification is directly proportional to the number of sensors that sensed the event. Suppose all the sensor nodes in a single cell sense the event. Each such sensor node sends an event notification message to the LEAN node. The maximum number of hops the packet has to traverse is equal to the diameter of the subnetwork inside the cell.
The number of nodes in each cell is given by
The expected number of transmissions needed to notify the LEAN node of an event by a single source node in the cell is given by
4.2.2. Energy Consumed for Mobile Sink Notification
Once LEAN node receives event notification messages from the source sensor nodes in the cell, it constructs a data collection request packet and sends the packet towards the mobile sink. The average distance between the mobile sink and the LEAN node can be evaluated by computing the average distance between two randomly chosen points in a unit square. Suppose (
Since there are
Since (6) does not yield a simple closed form solution, we simplify the analysis by assuming that each mobile sink will cover a square of area
The average distance between the cell and the mobile sink is given by
If there are a total of
4.2.3. Energy Consumed for Data Collection
The energy consumed by the mobile sink for data collection is proportional to the distance traveled. The energy consumed by a sensor node in the data collection process is for listening to the beacon messages from the mobile sink and transmission of data.
4.3. Time Complexity
In this section, we determine the delay in collecting the data using HTDC. The delay
The parameter
The parameter
The parameter
The parameter
5. Other Data Collection Algorithms
In this section, we describe the three datacollection algorithms against which we will compare the performance of HTDC.
5.1. Reactive Data Collection (RDC)
In RDC, a source node that wants its data collected either floods the network with its data packet or if it has multiple data packets that needs to be collected reactively [28], locates the mobile sink by flooding the network with a query about the location of a mobile sink. Once the query reaches the mobile sink, the sink moves toward the source node and transmits a beacon message to the source node indicating its readiness to collect the data from the source node. The algorithm converges when data from all the source nodes that detected the event is collected by the mobile sink. Reactive data collection consumes the least amount of time in notifying the mobile sink about the occurrence of the event. However, it consumes a vast amount of energy because it floods the network with query requests.
5.2. Continuous Data Collection (CDC)
Several approaches such as [15] partition the network based on geography and allocate one mobile sink to each partition. This is a continuous data collection approach where each mobile sink continuously moves in its allocated grid partition in rounds. In each round, a mobile sink moves in a designated trajectory that enables it to reach all the nodes in the grid partition. The mobile sink will broadcast a beacon message querying each sensor node for data as it moves closer to the node. Data is collected from all the source nodes in the grid partition in each round via one-hop routing. If a node does not have data in a certain round, it will not respond to the beacon message of the mobile sink and it is ignored by the mobile sink.
5.3. Ideal Data Collection (IDC)
In IDC, each mobile sink immediately knows about the occurrence of an event in its grid partition and which sensor nodes are the source nodes with data about that event. The mobile sink immediately moves towards the source sensor nodes after the occurrence of the event and collect the data. We call this
6. Performance Evaluation
We used a 150 node network within a 250 × 250 m monitoring area. In the simulations, a LEAN node only sends a DCR to a mobile sink after at least two nodes in the cell have reported the occurrence of the event. The results are averaged over 10 simulations runs and are shown with 95% confidence intervals. The vertical lines on the bars for all graphs indicate the confidence intervals. Confidence intervals are calculated using the method of independent replications [29] for these simulations. We obtain a single output variable in each of the simulation runs and its distribution is not known. We can use statistical inference based on normal distribution (because of the Central Limit Theorem) to determine the confidence intervals. However, since the number of simulation runs is small, we have used the Students
6.1. Energy Model
Each sensor node is assumed to have a radio range of 20
Radio characteristics [30].
6.2. Simulation Environment
We have compared the different data collection approaches using the J-SIM sensor network simulator [31]. J-SIM is an open-source network simulator that provides a modeling and simulation environment for wireless sensor networks. To implement the HTDC protocol we used the GPSR extensions to the J-SIM simulator. The general simulation environment is summarized in Table 2.
Simulation environment parameters.
6.3. Energy-Efficiency of the Virtual Grid
HTDC divides the sensor monitoring area into several virtual cells. The monitoring area can be divided into cells in a number of ways. We compare the performance of virtual grid with the division of the sensor field using voronoi graphs. LEAN are selected randomly to become a voronoi site. Upon occurrence of an event, sensor nodes in the Voronoi site send the event announcement packets to the LEAN node which in turn sends a DCR packet to the mobile sink. The results are shown in Figure 8. As the cell size increases, Voronoi graph construction increases much more than using a uniform grid. We have also compared the election of a LEAN node using clustering and using hashing in a uniform virtual grid. Election of a LEAN node using clustering mechanism consumes energy in cluster-head election. As the size of the cell increases, the cost of electing a cluster head increases. HTDC performs better compared to the other two methods because a geographic hash table required only a minimal setup.

Total number of transmissions versus cell size.
6.4. Effect of Mobile Sinks
First we show the effect of the number of mobile sinks on the network lifetime as well the average data collection delay. For this simulation, one event is generated for this experiment and the event persists for a period of 100 seconds and can be sensed by sensor nodes in an area of 3600

Total number of transmissions versus number of mobile sinks.
Next, we show the effect of varying number of mobile sinks on datacollection delay. The data collection delay is defined as the total time taken to collect all the data generated by the events. Figure 10 shows that RDC achieves low datacollection delay as flooding the network usually informs the mobile sink quickly about the event occurrence. HTDC performs equally well in terms of the datacollection delay with only using very few messages, thus improving the network lifetime. CDC performs the worst as the mobile sink continuously traverses the entire grid partition without any knowledge of the event occurrence. This shows that the continuous data collection protocols are unsuitable for event-driven sensor networks.

Datacollection delay versus the number of mobile sinks.
6.5. Effect of the Number of Events
We have also investigated the effect of the number of events on the performance of HTDC. Figure 11 shows the total number of transmissions with an increase in the number of events. For this experiment, we placed four mobile sinks in the sensor field. As the number of events increase, the number of transmissions increase for all four protocols. However, RDC performs the worst as source sensor nodes flood the network with data collections requests. HTDC performs as well as the IDC that uses global information for data collection.

Total number of transmissions versus number of events.
Figure 12 shows the effect of increasing the number of events on the data collection delay. The 95% confidence intervals are within 1% of the mean and hence not shown. As the number of events increase, we can see that data collection delay for CDC decreases compared with other methods. This is because as the number of events increase, CDC benefits from the fact that event occurrences are continuous in this simulation and occur sequentially.

Datacollection delay versus number of events.
6.6. Effect of Cell Size
Figure 13 shows the effect of varying the cell size on the performance of HTDC. The cell size is described in terms of the number of nodes present in the cell. As the cell size increases, the number of transmissions decreases initially but increases from cell size 16 nodes and beyond. This is due to the fact that when the cell size is small, the number of transmissions enroute to the mobile sink dominate the total number of transmissions but as the cell size gets larger, the number of transmissions inside the cell dominate. This can be observed clearly from Figure 13.

Total number of transmissions versus cell size.
6.7. Effect of DCR Broadcast
In HTDC, each LEAN node sends a DCR message to the closest mobile sink as a unicast. If the mobile sink is not present at its location, a PRQ message is sent to the next closest mobile sink. On the other hand, each LEAN node can also broadcast the DCR message to all the mobile sinks. We compared the energy savings of HTDC compared with a modified version of HTDC where we broadcast the DCR, HTDC-B and the results are as shown in Figure 14. We found that HTDC realizes higher energy savings due to fewer message transmissions compared to a broadcast. A major problem with the broadcast approach is that once all mobile sinks receive the message, if all of them attempt to respond to the DCR message and move towards the source cell where the event has occurred, it would be a wastage of resources since only one mobile sink is sufficient. Hence, a handshake mechanism to identify which mobile sink will collect the data is required, leading to further consumption of precious energy resources.

Total number of transmissions versus number of events.
6.8. Impact of Error in Localization
Errors in localization do not have any effect on the performance of HTDC as the geographic-routing methods correctly route even when localization errors are present. A sensor node with an incorrect location estimate can wrongly identify it's virtual cell. In such a scenario, the sensor node will forward its event notification announcements to a different LEAN node. This will only lead to a few more additional messages compared to a network with zero errors in location estimation. Hence, we expect the effect of errors in localization on the energy-efficiency of HTDC to be negligible.
7. Conclusions
Mobile data sinks have been proposed in the literature as a solution for datacollection to balance the energy consumption among the sensor nodes and improve the network lifetime. In this paper, we have presented HTDC, which leverages mobile sinks to significantly extend the lifetime of the sensor network through the use of a two-tier geographic hash table. The proposed mobility-management method moves the sink node only upon the occurrence of an event according to the evolution of current events to minimize the energy consumption incurred by the multihop transmission of the event-data. Data is therefore collected in this method only via single-hop routing. Simulation results demonstrate significant gains in energy savings for various parameter values.
