Abstract
Introduction
A wireless sensor network (WSN) is a collection of sensor nodes distributed within a particular area to observe some physical conditions and gather the observation at a base station (BS). 1 The nodes are typically battery-operated, and they have inherent limitations in computation and communication capacities. In most monitoring applications, it is often pricey and unattainable to interchange the node’s batteries since the networks are formed in harsh regions with hundreds to thousands of nodes. 2 As a result, the sensor nodes are expected to operate cooperatively for extended periods without any battery back-ups. 3 Conventional WSN applications produce sizable quantities of data. Transporting such large amounts of data from the nodes to the BS will deplete the deficient energy reserves of the nodes timelier and reduce their lifetime. Moreover, since nodes within a close range may sense identical events, it is incompetent to transport the sensed readings directly from the nodes to the BS. 4 Therefore, it is very significant to consolidate the sensed observations from the sensor nodes into valuable information at the intermediate nodes to guarantee considerably reduced overheads of energy consumption and data redundancies. Clustering technology provides an effective method to reduce the communication energy consumption of the sensor nodes and enhance their lifetime.5–7 Clustering helps to simplify the network structure and evade direct transmission among the nodes and BS. It facilitates the self-organization of nodes into clusters by specific rules, such that each cluster head (CH) can fuse the data from their members and filter them to eradicate redundancies. This aggregation strategy reduces the traffic loads and communication energy consumption of the network.
The low-energy adaptive clustering hierarchy (LEACH) protocol is one of the well-known classical routing schemes based on clustering. 8 LEACH protocol runs in two phases: setup phase and steady phase. During the setup phase, CHs are designated and made known to other nodes. In the steady phase, the nodes report their data to the heads in fixed slots. Although LEACH has some drawbacks such as poor CH node allocation, various improvements have been made on the LEACH protocol over the recent years.2,9 Typically, cluster-based sensor networks with static sink are faced with overheads of uneven power consumption and low success rates. This is because CH nodes around the sink are usually loaded with traffic, as they serve as relay nodes to transfer the data packets from the upper levels to the BS and thereby deplete their energy faster. Once a CH node dies at a fade zone surface, the other crucial performance metrics of the network such as throughput drop hastily. A possible solution to ensure success rate and longer lifetime of the nodes is by assigning the relay nodes discretely from various clusters. In this way, the CHs can have relay cooperation to transmit their accumulated packets to the destination and achieve longer lifetimes.
The scalable efficient clustering hierarchical (SEACH) routing is one among the classical cluster-based routing that is based on dedicated-relay cooperation. 10 Though the data deliveries model of SEACH protocol is time-driven, which makes it more appropriate for WSN applications with periodic communication requirements. In the literature, SEACH protocol is revealed to encourage more energy efficient and scalable communication well above the other conventional time-driven cluster-based routing protocol. In general, relay cooperation in cluster-based WSNs facilitates improved total channel capacities and guarantees data deliveries from the cluster sections even when there is a loss of connectivity or failure of a CH node. It also allows more utilization of WSN broadcast nature and efficient cooperative communication with network coding (NC). 11 Mainly, NC communications in WSN offer many benefits ranging from improved throughputs with reduced latency to enhanced network lifetime. It enables linear combinations of many packets from various nodes at a shared channel. 2 At the receiver nodes, the coded packet can be recovered upon computations of a few logical or mathematical formulae.
However, one of the significant challenges of cooperative relaying techniques in WSN is the selection of the appropriate relay nodes.12,13 In WSN, relay communication typically increases the traffic activates of the relay nodes. This causes the relay or cooperator nodes to deplete their energy quicker than the other nodes. 14 Once a relay node fails, a coverage hole appears in that area and other nodes may have no links to reach their destination. When the relay nodes are isolated or far apart, more energy is expended from long-distance communications. Moreover, NC operations in WSN necessitate good connectivity and extra energy resources at the coding links to successfully receive and process the packet flows from other shared mediums. 15 Therefore, when the coverage and residual energy of the cooperator nodes or coding nodes are poor, the coding gain drops adversely.
Motivated by these aforementioned problems, this article proposes a lifetime-enhancing cooperative data gathering and relaying algorithm (LCDGRA) that combines full advantages of clustering technology with multi-hop cooperative relay and NC communication. The proposed LCDGRA operates in three phases and mainly focuses on resolving the routing problem of the cluster-based WSN through enhancing the power consumption and network lifetime, packet latency, and data receiving rate metrics.
Our new contributions in this article are the following:
We propose a new cluster-based event-driven scheme called LCDGRA. The proposed scheme can be applied in various event-driven monitoring applications to strengthen the routing performances in terms of network lifetime, energy efficiency, data delivery rate, and latency.
We propose a centralized hybrid-clustering scheme that combines K-means clustering and Huffman entropy coding design. The hybrid-clustering scheme is meant to secure the coverage and residual energy conditions of the CH nodes as well as minimize the energy consumption of the nodes at the setup phase.
We frame the relay node assignment as an NP-hard problem in terms of residual energy, and placements of the relay nodes. Also, to resolve the NP-hard problem, we propose a gradient descent heuristic-based algorithm. This scheme can be applied in various cluster-based networks to enhance the relay node selection and achieve higher transmission success rates.
Subsequent sections of this article are as follows: section “Related works” reviews some of the significantly related works. The network communication and energy models are discussed in section “Proposed system model.” Section “Design flow of LCDGRA” adequately describes the design phases of the proposed system. Section “Simulation results” presents and discusses the comparative performance evaluation results, while section “Conclusion” gives the closing comments.
Related works
In recent years, significant focus has been given to routing protocols in WSN due to the differences in network architectures and routing stipulations in various sensing applications. 16 In WSN, routing is needed to establish reliable communication paths between nodes and their BSs, and it significantly influences the power consumption of the nodes. Recently, many energy-efficient cluster-based routing schemes have been applied in various WSNs. 17
The LEACH-centralized (LEACH-C) protocol is an improved version of LEACH. 9 This scheme is meant to improve the CH allocation in LEACH by using a centralized control algorithm to distribute the CH nodes throughout the system. In LEACH-C, the BS computes the average residual energy of the nodes across the network and only allows the nodes with residual battery powers equivalent to the computed average residual energy to work as CH in each round. The LEACH with fixed clusters (LEACH-F) is another centralized variant of LEACH. 18 In LEACH-F, the setup phase is not required at all rounds, only the CH position rotates among the nodes. The data fusion oriented clustered routing protocol based on LEACH (DF-LEACH) is another variant of LEACH protocol. 19 DF-LEACH aims to facilitate both global energy efficiency and reduced energy dissipation at the CHs. In this scheme, data fusion is performed in a hop-by-hop mode among the CHs, before transmission to the BS.
The threshold sensitive energy-efficient sensor network (TEEN) protocol was proposed for WSNs as an event-driven cluster-based protocol. 20 TEEN employs two limits: a soft and a hard threshold. The hard threshold is aimed to decrease total data transport. Therefore, this limit enables the nodes to communicate their sensed data only when their observations correspond to the attribute of interest events. The soft threshold is intended to decrease total data transport by enabling the nodes to disregard data deliveries when variations in their observations are minor, or there are no variations. The adaptive periodic TEEN (APTEEN) protocol is a hybrid alternative of TEEN that merges the benefits of reactive routing and proactive routing systems. 21 In APTEEN, the nodes can react to changes in their monitoring environment and as well perform periodic data deliveries.
Roy and Das 22 introduced the cluster-based event-driven routing protocol (CERP). In CERP, the nodes form clusters utilizing an event-based clustering system, whereby the clustering takes place only when there are event occurrences in the field. In the clustering phase, every node estimates a competing value concerning the transceiving and aggregation energy. Eventually, each node having the maximum competing value in the respective clusters is elected as CH and introduced to the members. The hybrid energy-efficient distributed clustering protocol (HEED) was proposed to support various time-driven WSNs by Younis and Filmy. 23 In HEED, initial probabilities of sensor nodes to serve as CH depend on their residual energy and proximities to other neighbors.
The K-means and Dijkstra’s algorithm-based routing (KDUCR) protocol is an energy-aware routing scheme based on K-means clustering algorithm. 24 The main aim of K-means clustering in WSN is to select random points in the sensing area and allocate the nodes to their nearest points to establish K-sections of cluster. The algorithm computes the centroid for each group repeatedly until the centers converge. This algorithm minimizes the intricacies associated with developing clusters at various levels in WSN. KDUCR protocol uses Dijkstra’s shortest path algorithm to determine the shortest routes with sufficient residual-energy to dispatch the aggregated data from each CH node in the sink.
The secured scalable energy-efficient clustering hierarchical (S-SEACH) protocol is an improved version of SEACH. 25 S-SEACH uses similar relay node assignment scheme as SEACH and mainly aims to achieve both improved energy efficiency and protection against intrusion. Recently, one time-driven cluster-based scheme with relay cooperation was Was proposed for WSNs by Wu et al. 26 It is intended to satisfy the basic needs of network connectivity in large-scale WSN farmland services. Ahmed and Samreen, 27 proposed the cluster chain-based relay nodes assignment protocol (CCBRNA). It merges the advantages of relay cooperative communication with clustering and chain-based routing mechanisms.
The mode of linear NC was first recommended for WSN by Ahlswede et al. 11 The authors showed that a node could combine many packets of other nodes by logical or mathematical formulae. The coding mode is of two types, namely the intra-section NC and the intersection NC. 28 The intra-section NC is intended to improve data delivery rates in lossy networks, while the intersection NC is intended to improve the throughputs of lossless sensor networks. Meanwhile, both coding methods are described generally as binary (XOR) and random linear coding (RLNC). In the literature, there are specific recent works carried out on NC in WSN. 29
Yin et al. 28 proposed a multi-hop routing scheme that combines both NC and compressive sensing (CS) methods. It is proposed to facilitate energy-saving and reliable data reporting multi-hop WSNs. Sun et al. 29 introduced the NC-WSN protocol to promote data delivery reliability, effective load balancing, and improved response time in various WSNs. Migabo et al. 30 proposed the cooperative and adaptive NC for gradient-based routing (CoAdNC-GBR) protocol to support efficient NC and data communication in WSNs. CoAdNC-GBR estimates the total amount of nodes in the neighborhoods periodically to implement NC processes and uses the sink nodes’ address to deliver the coded packets. However, CoAdNC-GBR is more suitable for query-based applications since it considers an on-demand gradient-based routing system, in which queries from the data operator actuates the data communication phases of the network.
Hence, considering the works above, a new LCDGRA for cluster-based WSNs is proposed.
Proposed system model
Network model
In this study, we consider the sensor network model depicted in Figure 1.

Proposed system network model.
This study aims to develop an improved event-based clustering routing scheme, characterized by low latency, low-energy consumption, extended longevity, and high data receiving rates. Therefore, considering the sensor network model above, a new event-driven cluster-based routing scheme called LCDGRA is presented in this work. The proposed system is designed acknowledging existing WSN event-driven cluster-based routing design, including the assumptions as follows:
The BS and sensor nodes are static after placements.
There is only one BS in the network that is at the far end of the network.
An external source powers the BS.
The sensor network is uniform, and the initial energy of all nodes is the same.
Energy consumption model
Figure 2 shows the considered radio model in this work.
2
Both free space
where
where
where
The energy for transceiving and processing
where

Energy dissipation model.
Design flow of LCDGRA
This section describes the design methods of the proposed LCDGRA, which functions in three phases:
Network initialization and clustering
Relay nodes selection
Data aggregation and reporting
Network initialization and clustering
In this phase, the network is initialized and the nodes are grouped into few clusters as described below.
Network initialization
Here the network is initialized. The central BS initiates this with initialization messages conveyed to all the nodes in the network. Then, all the nodes respond to this request with initialization responses. The response messages from the nodes contain their residual energy and current locations in the sensing region. Through the processes described above, the network is initialized.
Clustering based on hybrid K-means
In this point, the nodes are grouped into clusters. A hybrid K-means clustering system is proposed in this work to group the nodes into K number of clusters effectively. The hybrid approach merges the advantages of K-means clustering algorithm and Huffman coding algorithm. It is meant to optimize the sensor nodes’ transmission distances and energy usage during the clustering phase. Hence, the clustering and CH decisions are carried out by the central BS as described in the stages below.
Stage 1: cluster formation
In this stage, the nodes are divided into K units of clusters. Initially, the optimal total K-points of clusters is estimated as follows 31
where
Once the total amount of K-points is computed, the nodes are assigned to their closest cluster-centroids. The distance between the nodes and the cluster centroids is expressed as follows
where
Finally, new centroids are computed for every cluster until the centroid points become fixed.
Stage 2: appointment of CHs
In this stage, the CHs are elected. Initially, a competing value is calculated for each node in all the clusters as follows
where
Once the contending value of every node is estimated, the value for each node is multiplied with a random value in the range of 0–1, to determine their respective probabilities. The probabilities obtained for every node is later summed-up to one and organized in descending form. After that, a code is formed for each of the nodes by the Huffman coding algorithm to evaluate their weights. Finally, the node that has the smallest weight in the different clusters is chosen as the CH node and declared to the cluster members.
The reader can get more information about the Huffman coding algorithm in the work of Mehfuz et al. 32 In each round, different CHs are chosen in all clusters to facilitate load distribution until every node depletes within the sensing area.
Relay node selection
In this stage, the relay node selection takes place. Relay nodes are appointed in the various cluster regions to cooperatively help the CHs to deliver their aggregated data to the BS. Similar to the CH election, the election of relay nodes is carried out by the central BS. However, it is a familiar phenomenon in WSN that nodes utilize higher energy in data deliveries as opposed to receiving and processing. Hence, the central BS is bound to opt for relay nodes that have higher residual energy and shortest communication distances with adequate coverage. This study examines this condition as an NP-hard problem and proposes an effective gradient descent algorithm to solve this problem.
Gradient descent algorithm is an incremental first order iterative method intended to provide optimal weights that effectively reduce a cost function. The algorithm finds the local minimum weight by taking steps reciprocal to the negative gradient of the cost function, as it searches for the best weight. Typically, the weight update operation is given by
In the proposed system, the following are defined as the constrained functions for decent gradient optimization:
Constraint fa
The first constraint function
where
Constraint fb
The second constraint function
The average distance between the nodes and the central BS can be determined using
where
Hence, based on the above-constrained functions, as per
where
Algorithm 1 gives a sufficient description of the proposed relay node selection scheme. Once the algorithm is computed for the various nodes, the BS learns the optimal cost for electing the relay. Once it selects the relay nodes based on the above preferences, it notifies the selected nodes with a declaration message, which indicates their current functions as relay nodes. Afterwards, the relay nodes declare their information to the network with short notification messages, which has their residual energy, cluster identity (ID), relay node ID, and a header that reveals the message as a mere notification. Once each CH or relay node gets a join notification, it examines the message for membership suitability by analyzing the communication distance and signal strength of the sender. After the relay node or CH has confirmed the membership suitability, they notify the relay node of interest with a join message. The join message contains the ID of the preferred relay node and the sender ID that indicates the duty of the node as either relay node or a CH node; this is how the routing information that is needed for the cooperatively relaying operations is updated across the network. Once a relay node is about to ON or OFF its radio unit during the operation, it declares such transition to other relay and CH nodes having it as their relay members. This proposed relay node selection scheme can be applied in WSN to ensure transmission reliability and improvement in relay node allocation.
Data aggregation and reporting
After the CH and relay nodes are elected, the nodes transit into idle states awaiting events. Besides, an event represents a variation in the sensed value that exceeds specific sensing thresholds. Followed by the event occurrence in the sensing field, the stages of data aggregation and reporting are performed. It is well known in WSN, that the amount of data communications has a significant influence on energy consumption of the network. Therefore, it is essential to reduce the number of deliveries to guarantee considerably less energy consumption. In the proposed scheme, to ensure minimal energy consumption, random linear coding is executed per hop from the source to the destination.
Once an event occurs, the cluster members of that particular region, report their sensed data to the CH node. Upon receiving the sensory data of the members, the CH node organizes the variations in the readings above specific threshold into N blocks of data packets
where

Proposed LCDGRA scheme flowchart.
Simulation results
In this section, the performance of the proposed LCDGRA scheme is assessed using MATLAB 2018b simulations. The analysis was conducted with 100 nodes scattered randomly over an (
Simulation parameters.
In this work, an efficient hybrid K-means clustering scheme combining K-means algorithm with Huffman coding design has been applied to organize the nodes into K quantity of clusters. In addition, an efficient gradient descent heuristic-based algorithm has been applied to find the most appropriate relay nodes to serve the CHs report their aggregated data to the destination cooperatively. Figure 4 plots the convergence of the gradient decent cost function as per equation (11). It can be observed from Figure 4 that the cost function converges successively at about 31 iterations.

Cost function convergence.
Figure 5 compares the latency of the network when using each of the schemes. The average latency presents the time expended from when data are disseminated from the sender to the arrival time at the destination. It includes delays in data queuing, data propagation, and processing. It is visible from the simulation results presented in Figure 5 that the latency of the proposed LCDGRA routing system is 18% lesser than that of CERP. It is also noticeable that the latency of CERP is 16% lesser than TEEN, when examining the network latency.

Network latency.
Figure 6 shows the data-receiving rate of the network. The packet-receiving rate represents the ratio of the sent quantity of data to the amount of data well received at the destination. It reveals the packet loss rates and link qualities of the sensor network. From the simulation results depicted in Figure 6, it is evident that the proposed scheme keeps a stable, even, and high receiving rate in the range of 1–0.98, which is 0.82–0.72 higher than that of CERP protocol. For TEEN protocol, it is observed that the receiving rate drops entirely at a later stage of network operation, and the receiving rate is merely about 0.99–0.98, which is considerably less than that of LCDGRA and CERP.

Receiving rate.
Figure 7 analyzes the average energy consumption in the course of data aggregation. The average energy consumption in each round is estimated using equation (13), where

Energy consumption.
Figure 8 compares the lifetime of the network until half of the nodes deplete. The network lifetime is regarded as the period from the network initialization, to when half of the nodes deplete over the network. From Figure 8, it is clear that for the proposed LCDGRA, half of the nodes deplete around 3500, while in CERP and TEEN, half of the nodes depletes around 2600 and 2100, respectively.

Network lifetime.
The above simulation results confirm that the proposed LCDGRA truly merges the benefits of random linear coding and cooperative multi-hop communication with the best relay nodes in a cluster-based topology. Furthermore, the proposed LCDGRA outperforms both the protocols, TEEN and CERP, on the following rationales: in TEEN, the nodes wait for their assigned time slots to report their data to CH nodes. Thus, in scenarios where sensor nodes have no data to report, they are still constrained to waste their limited energy resources. Besides, the CH nodes also keep their radio units turned on always, to collect the sensed data of their cluster members. Therefore, during network operations, the CHs spend their energy quicker, which cause them to fail faster and thereby reducing the data receiving rates. In CERP, some of the nodes are isolated such that they have no links to other nodes due to variations in the clusters created at every round initiation. Therefore, the nodes fail faster, which creates more packet loss.
These are in contrast to the proposed LCDGRA, where the nodes are clustered adopting an efficient hybrid K-means clustering method that considers the communication distance and energy. Moreover, dedicated cooperating relay nodes that have good placements and energy resources consistently assist the CH nodes to communicate their aggregated data. In addition to that, the payload packets are coded at each hop from the source to the destination with random linear NC. These results prove that the proposed scheme does not only decrease sensor network latency and energy consumption but improves the throughput and longevity as well. Hence, it is attestable that the proposed scheme guarantees to meet the essentials for more energy efficiency, coherent and timely event reporting in WSNs.
Conclusion
In this article, a new event-driven cluster-based routing algorithm called LCDGRA was presented. The proposed LCDGRA routing method is simple and consists of three main phases. In Phase 1, the nodes are grouped into K clusters and the CHs are allocated in every cluster by a hybrid K-means clustering that combines both K-means clustering and Huffman coding algorithms. In Phase 2, rather than dedicating the relay nodes task to the CH nodes, relay nodes are assigned from the non-CH nodes to perform the data delivery tasks. Therefore, the CHs have sets of cooperating relay nodes that are dedicated to obtain and communicate their aggregated data to the destination. The relay node election is expressed as an NP-hard problem in studies of residual energies and communication distances of the nodes. Also, an efficient gradient descent heuristic-based scheme is proposed to solve the NP-hard challenge. In the final phase, the aggregated packets from the event region are coded with random linear coding and relayed in multi-hops to the central BS cooperatively.
The performance of the proposed LCDGRA has been assessed with other classical event-driven cluster-based protocol, including TEEN and CERP, through simulation. The simulation results confirm that the proposed LCDGRA significantly outperforms both TEEN and CERP routing protocols in terms of reduced energy consumption with prolonged lifetime and increased data delivery rates with reduced latency.
