Abstract
1. Introduction
As a new emerging store and forward networking architecture, delay tolerant networks (DTNs) have been widely studied and applied. In recent years, DTNs have achieved great successes in some challenging networks deployed in extreme environment, such as interplanetary Internet, habitat monitoring networks, underwater sensor networks [1, 2], vehicular ad hoc networks [3], pocket switched networks [4, 5], and mobile social networks [6, 7]. However, different from the traditional Internet, DTNs are characterized by frequent topology partitions [8], sparse node density, limited network resources (e.g., storage, bandwidth, etc.), extremely high end-to-end latency, asymmetric data rate, high bit error rate, heterogeneous interconnection, and so forth. So in DTNs, there may never be a complete end-to-end path between the sender and the receiver. Consequently, the successful message transmission in DTNs faces great challenges.
In order to cope with the intermittent connectivity problem, DTN architecture [9, 10] introduces a bundle layer between the application layer and the transport layer to implement store-carry-and-forward routing strategy. Furthermore, with the help of the bundle layer, DTN architecture is able to shield heterogeneous networks and communicate across the multiple regions that have different types of network architectures and protocols. So based on the bundle layer, DTNs can relay messages hop by hop until encountering the destination nodes. But due to the extremely limited network resource and bandwidth, current node should make a clever next hop routing selection to control the number of message copies.
So far, a large number of probabilistic routing strategies have been proposed to optimize the next hop routing selection in the absence of global topology knowledge. Most of them (e.g., Prophet [11]) make attempts to predict the encounter probabilities between nodes and then make routing decisions based on the computed probability values. There is no denying that these routing strategies can improve message delivery ratio in opportunistic routing. We also should notice that the time-to-live (TTL) of message is gradually depleted as time progresses. Most traditional probabilistic routing protocols always try to replicate message to the node with a higher delivery probability, without taking into consideration message's TTL. However, although a selected intermediate node has a higher probability to encounter destination node, the message delivery will still fail if the message's TTL is exhausted before meeting the destination node. In this case, considering the limited buffer resource, the message copy should not be delivered to the intermediate node. From the above case, we can see that a higher encounter probability can only indicate that the two nodes are more likely to encounter each other. But the two nodes may still need a period of time to encounter each other. If message's TTL is exhausted during this period of time, the message delivery will still fail. From this point of view, the node is not a good choice although it has a higher probability to encounter the destination. Consequently, an efficient probabilistic routing algorithm that fully takes into account message's TTL is expected to be employed in DTNs.
In this paper, from a new perspective, we propose a statistical analysis based probabilistic routing (SAPR) algorithm, which predicts the probability that a message can be successfully delivered to its destination based on its remaining time-to-live. Firstly, for each pair of nodes, we use statistical analysis methods to compute the mathematical expectation of the intermeeting times (IMTs) between them. Secondly, in the case of fully taking into account message's remaining time-to-live, we predict the possibility that a message can be successfully delivered. Finally, we make probabilistic routing selections according to the computed probabilities. In other words, we replicate message to the intermediate node that can make message get a higher delivery probability. In addition, we also introduce buffer management policy in the proposed routing algorithm. The message management is modeled as a 0-1 knapsack problem when node's buffer overflows. By solving the knapsack problem, we can make sure that each node always keeps the messages that can maximize the delivery probability sum.
The rest of this paper is organized as follows. In Section 2, we make the routing assumptions and explain the mathematical notations used in this paper. Section 3 gives the detailed descriptions of the proposed algorithm. The performance evaluations and comparisons are presented in Section 4. Section 5 discusses some related works. Finally in Section 6, we summarize this paper.
2. Assumptions and Preliminary
In order to analyze and implement the proposed probabilistic routing algorithm, we make the following assumptions.
The intermeeting time (IMT) between nodes is exponentially distributed or has at least an exponential tail. Nodes move independently and their mobility is heterogeneous. In other words, different node pairs have different exponential distribution parameter The network resource (e.g., storage, bandwidth, energy, etc.) is limited.
Regarding the first assumption, it has been shown that many simple synthetic mobility models (e.g., Random Walk, Random Waypoint, and Random Direction [12, 13]) have such a property. Furthermore, it is a known result in the theory of random walks on graphs that hitting times on subsets of vertices usually have an exponential tail [14]. And [15] has derived the fact that the expected intermeeting time in Random Walk model also follows an exponential distribution. Besides, the Exponential Correlated Random Mobility model can also be used to support the first assumption. So the assumption that most nodes exhibit random mobility is reasonable in the opportunistic networks. However, in addition to exponential distribution, some recent researches have suggested that intermeeting time also follows power law distribution in some human mobility traces. But recently, by using a diverse set of measured human mobility traces, Karagiannis et al. [16] have argued that the intermeeting time still exhibits an exponential tail. They find as an invariant property that there is a characteristic time, order of half a day, beyond which intermeeting time follows an exponential distribution. Within the characteristic time, intermeeting time follows a power law distribution. This is to say, in many human traces, although intermeeting time follows a power law distribution within a period of time, it still exhibits an exponential tail. Taking the Content datasets of Cambridge haggle for example, Cambridge spends about two months (far greater than half a day) to collect the trace data. In this case, according to the above conclusion of Thomas, the time period in which intermeeting time follows exponential distribution is much longer than the time period following power law distribution. Thus the entire cumulative distribution can be approximately seen as a certain exponential distribution. In addition, in the MIT trace using Bluetooth devices, up to 60% of intermeeting times observed are above one day (greater than half a day), and these large intermeeting times can also be found in the traces collected by UCSD and Dartmouth. So in some sense, the distributions of intermeeting times in these traces can also be seen as exponential. In this paper, our routing is not specifically designed for human mobility model, but in some extent of generality. So we tend to assume the exponential distribution taking into account the above factors. And the results in our simulations also show the reasonableness of the assumption. Regarding the second assumption, it is clear that nodes follow different moving trajectories and different node pairs usually have different encounter rates in the real world. Some nodes may encounter each other frequently, but other nodes may never meet each other.
The mathematical notations used in this paper are listed and explained in Notations section.
3. Statistical Analysis Based Probabilistic Routing (SAPR)
Before presenting our SAPR algorithm, we first introduce some analysis works and routing models based on the above assumptions.
3.1. Estimating Exponential Distribution Parameter
Here we assume that intermeeting time
3.2. Computing IMT's Mathematical Expectation
In order to compute the value of
Note that we use random sampling in this paper. For a node pair, it is easy to compute the interval between two encounters. By repeating this operation in a random way, we can finally get the sample data. The value of
In order to more accurately compute the
Theorem 1.
For the intermeeting times between any two nodes, assuming their expectation is
Proof.
Firstly, we assume the probability density function of
To some extent, Theorem 1 shows the central tendency of the intermeeting times. That is to say, we can get the interval that most intermeeting times are clustered together in.
Corollary 2.
For the intermeeting times between any two nodes, at least
Proof.
If we set
If two nodes have not encountered each other for a long period of time, they should update
3.3. Predicting Message's Delivery Probability
After computing and updating
Theorem 3.
Assuming the remaining time-to-live of
Proof.
The probability that a message can be successfully delivered by a node is equal to the probability that the next intermeeting time between the node and the destination is not greater than the sum of message's remaining TTL and the time that has elapsed; that is,
3.4. Next Hop Selection Strategy
Now we focus on the next hop selection strategy. When communication opportunity arises, the message should be delivered to the relay node with a higher delivery probability. The detailed process is shown in Algorithm 1.
when node (1) (2) (3) (4) update (5) (6) update (7) (8) (9) (10) (11)
3.5. Buffer Management Policy
In DTNs, the buffer resources of nodes are usually limited. So when node's buffer overflows, the resource allocation problem arises. In this case, current node needs to determine whether to receive the incoming message and which message to drop. To this end, we first need to define the optimal objective and then make the optimal decisions based on the objective. In this paper, our objective is to use the limited buffer to maximize the sum of the delivery probabilities of messages that can be stored by current node. We formalize the optimal buffer management as a 0-1 knapsack problem and further solve it by using the back track technique.
Theorem 4.
Optimal buffer management is a 0-1 knapsack problem.
Proof.
If we view the buffer size of a node, the messages' delivery probabilities, and the sizes of messages as the maximum weight of a knapsack, the values of goods, and the weights of goods, then the objective of selecting and storing the messages that can maximize the sum of delivery probabilities can be viewed as filling the knapsack with the goods that can get the maximum value. Consequently, buffer management can be modeled as a 0-1 knapsack problem, and then we can further use the technique of solving the knapsack problem to solve the optimal buffer management problem.
Definition 5.
The formalization of the optimal buffer management is as follows, where
The above formalization can make sure that each node always keeps the messages that can maximize the delivery probability sum. Now the issue is how to solve the optimal problem.
The common way to solve knapsack problem is dynamic programming algorithm. But considering the huge cost of dynamic programming in this problem, we use the back track technique to solve the knapsack problem in this paper. The detailed process is shown in Algorithm 2, which also needs to call Algorithms 3 and 4. Algorithm 4 is to compute the upper bound of the optimal value of right subtree in the search process, which is called by Algorithm 3 to determine whether to continue searching right subtree. That is to say, we cut off the subtree if its upper bound is less than the current best value. Algorithm 3 is the back track algorithm, which searches the entire solution space tree and records the current optimal solution. In Algorithm 2, line 1 first sorts messages in a descending order according to the unit value computed by (25). Lines 2–9 are to initialize the global variables that will be sued in Algorithms 3-4. Note that these global variables can also be modified by Algorithms 3-4. Finally, lines 11–13 delete or reject those messages that are not included in the optimal solution computed by Algorithm 3. In order to determine which message to drop and which message to receive, the
(1) sort (2) (3) (4) (5) (6) (7) (8) (9) (10) call (11) (12) (13) (14) (15)
starting index: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19)
starting index:
the upper bound of (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)
4. Simulation
In this section, we use the ONE [17] simulator to conduct extensive simulations for evaluating the performance of SAPR under various settings. The simulation settings, evaluation metrics, and results are described as follows.
4.1. Simulation Settings
To well evaluate the routing performance of SAPR based on our assumptions, we first conduct simulations based on the synthetic traces generated by Random Walk model. This is because Random Walk is a typical movement model, in which the intermeeting times between nodes follow exponential distributions. So it is very helpful to evaluate our proposed routing algorithm. The detailed simulation settings are shown in Table 1. We introduce Epidemic, Prophet, and Source Spray and Wait into the simulations and comparisons. The reason is that Epidemic is typical multicopy routing based on flooding, which can be used to verify the performance improvements of SAPR. Prophet is typical probabilistic routing based on the encounter probabilities between nodes, which differs from SAPR and can be used to evaluate SAPR from the perspective of probabilistic routing. Source Spray and Wait is typical opportunistic routing that strictly limits the number of message copies, which can be used to evaluate the overhead ratio of SAPR. In addition, we also introduce the drop-front buffer management policy when implementing the above three algorithms in order to evaluate our proposed buffer management policy.
Simulation settings in Random Walk.
Taking into consideration the shortcomings of Random Walk, we also conduct simulations based on the synthetic traces generated by Helsinki City model. Helsinki City model is a more realistic mobility scenario, which is based on real map data and adds realism, so it is very helpful to evaluate routing protocols. Moreover, we can modify the model parameters as needed, so that it can reproduce various empirical mobility properties, which is beneficial to the routing performance evaluations. This is also why we use the Helsinki City model instead of the real traces.
We use 126 nodes in the Helsinki City whose area is
Simulation settings in Helsinki City.
4.2. Evaluation Metrics
In this paper, the simulations are grouped into the three categories: varying buffer size, varying message's time-to-live, and varying message generation interval. Under the same guideline, we evaluate all routing algorithms based on the following metrics.
Delivery ratio: this metric is to measure the delivery capability of each algorithm. Overhead ratio: it reflects the efficiency of message transmission and it is desirable to achieve a low overhead ratio. Average latency: the lower average latency means better routing performance. Average hop count: it is another routing goal to reduce transmission cost, such as bandwidth and energy. Dropped messages: it is desirable to achieve a fewer number of dropped messages so as to improve the utilization efficiency of storage.
4.3. Simulation Results
4.3.1. Performance Evaluations in Random Walk Model
Figure 1 shows the different routing performance by varying buffer size from 5 MB to 50 MB. Figure 2 shows the different simulation results by varying message's TTL from 2 hours to 5 hours. Figure 3 shows the performance comparisons by varying message's generation interval from 10 s to 50 s.

Delivery ratio, overhead ratio, average latency, and average hop count versus buffer size when setting TTL and message interval to 3 hours and 40 seconds in Random Walk.

Delivery ratio, overhead ratio, average latency, and average hop count versus TTL when setting buffer size and message interval to 50 MB and 40 seconds in Random Walk.

Delivery ratio, overhead ratio, average latency, and average hop count versus message interval when setting buffer size and TTL to 50 MB and 3 hours in Random Walk.
Regarding Figure 1, we can see that SAPR gets the highest delivery ratio, the lowest overhead ratio, and the fewest average hop count compared to Epidemic and Prophet. This can show the accuracy and efficiency of the routing selections of SAPR. Besides, it can also verify the improvements of SAPR on predicting message's delivery probability. By limiting the number of message copies, Source S and W gets a slightly higher delivery ratio than that of SAPR when buffer is insufficient (i.e., less than 10 MB). However, our SAPR gets the highest message delivery ratio when buffer size is greater than 10 MB. For the same reason, Source S and W also gets the lowest overhead ratio and the fewest hop count. But SAPR's performance on network overhead and average hop count is very close to Source S and W.
From Figure 2, we can find that SAPR still outperforms Epidemic and Prophet in terms of overhead ratio and average hop count. And SAPR still gets the highest delivery ratio when message's TTL is greater than 3 hours. This shows that SAPR is adapted to the scenario with a longer message TTL.
Figure 3 shows similar simulation results to Figure 1. Compared to Epidemic and Prophet, SAPR achieves advantages in message delivery ratio, overhead ratio, and average hop count. Moreover, the overhead ratio and average hop count of SAPR are also close to those of S and W, but SAPR gets a higher message delivery ratio.
Finally from Figures 1–3, we can see that SAPR gets a very low overhead ratio and greatly controls average hop count. In addition, SAPR also achieves a satisfying message delivery ratio. Unfortunately, SAPR does not get advantages in message's delivery latency in this scenario.
4.3.2. Performance Evaluations in Helsinki City Model
Figure 4 shows the different routing performance by varying buffer size from 4 MB to 20 MB. Figure 5 shows the different simulation results by varying message's TTL from 2 hours to 5 hours. Figure 6 shows the performance comparisons by varying message's generation interval from 30 s to 90 s.

Delivery ratio, overhead ratio, average latency, and average hop count versus buffer size when setting TTL and message interval to 3 hours and 40 seconds in Helsinki model.

Delivery ratio, overhead ratio, average latency, and average hop count versus TTL when setting buffer size and message interval to 20 MB and 40 seconds in Helsinki model.

Delivery ratio, overhead ratio, average latency, and average hop count versus message interval when setting buffer size and TTL to 20 MB and 3 hours in Helsinki model.
Regarding Figure 4, we can see that SAPR can achieve the highest message delivery ratio, the lowest overhead ratio, the shortest average latency, and the fewest average hop count. It can show once again the accuracy and efficiency of SAPR's selections.
In Figure 5, SAPR can outperform the other three routing protocols in terms of delivery latency and average hop count. When message's TTL is greater than 3 hours, SAPR achieves the highest message delivery ratio. Moreover, SAPR still gets advantages in network overhead ratio compared to Epidemic and Prophet.
Figure 6 shows that SAPR still achieves some advantages in message delivery ratio, network overhead ratio, delivery latency, and average hop count compared to the other three algorithms.
Finally, from Figures 4–6, we can draw the conclusion that SAPR can enhance routing performance in terms of message delivery ratio, network overhead ratio, average delivery latency, and average hop count compared to Epidemic, Prophet, and First Contact.
4.3.3. Performance Evaluations of Dropped Messages
Figures 7(a)–7(c) compare the performance of dropped messages in Random Walk model by changing buffer size, TTL, and message interval. Figures 7(d)–7(f) show the evaluation results of dropped messages under different settings.

Dropped message versus buffer size, TTL, and message interval in Random Walk model and Helsinki model.
From Figure 7 we can see that the number of dropped messages of Epidemic is the largest. This is because Epidemic uses flooding strategy to distribute message to every encountered node. In this case, it will spread a large number of message copies to the whole network. When storage resource is not sufficient, these message copies will be frequently dropped by nodes. In this case, it is hard to spread message to farther regions, which is not conducive to a better distribution of messages. On the contrary, the other algorithms all control message redundancy by different schemes, thus dropping fewer messages. In this case, Epidemic is more sensitive to the amount of network resources compared to other algorithms, and its better performance greatly relied on more cache resources. This can explain the reason why the performance of Epidemic is not better than that of other algorithms.
The number of dropped messages of Prophet is fewer than that of Epidemic; thus Prophet can get better performance. Compared to Epidemic and Prophet, SAPR drops fewer messages, and its number of dropped messages is slightly higher than that of Source Spray and Wait. This can indicate that SAPR can greatly control message redundancy, thus improving the utilization efficiency of storage resource. From this point of view, it can explain the better routing performance of SAPR.
5. Related Works
Prophet is a typical probabilistic routing protocol which makes routing selections based on the encounter probabilities between nodes. For example, current node will deliver message to an intermediate node if the node is more likely to meet the final destination. In addition to using the transitivity property to update the encounter probability, Prophet also proposes an aging function for the outdated information as time progresses.
For message dropping problem, many traditional policies (e.g., drop-tail, drop-front, random drop, etc.) have been proposed, which can play a role in opportunistic routing. In [18], Zhang et al. analyze the buffer constrained Epidemic routing and make the conclusion that drop-front can outperform drop-tail in DTN context. In [19], a node first deletes the message that has the largest number of copies in order to mitigate the impact on routing performance. Based on a specific community detection algorithm, [20] proposes an efficient buffer management policy for social delay tolerant networks, which utilizes social relation and centrality to avoid dropping meaningful messages.
In [21], Li et al. propose an optimal routing strategy by exploiting the heterogeneous features of nodes to enhance the routing performance. It takes into consideration nodes' heterogeneous contact rates and delivery costs when selecting intermediate nodes to minimize the delivery cost. For mobile sensor networks, [22] provides a reliable routing scheme with an enhanced delaying technique, which estimates connectivity based on the ratio of past and present connections. When the connectivity is unreliable, nodes will delay message transmission.
With a home-aware model, CAOR [23] turns mobile social networks into a network that only includes community homes. Then, in the network of community homes, it computes the minimum expected delivery delay by a reverse Dijkstra algorithm. In [24], by introducing a metric to accurately detect the quality of friendship, each node defines its friendship community as the set of nodes having close friendship with itself either directly or indirectly. Then temporally differentiated friendships are used to make the forwarding decisions of messages.
6. Conclusion
In this paper, we try to improve the probabilistic routing performance by taking into account the message's remaining TTL so as to avoid the shortcomings of routing messages directly based on the encounter probabilities between nodes. Our motivation is that the higher encounter probability can only indicate that the two nodes can meet each other frequently. But they may still need a period of time to encounter each other again. However, the message transmission will still fail if the message's TTL is exhausted during this period of time. In this case, an effective scheme that fully takes into account the message's remaining TTL when computing message's delivery probability can get a better performance in probabilistic routing. To this end, by using statistical analysis methods, we propose an efficient scheme to compute and update the expectation of the intermeeting times between nodes. And then, based on exponential distribution, we predict the probability that a message can be successfully delivered before its TTL is exhausted.
In addition, we also improve buffer management policy by modeling message dropping problem as a 0-1 knapsack problem. Then, solving the problem by the back track technique, each node always keeps the messages that can maximize the delivery probability sum. Extensive simulations are conducted based on Random Walk model and Helsinki City model. The results show that the proposed SAPR can greatly enhance the routing performance in DTN context.
