Abstract
Introduction
Predicting human mobility has always been one of the hot topics, which includes human migration pattern, understanding human behaviors, evolution of epidemics and predicting the spread of disease to the emerging location–based social network (LBSN), and elevating the quality of service (QoS) of mobile application. 1 With the uprising development of mobile devices and applications, people leave their digital “footsteps” as well as they use mobile devices, such as Global Positioning System (GPS),2–4 device’s linking information of access point (AP) like cell tower5–7 and WIFI,8,9 and the check-in data of LBSN.10–12 These abundant mobility data facilitate our ways of understanding human mobility and the study of location prediction. Recent research shows that people’s mobility trajectories possess a strong regularity most of their time, people tend to being at a few important places (i.e. home, office) by fixed route. In the meanwhile, human mobility possesses a certain degree of randomness, people have occasionally gone to places where usually they would not have been (i.e. cinema or some restaurants). 13 We manage to study the overall location prediction based on the two above characters of human mobility. Location prediction, in general, is using users’ historical digital footprints to build their mobility models by statistical methods and predicting users’ whereabouts by utilizing these models and related data. 14 The strategy and QoS of the various applications that mentioned above have all depended on the accuracy of user’s location prediction. Therefore, predicting user’s location accurately is of great significance to various fields.
Location prediction has been extensively studied in the literature. To predict user’s location more accurately, different researchers work on it from different angles and come up with a variety of location prediction algorithms that aim at different kinds of digital footprints. Some researchers solve the problem from the perspective of statistics, summarize the times of user being at each locations, and then calculate the transition probability between locations.2,3,11 However, this approach relies on user’s past movements entirely, it will dis-function when user appears at new locations or user’s mobility pattern shifts. To resolve the situation when user deviates from original mobility patterns, there are also people taking advantage of users’ relationships. This type of methods usually uses a specific connection between users, which impacts user’s mobility, to analyze and model movement patterns and then uses this new obtained model to modify the prediction result which is given by the personal historical data-based prediction model. Although these kinds of methods can, to a certain extent, predict user’s location when they deviate from original movement patterns. However, these methods usually depend on some specific data (i.e. call record) which make them hard to transfer to other platforms. Moreover, these methods normally find a connection between users first and then analyzing how it affects user’s trajectory. This means pre-narrowing the connection between users and neglecting the mobile similarity among strangers.
Inspired by social contagion theory, we proposed a novel location prediction algorithm by discovering groups of users who have high trajectory similarity and using their mobility models to modify the prediction results, called trajectory similarity–based location prediction (TLP). Social contagion theory points out that personal behavior under the influence of crowd tends to adapting oneself to group behavior. 15 This way, our method extracts users’ connection directly from the mobility dataset, cast off the limitation of pre-choosing the connection between users. We first employ the second-order Markov Chain (2-MMC) to build user’s personal mobility model, which is suitable for the movement regularity. The next step is followed using the concept of covering algorithm to compress the large-scale trajectory data, and thus to get the consistent subset from trajectory set. Subsequently, we proposed a trajectory-based user similarity computing method, and selecting a group of users of high similarity with the tested user from the subset, and naming it personal group of similar trajectories. Finally, apply mobility models from the group to modify the location prediction result.
The rest of this article is organized as follows: in section “Related works,” the related work of the current approach of location prediction will be introduced briefly. In section “The proposed problem and its formulation,” first of all, we will point out the proposed research problem and then formalize the problem in detail. In section “A social contagion theory–based approach for location prediction,” the system architecture of TLP is designed first in mobile environment. Moreover, the proposed solution is put forward for addressing the formalized research problem. At the end of this section, the design and implementation process of the proposed algorithm are introduced in detail. In section “Experiments,” we will present experiments and results. In section “Conclusion and future work,” we draw conclusions and discuss future work.
Related works
In recent years, lots of user location prediction strategies use the digital footprints produced by user’s mobile devices—AP node (Cell Tower,5–7,16 WiFi8,9), GPS,2–4,17 and check-in data from location-based service (LBS) of social network.10,11 Aiming to solve the problem of predicting user’s continuous locations, we focus on employing the more continuous and intact digital footprints on location prediction, due to the sparsity and lack of integrity in the nature of check-in data. 10 Based on whether others’ historical data are used by the predicting algorithm, we classify current location prediction algorithms into two categories: independent model-based algorithm and combining others’ trajectory-based algorithm.
Independent model-based algorithm
As mentioned above, from the long-term perspective, human mobility possesses the characteristics of strong regularity and periodicity.10,13 Independent model-based algorithm leverages these characteristics of human mobility by analyzing the pattern in user’s personal historical trajectory data. In the literature, Ashbrook and Starner 2 proposed an algorithm which first extracts most frequently visited place of interest (POI) from user’s past GPS data by k-means clustering algorithm. Then calculate the transmitting rate from one POI to another resulting in a mobility Markov model to predict user’s location. However, this algorithm needs to set the number of cluster centers in advance and the number affects the predicting accuracy. Gambs et al. 3 extract POI by density and join (DJ)-clustering algorithm and model user’s mobility by multi-order Markov Chain (n-MMC), and predict user’s locations.
Because of the number of the cluster centers varies with different tightness of GPS data, the extracted POIs are more suitable to present the pattern of mobile trajectory. But n-MMC algorithm cannot offer a correct answer when user moves toward an unvisited location because the algorithm is completely based on user’s historical data. Ying et al.
17
came up with a prediction method based on the semantic information of location, which transfers a fixed physical location to the semantic information of the location (i.e. “home,”“work”) and store the semantic trajectory. When predicting location, the algorithm returns the best matched location by searching the tested user’s context in semantic trajectory. Yuan et al.
11
proposed a location prediction algorithm,
Location prediction algorithm includes others’ trajectories
Since human mobility possesses the characteristics of randomness and exploratory, researcher recently focusing on how to take advantage of others’ mobility data to elevate the accuracy of location prediction. Facing the problems in location prediction and recommendation, user’s social connection has been taken into consideration.5,18–20 Musolesi et al. 20 proposed a location prediction algorithm group mobility (GM) which introduces social relationship into prediction model. Boldrini and Passarella 18 added user’s location preferences into GM algorithm. Backstrom et al. 21 extracted user’s social relationships from Facebook and combined it with user provide location semantic information to predict user’s location. In the literature, Hossmann et al. 19 use the contact information in user’s mobile phone as their social relationships. Zhang et al. 5 discovering the latent connection between users’ call records and their interactions with each other and, proposed a location prediction algorithm based on call records. The core idea of these methods is using external or internal data to obtain connection between users and building user’s mobility model based on the influence of connection on user’s trajectory, then including the mobility model in location prediction algorithm. However, there are always some latent connections between users being neglected, we proposed a location prediction algorithm which directly discovers group of users who have influence over the tested user and uses their mobility model to modify the result deriving from user’s personal mobility model.
In order to discover group of users who have impact on the tested user’s movement, the wildly used method in recommending system—collaborative filtering—has been applied to location recommendation and prediction by more and more researchers.10,13,22–24 Lian et al. 22 proposed a location recommendation algorithm GeoMF using the geographic influence between locations and user’s location preference. The influence between locations is preset instead of analyzing from user’s movement pattern. There are also other algorithms using collaborative filtering in location recommendation.22,23 However, location recommendation methods are not quite suitable for location prediction. Lian et al. 13 proposed an algorithm called collaborative exploration and periodically returning model (CEPR) which uses collaborative filtering for location prediction. CEPR uses hidden Markov Chain to build a mobility for user’s periodic moving pattern in order to predicting location and apply collaborative filtering for location prediction when user explores new places. But the accuracy of CEPR is limited by the accuracy of initiative user’s moving pattern detection. Wang et al. 10 also proposed a method which builds a mobility model and processing strategy separately for the periodic and random mobility situation and then merges the two into a universal mobility model to predict user’s location. But when predicting the next locations, instead of considering user’s current mobility trace, the algorithm is using time and possibility of user appear at every location which is calculating from user’s historical trajectory. So, it is more suitable for forecasting user’s whereabouts in the future on a macroscopic level than predict user’s next location on a microcosmic level.
The main contributions of this article are as follows:
We proposed a social contagion theory–based location prediction algorithm.
We present a novel method to calculate users’ similarity based on mobility traces.
We implement a large-scale trajectory data compressing method by leveraging the idea of covering algorithm.
The proposed problem and its formulation
Proposed problem
In mobile environment, system records people’s mobility data through their mobile devices, that is, location information at every moment. This information can be an absolute position, like GPS coordinates, a relative position, or the AP’s location of which the mobile device links in. System uses user’s current or historical location data to build a user’s mobility prediction model and predicts user’s next location with the model. In general, system uses the mobility prediction model constructed by user’s location information to predict user’s future whereabouts. Different mobility prediction models may be constructed from different kinds of historical data and by data deriving from mobility data. The former, for example, use exclusively the tested user’s mobility data or combine with others to build the model. The later, for example, analyze the average time duration between phone calls and users’ interaction and build the prediction model with calling records. Different modeling methods may use various data, but they normally use accuracy as the measurement metrics. The prediction accuracy is the ratio between the number of correct predictions and the number of overall predictions. There is no doubt that a more advanced mobility prediction modeling method will achieve a better accuracy. Therefore, it is necessary to design and implement an efficient mobility prediction modeling strategy.
Problem formulation
The problem of predicting user’s next location is defined as follows: assuming there are
The result of predicting locations is barely satisfactory simply using the tested user’s historical data for modeling. Hence, during the process of constructing user’s mobility prediction model, we normally need to calculate other’s degree of association with the tested user. Choose a part of strong connection users to forge into a group and use their mobility data
In this article, we choose 2-MMC to model user’s mobility data.
where
There are
The current state of user is known
Calculate users’ similarity. At first, we convert users’ mobility data into their trajectories. The complete trajectory of user
The similarity between any two users
To sum up, the proposed approach is a new aspect of building user’s location prediction model. It uses the tested user’s and other’s trajectories to derive user’s next location. The details will be given in the following.
A social contagion theory–based approach for location prediction
Design of system architecture
Figure 1 describes the architecture of the proposed TLP approach in mobile environment. It shows the interaction between TLP and other entities. Moreover, TLP plays an important role in the whole architecture. First of all, the data fusion module gathers users’ real-time connection information from each AP and then transfers them to user’s mobility data

The architecture of TLP.
In this article, the proposed TLP algorithm is a social contagion theory–based location prediction method. It is suitable for forecasting user’s location in big data environment and uses crowd’s mobile patterns to adjust the prediction results. TLP applies the reduction idea of covering algorithm and the 2-MMC model to predicting user’s next location in cloud computing environment. Furthermore, it refines the prediction results and demonstrates better spatial and temporal performances at the same time.
The location prediction process of TLP is shown in Figure 2. First, TLP gathers all users’ mobility data and transfers the data into every user’s trajectory

The process of TLP.
Algorithm of TLP
In this section, we describe the specific process of TLP. Details are given in the following.
Model building phase
The sampling process is based on covering algorithm. Evenly distribute the trace id in the total trace set
Since there are
where
Predicting phase
From the microcosmic perspective, we apply the method of covering algorithm for sampling the large-scale coding trajectory data. Even though covering algorithm is normally used as a kind of supervised machine learning classification algorithm, we use it, from another angle, for data sampling. The main idea of covering algorithm is dividing the sample space by a series of hypersphere into two parts, inside and outside the hypersphere. Through computing, the results are a set of sphere center
The TLP algorithm proposed in this article uses 2-MMC for modeling based on a presentative feeling. The core idea of Markov prediction is that next state of the tested user is only related to
From the whole, the idea of this article comes from social contagion theory. In social contagion theory, personal behavior is unconsciously, on some degree, influenced by crowd. Accordingly, a person gives up some of the personal behavior and adopts the crowd behavior as the new personal behavior. However, the personal behavior within the crowd is more similar to certain groups which have strong influence on the person. We define the group of people who have strong influence on the tested user as personal trajectory similarity group. Inspired by social contagion theory, when this article builds the location prediction model, we use not only user’s personal mobility data, but also the mobility data of users from the personal trajectory similarity group to modify former. However, when people speak of personal trajectory similar group, they naturally think of social relation between users in real life (i.e. classmate, coworker, relative). We consider simply using social relations to construct personal trajectory similarity group is not the most optimal option in terms of social contagion. Respect to location prediction, fairly speaking, the personal trajectory similarity group which is constructed based on social relations possesses strong influence on the tested users under certain situations. To address the above problem, constructing a more comprehensive personal trajectory similarity group and discovering all the contagions between users which are hidden in historical data are required. In this article, as part of the TLP algorithm, the proposed procedure of searching personal trajectory similarity group is based on the idea of collaborative filtering. The similarity of trajectory is chosen to be the sole standard of determining the influence intensity between users. This makes the procedure focusing on finding out the users who have similar trajectory with the tested user and leaving out the social relations and the user’s co-occuring behaviors. With regard to calculating trajectories’ similarity, considering user’s randomness of mobility, there are hardly two trajectories exactly the same. Hence, user’s trajectory can be considered as a series of traces. After splitting the trajectory into traces, we calculate user’s mobility similarity by counting the number of same trace between different users. Finally, elements in the tested user’s personal trajectory similarity group are selected by similarity descendingly.
Experiments
In this section, we conduct a series experiments to evaluate and verify the proposed TLP algorithm with dataset gathered in real scenario. The two aforementioned algorithms, 2-MMC and NextCell, are chosen to be the baselines. Particularly, we would like to answer:
The influence of time duration, between current time and the arriving time of next location, on prediction accuracy of the algorithms;
The influence of the size of the user group on prediction accuracy;
The influence of the parameter
The influence of the social activeness of user on prediction accuracy.
Experiment design
Metrics
We select two metrics to measure the performance of the prediction algorithms—
where
The time-related accuracy,
Experiment design details
We choose MIT Reality Mining Dataset 25 to evaluate the performance of proposed location prediction algorithm. This dataset contains the location information, communications, and the mobile application preferences of 106 users during 9 months gathering. The details of the dataset are in Table 1.
Details of MIT dataset.
GSM: Global System for Mobile Communication.
We implement the proposed TLP algorithm on Hadoop platform and MapReduce programming framework. 26 There are 21 nodes in the cluster, 1 master node and 20 slave nodes, each of which is equipped with 16 GB memory and CPU 2.83 GHz.
The influence of arriving time
In this section, we examine the influence of the time duration between current time and the arriving time of location on prediction accuracy. The reason for conducting this experiment is many research works employ methods of predicting future locations from an un-linear time point of view. However, TLP is built as the predicting model from the perspective of trajectory (personal trajectory and trajectory similarity). So, we design the experiment to examine the influence of time on the trajectory-based location prediction method. We divide the locations by the time duration between it and its previous one. One hour as an interval, there are five intervals. To examine the impact of social contagion on TLP, we remove the social contagion–based part of TLP and use the remaining personal trajectory–based method, 2-MMC, as one of the baselines. Meanwhile, we invite a state-of-the-art social-based prediction algorithm, NextCell, 5 as another baseline. The result is illustrated in Figure 3. In general, the TLP is affected by the time variant least, the accuracy slightly descends with an increase in time. The difference of accuracy between the best (first interval) and worst (fifth interval) is merely 11%. NextCell performs slightly worse than TLP, but it also shows the same stability. The difference between the highest and lowest is about 17%. NextCell is a location and un-linear time-based statistics method, so time will not make too much difference on its accuracy. However, the accuracy of 2-MMC declines sharply with an increase in time. At the first interval, the different between it and TLP is 13%, but the difference raise to 27% at the fifth interval. Since the only difference between the two is the social contagion part, we think that the social contagion adjusts well with varying time. The reason might be that with the time duration between two locations shorter, user may focus on a kind of periodic behaviors or a series of movement left by in order to finish a task. This means the bonding between locations in the trajectory is closer. When the time duration increases, user might have already finished the behaviors, which means the association between locations starting to fade and user might begin another kind of behaviors. So, when the link between locations is weakening, the accuracy of 2-MMC declines sharply and the accuracy of TLP influence by social contagion of other people is impacted slightly.

Influence of time duration between locations on the accuracy of the model.
The influence of user group size
In this section, to examine the effect of user group size on accuracy, we extract 30 users and 60 users randomly from MIT dataset, 25 along with the whole 106 users in the dataset as three comparing datasets. The result of the three datasets is shown in Figure 4. It is shown in the histogram, 2-MMC has the least impact and its accuracy is floating around 45%–50%. This is because that the prediction accuracy is only related to personal historical trajectory. However, the accuracy of TLP and NextCell is ascending with an increase in user number. NextCell is mildly lower than TLP except the user number is 60, accuracy of NextCell is 63.8% which is higher than the 62% of TLP. This is caused by the dataset containing more phone call records in it, and the NextCell has more useful information to elevate its prediction accuracy. When the user number is 106, the accuracy of 2-MMC, NextCell, and TLP is 48.6%, 66.7%, and 71.6%, respectively.

Influence of user number on prediction accuracy.
The influence of parameter K on TLP
Because the prediction accuracy of TLP associates with the tested user’s personal trajectory similarity group, we adjust the parameter

Influence of parameter
The influence of user’s social activeness
Because the social contagion of TLP is based on the associations between crowd and a person in the real world to construct the mobility model, the crowd is not divided into friends and strangers. So, in this section, we want to examine the influence of social activeness. We classify the users in the dataset as active, semi-active, and passive based on the total number of communications with other people and the number of friends which are both offered in the dataset. We choose one user from each classification—user A (active), user B (semi-active), and user C (passive). 2-MMC and NextCell are the baselines in this section. The result is demonstrated in Figure 6. All three algorithms show the same tendency, in which the accuracy of C is the highest and the accuracy of A is the lowest. The accuracies of 2-MMC, NextCell, and TLP are, respectively, 38.4%, 53.3%, and 58.7%. The reason is that the trajectory of the active user shows more randomness in it, so it is more difficult to predict his next whereabouts. The semi-active user demonstrates more regularity, so the prediction is more accurate. Especially the passive user, the accuracies of the three algorithms are 90.6%, 93.8%, and 94.3%, respectively, as it is shown in the histogram. Even though affected by user’s mobility randomness, the difference of TLP’s prediction accuracy rate is the smallest, 35.5% between the best and worst, which is a bit smaller than the NextCell’s 40.2% and much smaller than 2-MMC’s 52.2%. TLP is based on trajectory similarity to find the similar users. During prediction, it can use others’ decisions under the same state which the tested user is in as reference for predicting the next locations. Hence, whether the user is active or not, TLP can achieve preferable accuracy.

Influence of user’s social activeness.
In conclusion of the results above, we have achieved these following objectives: First of all, we use the time duration as the variable to see how it affects prediction accuracy. And TLP shows better accuracy and is affected by the time variant least. Second, the result shows that with more users involved as input, TLP achieves better accuracy than itself and than others. Third, we adjust the size of the similarity group by changing the value of parameter
Conclusion and future work
This article proposed a location prediction algorithm TLP based on social contagion theory. Addressing to the regularity and randomness of human mobility, TLP builds second-order Markov-based personal mobility model and user’s personal trajectory similarity group based on social contagion theory. During the process of finding the personal group, we split user’s trajectory and transfer it into the coding trajectory. Then combining the idea of covering method, we compress the large-scale user trajectory dataset into a consistent subset. Furthermore, we apply the trajectory-based user similarity algorithm to find out user’s personal group in the subset. Finally, we conduct a series of experiments on the performance of the proposed algorithm from multiple aspects with MIT Reality Mining dataset. The result shows that the proposed social contagion–based location prediction algorithm possesses great accuracy.
In the future, we are planning to use the semantic information of location (such as user-defined location tags) as users’ trajectories, which should be beneficial for achieving more prediction accuracy and have wider range of realistic usage due to the semantic information. Moreover, a more advanced trajectory model can also be used for similarity computing, such as a tree-like model or graph structure. These models can preserve more information than the BOOL vector and achieve more accuracy in similarity computation. Furthermore, we prepare to apply TLP to the applications of resource allocation in wireless network.
