Abstract
Introduction
With the popularity of mobile Internet, the technology of mobile user interest prediction has attracted extensive researches both from industry and academia. On one hand, user interest prediction provides exciting opportunity to develop many personalized applications, such as search and recommendation. On the other hand, by figuring out users’ interest, service providers can make the best of network resources, such as the construction of content distribution network (CDN).
Despite various datasets1–3 generated in mobile Internet,
However, typical spatial-temporal features that can be used in the strategies above ignore the fact that human behaviors are highly predictable and centralized in spatiality and temporality. On one hand, people spent most of their time at a few locations,9–11 such as home and work places. We define these places as hotspots in this article. Most individuals have regular mobility pattern: commuting to workplace in the morning, spending most of their daytime at workplace, taking some leisure activities after work, and returning home in the evening. 12 An example of a typical user in a single day is illustrated in Figure 1. On the other hand, bursty is a nature of human behavior. 13 The analysis of human temporal behaviors reveals that both memory effect14,15 and periodic characteristics 16 can be found in human behaviors, indicating that regular behavior pattern may exist. We then define these hours containing heavy network usage as hot-times. Thus, human behavior shows centrality both on temporality (hot-times) and spatiality (hotspots). Then, an intuitive question is, can these centralities improve the prediction of mobile user interest?

Behavior of a typical user in a single day. The upper panel shows the online activity records. Each bin represents a network usage, and the width indicates the length of corresponding record. The middle panel shows the heavy mobile Internet usage on timeline. The lower panel is spatial plane. Each dot specifies the location of base station. S1, S3, and S4 are hotspots attracting most of the network usages. In particular, S1 and S3 are home and work places, respectively. S4 is a frequently visited location for leisure activities, such as bar or shop. T1–T5 are hot-times and correspond to five heavy network usage time periods. Meanwhile, they also correspond to status transition on spatiality. Both memory effect and periodic characteristics contribute to the predictability of state transition on temporality and spatiality.
Therefore, this article provides a new solution
The main contributions of this article are listed as follows:
Based on the typical fields that are widely existent in various datasets, we systematically compare the importance of traditional spatial-temporal features and how much they matter in predicting mobile user interest using standard classification algorithms.
Integrating hotspots and hot-times information, we propose a novel and effective feature transformation method for interest prediction in the era of mobile big data. Validated by the state-of-the-art classification algorithms, namely DecisionTree (DT) and RandomForest (RF), extensive experiments show that feature sets generated by TCB have great advantages over traditional spatial-temporal feature sets in terms of precision, recall, and f1-score.
Meanwhile, extensive experiments show that statistical summaries related to hotspots and hot-times can make significant contribution to the prediction of mobile users’ interests, which provides new insight into human dynamics related to interest and mobility.
The rest of this article is organized as follows. Related work is presented in the next section followed by the section that offers the characteristics of mobile network usages. The next section details about the feature TCB followed by the section that details the performance and comparison of various feature sets and we conclude this article in the final section.
Related work
This article focuses on predicting the interests of mobile users using multidimensional contextual information, and concerns human dynamics in the following aspects:
Temporal features based
In Yuan et al.,
4
timeslot
Spatial features based
The location of cellular network
Preference based
Song et al.
5
adopted the preference selection scheme. Specifically, when a user chooses to return to a historically visited location with the probability
Characteristics of mobile Internet usage
In this section, we study the characteristics of mobile network usage on a large-scale Usage Detail Records (UDRs), which is described in details in section “Dataset and preliminary.” First, we investigate the diversity of mobile Internet users on spatiality and interest, respectively. Then, we measure how predictable are mobile users’ interests and the effect of spatial-temporal information on predicting them. Finally, this article introduces centrality phenomenon both on spatiality and temporality which inspires our feature transformation method.
Diversity
First of all, we take an overview of the diversity of mobile users. The diversity of a user means the number of unique interests/locations visited by him. The statistical results are shown in Figure 2.

Diversity of mobile users’ interests/locations. The horizontal axis represents the level of corresponding diversity, while the vertical axis denotes the proportion of corresponding users.
Both the diversity of location and interest show the form of lognormal distribution, which are widely found in the analysis of human behavior.21,22 Users with extreme location/interest diversity are rare, and most of the users have a constrained scope both on spatiality and interest. Specifically, users are more limited on spatiality than interest. The distribution of interest diversity is wider and shorter than that of location diversity. Since it is much easier for users to explore a new website than implementing physical movement, users are more free and willing to explore new interests on Internet, which makes interest prediction much trickier and more valuable. We then measure the predictability of mobile Internet users.
Predictability
To evaluate the predictability of mobile users’ interests, this article utilizes information entropy inspired by Song et al.
11
A larger entropy value means the larger uncertainty. First, the max (or random) entropy of user
where
where
where

Cumulative distribution of users with max, uncorrelated, and conditional entropy. The form A|B means the conditional entropy of A under the condition B. To obtain the conditional entropy of activity under the condition time, we extract the hour in timestamp from each record.
Spatiality
To investigate the relation of mobile network usage on spatiality, we refer to tie strength theory proposed in Granovetter.
23
Let

Network usage characteristics on spatiality and temporality: (a) and (b) The complementary cumulative distribution of strong/weak tie from five most favorable locations, respectively. (c) and (d) Network usage of a typical user in observation period (23 days). Color means the proportion of network usage in strong/weak ties. The white-dashed lines indicate weekends. Horizontal axis represents the timeslot, and the vertical axis represents the index of days.
Both strong and weak ties are statistically concentrated in the scope of several most favorable locations. Given a certain frequent visited location, the further a place is apart, the less likely it is to attract mobile network usages. Given certain distance threshold, the probability that network usages occur at a location tend to be in proportion to its popularity. Besides, over 70% of network usages (strong/weak ties) are contributed to the most frequent popular location (location 1), and the second popular location (location 2) attracts about 18% strong and weak ties.
Figure 4(a) and (b) indicate that users tend to access mobile Internet at several locations and they may be more informative compared to traditional spatial features, such as gyration and distance between consecutive records on spatial. Similar to the philosophy of principle components analysis (PCA), we select hotspots defined in section “TCB algorithm” as reference anchors to specify their effects on current online activity on spatiality. The details about this process are presented in section “TCB algorithm.”
Temporality
To better illustrate how regular mobile Internet usage is on temporality, we randomly select a user from dataset and plot his network usages in Figure 4(c) and (d). Without losing of generality, we split timeline into 24 timeslots by hour. And weekends are represented by white-dashed lines.
Several clear vertical lines are observed in the whole observation period, indicating that user tends to access mobile Internet at several particular timeslots and the temporal pattern is stable. This phenomenon also coincides with human behavior characteristics such as memory effect14,15 and period patterns. 16 So similar to spatial features, we utilize hot-times defined in section “TCB algorithm” as reference anchors to specify their effects on current online activity on temporality. We will provide the detailed description for this process in section “TCB algorithm.”
Feature transformation based central behavior
According to the analysis provided in section “Characteristics of mobile Internet usage,” mobile users reflect centralities both on spatiality and temporality. Specially, users tend to access mobile Internet at several most frequently visited locations (hotspots) on spatiality and contribute relatively more network usages at particular timeslots (hot-times). We then integrate these centralities into the designing of TCB.
TCB algorithm
The intuition behind TCB is that centralities (hotspots and hot-times) can affect user online activity. We assume that these centralities are stationary on time series, and the more closer a centrality is, the influential it can be. For the simplicity of illustration, we make the definitions below:
Algorithm
To begin with, TCB collects
Then, in the loop L5-L10, TCB first fetches original information such as user
Based on the above description, all features used in this article are shown in Table 1. They are classified into five groups according to their generation and background. In particular, HS and HT are feature sets produced by TCB. Both HS and HT consist of effects received from centralities and statistical summaries at centralities.
Symbols and corresponding illustration.
Centrality detection
As it mentioned in sections “Spatiality” and “Temporality,” hotspots and hot-times are behavior centralities on spatiality and temporality, respectively. Therefore, the philosophy behind
Let
Correlation analysis
In this section, this article investigates the correlation between different feature sets with users’ interests. Distance correlation (DC)
To make the distance between different interests computable, this article utilizes dummy variables to represent each interest. For the details about DC, we refer the readers to Székely et al.,
27
and we apply the package “energy”
29
in the process of computing DC values. Since the complexity of DC is
As shown in Figure 5, in general, both HS and HT have similar performance, and show great advantages over O/S/T, indicating that feature sets generated by TCB are much more informative in predicting mobile users’ interests. The performance of O and S is similar, while T ranks the worst. It also suggests that classical spatial-temporal features (T and S) are limited in predicting mobile users’ interests. In section “Evaluation,” we will compare the performance of different feature sets in detail.

Correlation between different feature sets with users’ interest with various steps. The red line in each box indicates the median. Only three steps are taken due to the huge time complexity. The numbers of hotspots and hot-times are set to 3, and statistical window is 7.
Evaluation
In this section, we compare the performances of various feature sets under the state-of-the-art classification algorithms. In particular, we investigate the performance of X/STA to analyze the importance of statistical summaries related to feature set X. Moreover, we also investigate the effects of the number of hotspots and hot-times, and the effect of length of time window used to obtain them. Although the framework of our experiment can be regarded as an interest prediction method, we lay our emphasis on the performance of various feature sets under standard classification algorithms.
Dataset and preliminary
UDRs used in this article span over 23 days, covering nine municipalities in the south of China. Each piece of UDR is generated when user accesses to mobile Internet via applications on his or her smart phone. The key fields and examples in our UDRs are provided in Table 2. Note that
Key fields and examples in UDRs.
UDRs: Usage Detail Records; URL: Uniform Resource Identifier.
Each record is a four-elements tuple
To obtain ground truth data used for future training and validation, several challenges need to be considered. On one hand, due to the screen limitation of mobile devices, it is common to have mistaken operations in mobile Internet usages. Thus, not only the meaningful online behavior, massive noises are also captured. On the other hand, the bipartite matrix
To obtain reliable and representative candidate users, we discard the individuals with less than 15 records everyday on average. When filtering candidate websites, duration time, frequency, and the number of coverage users are referred. Specifically, in every aspect, websites are ranked in descending order according to their values, and we select the subsets when the energy exceeds defined threshold
Euclidean distance is not sensitive if variates vary in small intervals. Moreover, scale transformation methods seem hard to guarantee the fairness among all variates. To avoid these defects, we choose classification algorithms in Pedregosa et al.
32
using entropy as an index in the process of modeling, namely DT and RF. In our cases, Gini impurity is used for both DT and RF in measuring the quality of a split. RF has 10 trees, and each of it with
This article then utilizes precision, recall, and f1-score to measure the performance of different feature sets, which are defined in equations (4)–(6).
Specifically,
This article utilizes cross validation in our experiment to avoid overfitting. In particular, for each feature set, we split total data into
Performance comparison
In this section, we compare the performances of TCB among different feature sets O/T/S. Since TCB generates HS and HT, and both HS and HT contain corresponding statistical summaries, we also investigate the performance of sub-feature sets (namely HS, HT) generated by TCB and the effect of statistical summaries related to them. Without losing of generality, the number of hotspots
Single feature set
First of all, we investigate the performance of independent feature set, respectively, and the results are given in Figure 6. Note that X/STA indicates the feature set related X without corresponding statistical summaries.

Performance (precision/recall/f1-score) of DT and RF, respectively, when different feature sets used.
Regardless of metrics and classification algorithms, the tendencies of different feature sets are nearly identical in the process of cross validation, indicating that the feature sets used in DT and RF are stable and reliable. Compared to other feature sets, HS and HT rank the best, followed by HS/STA and HT/STA, indicating that (1) feature sets generated by TCB are much informative and suitable for predicting mobile user interest, and (2) statistical information related to hotspots (hot-times) in HS (HT) is important for HS (HT) to achieve a better performance. On the contrary, the performances of traditional spatial-temporal feature sets S and T are even worse than that of original features O, implying that spatial-temporal features in single dimensionality are insufficient in predicting mobile user interest. Both spatial and temporal information should be taken into account. Moreover, the performances of O, S, T, HS, and HT are mainly consistent with the relation depicting in Figure 5, indicating that the DCs between different feature sets and user interest are reliable.
Dual combination
Second, we examine the performance of dual combination of different feature sets. Note that values in Table 3 are the average after 10 runs in cross validation.
Performance (precision/recall/f1-score) of dual combination of different feature sets.
HT: hot-time; HS: hotspot.
Results show that the performance of DT and RF is very similar. Compared to original feature set O (Figure 6), both S and T provide additional meaningful information for classification. However, the improvement seems to be limited. Besides, both of the performance of OS and OT surpass the performance of ST, which means traditional spatial-temporal feature sets are redundant to each other, even less informative compared to original information recorded in O. On the contrary, feature sets generated by TCB are much more abundant, compared to the original and traditional spatial-temporal feature sets, and bring universal and remarkable improvement in dual combination cases. In particular, on temporality, integrating original information O, the precision improvement brought by HT is 38.3% compared to feature set T in the best performance (when RF is executed). While on spatiality, HS improves the precision by 36.1% compared to feature set S in the best performance. Moreover, although statistical summaries about hotspots and hot-time are meaningful in the prediction of user interest, original feature set O and traditional spatial feature set S can still make considerable compensation when statistical summaries are missing while the traditional temporal feature set T is helpless. Finally, despite their impressive performance of HS and HT when integrating O, S, and T, respectively, the combination of HS and HT does not show great improvement. An intuitive interpretation is that HS and HT are highly coupling in space–time. Since user mobility is constrained in a small area, mobile network usages occurred in hot-time are of great probabilities in hotspots.
Multiple combination
Next, we investigate the performance of multi-combined feature sets, and how useful each feature set is in different combination cases. RF is used in modeling for simplicity. Original feature set O is used as starting line since it is the basic information in raw data. We also compare the performance of preference select proposed in Zhao et al.
6
For the simplicity, the probability of inertia is set to 0. User prefers to return to a historically interest category by the probability

Prediction performance comparison. The upper panel shows the performance of different feature sets. The lower panel shows the importance of different feature sets in model fitting in corresponding classification task (from left to right).
As it shown in Figure 7, HS- and HT-related feature sets bring universal improvement to user interest prediction. Compared to OST, OHSHT promotes the performance by 16.2%, and the figure is 13% when OHS/STAHT/STA is used. By integrating HS and HT, the final precision of OSTHSHT can even reach 83%, generating 17.2% improvement compared to using OST alone. Without statistical summaries related to HS/HT, the improvement decreases to 12.6%; however, it is still remarkable. Note that the performance of OHS/STAHT/STA and OSTHS/STAHT/STA is very close, which indicates traditional spatial-temporal features in S and T are redundant when taking HS/STA and HT/STA into account. Preference selection is only superior to OST, far less impressive than that combines HS and HT.
We then go further and investigate how much different feature sets matter when multiple sets are used in model training from the perspective of feature importance. 32 The importance of each feature set is the sum of corresponding features. Compared to traditional spatial-temporal feature sets S and T, HS and HT are more valuable in the process of modeling, which is obvious in the cases of OST and OHSHT. In the case of OSTHS/STAHT/STA, although the most important feature set is O, the performance improvement brought by the combination of O, S, T, HS/STA, and HT/STA is about 30% compared to using O alone. Finally, in the case of OSTHSHT, HS and HT are the most significant feature sets in the process of classification modeling. The different importance distribution in OSTHS/STAHT/STA and OSTHSHT also indicates that statistical information at hotspots and hot-times can make impressive contribution to users’ interests prediction.
Effect of the number of hotspots/hot-times
Since the number of hotspots/hot-times affect the collection of the effects received from hotspots/hot-times (HS/STA/ HT/STA), and the corresponding statistical summaries at hotspots/hot-times, in this section, we investigate the performance of HT/HT/STA/HS/HS/STA as a function of the number of hotspots/hot-times. For the simplicity of illustration, we refer hotspot and hot-time as centrality, and set

Performance of HS/HS/STA/HT/HT/STA when the number of hotspots/hot-times change.
At the very beginning, the performances of HS/STA and HT/STA show significant improvement when the number of centralities increases, and they reach saturation when the number of centrality gets larger. The saturation states for both feature sets show great advantages over O/S/T (Figure 6). However, the variation of the number of centralities brings little effect when corresponding statistical summaries are taken into account (see the performance of HS and HT), indicating that statistical summaries at centralities can make effective compensation when the number of centralities is limited. Moreover, the performance of HS is always superior to that of HS/STA, which implies statistical summaries at hotspots cannot be replaced by a larger number of hotspots in HS/STA. This phenomenon also verifies that statistical summaries related to hotspots and hot-times can make significant contribution to the prediction of mobile user interests.
To sum up, although a larger number of centralities can lead to a better prediction performance, it also means a higher dimensionality for data processing. With the help of statistical summaries at centralities, TCB can achieve relatively high performance even in a limited dimensionality, which makes it favorable in the era of mobile big data.
Effect of statistical window
Since the hotspots/hot-times information and corresponding statistical summaries are extracted in a time window

Performance variation when win length changes.
Results show that time widow has little effect on the performance of HT, indicating that few records are required to obtain hot-time information that is used for projecting original temporal features into new vector space. On the other hand, the performance of HS, HS/STA and HT/STA increases along with
Conclusion
Based on the ground truth that human behaviors are highly predictable, this article proposes a novel features construction method TCB by utilizing hotspots and hot-times information. Specifically, TCB utilizes several hotspots and hot-times as reference anchors on spatiality and temporality, respectively. Then, the effects that current record received from each hotspot and hot-time are collected according to its influence and distance. Besides, statistical summaries, such as average displacement to hotspot, average record duration on hotspot, average time interval to hot-time, and average record dwelling time on hot-time, are also meaningful and integrated in mobile user interest prediction. Based on classical classification algorithms, such as DT and RF, the proposed TCB is validated on a large UDRs dataset generated in real physical world. Results show that features generated by TCB have an advantage over traditional spatial-temporal and preference features on precision, recall, and f1-score. With the help of TCB, the final precision can reach 83%, more than 17.2% improvement compared to using original and traditional spatial-temporal features when RF is conducted. TCB only requires
