Abstract
Introduction
User review plays an important role in making decisions since it provides valuable information. With the openness of the online review platform, many attackers get profit and monetary rewards by doing some irregular activities, such as writing fake reviews and posting advertisements to interfere with the rank of products.
The task of detecting fraud reviews and reviewers exists for a long period. Some works focus on the characteristics of content and build text-based classifiers to get high accuracy. Some works focus on analyzing abnormal behaviors (e.g. timestamps, footprints, and distributions) to discover suspicious patterns. Graph-based methods are popular since they leverage the relationship ties between users, which makes considerable signs of progress in spotting malicious accounts. At the same time, these methods are gradually adopted by many organizations for deploying their risk-control systems.1,2
Unfortunately, attackers begin to avoid detection systems by changing their measures. Some of them try to look normal, which add links to popular entities. 1 Some of them hire workers to take part in spam activities by the crowdsourcing platforms (e.g. RapidWorkers and Microworkers). 3 In particular, deep learning-based models could be applied for generating fake reviews, 4 Even worse, all of these attack methods are practical and effective.
As fraudsters become adversarial, distributed, and dynamic, anti-spam tasks face a huge challenge. Figure 1 shows three types of fraud users in online app review platforms. A(I) is an attacker who wants to promote the rank of certain products and post many positive reviews for it. It is easy to be detected by behavior features. To avoid the detection systems, A(II) is an updated strategy with posting positive reviews to different products, which means the attacker presents like a normal user by camouflage. A(III) is more sophisticated, reviewers take part in crowdsourcing activities and post reviews to products with the distributed time slots and different devices, which brings a challenge to fraud user detection.

Three adversarial fraud users in online app review platform.
In this paper, we draw attention to the camouflage users and propose our DDF system by utilizing Graph Convolutional Network (GCN) 5 algorithm. By studying the patterns of data set and analyzing the behaviors of fraud users, we firstly identify obvious abnormal users of train set as positive samples (seed candidates) and others as negative samples. Secondly, we construct a graph with 82,542 nodes and 42,433,134 edges and extract the text and behavior characteristics for each node. Then, we train a GCN model to find suspicious users from the test set.
Finally, we measure the precision and recall rates of the detected suspicious users.
Considering some adversaries present like normal users in the early days and then do illegal activities to avoid detection systems, it is hard to verify the accuracy of suspicious users. Therefore, we evaluate these suspicious users by expert experience with text and behavior characteristics for 30 days. Surprisingly, DDF is able to spot almost 50% potential adversaries and the precision remains nearly 95%.
We summarize the contributions of this work as follows:
We shed light on the importance of detecting adversarial fraudsters in anti-spam tasks and firstly build our DDF system for discovering more potential abnormal users.
Our system is an efficient and scalable system by using Graph Convolutional Networks. Additionally, it can be transferred to other anti-spam applications and platforms.
We validate the performance of our system in real-world datasets by deploying it on the Tencent Venus Computation Platform, which shows that our DDF system is quite competitive and achieves a high precision in real anomaly detection industry tasks.
Related work
The framework
Our goal is to detect adversarial fraudsters in the large online app review platform. In this work, we propose a Smart DDF system by combining content, behavior, and graph characteristics and deploy it in real-world scenarios.
Overview
As Figure 2 shows, DDF mainly consists of three components: pre-processor, seed collector, and detector. The pre-processor is used for raw data processing and building a graph for the GCN module’s training and predicting. The seed collector is designed based on the feature extraction after the preprocessor. After characterizing and modeling fraud users in online app review systems, we can identify a small number of precision-focused seeds. With many iterations of training a GCN model, our seed set will expand dynamically and uncover new types of potential adversarial fraudsters. The graph-based detection module focuses on the structural characteristics of fraud user, it is designed by leveraging neighbors information via the GCN algorithm.

Architecture of DDF system.
Firstly, we collect user review data set and divide it into two parts: (1) raw review logs for text and behavior feature extraction and (2) a user graph for obtaining structural characteristics. Secondly, we assume that fraud users can be identified by utilizing content (similarity, special symbol, semantic) and behavior (review time, device update, frequency) features. By setting different thresholds, fraud users can be identified by a seed collector with high precision. Then, we train our GCN models by leveraging user features and labeled information means it is from seed set and the corresponding user is a positive sample). After that, users who have high similarities with fraud users will be recognized. Especially, the hidden (distant) users can be discovered via a propagation function. Finally, we evaluate the detected users and expand our seed set to uncover adversarial fraudsters as many as possible.
Feature extraction
In this section, we discuss the process of feature extraction, which leverages content and behavior characteristics for each user.
In online app review platform, user review (comment) is regarded as an obvious feedback signal to products. In addition, the rating of a review scaled from 1 star (worst) to 5 stars (best) represents a user’s attitude. Table 1 shows 11 content-based features we extracted for each user. The SRN (Similar Review Number) is calculated by using Simhash 15 algorithm and the WF/BF is labeled by using our own black dictionary for online app review.
Content features extraction.
In reality, user behavior contains much information to describe different users such as review times, the frequency of posting reviews, and the number of review devices. Table 2 shows three behavior features we extracted for each user.
Behavior features extraction.
Apart from that, temporal action reflects the sequential behavior of users. Table 3 lists the temporal action features we extract for each user, which are utilized as the attributes of nodes in our graph models.
TQD is a 24-dimensional vector, each dimension represents the number of reviews of each user during 24 hour.
SQD is the vector of user rating from 1 to 5
Temporal action features.
Seed identification
Aiming at modeling different types of users in online app review platform, we categorize all users into three types: fraud user, normal user and suspicious user. We see that normal user is hard to identify since adversarial fraudsters could camouflage to look like normal users. For example, given a certain review not having illegal information, we hardly to determine if it is honest. If a person write three reviews for same products, we also cannot make sure that he/she is a fraud user just by leveraging his/her behavior feature. Through our investigation, we find that fraud users always have the same distribution in some characteristics inevitably, so we define fraud user firstly. Fraud user is defined by using content and behavior features mentioned in Tables 1 and 2. For instance, if a person who posts advertisement, phone number, irrelevant content or has obvious attentions to promote products with a large number of fake reviews, we regarded he/she as a fraud user. Other undetected users are labeled as normal user. However, most adversarial fraudsters are not detected and labeled as a normal user in this process. To distinguish normal user and adversarial fraudsters, we increase a new type of user—suspicious user. Based on labeled data of fraud users and normal users, we build GCN-based model to discover users who have a strong relationship with fraud users, even they camouflage as normal users in content and behavior characteristics. Particularly, we define suspicious users with the help of domain experts.
Graph construction
In this work, we consider a two-layer GCN for fraudster detection. As Figure 3 shows, nodes in
Here, we define layer-wise propagation rule:

A demonstration for the construction of graph.
Note that
Here, the softmax activation function is defined as
where
Figure 4 shows the two-layer GCN we constructed in our DDF system. Propagation of feature information from neighboring nodes in every layer improves classification. We show that GCN model can detect more suspicious fraudsters than other classification models.

A fraud detection model based on graph convolutional network: users with high risk (red nodes) and users with uncertain risk (black nodes). An edge between two users (nodes) means they have reviewed same app during a period.
Deployment
Finally, we deploy our risk-control system DDF on Venus Computation Platform provided by Tencent Inc. Specifically, we implement the content and behavior features extraction module using Hive SQL, graph model training using Python and store graph and model information in PCG S3 which is an object storage system in Tencent.
Experiment
In this section, we evaluate DDF on real-world data set with baseline methods to verify the performance of fraudster detection.
Data set
The review data set provided by Tencent includes 85,025 users, 302,097 reviews and 7,584 apps. Based on this data set, we extract features for each user as expressed in Section 3.2 and construct a graph structure as introduced in Section 3.4. Table 4 shows the number of nodes and edges. It’s worth mentioning that we have excluded the isolated nodes from the graph. Because isolated nodes have no neighbors, they cannot be affected by their neighborhood during network learning.
Graph statistics.
We divide the data set into train set and test set to verify the detection performance of our DDF system. As Table 4 shows, the number of train set is 31,450 and the number of test set is 51,092.In real industry work, it is hard to manually label each log since adversarial users can change their attack methods frequently and human labor is expensive. As introduced in Section 3.3, we filter out obvious abnormal users (also called seed) as positive samples by experience rules in our train set. For test set in our work, we label each user whether he/she is a fraudster according to the text and behaviors in the next 30 days. Fraud users are regarded as positive samples and others are negative samples. The labeled result is listed in Table 4.
Seed selection
In this work, two rules are utilized to select obvious abnormal users as seed. Fraud users usually review apps in continuous days, or use lots of devices to publish their comments in order to seek profit. We choose two attributes to label users, expressing as
where,
By setting

Distribution comparison between fraud users and normal users with RQ and RRR features: (a) normal users, and (b) fraud users.
Combining the two attributes above, Table 5 lists the quantity of fraud users by using different thresholds of
Suspicious user statistics of train set by using different threshold.
Baseline methods
We compare the detection results with the following state-of-the-art baselines. In this work, two widely-used methods for classification and two famous graph structure based methods for node embedding are utilized as our baseline methods.
Detection results
We evaluate our experiment with the precision and recall metrics. Table 6 lists the detected results of DDF. Users of train set are labeled through thresholds and as introduced in Section 4.2. In Table 6, (
Detected results of DDF by using different threshold.
Figure 6 clearly shows the detection details with different thresholds. The black histograms represent fraud users of train set with the corresponding thresholds. The red histograms represent suspicious users of test set who have been detected according to the corresponding trained model. We conclude the results from two aspects:

Detection performance of DDF by using different thresholds.
Accordingly, seed selection seems important for adversarial fraudster detection when it is impossible to label a ground truth for each user of the train set.
Performance comparisons
Detected users of baseline methods are listed in Table 7. Performance comparisons of DDF and baseline methods are shown in Figure 7. It can be seen from 7 that recall of DDF (the red line) remains stable under different thresholds. Comparatively, recall rates of LR and RF are high in (d-10) but reduce a lot under other thresholds such as (d-50) and (
Detected users of baseline methods with different thresholds.

Detection performance comparisons with DDF and baseline methods.
FdGars 14 is our previous work to apply GCN on fraud detection. In that work, we mainly focus on network construction without considering the influence of seed selection. To control the complexity of our experiments, we just choose one group of thresholds to select fraud users as our initial seeds.
In reality, seed selection is the key point of fraud detection because labeling fraud users is a difficult problem. Therefore, in this work, we design several groups of thresholds to test the stability and efficiency of the methods. The detection results demonstrate the good performance of our presented system.
Efficiency
Efficiency is an important factor in industry tasks. Table 8 lists the time costs in training and predicting process. The iteration number of GCN is set to 500 and we only use CPU to train a GCN model. We can see that the time costs are all in a tolerable range. As mentioned in Section 3.5, the detection system is deployed on the Tencent Venus Computation Platform and provided as a fraud detection service on the Tencent Beacon Platform.
Time costs of training and predicting procedures with different thresholds.
Conclusion
In this paper, we study adversarial fraudsters in online app review platform and categorize them into three types by their motivations. To tackle the problem of finding new abnormal users in a large-scale network, we present DDF (Detect, Defense, and Forecast) system by combining content, behavior, and temporal action features and build a GCN model to capture structural information between users. After that, we evaluate the DDF system by utilizing real-world review data set and comparing the detection results with baseline methods. Finally, we demonstrate its good performance on adversarial fraudster detection and provide this system on the Tencent Beacon Platform.
