Abstract
Keywords
Introduction
With the development of sensor-based technology, cyber-physical systems (CPSs) are becoming an important part of our daily life.1–4 A large amount of researches show that the number of sensing devices around the world has exceeded the population in 2013. What’s more, its growth rate is still on the rise.5–9 These sensing devices are deployed in various occasions.10,11 For example, a sensing device called radio-frequency identification (RFID) which is deployed in supermarkets would carry out real-time statistical analysis of sold items’ data. Besides, kinds of intelligent sensing equipment such as mobile phones12,13 provide not only Global Positioning System (GPS) but also payment function. In the context of Big Data, large and complex data generated from a variety of fields would provide lots of valuable information.14,15 The improvement of people’s living conditions and high efficiency of businesses primarily rely on how to make full use of big data intelligently and correctly and how to retrieve meaningful knowledge from massive data. Then to seamlessly integrate the virtual world and the physical world becomes possible.16,17 However, data from mall is considered to be the most valuable. Therefore, a recommender system, which could recommend a variety of products to users, is designed to utilize these data directly by carrying out data mining for big data.18–20
Neighbor-based filtering algorithm is considered as the most basic and popular algorithm used in recommender system. As a consequence, it has been not only in-depth studied in the academic community but also widely used in the industry. As for neighbor-based algorithm, it is divided into two main categories, one is user-based filtering algorithm and the other is item-based filtering algorithm. The user-based filtering algorithm mainly contains two steps: (1) obtain the set consisting of users with the similar interest and (2) recommend a certain user in this set some items, which he has never bought before, based on others users’ shopping records in the set. And the item-based filtering also contains two steps: (1) obtain the item set consisting of similar items and (2) recommend a user some items which is in the same set as those which are bought by the user.
In addition, it is also reported that the user-based filtering algorithm and the item-based filtering algorithm have a various performance in different applications.21,22 Generally speaking, the user-based filtering algorithm suits those occasions where users have a strong subjective feeling much more. For example, in the news recommender system, it would perform well because those who have similar interests are more likely to pay attention to the same news. But, in terms of books, e-commerce, and movies, the item-based method would provide a more well-performed recommendation. As a result, some researchers put forward a mixed algorithm combining user-based and item-based algorithm. 23
In the subsequent works, trust information can be explicitly collected from users or implicitly from users’ behaviors. What’s more, integrating the trust degree among users into recommendation could lead to a better performance and have been verified by researchers. Therefore, a mixed method incorporating trust with user-based or item-based algorithm has been put forward and its correctness and effectiveness have been proven. 24
In this work, it is noticed that the most important thing in collaborative filtering (CF) recommendation system is how to select the most valuable neighbors. And neighborhoods can be divided into two kinds, user neighborhood and item neighborhood. In terms of user neighborhood, if the selected neighbors are low quality which means their opinions make no sense to others, the extra following improvements on recommendation would not have a significant effect. And as for item neighborhood, the similarity between two items also matters a lot. It is obvious that user similarity, trust degree among users, and item similarity are decisive factors in selecting these two kinds of neighbors. So, taking three elements discussed above into account to make recommendation would enhance the performance. On the basis of previous analysis, an integrated collaborative filtering recommendation (ICFR) approach that combines item ratings, user ratings, and social trust for making better recommendations is presented in this work. In contrast to previous CF recommendation works, ICFR approach makes full use of correlation between data to select the most influential neighbors of users and items based on user similarity, trust degree, and item similarity. And the trust degree includes two aspects: (1) global trust (i.e. reputation of users) and (2) local trust (i.e. the direct trust between two users). Therefore, in ICFR approach, neighbors of users selected based on similarity and trust are those whose opinions and behaviors could make a difference to others and neighbors of items selected based on similarity are those which have many common characteristics with other items. Then, ICFR approach searches the potential products recommended to a user based on not only his user neighbors’ opinions and shopping behaviors but also the item neighbors of those item which are bought by him before. Combining these three factors together, ICFR approach proposed in this work provides a well-performed recommendation. And plenty of experiments show that the proposed ICFR approach can effectively enhance the recommendation performance in terms of mean absolute error (MAE) and root mean square error (RMSE).
Related work
The proposed ICFR approach’s core idea is the CF algorithm. It can help extract the useful information from a large-scale dataset. Because of its advantage in dealing with information with potential and complex relationship in recommender system, CF algorithm becomes more and more popular nowadays.
Tapestry 25 is considered as the earliest implementations of CF system. This system demands members in a community show their views explicitly and system would filter out the most valuable things based on these opinions. CF system is then applied in GroupLens26,27 successfully in 1990s, which provides a pseudonymous CF solution for Usenet news and movies. Then, CF algorithm gradually penetrated into the e-commerce industry used to provide precise recommendation for customers. And the most famous application is Amazon. 28
As the e-commerce business developing quickly, a recommender system providing precise recommendations is in great need. A crucial factor leading to a more accurate recommendation is how to utilize the large-scale data, which means digging out the relationship between different data. In the context of the Big Data, an annoying problem is that there are a great number of redundant information, so CF algorithm is integrated into recommendation for a better performance.
In general, CF algorithm can be divided into two categories. One is item-based algorithm 29 and another is user-based algorithm.
Because of the enormous number of data, ordinary user-based algorithm is limited in terms of time. So, in the work by Zhao and Shang, 30 partitioning users into two groups according to some specific principles and running the system on a cloud computing platform, namely, Hadoop, leads to linear speedup of CF to solve the scalability problem. And in order to alleviate the sparsity problem, graph-based algorithm is proposed in Chen and colleagues.31,32 The authors treat shopping transactions as graphs and incorporate link prediction methods33–35 to make precise recommendation. And six linkage measures are employed to speculate about the potential for a new link to appear, which means the possible transaction in recommendation. Social network also becomes a key factor implemented in recommended system recently.36,37 Influence from friends, items’ general acceptance, and other user behavior data 38 are taken into filtering algorithm. Y Wang et al. 39 proposed trust-based probabilistic recommendation model for social networks. In their work, trust of products consisting of two aspects, reputation and purchase frequencies, is introduced to prevent malicious products. In terms of trust, there are lots of approach to measure the trust degree in social networks. Based on game theory, a trust measurement in social networks is proposed, 40 where trust degree is obtained from three aspects, service reliability, feedback effectiveness, and recommendation credibility, representing the reliability of the service provided by nodes in network, the trustworthiness of the feedback made by nodes, and the trustworthiness of the recommendation given by nodes, respectively. Just as the similarity among users, trust among users in social network is also considered as a useful factor which leads to a more robust recommendation.41–45 And in most e-commerce websites, a user can explicitly show his view toward others, which may make it more convenient for us to collect the trust information among users.
Compared with user-based algorithm, item-based algorithm produces precise recommendations more quickly, even facing large-scale problem.29,46 B Sarwar et al. analyzed the user-item matrix to figure out the potential relationship among items. Item-item correlation and cosine similarities are, respectively, considered as the similarity between two items. Then, weighted-sum and regression model are used to combine these two kinds of similarities to obtain the final recommendation. However, the result shows an improvement of the performance of recommending compared with user-based algorithm. 29 Considering a common problem faced when computing the item relationship that how to define and compute the weight of a user, a PageRank-based user ranking approach namely userrank is proposed to value the weight of each user in networks. 47 Applying userrank into inferring the potential relationship among items would contribute to a more accurate result.
In order to overcome the sparsity problem inherent to rating data, combination of the user-based and item-based filtering algorithm by confusing similarity is introduced to address this problem based on generative probabilistic framework.48,49
In addition to these approaches discussed above, there are still many works taking about optimization on CF algorithm. However, there still exists a few problems such as cold start problem and computational speed. Our work absorbs the essence of various algorithms and makes adjustments to each part to exploit the correspond advantages to enhance the performance of recommendation.
System model and problem statements
User neighborhood model
The user neighborhood model adopted is shown in Figure 1. The whole network is a community. Every user registering in the shopping website is considered as a resident. What’ more, everyone in this community could develop his own circle of friends which can also be considered as neighbors in K Nearest Neighbors. 50

Simple illustration of user neighborhood model.
In this model, there are three criteria: similarity, reputation, and explicit local trust between two users. And taking these three factors into consideration, the weighted sum of these three criteria is adopted to select neighbors so that the users selected would have high similarity, reputation, and local trust. And in order to obtain these three criteria, the shopping records and trust records are in need. Users’ shopping data contain the users’ id, items’ id, and ratings given by users. And trust data consist of three parts: a rating on a certain user, which means trust degree, and id numbers of these two users. And historical records stored in databases can be provided for recommending.
In Figure 1, four users are selected based on a recommender algorithm to be neighbors of a target user who is recommended for. These users have high similarity with target user and have a good reputation in the community and are trusted by the target user. Then, e-commerce websites could recommend items for target user based on these four users. The greater the users’ reference value, the more profitable businesses.
Item neighborhood model
To incorporate item-based algorithm into user-based algorithm, an item neighborhood model is also adopted. And for convenience of demonstration, several electronic items are used as examples in Figure 2.

Simple illustration of item neighborhood model.
Different from user neighborhood model, items with just high similarity would become neighbors of a target item. Consequently, the criterion is just similarity between two items. And the similarity between each pair of items would be computed based on historical shopping data in databases of those users who have bought them. The records also contain users’ id, items’ id, and rating values. And in Figure 2, four items are selected as the neighbors of target item, which means they are highly similar to target item. So, when e-commerce websites recommend some items to a user who have bought the target item, neighbors of this item would be chosen to be recommended. Therefore, the more similar items are to the target item, the better the recommender system would perform.
Evaluation model
In the evaluation model, two criteria are introduced so that the performance of each method can be assessed accurately. One of them is MAE, and the other is RMSE.
MAE is a quantity used to measure how close forecasts are to the actual ratings given by users. This criterion denotes the overall performance of each method. The MAE is given by
where
RMSE, which stands for the standard deviation, is a popular mean to qualify the difference between predicted value and actual value. To some extent, it also indicates the stability of the algorithm. Therefore, it is reasonable to be adopted in the evaluation model. The definition of RMSE is
However, the values of these two criteria do not always have a positive correlation. Perhaps, a collaborative algorithm would perform well in the MAE but have little weakness in the RMSE, or vice versa. Consequently, a fixed criterion is determined that when the disadvantage in the RMSE is not obvious, a better RMSE approach is accepted. Because it is obviously that a perfect recommendation is better than lots of ok recommendations for users.
Problem statements
In order to judge the performance of the filtering recommender algorithm, predicting plenty of users’ rating on items and comparing the predicted ratings with the actual ratings based on the historical databases by the evaluation model are chosen as the evaluation method.
As a consequence, the specific problem can be stated as an optimization problem. That is to search the minimum the MAE and the RMSE, which are put forward in the evaluation model. And it can be expressed as
The design of ICFR approach
Taking into account the equivalent importance of both users and items in recommendation and the shortcomings of simply considering these factors, we come up with an approach to enhance the performance of the prediction by combining user-based prediction and item-based prediction. In addition, user-based filtering algorithm is performed with not only similarity but also two kinds of trust. On one hand, a user has a certain reputation (i.e. global trust), which may be a good reputation or a bad reputation, in the network. The reputation would influent the acceptance of other users for his recommendations. On the other hand, just like what we do in real life, everyone prefers to accept the opinions from his own friend instead of strangers. So, the explicit trust degree (i.e. local trust), trust or distrust, is adopted in our work, which we could obtain directly from our dataset. The more criteria are adopted in selecting neighbors, the more precise the recommendation would be. With this new idea, we could overcome lots of shortcomings which either user-based prediction or item-based prediction would encounter.
On one hand, for example, when adopting user-based algorithm to make prediction in those e-commerce websites with millions of users and items, it is hard to make real-time prediction because the user-based approach has to search a great number of neighbors. Compared with the limitation faced by user-based prediction, item-based prediction would provide the pretreatment finished offline because those items in website are relatively stable and it is unnecessary to update the similarity table frequently so that it can make forecast as fast as possible even with millions of users and items. What’s more, it is difficult to provide a precise prediction for a newly registering user, who does not have adequate shopping records, due to the case that there are too little data to find enough and high-quality neighbors and consequently the precision is pulled down. In terms of item-based prediction, because the pretreatment has been finished offline, we can make prediction on other items based on the already calculated similarity as long as the newly registering user makes few transactions.
On the other hand, incorporating the user-based prediction into item-based prediction also makes it more stable and universally applicable. For example, the item-based approach cannot make a prediction on a new item without enough rating records used to update the similarity table, while the user-based prediction can figure out the predicted rating on this new item for someone once his neighbors buy it. In addition, there is also a problem we would face. It is typical to find that the number of items available online is larger than the size of users. It would lead to an issue that there are no adequate records to find those users who rate both item
After considering these strengths and weaknesses talked above, we deem it to be a reasonable approach to maximize the advantages of both methods and then put forward several tasks which are going to be completed according to several steps shown in Figure 3.

The overall idea of our algorithm.
At first, user-based algorithm is divided into three specific tasks. The first one is to compute the similarity between each pair of users. The second task is to compute the global trust of every user in the network. And final task is to compute the explicit local trust between each pair of users. After completing these three tasks, neighbors of users are selected.
Then, the main task in item-based algorithm is to compute the similarity between each pair of items. And neighbors of items could also be selected then.
Finally, the ICFR approach is performed based on these two neighborhoods.
User-based prediction for ICFR approach
First, the user-based prediction is going to be established and the key problem is how to select neighbors for a user. What is the simplest and most common factor implemented to choose the neighbors of a user is user similarity, which is calculated through Pearson coefficient and the definition is defined as
Based on what is discussed above, a more concrete algorithm is put forward in Algorithm 1.
In this algorithm, the similarity between different users is obtained so that neighbors who have the similar interests with each other could be selected. And this is the first step in ICFR approach. Then, social trust is integrated into proposed approach to select those users with both high similarity and trust degree.
Improved user-based prediction
In order to improve the stability of user-based prediction, trust among users is proposed to ICFR approach. The reason a co-prediction combining trust with similarity is proposed in this work is that we have observed that those users who are more incredible are not exactly those who are more similar to other users. In terms of trust, there are two kinds of trust supposed to be obtained.
One of them is global trust, which is exactly the reputation of a user in a network. It means how much other users in the same network trust this user. The more he is trusted, the more reliable and acceptable his opinions and recommendations would be. If a user with good reputation is selected as a neighbor, the recommendation based on his behaviors would perform better than before. Therefore, global trust is integrated into the process of selecting neighbors in ICFR approach.
In a network, a user’s global trust could be positive or negative, which means his reputation is good or bad. As is known to all, everyone in the network would be rated by others. And based on the ratings toward a user, the reputation of him could be obtained.
As shown in Figure 4, in-degree means that the user is trusted or distrusted by others. The red arrow is trust (i.e., a positive in-degree), while the blue arrow is distrust (i.e., a negative in-degree). And
where

An example of network.
An example of computation of global trust.
After introducing the general process above, a concrete algorithm is proposed in Algorithm 2.
Another type of trust is named local trust, which is explicitly given by users. It represents whether a user trusts another user or not. If users in a user’s neighborhood are those who he trusts, his shopping choice would be influent by his neighbors obviously. But if users in his neighborhood are distrusted by him, the shopping behaviors of neighbors would make no sense. Therefore, taking explicit local trust into consideration in ICFR approach could improve the quality of neighbors and thereby enhance the performance of recommendation. And the concrete procedure is shown in Algorithm 3.
Integrating these three factors mentioned above, the neighborhood obtained would be more valuable, which leads to the enhancement of user-based algorithm. Based on the shopping records of selected neighbors, the predicted rating for a user
And the concrete algorithm is in Algorithm 4.
Compared with previous common approach to select neighbors simply by similarity between users, ICFR approach integrates two kinds of social trust into selection, which is more realistic and would lead to a more accurate recommendation.
Item-based prediction with item similarity
Second, the item-based prediction is figured out as following steps.
The main idea of item-based is to using the similarity between items which is approximate to user similarity. Different from user similarity, cosine similarity is used to calculate the resemblance among items. The initial similarity is defined as
On having the data of similarity between a specific item and other items, we can make the prediction based on neighbors of the item. Similar to the approach in section “User neighborhood model,’ the equation is defined as
Item
And the concrete algorithm to make an item-based prediction is shown in Algorithm 5.
Integrate two different prediction
On the basis of working out the concrete process of user-based prediction and item-based prediction, we can obtain the final process of combining the results of these two steps by Algorithm 6.
Performance analysis and experimental results
After making an explicit introduction of proposed ICFR approaches which combine the user-based prediction and the item-based prediction, a series of experiments are conducted to evaluate the ICFR approach’s performance. These experiments take into these four elements mentioned above. And they were taken in one by one to figure out their combined effect. For the sake of proof of correctness and generality of the approach, the Epinions dataset is adopted, which is a relatively ample dataset in contrast to others available online. There are 132,000 users who issued 841,372 statement toward other users including 717,667 trusts and 123,705 distrusts in a website called Epinions in which users can make comments on items, and other users can make feedbacks on previous reviews and show their trust or distrust on other users based on reviews. Besides, there are also 13,668,319 feedbacks on 1,560,144 reviews. Consequently, it is a suitable dataset to conduct experiments on. For the sake of objectivity, the experiments are performed three times, and we stochastically sample 30,000, 40,000, and 50,000 users’ rating records, respectively. In addition, some data are randomly removed from the chosen dataset randomly as the test dataset. Then, the k-NN algorithm is performed based on the left data called training dataset. Finally, validation of the ICFR approach runs on the test dataset several times.
In the following experiments, we make prediction for 100 users at first, and we would do forecasting for more users in the following trials to prove the stability and wide validity of our method. The results shown below demonstrate that the ICFR approach is effective and steady.
In the experiments,
Therefore, it is wise and necessary to determine a threshold
Assessments of user-based prediction
As is known, user-based prediction is exactly to predicting with user neighborhood. In order to figure out the most well-performed way to predict by user neighborhood, which is determined by the combination of user similarity, global trust, and local trust, a series of experiments to demonstrate that incorporate the two factors (i.e. global trust and local trust) with user similarity would improve the precision are performed, and then find out the optimal weight of each factor and show the performance of prediction through user neighborhood. In addition, the size of neighborhood
At first, some preliminary experiments are conducted to make a simple proof on the opinion that combination of trust and user similarity would make a difference. In these trials, we assign the proportions to 0.45:0.3:0.25, 0.4:0.35:0.25, and 0.375:0.3:0.325. And the reason why the specific proportions are assigned is that we think these three criteria are of the same importance to the selection of neighborhood and similar weight of each factor is considered to be appropriate. Compared with the user-based prediction with user similarity simply (i.e. the set with proportion of 1:0:0 in Figure 5), it is concluded that the combination would enhance the overall performance. Moreover, in order to figure out a more precise proportion to maximize the strength, we do more experiments in following steps.

Proposed ICFR approach’s general performance compared with approach with similarity.
Performance of combining user similarity and global trust
The first experiment is conducted to proof that it is effective to combine user similarity and global trust to determine the user neighborhood which is the basis of prediction. And we adopt the linear combination to mix two elements and we do prediction based on the weighted sum of them.
Aiming at working out an optimal proportion of each one, several sets of values are chosen to compare with each other. The results are shown in Figures 6 and 7, where the “1:0” means that global trust is not taken into account, and other labels denote the ratio of user similarity and global trust. As is depicted in figures below, when the ratio is 6:4, we get optimal results that MAE reduces from 0.58 to 0.522 and the RMSE reduces from 0.75 to 0.6578. As a consequence, we can draw a conclusion that combining user similarity and global trust with the proportion of 6:4 could obtain an effective result compared with other sets.

The MAE of user-based prediction with various proportion of similarity and global trust.

The RMSE of user-based prediction with various proportion of similarity and global trust.
Performance of combining user similarity and local trust
Second, an experiment is conducted to demonstrate the effectiveness of combination of user similarity and local trust. As described before, a series of various proportions between user similarity and local trust are set.
The results are depicted in Figures 8 and 9. We can find that aggregating the effect of user similarity and local trust would also lead to a better result. Besides, it is obviously shown in figures that when the weight of user similarity is equal to the weight of local trust, the MAE reduces to 0.555 and the RMSE reduces to 0.743, which are both the lowest value compared with others. Consequently, we assign the weights of both user similarity and global trust to 0.5 for further study.

The MAE of user-based prediction with various proportion of similarity and local trust.

The RMSE of user-based prediction with various proportion of similarity and local trust.
Performance of incorporating global and local trust with user similarity
After finishing experiments mentioned above, we know that global and local trust along with user similarity would enhance the performance of the prediction respectively. Therefore, we would like to conduct experiments on integrating user similarity and global and local trust all together.
Because of the previous trials’ results, we perform experiments with the proportion of user similarity and global and local trust being 0.375:0.25:0.375.
The result is drawn in Figures 10 and 11, where the “similarity” means the user-based prediction without any kind of trust and “integrated” means integrating all trusts. We can learn that the MAE is reduced from 0.58 to 0.51, and the RMSE is reduced to 0.701 which is better than prediction just by user similarity but worse than prediction by user similarity and global trust. This result denotes that combining all factors could improve the overall performance, but it would lead to a bit instability to some extent, which may be caused by the sparseness of the data, especially the lack of trust rating data. It is because that without enough rating records for a user, it is hard to find many users who are trusted by him, which would lead to the inaccuracy when ICFR makes recommendation according to the shopping records of those users. What’s more, there are only two rating values used to denote the degree of trust and this would also impact the performance because the degree of trust between users is not exactly shown by two values. So, we would like to overcome this problem as much as possible with the following proposals we come up with.

The MAE of different kinds of user-based prediction.

The RMSE of different kinds of user-based prediction.
Assessments of combination user-based prediction and item-based prediction
After determining the weight of each factor which influences the selection of a user’s neighbors, we incorporate the user-based prediction with the item-based prediction, which is exactly making forecast utilizing item neighborhood, to optimize the result.
In this experiment, we compared the performance of four different methods. The first is the approach with just similarity, and then the result worked out by integrating user similarity and global and local trust above is taken in. The third is the prediction just by item neighbors which is selected through item similarity proposed in section “The design of ICFR approach.” However, due to the limitation of rating records, we simply select those whose similarity is not negative instead of setting a specific threshold

The MAE of different kinds of prediction.

The RMSE of different kinds of prediction.
Just like what we assume, the MAE decreases to 0.46 which is the most optimal value until now. And the RMSE reduces from 0.701 to 0.64, which denotes the ICFR approach performing better than both user-based and item-based filtering algorithms. But we would like to search a more well-performed approach, so we change the proportion which may help each element to maximize its advantage as much as possible. Besides, as is mentioned above, the value of
Due to the limited data, the number of neighbors is assigned to 10, 15, and 20 one by one and five different proportions for comparison are chosen. The results are shown in Figures 14 and 15.

The MAE with different

The RMSE with different
As is depicted in figures below, we learn that when the size of neighborhood is 10 and the proportion of user-based prediction and item-based prediction is about 3:7, the MAE and the RMSE are both less than others’. Therefore, further experiments are conducted based on these parameters.
Aiming at confirming the effectiveness of proposed approach, more users are selected randomly to be recommended for, and the performance of ICFR approach is compared with other algorithms’ performance.
With the data shown in Figures 16 and 17, it is noticed that the item-based prediction makes the worst forecast compared with others because the MAE and the RMSE is around 0.70 and the RMSE is around 1.40, which is consistent with what is presumed in section “The design of ICFR approach.” And the prediction combined other two kinds perform best and stably, whose MAE and RMSE is around 0.42 and 0.6. From these two figures, it can be concluded that the proposed ICFR approach is effective and stable.

The MAE with more users to be recommended for.

The RMSE with more users to be recommended for.
Further experiments with 40,000 users
As is described in sections above, we rudimentarily demonstrate the feasibility of the proposed ICFR approach that integrate both user-based prediction and item-based prediction to make recommendation. In order to make the proof more reliable, further experiments are conducted with other records sampled from our dataset.
First, another 40,000 users’ rating records are randomly selected to perform experiments on. On the basis of previous experiments, we assign
Different from the results shown in Figures 18 and 19, the MAE value of all sets are very close changing between 0.347 and 0.367, but the distinction of the RMSE among the sets is a little big descending from 0.796 to 0.605. Therefore, we still think assigning the proportion of user-based forecast and item-based forecast to 3:7 can enhance the performance, though the MAE is not the best among these sets, because it has an obvious advantage on the RMSE, which denote a much more stable prediction for users.

The MAE with 400 users to be recommended for on another 40 k dataset.

The RMSE with 400 users to be recommended for on another 40 k dataset.
Then, just like what we did before, the number of rated users is changed. However, in this experiment, we set up three sets with 400, 500, and 600 users so as to prove the effectiveness with more users.
Figures 20 and 21 below have shown the performance of three types of prediction for different number of users. Generally speaking, proposed ICFR approach does well in forecast denoted by the MAE. But, when we focus on Figure 21, we notice that the item-based prediction is not very stable denoted by the RMSE which we consider as an obstacle causing proposed approach’s RMSE is not improved very well. For examining the approach and verifying our assumption, more experiments are in need.

The MAE with more users to be recommended.

The RMSE with more users to be recommended.
Further experiments with 50,000 users
We continue to perform the prediction for 400 users with 50,000 users’ rating records sampled randomly. In this experiment, our suggested proportion (i.e. 3:7) does well not only in the MAE whose value is 0.42 shown in Figure 22 but also in the RMSE whose value is 0.58 shown in Figure 23, which strengthens our confidence.

The MAE with 400 users to be recommended for on another 50 k dataset.

The RMSE with 400 users to be recommended for on another 50 k dataset.
Then, we still assign three sets with another randomly selected 400, 500, and 600 users for prediction. The results shown in Figure 24 denote that the ICFR approach enhances the precision in contrast to user-based prediction and item-based prediction. But the results shown in Figure 25 are higher than user-based prediction, which means lacking stability. We can know that the RMSE of item-based prediction is up to 1.59, which is the maximum value in the series of our experiments. And in ICFR approach, we incorporate those two types of prediction by linear combination. Therefore, a higher RMSE of item-based prediction is bound to cause the increase of our proposed method’s RMSE. As we mentioned in section “The design of ICFR approach,” it is sparsity that causes the instability of item-based prediction. However, further analysis toward our dataset is conducted. As is listed in before, 132,000 users with 13,668,319 feedbacks on reviews exist in this dataset. But there are 1,560,144 reviews (a piece of review corresponds to an item) available, which means that there are less than 10 pieces of feedbacks on an item in average. So, it is hard to find enough common reviews provided by different users between two items, which causes difficulty to select enough and high-quality neighbors of an item which finally cause the loss of precision of prediction. Anyway, this is considered to be a disadvantage in the ICFR approach.

The MAE with more users to be recommended with another test dataset.

The RMSE with more users to be recommended with another test dataset.
Comparison with the state-of-the-art recommendation algorithm
In addition to the user-based and item-based filtering algorithm, there is also a kind of recommendation algorithm based on matrix factorization. There are two kinds of optimization algorithms. One is alternate least squares (ALS) and it is usually implemented with Spark, which is designed for large-scale data application. Another is stochastic gradient descent (SGD) and we considered it to be more appropriate for us to apply in the matrix factorization because of the limitation of our hardware. Therefore, SGD algorithm is adopted to make a recommendation based on matrix factorization for comparison. What’s more, due to the limitation of computation speed, we reduce the size of the data and predict for 100 users. And the performances of our proposed ICFR approach and recommendation based on SGD algorithm are shown in Figure 26, respectively.

The MAE and RMSE of ICFR approach and SGD algorithm.
It could be concluded from the figure that our proposed ICFR approach performs better than SGD in MAE and they perform similar to each other in RMSE. Furthermore, we think that it is because the number of data is so limited that recommendation based on matrix factorization with SGD performs badly. But for ICFR approach, it also takes into the trust records so that it could make recommendations based on more information. What’s more, when conducting experiments based on SGD algorithm, we notice that SGD algorithm spends longer time in recommendation so that it is usually used offline and runs on those platforms designed for large-scale data application. Therefore, our proposed ICFR approach is considered to be a more suitable recommendation algorithm when facing limited data and devices with low computing ability.
Conclusion and future work
User-based and item-based filtering algorithms are both widely applied in recommender system because of their ease to implement. But just like what we discuss above, both of them raise a few concerns that would lead to a less accurate result. In order to address these shortcomings, some works have been done to combine the item-based and user-based algorithms to improve the performance of recommendation. What’s more, in our work, the trust-based filtering algorithm is integrated into recommendation. In the social networks, two aspects of trust are taken into consideration, the explicit local trust and global trust, which we could get directly from our dataset or obtain through calculating. Consequently, when selecting a neighborhood of a specific user, a combined criterion (i.e. similarity, global trust, and explicit local trust) is adopted rather than simply selecting by similarity. And finally, an optimized user-based algorithm and an item-based algorithm are combined by weighted sum to obtain recommendation. In our work, extensive experiments are conducted in certain order to verify the effectiveness of our proposed strategies. The outcome proves ICFR approach’s correctness and feasibility. In the future, we would like to perform more experiments on different dataset to make a further validation. In addition, due to the use of K-NN in user-based algorithm, we figure out and recommend an optimized
Despite the satisfactory performance in most cases, there is still an obstacle. When dealing with a large-scale item set and a relatively small user set, ICFR approach may lack of stability like we discussed above. Therefore, in the further research, our team would analyze more potential factors which may influence the result of recommendation, such as the bought frequency of a certain item in a certain period, which is considered as how hot and popular the item is. And assigning different size of neighborhood
