Sage Journals: Discover world-class research

Abstract

Based on the research of social network and the Internet of Things, a new research topic in the field of Internet of Things, Social Internet of Things is gradually formed. The SIoT applies the research results of SIoT from different aspects of the Internet of Things, and solves the specific problems in the research of Internet of Things, which brings new opportunities for the development of the Internet of Things. With the development of the Internet of Things technology, in the spatial social Internet of Things structure, user information includes sensitive attributes and non-sensitive attribute information. This information can be inferred from public user information to infer the information of the private user and even speculate on sensitive attributes. This article proposes an information speculation method based on the core users of spatial social networks, and estimates the non-core user information through the core user public information. First, the user’s spatial social network is divided into communities, and the core nodes of the community in the spatial social network are calculated by PageRank algorithm and the convergence of the algorithm is proved. Then, through the public information of the core nodes divided by the community in the space social network, the private information of relevant users to these core nodes can be speculated. Finally, by experimental analyzing the community structures of SIoT (Social Internet of Things) like Twitter, Sina Weibo, ER random networks, and NW small-world network, and making 5%, 10%, 15%, 20% information anonymous respectively in these four kinds of networks, we can analyze their clustering coefficient, Q-modularity and properties. Finally, the key node information of the four spatial social structures is speculated to analyze the effectiveness of the proposed method. Compared with the non-core speculation method, this method has advantages in speculative information integrity and time.

Keywords

Internet of Things Social Internet of Things sensitive attributes speculation community partitioning PageRank algorithm

Introduction

With the continuous development of the Internet, more and more devices and daily necessities can be networked, and real-time network communication with their owners provides the foundation for the birth of the Social Internet of Things (SIoT). The SIoT assigns the characteristics and principles of social networks to the Internet of Things, creates social interaction between objects, and then uses some of the attributes of social networks to solve practical problems in the Internet of Things. Social Internet of Things is a combination of social networks and the Internet of Things. It uses Internet of Things technology, including the perception layer of radio frequency identification (RFID) and sensors, and forms a network of objects on the basis of topological structures of upper applied social networks. At the same time, along with the combination of Internet of Things and social networks, a new batch of technologies is also used. The integration of item data of the perception layer can be realized, and the items and data can be networked in the form of a social network.

With the rapid development of SIoT, an increasing number of users are recording their lives and communicating with friends through SIoT. In order to better communicate and obtain information of interest, users can add friends by searching relevant educational information, interests and so on in the SIoT. As these users share the same information and interests, they are more likely to make friends with each other. In SIoT, the network structure formed by a user’s friends and the relationship between these friends is called the network of relationships. In the network of relationships, the user’s friends can be divided into different groups according to closeness. Members in the same group are closer than those in different groups. All these groups are the communities in the network of relationships.¹ Studies have shown that users of current online SIoT lack attention to personal privacy, and some only show some concern, but the privacy is speculated to be underestimated, in order to understand the user’s attention to their privacy and individuals in the SIoT. The way privacy is leaked, researchers have conducted some observations and research on user behavior, cognition, policy, and law.^2,3 Among them, a survey on Carnegie Mellon University students carried on the famous social networking site – Facebook, showed that 80% of users use personal photos as their own avatars, while only 2% of them have turned on the privacy settings for accounts. According to other surveys, people’s thoughts about their privacy are based on their own environment and whether they have similar characteristics to the target group they communicate. In this sense, if the user thinks that he and the group he communicates have common attributes (such as hobbies, interests, etc.), or have certain characteristics, these are points that cause themselves to be of interest to the surrounding people, causing great effect.

Related work

Reasoning attack (inference attack)^4–6 is the study of an indirect disclosure of personal information through social relations approach. When the user chooses to hide the private information, the user’s private information is inferred by the information related to the user. Assume that a user’s check-in location information is hidden, but the attacker guesses the user’s location through the user’s public friend information, in order to achieve the purpose of obtaining information. The Literatures^5,6 uses Bayesian networks to achieve this kind of reasoning. They study the factors that affect the accuracy of reasoning. It is more conducive to the protection of privacy through plausible privacy information. The Literature⁴ also studies that online group or group information that users join may also be used to infer sensitive hidden attributes, and the list of members of these groups is public. Zheleva and Getoor⁷ have speculated the users’ private information by using the users’ friends and their groups, as well as made comparison and contrast analysis with different attack models. Finally, both of them concluded that the results obtained by using group relations are more accurate. In the experiment both authors have assumed that the information of 50% users’ friends are in the open, while the experimental data used in this article completely real from the SIoT. In the literature,⁸ through modified Naive Bayes method, the writer of this paper has speculated users’ other information like their political standpoints by analyzing these users’ personal information and their relationships with their friends. Besides, this author has also made contrast with the results obtained from using personal information only, using relationships with friends only, and combing both respectively. Dey and Tang together with others 9 have speculated the users’ age and so on by applying different methods based on the users’ disclosed information at different levels. These methods are: (1) the school time disclosed by users themselves; (2) the age of users’ list friends disclosed by users themselves; (3) retrieving users’ friends for those users disclosing neither their school time nor their friends list, and then conducting step 2. Author concludes an iterative algorithm, not only the use of the user’s friend’s information but also friends of friends and friendships layer 3 to infer the user’s age. In Dey et al.,⁹ in the community classification structure, through sensitive link relationships, information can be published on third-party platforms, and information in the privacy of the community can be found. In other applications,¹⁰ the user’s medical data privacy information is protected, and multiple layers are divided into encryption controls. The application of the community^11–14 is to use the classification of the community to carry out related research work.

Although these actions have employed the public information and friends disclosed by users to speculate users’ private information, the common point is that they all use users’ friends networks as a whole to speculate users’ private information. Unlike them, this paper is going to speculate users’ information by employing the information disclosed by users at critical nodes. The SIoT key node spread is the central point of contact a member of the user, the amount of information, if the community key node information disclosure, the information to other nodes can be presumed by the amount of information; if the key node information is not disclosed, through times key node or nodes connecting node key information speculate key node information through the key nodes infer other non-public information to speculate node private information. The schematic diagram of the structural model based on SIoT proposed in this article is shown in Figure 1.

Figure 1.

Structural model diagram of SIoT.

The main work is as follows: (1) defining critical nodes and convergence proof; (2) defining community division; (3) speculating users’ other private information according to the critical node private information already disclosed; and (4) speculating users’ other private critical node information. The four tasks proposed in this year are all described by the speculation of privacy location information. In this article, the privacy node is speculated, and it is necessary to determine the key node as the first step. It is necessary to prove the relevant nodes and prove the convergence. The second step is divided by the community, and key nodes are identified in the community. The purpose of community partitioning is to reduce the amount of computation and improve the efficiency of key nodes. The third step is to identify key nodes to speculate on non-critical nodes and to easily speculate on non-critical nodes. The fourth step is to infer the key nodes through non-critical nodes, and speculate that the key nodes are determined by a relatively large number of non-critical nodes.

SIoT and related proof

SIoT structure

With reference to the three-layer architecture model of the Internet of Things, the application layer of the Internet of Things is expanded, and a hierarchical architecture model of the Social Internet of Things is proposed, as shown in Figure 2. Perceptual layer: the perceptual layer solves the problem of acquiring data (including video, logo, various physical quantities, audio, etc.) in the physical world and the human world. The perception layer is deployed in the environment. Wireless sensors and other sensing devices use bluetooth, infrared, industrial fieldbus, and other transmission methods to upload the collected sensing data to the network layer. These include human machine interface, object interface, and service application programming interface (API). They are all people, machines, and services that sense each other and can sense data in time.

Figure 2.

SIoT structural model.

The network layer is mainly responsible for information transmission, and the communication network is used in the physical network, including object analysis, owner control, service discover, integrity management, ID, service combination, and the like.

The base layer implements storage management of data and related descriptors, records description information and social relationships of nodes, and activity information of objects in the real world and virtual world. These include metadata, ontology, and semantics, which implement basic relationships and logical relationships for related services.

The component layer includes functions such as relationship management, service discovery, service composition, and trust management. Including cellular, network, Wlan, and so on is a related combination of network services.

The application layer supports project-based social behavior to develop interactive applications that target people, projects, and third-party services.

SIoT key node definitions

The most ideal core node of SIoT, that is, the node that is considered to be connected to all nodes in the network is the most important core node. For example, the central node in the star network is obviously the most important “core node” in the network. The core node is an important guarantee for its stability in the entire network. If there is a security problem at the core node, the entire network is not secure. The privacy of the entire network is speculated by speculating on the privacy of the core node. However, in the SIoT is a sparse matrix, little connection between the various communities, and a large amount of information exchange within the community, as shown in Figure 3.

Figure 3.

SIoT key node model.

Definition 1

Provided a node is one of critical nodes of a SIoT, it is also the critical node of a community; and vice versa. The critical nodes set is represented by Ps, the critical nodes set in a community is represented by $P_{i}^{K}$ , and the node is $P_{i}$

$P_{i} \in P_{i}^{K} \Leftrightarrow P_{i} \in P_{S}$ (1)

“k” is the number of community, “i ” is the number of node, usually, there is at least one critical node in each community.

PageRank algorithm based on key nodes

Widely, eigenvector centrality and its variants applications, for example, in the field of the most famous Page Rank PageRank algorithm¹⁵ is the core of the algorithm Google search engine. The initial time, giving each node (web page) the same PR value, and then we iterate it. At each step, we equally distribute the current PR value to all the nodes it points to. The new PR value of each node is the total PR values it has obtained. Thereby, the PR of Pi at time t is

$P R_{i} (t) = \sum_{j = 1}^{n} a_{ij} \frac{P R_{j} (t - 1)}{k_{j}^{out}}$ (2)

Among them, $k_{j}^{out}$ is the out-degree of node $P_{i}, a_{ij}$ is the initial matrix of SIoT, we iterate them until the PR value of each node is stable.

PageRank algorithm convergence proof

As the PageRank algorithm has already given a long time, it’s more likely to obtain a stable PR value after times of iterating. But, how many times of iterating is the relevant precision required by PageRank, and $ε$ stands for the end of algorithm

$P R_{i} (t) - P R_{i} (t - 1) < ε$ (3)

Theoretically $ε$ is the user to set high requirements when $ε$ is small, while when $i \to \infty$ , $ε = 0$ . The PageRank algorithm literature did not give relevant convergence proof, merely to illustrate the convergence of the iterative process, not to be explained by the relevant mathematical methods. The author gives the following relevant proof.

Definition 2

If the definition of a matrix of values of all columns is 1, then the matrix is a matrix of convergence

$\begin{matrix} proof : let P R_{t} = [P R_{1} (t), P R_{2} (t), . . ., P R_{n} (t)] \\ M = (\begin{matrix} \frac{a_{11}}{k_{1}^{out}} & \frac{a_{12}}{k_{2}^{out}} & \dots & \frac{a_{1 n}}{k_{n}^{out}} \\ \frac{a_{21}}{k_{1}^{out}} & \frac{a_{22}}{k_{2}^{out}} & \dots & \frac{a_{2 n}}{k_{n}^{out}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{a_{n 1}}{k_{1}^{out}} & \frac{a_{n 2}}{k_{2}^{out}} & \dots & \frac{a_{nn}}{k_{n}^{out}} \end{matrix}) \\ PR (t) = M * PR (t - 1) \Rightarrow M^{t} * PR (1) \end{matrix}$ (4)

where $\sum_{j = 1}^{n} P R_{i} (t) = 1$ is the proof of formula (3) in $M^{t}$ to converge. The initial value of PR is the weight given to each node. Generally, the weight of each node can be given the same weight without considering the preference $PR (1) = [1 / n, 1 / n, . . ., 1 / n]'$

$\begin{matrix} let M = (\begin{matrix} m_{11} & m_{12} & \dots & m_{1 n} \\ m_{21} & m_{22} & \dots & m_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ m_{n 1} & m_{n 2} & \dots & m_{nd} \end{matrix}), M eigenvaluesis \\ \land_{=} (\begin{matrix} λ_{1} & 0 & \dots 0 \\ 0 & λ_{2} & \dots 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & λ_{n} \end{matrix}) \\ M = P * Λ * P^{- 1} \Rightarrow M^{t} = P * Λ^{t} * P^{- 1} \end{matrix}$ (5)

where $λ_{i} ε \sum_{i = 1}^{n} D_{i} (M), R_{i} = \sum_{i = 1 andj = 1}^{n} | m_{ij} |$ . In the matrix M, $\sum_{j = 1}^{n} M_{ji} = 1, m_{ii} = 0$ . According to equation (5) gives the disk N of M

$D_{i} (M) = λ_{i} | | λ_{i} | ⩽ 1, i = 1, 2, . . ., n$ (6)

M’of N eigenvalues fall multiplexing N disks plane and focus, as shown in Figure 4.

Figure 4.

N eigenvalues and disk set.

The $λ_{i}$ is the blue disk of Figure 4, and you can see M’ of N eigenvalues $| λ_{i} | ⩽ 1$ , according to equation (4)

$M^{t} = P * (\begin{matrix} λ_{1}^{t} & 0 & \dots 0 \\ 0 & λ_{2}^{t} & \dots 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & λ_{n}^{t} \end{matrix}) * P^{- 1}$

when $t \to \infty, λ_{i}^{t} \in {- 1, 0, 1}$ , so

$\lim_{t \to \infty} = P * (\begin{matrix} \lim_{t \to \infty} λ_{1}^{t} & 0 & \dots 0 \\ 0 & \lim_{t \to \infty} λ_{2}^{t} & \dots 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & \lim_{t \to \infty} λ_{n}^{t} \end{matrix})$

$* P^{- 1} \Rightarrow \lim_{t \to \infty} M^{t} = P * (\begin{matrix} 1 & 0 & \dots 0 \\ 00 & \dots 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & - 1 \end{matrix}) * P^{- 1}$

so $\lim_{t \to \infty} M^{t}$ when the matrix is convergence.

In this article, it is determined whether the PageRank algorithm can be implemented by proving that $\lim_{t \to \infty} M^{t}$ is a convergence matrix. In the actual research work, it is realized by continuous iteration of the algorithm. If the matrix can converge, then formula (3) value can satisfy the convergence condition.

SIoT community-based division of user privacy leak

SIoT community structure definition

Community network structure refers to the group consisting of nodes and connections between nodes within the group more closely, and the connection between cluster nodes is relatively sparse. Specific defined as follows:¹⁶G is represented by graph corresponding to the network, where i is a node, the degree of the i node

$k_{i} = Σ_{j} M_{ij}$ (7)

The graph S stands for node i. Under this condition, both the node set and connection set in S are subsets of node set and connection set in G, that is the S is the subgraph of G. In this, the out-degree of node i can be divided into two parts

$K_{i} (S) = K_{i}^{in} (S) + K_{i}^{out} (S)$ (8)

If the connections of node i with internal nodes is more than that with the external nodes, the subgraph is regarded as strong community structure. Otherwise, it is the weak community structure, which is as below

Suppose that the attribute eigenvector of the community $S_{i}$ is $f_{i} = (i_{1}, i_{2}, . . ., i_{n})$ in this which, $i_{k} k \in (1, 2, . . ., n)$ presents the probability that the community attribute is divided into the k-th type. Then the attribute similarity between the association $S_{i}$ and the association $S_{j}$ is $f_{ij}$

$f_{ij} = \cos (f_{i}, f_{j}) = \frac{{f *}_{i} f_{j}}{| f_{i} | | f_{j} |} = \frac{\sum_{k = 1}^{n} i_{k} j_{k}}{\sqrt{{sum}_{k = 1}^{n} i_{k}^{2}} \sqrt{{sum}_{k = 1}^{n} j_{k}^{2}}}$ (9)

The parameter “modularity“ is the a measure specifically proposed by Newman for the community structure, which is the difference between the community structure in the network and the community structure in the random network, and its function is expressed¹⁷ as follows

$Q = \frac{1}{2 n} Σ_{ij} (m_{ij} - \frac{K_{i} * K_{j}}{2 n} δ (C_{i}, C_{j}))$ (10)

Where, K_i and K_j are the degrees of node i and j respectively, and Ci is the community to which node i belongs. n is the total number of connections in the network. If there is a connection between node i and node j $m_{ij} = 1$ , otherwise $m_{ij} = 0$ . When $C_{i} = C_{j}$ , $δ (C_{i}, C_{j}) = 1$ , otherwise it is 0. The modularity range is –1∼1, 0 indicates that there is no more community structure in the network than the random network, and the larger the Q value of modularity is, the stronger the community structure is. Community classification based on formula (9) is usually divided into relatively strong and weak communities. For particularly weak communities, there are fewer nodes that can be ignored. The number of communities is determined by the number of nodes in the social network, and the number of communities in some large social networks can be divided into multiple communities.

The Q value is a criterion for determining whether a community is strong or weak, and can have a certain impact on the community structure. When the Q value is large, it indicates that it is a strong structure community, and there are many key nodes. How to disclose the key nodes is more, and it is easier to infer other non-public nodes in the community. If the Q value is small, it means that it is a weak community structure, and there are few key nodes. It is difficult to speculate the non-public privacy information. Therefore, the size of the Q value can directly see the amount of community structure relationships and speculative information.

Community discovery has emerged many ideas and algorithms after many years of development. Among them, the main representative algorithms are spectral clustering algorithm,^18,19 hierarchical clustering algorithm,²⁰ label propagation algorithm,^21,22 and modularity optimization algorithm.^23,24 In this article, VD Blondel et al.²⁵ proposed a modularity-based community discovery algorithm. The algorithm first takes each node in the network as a community, and then moves the node to the community that maximizes modularity each time until modularity no longer increases, or only one node remains. According to the Literatures,^26,27 SIoT have the same characteristics.

Users of information are more likely to become friends, and they have strong connections, so they form communities. Conversely, in SIoT, users can be divided into different communities according to the degree of close contact between users. Users in the community have common information, such as educational experience, hobbies, and so on.

User privacy information speculations

In SIoT, a large number of users’ personal information is disclosed,²⁸ it is possible to frequency information disclosed in the user’s personal information appearing in the community by statistical one to infer within the community of users shared information, that other information is not disclosed user private information.

For user u in the SIoT, the friend relationship network composed of its friends is $G_{u} = (V_{u}, E_{u}), V_{u} = N_{u}$ namely the collection of adjacent nodes with user u, $E_{u} = {i, j | i, j \in V_{u}}$ namely the collection of the relationship between user u’s friends Community discovery algorithm can be used to divide Gu into community $C_{i} (i = 1, 2, . . ., N)$ N is the total number of communities. Then, we calculate Pubi for users in C_i who have publicized their information. Ki is the number of core nodes of public information in community C_i, and P_i is the core set of community C_i

$T_{i} = \frac{| Pu b_{i} |}{| C_{i} |} and T_{i}^{k} = \frac{| k_{i} |}{| p_{i} |}$ (11)

where $| C_{i} |$ is the total number of users in the community, $| Pu b_{i} |$ the users community information disclosure, $| P_{i} |$ the core nodules community, $| K_{i} |$ the public community core number, in order to ensure the experiment effectiveness, $T_{i}, T_{i}^{k}$ should satisfy

$T_{i} ⩾ θ and T_{i}^{k} ⩾ θ'$ (12)

where $θ, θ'$ are threshold values in the experiment, and $θ'$ is the key value in this paper. The following discussion is conducted according to two situations: if $T_{i}^{k}$ is very small, the contents of T_i public information can be inferred, as shown in Figures 5 –7. Figure 5, if the information of key nodes is hidden, and the information of connection nodes of key nodes is public, the information of key nodes can be speculated. Figure 6, if the key node information is hidden, but there are hidden nodes in the key node, there are two situations: (1) if the number of hidden nodes is limited, the key information can be directly speculated; (2) if the number of hidden nodes is large, it is impossible to directly speculate the key nodes. It can be seen from Figure 6 that A1, A3, and A4 are non-public nodes, and the key node A1 is a key node and is also non-disclosed. Therefore, the A1, A6, and A5 public information is used to infer the A1 key node information. If the community structure is a strong community, the A1 information speculates that it has strong reliability, and then A3, A4 user information is speculated again through A1. If the community structure is a weak community structure, this speculation reliability is also weak, and some information can be speculated, as shown in Figure 7, to infer details of its surrounding nodes public information, and then speculated the key node information. It can be seen from Figure 7 that A1, A3, and A4 are non-public nodes and are also key nodes. The A1 node is inferred by exposing a large number of A2, A5, and A6 nodes; the A4 node is estimated by the A8, A9, and A10 nodes. Finally, the A3 node information is estimated by A1, A7, and A8.

Figure 5.

Key node information speculate.

Figure 6.

The key nodes and other nodes Hide.

Figure 7.

Other nodes Hide speculation.

If $T_{i}^{k}$ is very large, the key information is disclosed, as shown in Figures 8 and 9. Figure 8, information disclosure of key nodes and connection of related nodes can be used to infer information of undisclosed node A3. Figure 9, there are many hidden nodes connected by key nodes. It is impossible to directly infer other hidden nodes from the key nodes and connection nodes. At this time, the content of $T_{i}$ public information is speculated.

Figure 8.

The key nodes speculate other nodes.

Figure 9.

Public key nodes and node speculation.

In this paper, if the $T_{i}$ value is large, the prediction accuracy of critical nodes can be greatly increased. When T_i value is very large and $T_{i}^{k}$ is very large, the system has high prediction accuracy. When $T_{i}$ value is very large but $T_{i}^{k}$ is very small, the key node information can be inferred through the connection node of the key node, and the value of $T_{i}^{k}$ can be increased, so as to increase the prediction accuracy of the system. When the $T_{i}$ value is very small and the $T_{i}^{k}$ is very large, the connection node can be inferred from the key node to increase the value of $T_{i}$ , so as to increase the prediction accuracy of the system. When $T_{i}$ is very small and $T_{i}^{k}$ is very small, the system cannot guarantee high prediction accuracy.

The algorithm in this paper is as follows

Step 1. Modularity is used as the community discovery algorithm of measurement to realize community segmentation;

Step 2. Use PageRank algorithm to determine the critical nodes of each community;

Step 3. Calculation of $T_{i}$ , $T_{i}^{k}$ , $θ = 0.4, and θ' = 0.5$ , and if the value $T_{i}$ is greater, $θ'$ can take a small point;

Step 4. If $T_{i} ⩾ 0.4$ and $T_{i}^{k} ⩾ 0.5$ ,

Through the key node, the information of the connection privacy node is speculated, and then other key nodes are speculated or all key nodes are speculated first, and then other nodes are speculated;

If $T_{i} ⩾ 0.4$ and $T_{i}^{k} < 0.5$ ;

By connecting nodes, suggesting that the key node information privacy, then presumably other connection nodes;

If $T_{i} < 0.4$ and $T_{i}^{k} < 0.5$ ;

Through the key nodes, suggesting that the connection node information privacy, then presumably other critical nodes.

Experimental analyses

We select the popular online SIoT Twitter and Sina Weibo as the two network topologies for this article, and the related data are all from Internet resources. At the same time, this article uses the relevant algorithm to generate two The simulation networks are ER random network and NW small world network. We assume that the edges of each network are undirected and unprivileged. The relevant topological characteristics of each network are shown in Table 1.

Table 1.

Basic structure.

Network	Node no.	Sides no.	Node ave.	Com	Coe
ER net	5000	25,348	5.06	421	0.0089
NW net	5000	35,094	7.02	217	0.5063
Twitter	1,45,942	2,03,152	1.392	1035	0.0014
Weibo	1,46,091	2,05,408	1.546	1234	0.0123

ER: Erdos-Renyi; NW: Newman and Watts; Com: community; Coe: coefficient.

Social structure analysis

In this article, 5%, 10%, 15%, and 20% of the attribute values are randomly selected as anonymous attribute sets on the above four data sets. The anonymous attribute set can be included for the data publisher’s anonymous requirements. Sensitive attribute values, irrelevant attribute values, and other high-recognition attribute values. Experiments examined the changes in social structure and attribute distribution of two data sets before and after anonymity. Through experimental analysis, the algorithm can be very good. In addition, the experiment also examines the error rate change of attribute speculation based on the node’s social degree. By comparing the data before and after anonymity, it can be seen that after anonymity, the ability of the attacker to determine the target attribute through social structure is greatly reduced. The user’s privacy properties can be protected.

The ER random network data set, NW small world network data set, Twitter data set, and Sina Weibo data set were analyzed to the degree of change in the SIoT structure of users before and after anonymity. Among them, Figures 10 –13 analyze the aggregation coefficient and Q-Modularity of the user’s social structure before and after anonymity (note: the integer part of the X-axis coordinates indicates that the proportion of the anonymous attribute set is 5%, 10%, 15%, and 20%, the fractional part indicates that the attribute correlation threshold is 0.3, 0.5, 0.7, and the original item is the initial state of the data set).

Figure 10.

Analysis of SIoT structure before and after ER network anonymity.

Figure 11.

Analysis of SIoT structure before and after NW Network anonymity.

Figure 12.

Analysis of SIoT structure before and after Twitter anonymity.

Figure 13.

Analysis of SIoT structure before and after Sina anonymity.

The above figure analyzes the relationship between the proportion of anonymous attribute sets and the selection of attribute relevance thresholds and the availability of anonymous results. Figures 10 –13 respectively shows ER random network data set, NW small world network data set, Twitter data set and Sina Weibo data set, as well as the structural analysis of the data set. It can be seen from the four graphs that as the proportion of the anonymous attribute set selected by the user is expanded, the availability of the data is gradually reduced, but for the same anonymous attribute set ratio, different threshold parameters are selected for the anonymous result. The effect is not significant. It shows that the correlation of attributes is obvious in the local area; the different thresholds are selected, and the disturbance of social structure segmentation and attribute segmentation is small.

Figures 14 –17 analyzes the distribution of the social degree distribution before and after the data set anonymity.

Figure 14.

Analysis of SIoT degrees before and after ER network anonymity.

Figure 15.

Analysis of SIoT degrees before and after NW Network anonymity.

Figure 16.

Analysis of SIoT degrees before and after Twitter anonymity.

Figure 17.

Analysis of SIoT degrees before and after Sina anonymity.

The algorithm can maintain the degree distribution characteristics well. Figures 14 –17 reflect the ER random network data set, NW small world network data set, Twitter data set, and Sina Weibo data set. Before and after the first two data sets, the trend of node social degree distribution is almost unchanged. In the latter two data sets, the previous node degree has an upward trend, and the distribution trend of social degree is also consistent. However, because the algorithm may cause the social connection of the newly generated node part to be lost when the node is split, or because there are more nodes splitting in the local area, the social connection of the node with more attributes is increased. Therefore, in the anonymous result set of the algorithm, the nodes with lower degrees and higher scores are more than the original data sets. However, it can be seen from the figure that the anonymous results of the algorithm can still faithfully reflect the node distribution trends of the data sets.

Community analysis in space SIoT

The method proposed in this article is based on the core node method of the community to realize the speculation of private information. The change in the number of communities reflects the change of the whole SIoT before and after anonymity (Notice: when the proportion of the integer part of the X-axis represents 5%, 10%, 15% and 20% of the anonymous attribute set, the fractional part represents the threshold of attribute correlation of 0.3, 0.5 and 0.7, respectively, and the original item is the initial state of the data set. In the case of different data anonymity, the number of associations will change, as shown in Figure 18.

Figure 18.

Communities number before and after anonymity in four spatial SIoT.

As can be seen from Figure 18, the number of four associations, with the comparison before and after the anonymity, has decreased. The number of ER random network data sets and NW small world network data sets are relatively small. The number of associations and the number of associations after anonymity increases, and the number of associations decreases. The number of nodes in the Twitter data set, Sina Weibo data set, and associations the number is relatively large, and the corresponding number becomes more obvious. There is also an increase in the number of associations and attribute correlation thresholds after anonymity, and the number of associations is correspondingly reduced.

Speculative comparison of key node privacy information

With the speculation of network privacy attributes, this article selects the core nodes in the four networks, takes 5%, 10%, 15%, and 20% attributes for anonymity, and takes the average value for comparative analysis. Using the key nodes and non-critical nodes for user privacy on the four networks, the ratio of the privacy nodes in these four networks is shown in the following Figures 19 –22.

Figure 19.

Relations between ER stochastic network and privacy node discovery rate.

Figure 20.

Relations between NW small world network and privacy node discovery rate.

Figure 21.

Relations between Twitter and privacy node discovery rate.

Figure 22.

Relations between Weibo and privacy node discovery rate.

By comparing the above four networks, it is found that the privacy nodes are increasing with the size of the network. When the network size is small, the privacy node discovery time is relatively short. When the network size is relatively large, the privacy nodes are found to be large in time. As the size of the network and the changing community evolve, the privacy nodes are found to be different at the same time.

Conclusion

In this article, we propose a privacy user information inference method based on SIoT key nodes, which infers other privacy node information through key nodes in SIoT community. This inference method is based on the public information of the community and the number of key nodes, and the inference method is relatively simple. The latter part of the work is to find out the role of key nodes in the process of information dissemination by learning the complete social information. The similarity rules of information between non-key nodes and key nodes are derived, and the inference rule data set is obtained. Then the inference method of other non-key nodes or non-key nodes for key nodes in SIoT is deduced. Finally, four kinds of network models are used to analyze the inference of key nodes and non-key nodes, and the effect of key nodes to infer that private nodes have obvious advantages. Future work mainly focuses on data characteristics in different data sets, and analyzes unknown information through known information. For social data sets with the same characteristics and structure, this speculation makes it easier to analyze unknown private information.

Footnotes

Handling Editor: Leo Zhang

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was supported by the Science and Technology Research Program of Chongqing Municipal Education Commission (grant nos KJQN201802101,KJZD-K201802101);the Open Fund of Chongqing Key Laboratory of Spatial Data Mining and Big Data Integration for Ecology and Environment;National Natural Science Foundation of China (grant no. 71473074);the Planning Project of Hubei Province Educational Science (grant no. 2018GB066);Natural Science Foundation of Hunan Province,China (grant no. 2019JJ40097);the Research Foundation of Hunan University of Science and Engineering,China (grant no. 17XKY068);the construct program of applied characteristic discipline in Hunan University of Science and Engineering;the Natural Science Foundation of Hunan Province,China (grant no. 2019JJ40097).

ORCID iD

Tangsen Huang

References

Mehrle

Strosser

Walk-modularity and community structure in networks. Comput Sci 2014; 27(1): 124–135.

Chen

Zhu

Zhang

, et al. Exploiting self-adaptive permutation–diffusion and DNA random encoding for secure and efficient image encryption. Signal Process 2018; 142: 340–353.

Wang

Bekele

, et al. Scientific collaboration patterns vary with scholars’ academic ages. Scientometrics 2017; 112(1): 329–343.

Alonazi

WB.

Exploring shared risks through public-private partnerships in public health programs: a mixed method. BMC Public Health 2017; 17(12): 571–585.

Cai

Guan

, et al. Collective data-sanitization for preventing sensitive information inference attacks in social networks. IEEE T Depend Secure 2018; 15(4): 577–590.

Chu

Liu

Inferring privacy information from social networks. In: Proceedings of the international conference on intelligence and security informatics, San Diego, CA, 23–24 May 2006, pp.154–165. Berlin; Heidelberg: Springer.

Zheleva

Getoor

. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In: Proceedings of the 18th international conference on World Wide Web (WWW’09), Madrid, 20–24 April 2009, pp.1145–1146. New York: ACM.

Lindamood

Heatherly

Kantarcioglu

, et al. Inferring private information using social network data. In: Proceedings of the 18th international conference on World Wide Web (WWW’09), Madrid, 20–24 April 2009, pp.1205–1228. New York: ACM.

Dey

Tang

Ross

, et al. Estimating age privacy leakage in online social networks. In: Proceedings of the IEEE 31st conference on computer communications (INFOCOM’2012), Orlando, FL, 25–30 March 2012, pp.2836–2840. New York: IEEE.

10.

Cai

Luo

, et al. Privacy-preserved community discovery in online social networks. Future Gener Comp Sy 2019; 93: 1002–1009.

11.

Chen

Luo

Zhang

, et al. An infrastructure framework for privacy protection of community medical internet of things. World Wide Web 2018; 21(1): 33–57.

12.

Argyros

Petsios

Sivakorn

, et al. Evaluating the privacy guarantees of location proximity services. ACM T Privacy Secur 2017; 19(4): 1–31.

13.

Jiang

Chen

K-C

, et al. Community-structured evolutionary game for privacy protection in social networks. IEEE T Inf Foren Sec 2018; 13(3): 574–589.

14.

Lee

Y-T

Hsiao

W-H

Lin

Y-S

, et al. Privacy-preserving data analytics in cloud-based smart home with community hierarchy. IEEE T Consum Electr 2017; 63(2): 200–207.

15.

Brin

Page

The anatomy of a large-scale hypertextual Web search engine. Comput Networks ISDN 1998; 30(1–7): 107–117.

16.

Radicchi

Castellano

Cecconi

, et al. Defining and identifying communities in networks. P Natl Acad Sci USA 2004; 101(9): 2658–2663.

17.

Park

Newman

MEJ

. The origin of degree correlations in the Internet and other networks. Phys Rev E 2003; 68(2): 026112.

18.

Shen

H-W

Cheng

X-Q.

Spectral methods for the detection of network community structure: a comparative analysis. J Stat Mech: Theory E 2010; 2010(10): P10020.

19.

Jiang

Dress

AWM

Yang

A spectral clustering-based framework for detecting community structures in complex networks. Appl Math Lett 2009; 22(9): 1479–1482.

20.

Gong

Chen

Jia

, et al. Survey on algorithms of community detection. Appl Res Comput 2013; 30(11): 3216–3220.

21.

Raghavan

Albert

Kumara

Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 2007; 76(9): 036106.

22.

Subelj

Bajec

Unfolding communities in large complex networks: combining defensive and offensive label propagation for core extraction. Phys Rev E 2011; 83(3): 036103.

23.

Shang

Bai

Jiao

, et al. Community detection based on modularity and an improved genetic algorithm. Physica A 2013; 392(5): 1215–1231.

24.

Lee

Gross

Lee

Modularity optimization by conformational space annealing. Phys Rev E 2012; 85(5): 056702.

25.

Blondel

Guillaume

Lambiotte

, et al. Fast unfolding of communities in large networks. J Stat Mech: Theory E 2008; 10(10): 108–120.

26.

Mislove

Viswanath

Gummadi

, et al. You are who you know: inferring user profiles in online social networks. In: Proceedings of the 3rd ACM international conference on Web search and data mining (WSDM’10), New York, 4–6 February 2010, pp.251–260. New York: ACM.

27.

Tang

Yuan

Mao

, et al. Relationship classification in large scale online social networks and its impact on information propagation. In: Proceedings of the IEEE 32nd international conference on computer communications (INFOCOM’2011), Shanghai, China, 10–15 April 2011, pp.2291–2299. New York: IEEE.

28.

Liu

Gummadi

Krishnamurthy

, et al. Analyzing Facebook privacy settings: user expectations vs. reality. In: Proceedings of the 2011 ACM SIGCOMM conference on internet measurement conference (IMC’11), Berlin, 2–4 November 2011, pp.61–70. New York: ACM.