Abstract
Introduction
Wireless sensor network (WSN) is a multi-hop, self-organizing wireless communication network system by deploying a large number of cheap micro-sensor nodes in the monitoring region. Its purpose is to perceive, gather, and process the information detected by the tiny sensors in the coverage area and send the useful information to the observers and control centers. 1 Localization is the key to many WSN applications, such as network management, environmental monitoring, target tracking, and routing. WSN node localization technology has gained wide attention and become one research hotpot of WSN.
There is a lot of work done in the field of localization in WSNs. WSN node localization can be divided into two major categories. One is the localization scheme in the centralized manner, such as semi-definite programming2,3 and multidimensional scaling.4,5 The other category is the localization scheme in the distributed manner, such as DV-Hop,6,7 MDS-MAP,8,9 and MDL. 10 The localization algorithm in distributed manner has lower computation and communication costs, so it is suitable for large-scale WSNs.
In recent years, many scholars proposed the localization algorithm based on machine learning.11–16 In learning-based approaches, a model is built based on training data gathered from the network, and then each sensor node can estimate position itself using trained model. 17 Therefore, learning approaches can be used in a distributed manner, and each sensor can be localized independently from other sensor nodes.
Support vector machine (SVM) is a machine learning methods which evolved from the statistical learning theory. SVM has high generalization ability and good classification precision. Moreover, SVM can solve the overfitting problem effectively. So, SVM has been applied to a number of fields, such as classification and regression fields. There has been some localization algorithm based on SVM in Pan et al., 12 Xu et al., 13 Kim et al., 14 Safa, 15 Mao et al., 16 and Huan et al. 17 But for a large-scale WSN, localization algorithm based on SVM faces to the problems of the large-scale learning samples. The large-scale learning samples will lead to high burden of the training calculation, over learning, and low classification accuracy. These problems will lead to localization algorithm impractical. In this article, we will try to improve the SVM training process to reduce training time, computation cost, and achieve more localization accurate.
The rest of this article is organized as follows. Section “Localization based on SVM” briefly introduces theories related to SVM and explains localization algorithm based on SVM. Section “Large-scale training samples reduction method (FCMTSR)” introduces our improved large-scale training sample reduction method for SVM (FCMTSR-SVM) to further modification of nodes’ positions and reduce training time. Section “Simulation of localization algorithm based on SVM-FCMTSR” shows the simulation results of localization algorithm based on FCMTSR-SVM. Section “Discussion” discusses the benefits and limitations of proposed localization algorithm. Finally, this article is concluded in section “Conclusion.”
Localization based on SVM
Basic theory of SVM
SVM is based on the principle of structural risk minimization. 18 The idea of SVM is to gain an optimal hyperplane in the feature space, and the hyperplane can separate the two-class data with largest interval. SVM is a machine learning algorithm, so it needs to be trained to build the model. SVM can solve linear and nonlinear classification. If the training samples are nonlinear, SVM maps the samples into a high-dimensional feature space by a nonlinear mapping function, where samples become linearly separable and the optimal classification hyperplane is constructed. To solve the problem of dimension disaster in high-dimension space, SVM introduces the kernel function.19–21
According to idea of SVM, the hyperplane can not only separate the two-class data but also make classification interval largest. It is assumed that the linearly separable sample sets
After the normalization, classification interval is described in equation (2)
The problem of searching optimal classification hyperplanes is converted into getting the minimum value by satisfying the conditions
where * is the inner product,
where
where
Localization based on SVM
It is assumed that we randomly deployed a WSN with n nodes {S1, S2, S3, …, SN} in a two-dimensional (2D) geographic area [0, D]2 (D > 0). The communication range of each node is same. In the WSN, k nodes are beacon nodes which know their own location; other N−k nodes are non-beacon nodes which need to estimate their location. We divide the 2D area into M × M square cells. X-dimension and Y-dimension are both divided into M parts. Each cell can be seen as one class. So, x-coordinate has M = 2m classes cxi, and the y-coordinate has M = 2m classes cyi, where m is the number of SVM required to classification for each dimension. Each node resides in a cell, so it is assumed that each node exists in class [cxi, cyi] in 2D area. In other words, respectively, each node exists in [(i−1)D/M, iD/M) × [(j−1)D/M, jD/M) unit. If x-coordinate of the node is the D, the node belongs to [(i−1)D/M, iD/M]. If y-coordinate of the node is the D, the node belongs to [(j−1)D/M, jD/M].
The connectivity information would be gathered by beacon nodes from the WSN and would be sent to the head node (sink node or base station) where the SVM training algorithm is running. Suppose that the node Ni’s true coordinates are (xi, yi). The vector Hi = h(Ni, N1), h(Ni, N2), h(Ni, Nk) > is taken as feature vector of SVM to classify node Ni, where h(Ni, Nj) denotes the shortest path hop-count between nodes Ni and Nj. All beacons’ feature vectors {Hi} (i = 1→k) are composed of training sample set. Using this training sample set, the SVM will be trained to build a model that would be broadcasted to all the nodes even they are beacons or not. Each non-beacon node is classified into one unit by this SVM model. And then, the center of each unit with (x_i, y_i) is used as the predicted position. If the result of each classification is correct, the maximum localization error of each non-beacon node is
Obviously, localization algorithm based on SVM described above is a multi-class problem, but SVM can only solve two-class problem. So, how to solve multi-class problem is the key to localization algorithm. In this article, decision tree is used to solve the multi-class problem.
Take X-dimension for example, X-dimension is divided into M parts, so x-coordinate has M−1 classes cxi {cx1, cx2, …, cxM−1}. Each class cxi contains nodes with the x-coordinate greater or equal to iD/M. According to the decision-tree strategy, M−1 binary SVM need to be trained, and localization of each node needs log2M binary classifications. For the Y-dimension, the processing method is the same as the X-dimension. Figure 1 shows that X-dimension is divided into 16 parts.

Decision tree used for X-dimension classification (M = 16).
Large-scale training samples reduction method (FCMTSR)
The SVM has good generalization ability and high classification precision. But, for a large-scale WSN, localization algorithm based on SVM faces to the problem of the large-scale learning samples. The large-scale training samples result in slow learning speed and large storage demand. Moreover, if training sample data mingle with outlier data, the training result will lead to classification accuracy decline. These problems will directly hamper the SVM’s application in localization algorithm. So, before training the SVM, we need to preprocess the learning sample data to reduce the scale and remove the outlier data. This article proposed a large-scale training samples reduction method based on fuzzy C-means (FCM) which is called FCMTSR for short.
Issues to be considered
According to the basic theory of SVM described in section “Localization based on SVM,” SVM is formulated as Quadratic Programming (QP) problems; the training time and space complexities are O(n3) and O(n2), respectively, in which n is the number of training samples. Hence, reducing the size of training data can improve the training speed and lower the space demand.
From the basic theory of SVM, it is well known that the optimal hyperplane which determines the classification results are constructed by the support vectors. This means the other non-support vector samples are useless to training, and they do not impact the position of the optimal hyperplane. So, removing the non-support vector samples not only affects the classification accuracy but also speeds up training process. It is obvious that only those samples near the two-class border are likely to become support vectors. We call these samples near the border as potential support vectors. The performance of training will be better using potential support vectors as training sample set. Under the premise of ensuring classification accuracy, the training speed will be certainly improved to a certain extent. Because this strategy removes only the samples which are useless to SVM learning and keeps the potential support vectors which have greater effect on the optimal hyperplane. When sample aliasing exists between the two types of training data, we will obtain a wrong optimal hyperplane. So, at this situation, we should try to prune away these aliasing sample data to guarantee the accuracy.
The idea of training samples reduction method
Suppose that training data sets are expressed as Figure 2. The • represents the training sample data which belong to class +1, the Δ represents the training sample data which belong to class −1, H is the optimal hyperplane, and H1 and H2 are the planes which are constructed by respective support vectors. From above analysis, it is known that the potential support vectors must be near the border of each sample set. How to judge the relationship between sample data and sample set is the key to the reduction method.

Training sample set.
Figure 2 shows that the data near the border are closer to the other sample set. So, one simple method of judging border data is that if data “S” belongs to class +1, and there exists sample data belong to class −1 in the neighbor of “S,”“S” can be considered as near the border. According to this idea, this article proposes flowing determine rule of the border data based on point set theory.
In FCMTSR, each training sample data are regarded as a point and training sample sets are regarded as point sets. Based on the point set theory, we give some definitions as follows.
Assume that there is a limited point set A in a space
According to above assumption, we give the following definitions.
Definition 1
For the point S, if there exists one point set B which meet the condition that B is one neighborhood of S, B =
Definition 2
For the point S, if any neighborhood point of S is not in A,
Definition 3
If S is not either inner point or outer point of A, that means in any neighborhood of S, a part of points belong to A, and other points belong to complementary set of A, we defined that S is the edge point of A.
We regard the training samples
Assume that any point S is in the set A1. If S is the outer point for set A2, that means S must belong to class +1 and S is not at the edge of the two classes. At this situation, S is probably not the non-support vector; if S is the inner point for set A2, that means S should belong to class −1, so S should be the outlier data mixed in set A1 rather than the support vector; if S is the edge point for set A2, that means S is probably at the edge of the two classes. At this situation, S is most likely a support vector. Similarly, the type of each point in set A2 can be analyzed according to above method. Thus, to gain the potential support vectors, we just need to find the edge point of the two classes. And then, these potential support vectors are used as training samples. Thus, the scale of training samples is reduced and outlier data are removed.
FCMTSR
For judging the type of each sample point, the neighborhood points of each point need to be gained by calculating the distance of every paired points in high-dimensional space. Distance can be calculated by
where
In the FCMTSR, the training samples are divided into c subclasses according to spatial similarity through clustering technology. Taking Figure 2 for example, after clustering, each type of training sample set is divided into five subclasses. Subclass “+A,”“+B,”“+C,”“+D,” and “+E” belongs to class +1, and subclass “−A,”“−B,”“−C,”“−D,” and “−E” belongs to class −1; it is shown in Figure 3. Clustering center of each subclass is used to represent all points which belong to respective subclass. Each clustering center must belong to one subclass definitely.

Training sample set after clustering.
Each point’s membership degrees of belonging to every subclass are calculated. The membership degree represents the spatial similarity and is used to analyze the type of each point. Take points “a” and “b” in Figure 3 for example. Point “a” belongs to subclass “+A” and “b” belongs to subclass “+E.” The spatial position of “a” is more close to class −1 than “b.” So, the sum of spatial similarity between “a” and each subclass of class −1 must be bigger than “b.”
Suppose that the sum of membership degrees between any point “i” and each subclass of class −1 is
Figure 4 shows the situation of outlier point existing. In Figure 4, point “m” is an outlier which should belong to class −1, but it is mixed into sample set “+1” wrongly. This error will affect the training result and cause classification hyperplane to deviate. So, point “m” should be removed before the training. According to spatial similarity, formula (9) must be workable. That means that the difference of membership degree of outlier data must be negative

Outlier point existing in training sample set.
According to the above theoretical analysis, a new method of sample data set reduction (FCMTSR) based on spatial similarity analysis and clustering technology is proposed.
FCM is a clustering algorithm that allows one piece of data to belong to two or more clusters. It provides many advantages, including implementation simplicity, rapidly converging, and high efficiency.22–24
In the FCMTSR, the sample set including N vectors
where
The objective function is
where
Based on above FCM method, we can obtain the membership value of every training sample point for SVM. Then, we can decide the relationship of any sample point s in sample set A1 (belong to class +1) with sample set A2 (belong to class −1) through the following rules:
If
If
If
where
Realization of FCMTSR
According to the basic idea of proposed algorithm above, SVM-FCMTSR realization steps are as follows:
where n is the total number of sample, c = 2 m, and
where
Simulation of localization algorithm based on SVM-FCMTSR
We conducted the simulations on a network of 1000 sensor nodes randomly distributed in the 100 m × 100 m 2D area. In the network, the beacon nodes are randomly selected. Three different beacon populations were considered: 20% of the WSN size (n = 200 beacon nodes), 25% (n = 250 beacon nodes), and 30% (n = 300 beacon nodes). The communication range of each node is 10 m. The x-dimension and y-dimension are divided into 128 parts (M = 128). All simulation experiments are done on a PC machine with CPU-Intel Core2 Duo E7500, 2GB memory. The simulation platform is MATLAB 7.0, and LIBSVM software is used for SVM classification. The kernel of SVM is Radial Basis Function (RBF). FCMTSR proposed in section “Large-scale training samples reduction method (FCMTSR)” is used to preprocess the training sample set before training SVM. The parameter
Effect of beacon population
We randomly selected three simulation results as shown in Figures 5–7; when the number of nodes is 1000, the beacon population is 20%, 25%, and 30% in the square-shaped network. The communication range is 10 m. The * represents the positions of beacon nodes, the Δ represents the positions of real non-beacon nodes, and the ○ represents the estimated positions of non-beacon nodes.

Estimation result of 20% beacon nodes in the square-shaped network.

Estimation result of 25% beacon nodes in the square-shaped network

Estimation result of 30% beacon nodes in the square-shaped network.
The localization error is 2.6 m at the beacon nodes population 20%, 1.9 m at the population 25%, and 1.2 m at the population 30% in these three simulations. The localization error E is described as follows
where
The above results show that the localization error decreased with increasing beacon nodes. In order to make the result believable, every simulation was run for 10 times and the results were averaged. The results of the localization error are shown in Table 1.
Localization error of different beacon population in the square-shaped network.
Table 1 shows that increasing the number of beacon nodes results in decreasing the localization error. The accuracy of learning methods depends on the amount of training data. Since beacon nodes make training data, with more number of beacon nodes, there is more training data available. So, the localization algorithm can be trained with more accuracy. Then, in the localization phase, it has more accurate results.
FCMTSR-SVM versus SVM with different beacon population
We compared the localization algorithm based on SVM with using FCMTSR proposed in section “Large-scale training samples reduction method (FCMTSR)” and without using FCMTSR at different beacon population. The results are shown in Figure 8. The location accuracy by FCMTSR-SVM is higher than SVM. Since the outlier data immixed can be removed by FCMTSR. It results that generalization of SVM is improved. So, the position of nodes would be computed more accurately. From Figure 8, we can see that the average localization error by the FCMTSR-SVM is lower about 2% than by the SVM.

Average localization error comparison of FCMTSR-SVM versus SVM.
Figure 9 shows the simulation results of FCMTSR-SVM versus SVM training time consumption. When the beacon population is 20%, 25%, and 30%, the FCMTSR-SVM training time is about 21.14, 39.44, and 47.33 s, the SVM training time is about 38.91, 71.68, and 86.05 s. The simulation results show that localization accuracy by FCMTSR-SVM is improved by 2%, and the total training time is reduced about 55%.

Average training time comparison of FCMTSR-SVM versus SVM.
Border problem
The purpose of this simulation is to study the performance of the FCMTSR-SVM localization algorithm for the border problem. The border problem exists in many localization algorithms. It means that the sensor nodes near the edge of deployment area will obtain much lower localization accuracy than those near the center of deployment area. We investigate this problem for two sizes of beacon population (20% and 30%) and select 100 nodes closer to the edge. Figures 10 and 11 show the results of simulations. The localization error of 100 border nodes is between the minimum error and the maximum error. The results illustrate that in FCMTSR-SVM localization algorithm, the localization accuracy of sensor nodes near the edge is very close to those near the center. So, the proposed algorithm is pure of border problem. This property can be explained. The FCMTSR-SVM localization only depends on the beacon nodes, which is independent of other nodes. And the proposed FCMTSR does not affect the support vectors which are provided by beacon nodes. Therefore, the position of the sensor node does not have substantial effects on localization accuracy.

Estimation result of 100 border nodes with beacon population 20%.

Estimation result of 100 border nodes with beacon population 30%.
Coverage hole problem
The purpose of the simulations is to evaluate the performance of FCMTSR-SVM in the network with the coverage holes. The simulations are composed of two groups. One is in the C-shape network area and the other is the network with two holes. In the first group of simulations, 1000 sensor nodes are deployed randomly in the 100 m × 100 m C-shape network area. The communication range is 10 m. The node distribution is shown in Figure 12.

Node distribution in C-shape network.
Table 2 shows the average localization error of FCMTSR-SVM in this C-shape network with beacon population 20%, 25%, and 30%. Comparing Table 1 with Table 2, it is shown that both average localization errors are consistent.
Localization error of different beacon population in the C-shape network.
FCMTSR-SVM was compared with SVM in this network with beacon population 20%, 25%, and 30%. The simulation results of average localization error and training time are shown in Figures 13 and 14, respectively. As shown in these two figures, FCMTSR-SVM has higher accuracy and less training time than SVM in the C-shape network area.

Average localization error comparison of FCMTSR-SVM versus SVM in C-shape network.

Average training time comparison of FCMTSR-SVM versus SVM in C-shape network.
In the second group of simulations, 1000 sensor nodes are deployed randomly in the 100 m × 100 m square-shaped network area with two holes. The communication range is 10 m. The radius of the left hole is 10 m, and the radius of the right hole is 15 m. The node distribution is shown in Figure 15.

Node distribution in the network with two holes.
FCMTSR-SVM was compared with SVM in this network with beacon population 20%, 25%, and 30%. Figures 16 and 17 show the results of average localization error and training time, respectively. Similarly, the results suggest that in the network with two holes, FCMTSR-SVM has higher accuracy and less training time than SVM.

Average localization error comparison of FCMTSR-SVM versus SVM in the network with two holes.

Average training time comparison of FCMTSR-SVM versus SVM in the network with two holes.
From the above results of the simulations, we can see that the localization accuracy of FCMTSR-SVM may be a little lower in the networks with holes. But it appears that the existence of coverage holes does not have big effects on FCMTSR-SVM. Moreover, FCMTSR-SVM in the networks with holes has better performance than SVM. For example, when beacon population is 30% in the network with two holes, the average localization error of SVM is 1.5 m and the average training time of SVM is 86.55 s, the average localization error of FCMTSR-SVM is 1.2 m and the average training time of FCMTSR-SVM is 49.22 s, respectively. Therefore, FCMTSR-SVM has good performance not only for the border problem but also for the coverage hole problem. The reason can be explained that localization based on SVM is relative to hop-count, not relative to geographic distances, so the network holes do not have significant effect on localization accuracy. And it proves that FCMTSR-SVM can reduce scale of training sample and keep support vectors effectively in the network with holes.
Discussion
Because the power and memory of sensor node is limited, an effective localization algorithm needs to consider these factors of storage, calculation, and communication capacities. In the localization algorithm based on FCMTSR-SVM, each node needs 2 × (M − 1) × (4k + 4) + k memory units, where k is the number of beacons, M is the number of parts that network is divided into. Each k hop-count value can be represented by 1 byte. Each SVM needs the
In the point of computation cost, each node needs to compute decision function 2log2M times. Hence, for k = 50, M = 128, it results in not more than 14 × 50 = 600 multiplications. At the base station, when the SVM is trained without FCMTSR (sample size is n), the iterative processes need to use Hessian matrix many times, the kernel function needs O(n2) order calculation. But FCMTSR only needs to calculate O(iter × c × n) order Euclidean distance for the FCM clustering process, where c is the clustering number, iter is the maximum iteration times, c << n, and iter << 100. After reduction, it needs
In terms of communication cost, to calculate the hop-count of each node, beacon nodes should broadcast less than k byte message except the base station (or sink node). It means each node transmit less than k unit of memory to its neighbor nodes in the initial phase. To train the SVM, each beacon node needs to transmit k byte hop-count to the base station. After building the SVM model, the base station broadcasts 2(M−1) groups of SVM parameters
However, our proposed localization algorithm has its own limitation. Like other localization algorithm based on SVM, the quality of training process directly affects the localization accuracy. Training data are made by beacon nodes. So, the number of beacon nodes has effect on accuracy. The accuracy is decreased with beacon nodes being reduced. FCMTSR-SVM localization algorithm is aimed at the large-scale WSN. For the small-scale WSN, the FCMTSR is not very effective.
Conclusion
In this article, we propose a distributed localization algorithm for large-scale WSNs based on FCMTSR-SVM. The proposed localization algorithm transforms the location estimation problem into multi-class problem, and binary SVM classification is used to solve the multi-class. So, it is suitable for WSNs that do not require pairwise distance measurements and special assisting devices. The SVM classification accuracy is the key to the localization accuracy. The training process is the important factor that influences the performance of SVM. The large-scale training samples will lead to high burden of the training calculation, over learning, and low classification accuracy. So, this article proposes a new reduction method based on FCM (FCMTSR). Using fuzzy clustering method, the potential support vectors are obtained and the non-boundary outlier data immixed are removed. Thus, the training time can be reduced and the localization accuracy can be improved. Then, we conducted the simulations to examine the performance of localization. The simulation results show that localization algorithm based on FCMTSR-SVM has the lower training time cost and more accuracy than localization algorithm based on SVM. The border problem and coverage hole problem also cannot affect its performance, which many other localization algorithms have not overcome. The communication and processing overheads are kept small in this algorithm, so it is suitable for practical application in large-scale WSNs.
