Abstract
Introduction
Today, the Internet of Things (IoT) has received growing attention and has been investigated in widespread fields such as mobile communication, wireless network, and intelligent devices. One of the focuses of IoT application is location-based service (LBS). Currently, global positioning system (GPS)1,2 is largely available and widely applied in outdoor location. However, the multipath effect is a downside of GPS technology that limits its usage in the indoor environment. Therefore, indoor localization techniques that are independent from satellite and station signals have been extensively studied, such as wireless local area network (WLAN), radio frequency identification (RFID), Bluetooth, and infrared technology. Among the aforementioned localization technologies, WLAN is the most promising approach because of its low cost, non-line-of-sight propagation, and ubiquitous coverage. 3
Indoor localization technologies can be classified into two categories: those based on distance and time, and those based on the characteristics analysis of localization area. 4 The former includes angle of arrival (AOA), 5 times of arrival (TOA), 6 and time difference of arrival (TDOA), 7 which requires multiple stations for signals measurement. Herein, TOA and TDOA need the hardware that can achieve high accuracy of time synchronization between access points (APs) and reference points. 8 On the contrary, the latter, such as RFID, 9 requires the deployment of additional tags to be used in the object location estimation based on received signal strength indication (RSSI). For WLAN, it not only has a low cost, but also achieves a high accuracy in the object location estimation.
Currently, one of the most esteemed solutions for WLAN-based indoor localization is the fingerprint technology. 10 In offline stage, “fingerprints,” stored in the database, is obtained by the signal strength (SS) of APs. In online stage, object locations are calculated by comparing the current fingerprint with the previous one. Traditional fingerprint localization technology requires that all the fingerprint information be compared and calculated, which would result in the high complexity and low real time of the system.11,12 Virtual Reference Elimination (VIRE) 13 requires only a small amount of fingerprint information to create fingerprint database through D-value and sparse reconstruction. However, it could be easily influenced by the difference of heterogeneous devices, leading to the larger errors.
These traditional WLAN-based indoor localization technologies demand the calculation of a large amount of fingerprint vectors. To mitigate the complexity, a four-phase localization technology has been proposed to calculate the target location. During the offline stage, localization area is divided into different subareas utilizing fuzzy C-means (FCM) algorithm. Then, binary clustering for APs is calculated via k-means++ algorithm. After that, the useful APs in subareas are selected. During the online stage, target subarea is selected utilizing nearest neighbor (NN) algorithm based on useful APs, after which the target location is calculated by relative distance fuzzy localization (RDFL) algorithms proposed in this article.
This article is organized as follows: In section “Related works,” some basic algorithms are introduced and the key factors affecting the localization accuracy are analyzed. Then, a novel model of indoor location, used to reduce the computation complexity of fingerprint during the online stage, is shown in section “Methodology.” In order to evaluate the performance of the proposed model, several experiments have been conducted, which is shown in section “Experimental result.” Section “Conclusion” concludes the article.
Related works
Although terminal localization technology based on cellular mobile communication network and assisted global positioning system (A-GPS) has been used in wide-ranging applications, they fail to achieve a high accuracy in indoor localization. Here demonstrates some technologies used in indoor localization, such as optical tracking, 14 ZigBee, 15 infrared,4,16 Bluetooth, 17 WLAN, 18 and RFID. 19 The requirement of optical tracking technology is visible between a tracking target and a detector. Although the ZigBee technology has high localization accuracy, it requires extensive hardware support and time–space cost. Infrared only applies to areas without obstacles because the infrared signals of short-range transmission cannot traverse them. Active Badge20,21 is an example of infrared technology-based indoor localization system. Bluetooth is easy to integrate, yet it is unstable and susceptible to complicated environment. RFID exploits received signal strength (RSS) to calculated locations, but it fails to make full use of the existing WLAN infrastructure in localization area.
WLAN-based localization technology can be divided into two categories: propagation based and training based. The former requires strict time synchronization and high sensitivity of the hardware to obtain arrival time information. Compared to the latter, those localization techniques under the first category only obtain low localization accuracy and are generally more time–space cost-effective.
Training-based localization technology includes two models: propagation and location fingerprint. The former utilizes an empirical model to describe the relationship between SS and distance. 22 However, the impact of indoor noises and the complex relationship between SS and distance lead to lower accuracy in the former model. Therefore, the primary research direction of WLAN localization technology is on the latter, which is the foremost research content of this article. Fingerprint-based localization model creates location fingerprint database in offline stage using the stored location and the mapping relation of signal value and then calculates the location of the object point by comparing the fingerprint information received by an AP via signal matching algorithm in online stage.
A typical fingerprint-based indoor localization system is radio detection and ranging (RADAR), which uses the received SS by APs to match the aggregated signal layout and calculates the location by NN algorithm. Chen
23
has conducted intensive research on the NN algorithm and proposed that the location in WLAN environment could be estimated by robust pattern match and vector matching. Roos et al.
24
proposed a fingerprint localization technology based on probability, which utilized kernel function to fit probability density function (PDF) of SS in offline and matched bias functions to calculate location in online. Youssef and Agrawala
25
use location-clustering techniques to reduce the computational requirements of the localization algorithm. M Zhou et al.
26
proposed a novel information-based approach to derive the fundamental limit of localization and explore the relationship between the localization precision and AP placement, with the use of Fisher information matrix (FIM). Chen and Wang
27
proposed a novel indoor subarea localization scheme based on fingerprint passive crowdsourcing and unsupervised clustering, which first classifies unlabeled RSS measurements into several clusters and then relates to indoor subareas to generate subarea fingerprints. Wang et al.
28
applied the surface fitting technique to construct SS spatial distribution functions and proposed two location search methods to find the target location. Liu et al.
29
obtained accurate acoustic ranging estimates among peer phones and then mapped their locations jointly against Wi-Fi signature map subjecting to ranging constraints. As shown in Figure 1, the fingerprint vector is

Model of fingerprint localization.
In localization area, APs’ location and SS generate a radio map which we named it “fingerprint map (FM)” and then calculate location through Euclidean distance between the FM and the real-time SS in location stage. In order to gain high accuracy for the FM, a large number of samples are required to be collected during the offline stage. The complexity of fingerprint location comprises two parts: one is the large amount of fingerprint vectors for samples and the other is the higher dimension fingerprint vector for APs.30–33 Collecting this amount of fingerprint data can be increasingly more difficult, as the considered area becomes larger and more complicated. The current fingerprint technology has failed to estimate exactly how many APs are affected and where to place them in a given workspace.
A large amount of higher dimension fingerprint vectors may take up vast space–time resources, rather than improving the location accuracy. During the experiment, SS showed a large attenuation when the wireless signal passed through obstacles such as wall and table. As shown in Figure 2, SS is stable in a certain area, but it has noteworthy differences in disparate areas.

Change of SS indoor.
In response to above problems, k-nearest neighbor (KNN)-based two-step FCM weight (KTFW)
34
algorithm has been proposed, and
Above algorithms reduce the complexity of calculation and provide an available localization area to calculate location in offline stage. For algorithms,34,35 the KNN-FCM hybrid algorithm is utilized in online stage, which can enhance the load of calculation in location stage, because a large amount of APs need to be selected by KNN and cluster by FCM. Therefore, a method which divides subarea by FCM in offline has been proposed.
For example, A Aksu and P Krishnamurthy 37 proposed a method based on subarea divisions, and Xu et al. 38 proposed a method based on the generated distance loss model through subarea divisions. However, the above two method inadequately considered the influence in different APs and failed to reduce the complexity of location calculation in online stage. Therefore, AP selection has been proposed.
According to experiments and theoretical analysis, there is often a part of SS of APs which does not change with the fingerprint localization environment. Therefore, AP selection can remove useless APs whose received signal SS is weak and reduce the dimension of fingerprint vectors.
Kushki et al.
39
discussed the importance of AP selection in locating stage and described the basic principles of correlation minimization among APs selection. Chen et al.
40
proposed a power-efficient technique known as CaDet which calculates the location through intelligent selection of APs. Zhu and Deng
41
proposed an AP selection method, known as neighborhood rough sets (NRS). However, this method is prone to omitting some valuable APs and has certain limitations as it considers only the neighborhood area. Zou et al.
42
proposed an AP selection method, which combines information gain and mutual information entropy to decrease the computation cost. Residual-ranking achieves higher accuracy of coordinate calculation through selecting APs of smaller difference between reference points and object points.
After AP selection, some localization technologies utilize KNN or fuzzy set to calculate location. The core idea of KNN is the calculation of Euclidean distance between the SS of APs in offline and the SS of APs in online, as well as the selection of k-nearest APs. It then calculates these location average or gives different weights according to distance to estimate the location of object. KNN can obtain higher accuracy of localization when the environment remains unchanged in both offline and online stages. However, its factors are usually diversified in localization environment, such as temperature, humidity, and weather, which leads to a change in the SS of APs and a reduction in the accuracy of localization. The fuzzy set and KNN are somewhat similar. However, KNN describes the far or near position of the SS of APs in offline and online through Euclidean distance, while fuzzy set describes it through the similarity of signals. First, fingerprint transform model (FTM) transforms the fingerprint information of APs into distance fingerprint information. Second, the location is calculated based on fuzzy set.
Methodology
Algorithm
As shown in Figure 3, fingerprint localization model is divided into two stages: offline and online. During the offline stage, the localization area is divided into different clusters by FCM, using the technique proposed in this article, and the binary classification for APs by k-means++ algorithm in each subarea to select available APs. During the online stage, subareas where the object points exist are selected using NN algorithm. Then, the coordinate of the target point is calculated according to RDFL algorithm.

Model of fingerprint localization.
FCM algorithm
Among the cluster algorithms based on objective function, the theory of the FCM algorithm is the most mature. It was derived from the optimization in hard
Suppose that a sample set
In formula (2),
In formula (3),
The sample set is divided into
K-means++ algorithm
K-means++ algorithm improves the k-means algorithm and solves the well-known problem of cluster center selection. It divides a given data set through a certain number of clusters. The algorithm aims to find the center point of a cluster by minimizing the distance between the cluster and members of the same cluster. The algorithm comprises the following steps in Table 1.
K-means++ clustering algorithm.
Fuzzy set theory
By the classic set theory,
LA Zadeh 43 proposed a concept of fuzzy set that describes the fuzziness of object by a degree of membership function whose value interval is [0, 1].
The degree of membership close to 1 indicates that the object is close to set
Suppose
Let
Subarea division
Suppose in this location area,
With our prior knowledge on signal information between reference points and APs, we record the reference points coordinate information
Those “FSKEP” is divided into different clusters and each clusters as a subarea by cluster algorithm. In this article, we utilize FCM to divide the fingerprint database into
The FCM cluster algorithm based on fingerprint database is divided into two main categories: one is fingerprint normalization and the other is subarea division. Fingerprint normalization changes the degree of membership of fingerprint from 0 and 1 to [0, 1]. After fingerprint normalization, FCM is utilized to divide fingerprint database into different clusters. The specific steps are as follows.
Fingerprint normalization
In general, the SS of wireless is between −100 and 0 db in indoor environment. If SS and location are imported directly as cluster characteristics, different characteristics of dimension correspond to different contribution for clustering. Suppose two reference points fingerprint
In formula (10), because difference measurement unit may cause other characteristics be ignored as its change is not obvious when we use coordinates of
Suppose reference point
In formula (11),
In formula (12),
Subarea division
FCM cluster algorithm has been improved based on objective function. In formula (13), data set
1. Initialize the number of clusters
2.
In formula (14),
3.
4. Determine whether to stop the algorithm. If
Select maximal degree of membership cluster as subclass of reference point
In formula (16),
In formula (17),
In formula (18),
AP selection
The fingerprint technique requires a fingerprint vector by a combination of geographical coordinate of the preference point and RSSI value received by different APs. In online stage, the location is estimated by comparing the measured SS value acquisition and the previously measured value. As previously mentioned, the more the number of APs is, the higher the dimension of fingerprint vector becomes. Nevertheless, not all of the APs can improve the localization accuracy; on the contrary, the fault data may be imported. As shown in Figure 4, the performance of SS values is between 5 preference points and 65 APs, and a mass of SS value of APs is weak.

RSSI between AP.
In Figure 4, APs can be divided into two sub-clusters: one is the APs of strong signal that works for location accuracy and the other is APs of weak signal that are abandoned. In this article, k-means++ algorithm has been utilized to divide AP into two sub-cluster which are the APs of strong signals and the APs of weak signals. We select those APs which fall under the category of the former sub-cluster as useful APs
In formula (19),
Pseudo-code of k-means++ algorithm.
Subarea selection
In the online location stage, we first match the measured RSSI sample with cluster center and then select the subarea that is most similar to the measured RSSI sample as the area of the sample. According to NN algorithm, the measured object point and clusters center distance are compared, and the area that has the closest distance with the object’s area is selected.
Suppose that the measured sample fingerprint is
In formula (22),
Fuzzy set localization
Fingerprint transform model
The indoor wireless signal transmission accords with logarithmic path loss model
In formula (23),
Suppose the distance between the two APs and the signal source is
Since two APs are in the same localization area, we suppose that
Convert formula (26) to formula (27)
As shown in formula (27),
The difference between
According to formulas (27) and (29), formula (30) is as follows
The form of formula (30) after normalization is as follows
Formula (31) represents the distance fingerprint of
Relative distance fuzzy localization
Suppose there are
As shown in formula (33), according to fingerprint conversion model, we can calculate the distance fingerprint
The degree of membership function of AP is defined as follows
According to formula (34), we could calculate the similarity between APs and object point, sort them in descending order and select the top
Experimental result
The experimental region, which includes a corridor, four offices, and eight classrooms, is approximately 860 m2, with five Aps deployed. 44 In every classroom, a reference point could receive signals from at least three APs on the current floor and several APs on the neighboring floors. For instance, reference points (blue points) in a room are separated by 3.7 m and the nearest distance between the reference points in current room and adjacent room is 2.6 m. According to Figure 5, the number of APs from 3 to 13 can be perceived in each position, and the average number is 7.

Layout of experimental context.
The experiment comprises three parts: subarea division, AP selection, and subarea selection. The specific procedure is shown in the following.
Subarea division
According to the proposed method, FM is divided into different subareas. As shown in Figures 6 and 7, the available number of clusters is 13 and 14, respectively. Comparing with Figures 5–7, we can find that Figure 6 has the best performance.

Division of cluster (

Division of cluster (
When the number of cluster is 14, we randomly select 68 test points and calculate their locations by KNN, fuzzy, and Bayesian algorithm. First, the locations of 68 test points are calculated by KNN (

Probability of error of KNN algorithm.
The distance of error is defined as the distance between calculated location and real location in localization stage. The probability of error is calculated as follows
where
Second, the locations of 68 test points are calculated by fuzzy algorithm. The probability of error of the two localization processes is compared in Figure 9, where one process includes subarea division, and the other excludes.

Probability of error of fuzzy algorithm.
Third, the locations of 68 test points are calculated by Bayesian algorithm. The probability of error of two localization processes is compared in Figure 10, where one process includes subarea division, and the other excludes.

Probability of error of Bayesian algorithm.
Fourth, the locations of 68 test points are calculated by KNN, fuzzy, and Bayesian algorithm with subarea division (the number of subareas is 14). The probability of error of these is compared in Figure 11, which indicates that the performance of fuzzy is better than others. The experimental results show that subarea division can improve the accuracy of localization.

Probability of error of KNN, fuzzy, and Bayesian algorithm.
AP selection
There are 68 test points selected randomly and 65 APs in Figure 5, and each test point perceives the number of APs differently. The experiment is divided into two parts: one is non-divide subarea in which the sample set consists of 68 test points and the other is divided subarea in which the sample set of each subarea consists of the test points distributed in the subarea, which the sum of test points in each subarea being 68. After that, the probability of error based on fuzzy algorithm is obtained.
First, four test points are selected randomly in the area without subarea divisions, and then the distribution of received SS is observed. As shown in Figure 12, the SS on the horizontal axis is dispersed, which indicates that the difference of APs in different position is greater. Therefore, binary clustering for APs calculated via k-means++ algorithm would lead to larger errors.

SS of four sampling points inner subarea.
As shown in Figure 13, APs are divided into two subclasses utilizing the k-means++ algorithm when location area is non-divided. The number of selected APs is 7, and KNN is applied to calculate location. The experimental results show that the localization accuracy of AP selection is lower than the one without AP selection. This is because that some APs are lost in the AP selection by k-means++ algorithm.

Probability of error of AP select.
As shown in Figure 14, four test points are selected randomly in the area with subarea divisions, and then the distribution of received SS is observed. The SS on the horizontal axis is intensive, which indicates that the difference of APs in different positions is not obvious. As shown in Figure 15, after AP selection, the number of useful APs is from 4 to 7 in each subareas, and the average number is 5.6.

SS of four sampling points inner subarea.

Bar of AP select inner subarea.
When the number of subareas is 14, locations of these 68 test points are calculated by KNN, fuzzy, and Bayesian algorithm. First, the locations of 68 test points are calculated by KNN (

Probability of error of KNN algorithm.
Second, the locations of 68 test points are calculated by fuzzy algorithm. The probability of error of the two localization processes is compared in Figure 17, where one process includes AP selection, and the other excludes.

Probability of error of fuzzy algorithm.
Third, the locations of 68 test points are calculated by Bayesian algorithm. The probability of error of the two localization processes is compared in Figure 18, where one process includes AP selection, and the other excludes.

Probability of error of Bayesian algorithm.
Fourth, the locations of 68 test points are calculated by KNN, fuzzy, and Bayesian algorithm with AP selection. The probability of error of these is compared in Figure 19, and the performance of fuzzy is better than others.

Probability of error of KNN, fuzzy, and Bayesian algorithm.
After subarea division and AP selection, the number of useful APs is reduced from 65 to 5.6. Table 3 shows that the localization time of KNN, fuzzy, and Bayesian algorithm are reduced by about 21%, 20%, and 13%, respectively.
Times statistics of around APs selection.
KNN: k-nearest neighbor; APs: access points.
Subarea selection
As mentioned above, the experimental area is divided into 14 subareas. The received SS of test point is normalized, and then the nearest distance is selected from the set containing the Euclidean distance between the normalized values and each subarea. Finally, the subarea corresponding to the nearest distance is considered as the area where test point exists.
In the experiment, 68 test points, distributed in different areas, are randomly selected and their residing subareas are calculated by NN and fuzzy algorithms. As shown in Figure 20, the number of error points both is 4 and the accuracy of subarea selection is up to 94%.

Result of subarea selection.
RDFL localization
As shown in Figure 21, the probability of error of fuzzy is similar to RDFL when the distance between the calculated location and real location is within 2 m. Meanwhile, the probability of error of fuzzy is lower than that of RDFL when the distance of error between calculated location and real location is greater than 2 m.

Probability of error of RDFL and fuzzy algorithms.
As shown in Table 4, the accuracy of RDFL is improved by 0.46 m compared to that of fuzzy set technology.
Average distance of error and average probability of error.
RDFL: relative distance fuzzy localization.
As shown in Figure 22, the probability of error of KNN which is based on FTM is lower than that of RDFL.

Probability of error of FTM-KNN and RDFL algorithm.
As shown in Table 5, in spite of the 0.12 m improvement in accuracy using RDFL, the average probability of error of RDFL is around the same as that of FTM-based KNN.
Average distance of error and average probability of error.
FTM: fingerprint transform model; KNN: k-nearest neighbor; RDFL: relative distance fuzzy localization.
As shown in Figure 23, the probability of error of RDFL is higher than that of other methods. As shown in Table 6, the average distance of error of the RDFL is reduced to 2.58 m, which is smaller than that of other localization methods.

Probability of error different algorithms.
Average distance of error and average probability of error.
RDFL: relative distance fuzzy localization; FTM: fingerprint transform model; KNN: k-nearest neighbor.
Conclusion
An indoor positioning model based on fingerprint has been designed. In this article, the proposed approach consists of FCM and k-means++ algorithm. During the offline stage, localization area is divided into different subareas utilizing FCM algorithm. Then, binary clustering for APs is calculated via k-means++ algorithm, after which useful APs in subareas are selected. During the online stage, target subarea is selected utilizing NN algorithm based on selected APs. After that, target location is calculated by RDFL algorithm which has also been proposed in this article. Experimental results demonstrate that the proposed method is more effective and more robust, and its performance is competitive against other state-of-the-art methods.
