Abstract
Outlier detection is an interesting issue in data mining and machine learning. In this paper, to detect outliers, an information-entropy-based k-nearest neighborhood relevant outlier factor algorithm is proposed that is combined with Shannon information theory and the triangle pruning strategy. The algorithm accounts for the data points whose k-nearest neighbors are distributed on the edge of the range within the designated radius. In particular, the neighborhood influence on each point is considered to address the problem of information concealment and submergence. Information entropy is used to calculate the weights to distinguish the importance of each attribute. Then, based on the attribute weights, the improved pruning strategy reduces the computational complexity of the subsequent procedures by removing some inliers and obtaining the outlier candidate dataset. Finally, according to the weighted distance between the objects in the candidate dataset and those in the original dataset, the algorithm calculates the dissimilarity between each object and its k-nearest neighbors. The data points with the top $r$ dissimilarity are regarded as the outliers. Experimental results show that, compared to existing methods, the proposed approach improves pruning and detection rates while maintaining the coverage rate.
Get full access to this article
View all access options for this article.
