Abstract
With the development of the internet and the arrival of large volumes of data, the analysis of transactional data is becoming important in the field of data mining. Clustering algorithms for transactional trade datasets are becoming a hot topic. Among them, clustering with slope algorithm (CLOPE) is widely used as a result of its superior performance, lower memory use, and better quality than other clustering algorithms. However, the quality of the CLOPE algorithm is related to the sequence in which the data is input; different result will be clustered by different input sequences of the same dataset. This can even result in poor clustering. In order to solve the problem, this paper analyzes the CLOPE algorithm deeply and proves that records with more items ahead will improve the quality of the result greatly in theory. A procedure to preprocess the dataset according to item similarity is proposed. The experiment results show that the algorithm has obviously better quality result when the proposed method is used, and it is 10% faster than the traditional procedure. This algorithm is a valid algorithm that produces high quality results for transaction data sets.
Get full access to this article
View all access options for this article.
